Big Data Application Platform
It's time to think of the architecture and application platforms surrounding "Big Data" databases. Big Data is often centered around new database technologies mostly from the emerging NoSQL world. The main challenge that these databases solve is how to handle massive amount of data at a reasonable cost and without poor performanc - distributed databases emerged to address this challenge and today we're seeing high adoption rate
and quite impressive success stories such as the Netflix use of Cassandra/DataStax solution
. All that indicate the speed in which this market evolves.
The need for a Big Data Application Platform
Application platforms provide a framework for making the development of applications simpler. They do this by carving out the generic parts of applications such as security, scalability, and reliability (which are attributes of a 'good' application) from the parts of the applications that are specific to our business domain.
Most of the existing application platforms such as Java EE and Ruby on Rails were designed to work with centralized relational databases in mind. Clearly, that model doesn’t fit well to the Big Data world simply because it wasn’t designed to deal with massive amount of data in first place. In addition to that, frameworks such as Hadoop are considered too complex as noted in VP/Research Director for Forrester Research Mike Gilpin
, in his post, "Big Data" technology: getting hotter, but still too hard
:
"Big Data" also matters to application developers - at least, to those who are building applications in domains where "Big Data" is relevant. These include smart grid, marketing automation, clinical care, fraud detection and avoidance, criminal justice systems, cyber-security, and intelligence.
One "big question" about "Big Data": What’s the right development model? Virtually everyone who comments on this issue points out that today’s models, such as those used with Hadoop, are too complex for most developers. It requires a special class of developers to understand how to break their problem down into the components necessary for treatment by a distributed architecture like Hadoop. For this model to take off, we need simpler models that are more accessible to a wider range of developers - while retaining all the power of these special platforms.
Other existing models for handling Big Data such as Data Warehouse don’t cut it either, as noted in Dan Woods
' post on Forbes, Big Data Requires a Big, New Architecture
:
...to take maximum advantage of big data, IT is going to have to press the re-start button on its architecture for acquiring and understanding information. IT will need to construct a new way of capturing, organizing and analyzing data, because big data stands no chance of being useful if people attempt to process it using the traditional mechanisms of business intelligence, such as a data warehouses and traditional data-analysis techniques.
To effectively write Big Data applications, we need an Application Platform that would put together the different patterns and tools that are used by pioneers in that space such as Google, Yahoo, and Facebook in one framework and make them simple enough so that any organization could make use of them without the need to go through huge investment.
Here's my personal view on how that platform could look like based on my experience covering the NoSQL space for a while now and through my experience with GigaSpaces.