advertise
« Architecting Massively-Scalable Near-Real-Time Risk Analysis Solutions | Main | How Twitter Stores 250 Million Tweets a Day Using MySQL »
Wednesday
Dec212011

In Memory Data Grid Technologies

After winning a CSC Leading Edge Forum (LEF) research grant, I (Paul Colmer) wanted to publish some of the highlights of my research to share with the wider technology community.

What is an In Memory Data Grid?

It is not an in-memory relational database, a NOSQL database or a relational database.  It is a different breed of software datastore.

In summary an IMDG is an ‘off the shelf’ software product that exhibits the following characteristics:

The data model is distributed across many servers in a single location or across multiple locations.  This distribution is known as a data fabric.  This distributed model is known as a ‘shared nothing’ architecture.

  • All servers can be active in each site.
  • All data is stored in the RAM of the servers.
  • Servers can be added or removed non-disruptively, to increase the amount of RAM available.
  • The data model is non-relational and is object-based. 
  • Distributed applications written on the .NET and Java application platforms are supported.
  • The data fabric is resilient, allowing non-disruptive automated detection and recovery of a single server or multiple servers.

There are also hardware appliances that exhibit all these characteristics.  I use the term in-memory data grid appliance to describe this group of products and these were excluded from my research.

There are six products in the market that I would consider for a proof of concept, or as a starting point for a product selection and evaluation: 

  • VMware Gemfire                                                (Java)
  • Oracle Coherence                                             (Java)
  • Alachisoft NCache                                             (.Net)
  • Gigaspaces XAP Elastic Caching Edition           (Java)
  • Hazelcast                                                          (Java)
  • Scaleout StateServer                                         (.Net)

 And here are the rest of products available in the market now, that I consider IMDGs:

  • IBM eXtreme Scale
  • Terracotta Enterprise Suite
  • Jboss (Redhat) Infinispan

 Relative newcomers to this space, and worthy of watching closely are Microsoft and Tibco.

Why would I want an In Memory Data Grid? 

Let’s compare this with our old friend the traditional relational database:

  • Performance – using RAM is faster than using disk.  No need to try and predict what data will be used next.  It’s already in memory to use.
  • Data Structure – using a key/value store allows greater flexibility for the application developer.  The data model and application code are inextricably linked.  More so than a relational structure.
  • Operations – Scalability and resiliency are easy to provide and maintain.  Software / hardware upgrades can be performed non-disruptively.

How does an In Memory Data Grid map to real business benefits?

  • Competitive Advantage – businesses will make better decisions faster.
  • Safety – businesses can improve the quality of their decision-making.
  • Productivity – improved business process efficiency reduces waster and likely to improve profitability.
  • Improved Customer Experience – provides the basis for a faster, reliable web service which is a strong differentiator in the online business sector.

How do use an In Memory Data Grid?

  1. Simply install your servers in a single site or across multiple sites.  Each group of servers within a site is referred to as a cluster.
  2. Install the IMDG software on all the servers and choose the appropriate topology for the product.  For multi-site operations I always recommend a partitioned and replicated cache.
  3. Setup your APIs, or GUI interfaces to allow replicated between the various servers.
  4. Develop your data model and the business logic around the model.

With a partitioned and replicated cache, you simply partition the cache on the servers that best suits the business needs to trying to fulfil, and the replicated part ensures there are sufficient copies across all the servers.  This means that if a server dies, there is no effect on the business service.  Providing you have provisioned enough capacity of course.

The key here is to design a topology that mitigates all business risk, so that if a server or a site is inoperable, the service keeps running seamlessly in the background. 

There are also some tough decisions you may need to make regarding data consistency vs performance.  You can trade the performance to improve data consistency and vice versa.

Are there any proven use cases for In Memory Data Grid adoption?

Oh yes, and if you’re a competitor in these markets, you may want to rethink your solution.

Financial Services: Improve decision-making, profitability and market competitiveness through increased performance in financial stock-trading markets. Reduction in processing times from 60 minutes to 60 seconds.

Online Retailer: Providing a highly available, easily maintainable and scalable solution for 3+ million visitors per month in the online card retailer market.

Aviation: Three-site active / active / active flight booking system for a major European budget-airline carrier. Three sites are London, Dublin and Frankfurt.

Check out the VMware Gemfire and Alachisoft NCache websites for more details on these proven use cases.

About the Author:

Paul Colmer is a technology consultant working for CSC and director and active professional musician for Music4Film.net.  He specialises in Cloud Computing, Social Business and Solution Architecture. He is based in Brisbane, Australia. http://www.linkedin.com/pub/paul-colmer/6/894/539

Reader Comments (10)

What this post is about?

December 21, 2011 | Unregistered CommenterVladimir Rodionov

Your article is an excellent summary of in-memory data grids and their benefits; thanks for sharing the results of your work. Near the end of your post you mention that an IMDG can provide a competitive edge due to its enabling faster application performance. We agree, and at ScaleOut Software we’ve extended the performance benefit of an IMDG by integrating a computational engine with our product. This enables fast analysis of stored data using the popular "map/reduce" style of computation. Users across a wide range of vertical applications find that this can give them a further competitive advantage by speeding up data mining and decision making.
I’m sure you’re aware of the huge buzz around Hadoop’s map/reduce model for analyzing big data. We use the same powerful approach, but by storing data in memory in the IMDG instead of in a distributed file system, we have observed much faster performance. (Many interesting datasets fit within the memory of an IMDG.) A recent test of a financial "stock back-testing" application demonstrated a 16X improvement over Hadoop. It also turns out that an IMDG is easier to use for data analysis because the IMDG's object-oriented view of data integrates naturally into Java and C#.
One final note: you list ScaleOut StateServer as a .NET solution. That’s correct, but it also provides equivalent support for Java. We believe that ScaleOut may be the only true cross platform IMDG since our underlying engine is written in C and includes both .NET and Java libraries. In fact, you can mix Java and .NET objects on both Windows and Linux IMDG servers operating as a single IMDG.

December 21, 2011 | Unregistered CommenterWilliam L. Bain

There is no mention of memcached. Can you also please compare the access APIs/protocols for each product?

December 21, 2011 | Unregistered CommenterShankar

I think this post misses the point.
Why do you mix languages with technologies?

You put in the requisites of an IMDG the support for .NET and Java?? What?

This post is only a list of well known best practices and some advertising for products.

December 22, 2011 | Unregistered CommenterPaolo Casciello

I think this post missing the point, and by a wide mark too.

I get the overall impression that the author has spent more time reading the marketing blurbs and hype rather than actually understand the requirements in this space and using the products. Its so limited on technical information and heavy on simplistic marketing commentary that its almost disinformation.

One simple point...as soon as you go distributed you must consider the network. Getting data out of distributed memory via the network is going to be orders of magnitude slower than retrieving data from local memory / disk. Obviously this can be solved by processing data locally. Hadoop and some of the products mentioned are a great example of this but its not mentioned once.

Was this a sponsored post?

December 22, 2011 | Unregistered CommenterT Mitchell

You really ought to mention TIBCO's ActiveSpaces as one of the products to seriously consider! Especially the new version 2.0 which has some unique features.

December 22, 2011 | Unregistered CommenterJNM

This post was quite painful to read. No mention of the downsides with regards to the trade offs with regards to CAP? How about operational durability during cascading network events? There's a place and time for everything and IMDGs can be quite useful, but treating it like a drop in replacement for the other technologies borders on the irresponsible.

December 22, 2011 | Unregistered CommenterTed Chen

Great posts guys. A couple of important points. It is NOT a sponsored post, I decided to share some of my research highlights, as many people outside of the specialist IT community have not heard of IMDGs and even those in the community may not know where they should start.

The technical detail is contained within the actual research paper (over 150 pages in 2 documents). What you see here is just a summary. I'll try and post some more of the detail over the next few weeks / months, so watch this space. If you're genuinely interested in those products, please contact the vendors. Many of them offer 30 day trials. Please post any genuine problems you find. Vendors are great at listening and fixing things up.

Keep the feedback coming. :-)

December 22, 2011 | Unregistered CommenterPaul Colmer

Paul, your article is a very good overview of IMDGs for non-techies. Thanks for posting it.
Some of the differentiators among the products you mentioned are partitioning model, indexing, remote execution and distributed transaction support.
You might want to know that GigaSpaces XAP comes also in a .Net version, not just Java. The premium edition provides further scalability for components like the web server, so it can basically replace the old generation app servers.
For financial trading, 1 second is considered an eternity, not to mention 60, but for large batch or analytics jobs (e.g. reconciliation or risk calculations) you may get this type of reduction in computation time.
Other interesting existing use cases from the real world are a social search engine, telco OSS software and online gaming.
Combining IMDG and big-data can also add some interesting solutions (imagine both of them deployed elastically on the cloud...).

December 26, 2011 | Unregistered CommenterElaad Teuerstein

I have read the specification looks like you have reimplemented erlang + mnesia....

February 19, 2012 | Unregistered Commenterapr

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>