advertise
« Running TPC-C on MySQL/RDS | Main | Stuff The Internet Says On Scalability For June 17, 2011 »
Monday
Jun202011

35+ Use Cases for Choosing Your Next NoSQL Database

We've asked What The Heck Are You Actually Using NoSQL For?. We've asked 101 Questions To Ask When Considering A NoSQL Database. We've even had a webinar What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications.

Now we get to the point of considering use cases and which systems might be appropriate for those use cases.

What are your options?

First, let's cover what are the various data models. These have been adapted from Emil Eifrem and NoSQL databases

Document Databases

  • Lineage: Inspired by Lotus Notes.
  • Data model: Collections of documents, which contain key-value collections.
  • Example: CouchDB, MongoDB 
  • Good at: Natural data modeling. Programmer friendly. Rapid development. Web friendly, CRUD.
Graph Databases
  • Lineage: Euler and graph theory.
  • Data model: Nodes & relationships, both which can hold key-value pairs
  • Example: AllegroGraph, InfoGrid, Neo4j
  • Good at:  Rock complicated graph problems. Fast.
Relational Databases
  • Lineage: E. F. Codd in A Relational Model of Data for Large Shared Data Banks
  • Data Model: a set of relations
  • Example: VoltDB,  Clustrix, MySQL
  • Good at: High performing, scalable OLTP. SQL access. Materialized views. Transactions matter. Programmer friendly transactions.

Object Oriented Databases

  • Lineage: Graph Database Research
  • Data Model: Objects
  • Example: Objectivity, Gemstone
  • Good at: complex object models, fast key-value access, key-function access, and graph database functionality.

Key-Value Stores

  • Lineage: Amazon's Dynamo paper and Distributed HashTables.
  • Data model: A global collection of KV pairs.
  • Example: Membase, Riak
  • Good at: Handles size well. Processing a constant stream of small reads and writes. Fast. Programmer friendly.
BigTable Clones 
  • Lineage: Google's BigTable paper.
  • Data model: Column family, i.e. a tabular model where each row at least in theory can have an individual configuration of columns.
  • Example: HBase, Hypertable, Cassandra
  • Good at: Handles size well. Stream massive write loads. High availability. Multiple-data centers. MapReduce.
Data Structure Servers
  • Lineage: ?
  • Example: Redis
  • Data model: Operations over dictionaries, lists, sets and string values.
  • Good at: Quirky stuff you never thought of using a database for before.
Grid Databases
  • Lineage: Data Grid and Tuple Space research.
  • Data Model: Space Based Architecture
  • Example: GigaSpaces, Coherence
  • Good at: High performance and scalable transaction processing.

What should your application use?

  • Key point is to rethink how your application could work differently in terms of the different data models and the different products. Right data model for the right problem. Right product for the right problem.
  • To see what models might help your application take a look at What The Heck Are You Actually Using NoSQL For? In this article I tried to pull together a lot of unconventional use cases of the different qualities and features developers have used in building systems. 
  • Match what you need to do with these use cases. From there you can backtrack to the products you may want to include in your architecture. NoSQL, SQL, it doesn't matter.
  • Look at Data Model + Product Features + Your Situation. Products have such different feature sets it's almost impossible to recommend by pure data model alone.
  • Which option is best is determined by your priorities.

 If your application needs...

  • complex transactions because you can't afford to lose data or if you would like a simple transaction programming model then look at a Relational or Grid database.
    • Example: an inventory system that might want full ACID. I was very unhappy when I bought a product and they said later they were out of stock. I did not want a compensated transaction. I wanted my item!
  • to scale then NoSQL or SQL can work. Look for systems that support scale-out, partitioning, live addition and removal of machines, load balancing, automatic sharding and rebalancing, and fault tolerance.
  • to always be able to write to a database because you need high availability then look at Bigtable Clones which feature eventual consistency.
  • to handle lots of small continuous reads and writes, that may be volatile, then look at Document or Key-value or databases offering fast in-memory access. Also consider SSD.
  • to implement social network operations then you first may want a Graph database or second, a database like Riak that supports relationships. An in- memory relational database with simple SQL joins might suffice for small data sets. Redis' set and list operations could work too.

If your application needs...

  • to operate over a wide variety of access patterns and data types then look at a Document database, they generally are flexible and perform well.
  • powerful offline reporting with large datasets then look at Hadoop first and second, products that support MapReduce. Supporting MapReduce isn't the same as being good at it.
  • to span multiple data-centers then look at Bigtable Clones and other products that offer a distributed option that can handle the long latencies and are partition tolerant.
  • to build CRUD apps then look at a Document database, they make it easy to access complex data without joins. 
  • built-in search then look at Riak.
  • to operate on data structures like lists, sets, queues, publish-subscribe then look at Redis. Useful for distributed locking, capped logs, and a lot more.
  • programmer friendliness in the form of programmer friendly data types like JSON, HTTP, REST, Javascript then first look at Document databases and then Key-value Databases.

If your application needs...

  • transactions combined with materialized views for real-time data feeds then look at VoltDB. Great for data-rollups and time windowing.
  • enterprise level support and SLAs then look for a product that makes a point of catering to that market. Membase is an example.
  • to log continuous streams of data that may have no consistency guarantees necessary at all then look at Bigtable Clones because they generally work on distributed file systems that can handle a lot of writes.
  • to be as simple as possible to operate then look for a hosted or PaaS solution because they will do all the work for you.
  • to be sold to enterprise customers then consider a Relational Database because they are used to relational technology.
  • to dynamically build relationships between objects that have dynamic properties then consider a Graph Database because often they will not require a schema and models can be built incrementally through programming.
  • to support large media then look storage services like S3. NoSQL systems tend not to handle large BLOBS, though MongoDB has a file service.

If your application needs...

  • to bulk upload lots of data quickly and efficiently then look for a product supports that scenario. Most will not because they don't support bulk operations.
  • an easier upgrade path then use a fluid schema system like a Document Database or a Key-value Database because it supports optional fields, adding fields, and field deletions without the need to build an entire schema migration framework.
  • to implement integrity constraints then pick a database that support SQL DDL, implement them in stored procedures, or implement them in application code.
  • a very deep join depth the use a Graph Database because they support blisteringly fast navigation between entities.
  • to move behavior close to the data so the data doesn't have to be moved over the network then look at stored procedures of one kind or another. These can be found in Relational, Grid, Document, and even Key-value databases.

If your application needs...

  • to cache or store BLOB data then look at a Key-value store. Caching can for bits of web pages, or to save complex objects that were expensive to join in a relational database, to reduce latency, and so on.
  • a proven track record like not corrupting data and just generally working then pick an established product and when you hit scaling (or other issues) use on of the common workarounds (scale-up, tuning, memcached, sharding, denormalization, etc).
  • fluid data types because your data isn't tabular in nature, or requires a flexible number of columns, or has a complex structure, or varies by user (or whatever), then look at Document, Key-value, and Bigtable Clone databases. Each has a lot of flexibility in their data types.
  • other business units to run quick relational queries so you don't have to reimplement everything then use a database that supports SQL.
  • to operate in the cloud and automatically take full advantage of cloud features then we may not be there yet.   

If your application needs...

  • support for secondary indexes so you can look up data by different keys then look at relational databases and Cassandra's new secondary index support.
  • creates an ever-growing set of data (really BigData) that rarely gets accessed then look at Bigtable Clone which will spread the data over a distributed file system.
  • to integrate with other services then check if the database provides some sort of write-behind syncing feature so you can capture database changes and feed them into other systems to ensure consistency.
  • fault tolerance check how durable writes are in the face power failures, partitions, and other failure scenarios.
  • to push the technological envelope in a direction nobody seems to be going then build it yourself because that's what it takes to be great sometimes.
  • to work on a mobile platform then look at CouchDB/Mobile couchbase.

Which is Better?

  • Moving for a 25% improvement is probably not a reason to go NoSQL.
  • Benchmark relevancy depends on the use case. Does it match your situation(s)?
  • Are you a startup that needs to release a product as soon as possible and you are playing around with ideas? Both SQL and NoSQL can make an argument.
  • Performance may be equal on one box, but what happens when you need N?
  • Everything has problems, if you look at Amazon forums it's EBS is slow, or my instances won't reply, etc. For GAE it's the datastore is slow or X. Every product which people are using will have problems. Are you OK with the problems of the system you've selected?

Reader Comments (22)

Good post. I would further say that if you need fluid data types and integrate with other services then you should seriously consider an RDF store (graph model), such as Jena, Sesame, or OWLIM. RDF and Linked Data is the only standard format for combining data on the Web. I will be speaking about this on August 24th at NoSQL Now! in San Jose.

June 20, 2011 | Unregistered CommenterJames Leigh

For object orientated databases you might also want to take a look at JOOB, http://www.joobworld.com/

disclosure: I work for them :)

June 21, 2011 | Unregistered CommenterDr Danyo

GemFire is a Grid Database (i.e. in same category as Coherence); Wall Street and Federal government deploy the product heavily in transactional data grid use cases.

SQLFire is a [new] Relational DB that is horizontally scalable, memory-oriented SQL; SQLFire builds on GemFire foundation but adds full SQL support.

GemStone/S is an Object Database for the Smalltalk market.

All three originate from the GemStone team (now at VMware), but each are distinct products targeting different use cases.

June 21, 2011 | Unregistered CommenterShaun Connolly

What about DB4O? I never seem to see this db mentioned when discussing NoSQL

June 21, 2011 | Unregistered CommenterSean

You seem to be missing Raven?

June 21, 2011 | Unregistered CommenterEric

I wouldn't call JavaScript 'programmer friendly'....

June 21, 2011 | Unregistered Commenter-

Would you go for web programmer friendly? :-)

June 21, 2011 | Registered CommenterTodd Hoff

Great post, a very useful cheat sheet on NoSQL. Perhaps the next post could highlight problems/deficiencies of the various NoSQL products.

June 21, 2011 | Unregistered CommenterFJ

Very good post indeed. But JavaScript based DB access is not programmer friendly ,hope java api's are available and easy to use (or should be developed atleast)

June 22, 2011 | Unregistered Commenterravindra pitambare

If you would like to write that article FJ just let me know.

June 22, 2011 | Registered CommenterTodd Hoff

Where do storages like memcachedb and tarantool fit in?

June 23, 2011 | Registered CommenterKonstantin Osipov

what about voldemort?

June 23, 2011 | Unregistered CommenterEC

what the heck yin and yang has to do with the text? what's the point?

June 24, 2011 | Unregistered CommenterLuiz K.

If your application needs:
- to scale ACID transactions to hundreds or thousands of concurrent users
- possibly very large tables (billions of rows)
- the ability to alter tables, add indexes, expand capacity, and back up while in full production
- a database that gracefully handles hardware faults and is self-healing
- to speak MySQL or integrate with an existing MySQL environment

... then consider Clustrix.

June 24, 2011 | Unregistered CommenterBen

"Goog at" should be "Good at"

June 26, 2011 | Unregistered CommenterTyrael

Very useful article.

June 28, 2011 | Unregistered CommenterJong

What about embeddable databases? TokyoCabinet, leveldb, berkeley db, etc?

July 6, 2011 | Unregistered Commenteranonymous

Hi,
Very nice article,
Which nosql database would you propose for "write load" and "index on many fields" ?

November 24, 2011 | Unregistered CommenterMehdi eshaghi

Could anybody please tell me which NoSql to use, mongodb or couchdb for multiplayer mobile games.?

May 1, 2012 | Unregistered CommenterAmit

I would definitely add that one should consider 'Oracle NoSQL Database' if they are considering using Key-Value. For more details: http://www.oracle.com/technetwork/products/nosqldb/overview/index.html

October 10, 2012 | Unregistered CommenterAnuj Sahni

(disclaimer: I work for Garantia Data which provides a Redis Cloud service)

Thanks for this comprehensive overview of application needs and NoSQL solutions - of the broad plethora of needs shown here, I have a particular interest in these six:

Scale, Friendliness, Enterprise Level Support, Simple as Possible, Operate in Cloud, Fault Tolerance

Users with this set of needs can be characterized as "mission critical, hands off" - they have a mission critical application, they need the scalability, fault tolerance and enterprise-level support, but they need it to work transparently and don't want to / don't have the resources to deal with it day-to-day.

This is quite a common application profile these days, and I thought it would be useful, as you have done for each need separately, to list some products that meet the needs of the "mission critical, hands off" crowd:

* Amazon DynamoDB - high performance (running on SSD), transparent scalability, as simple as you get - if key-value functionality is sufficient.

* Mongolabs - MongoDB as a service, scale up as far as you want, document model supports more complex queries. Runs on disk with RAM caching (affects write latency). Multi-cloud support is a plus.

* Cloudant - another well-regarded document-based database, provided as a service, with a "regular" shared server plan and a "heavy" dedicated cluster option.

* ClearDB - in the spirit of your article, which considered relational DBs as a relevant option (and I agree), ClearDB is a MySQL database as a service running on Amazon or Windows Azure, with built-in high availability, uses master-master replication to scale.

* Lastly, I should also mention our home-grown cloud solution, Redis Cloud - it's like the open source Redis (in-memory, query functionality somewhere in between key-value and document store DBs) but offered on the cloud with automatic clustering and fault tolerance.

July 17, 2013 | Unregistered CommenterItamar Haber

My startup spent alot of time evaluating NoSQL before going with DynamoDB. After a few months in production the hassle and expense of throughput provisioning became too much and we ended up turning Amazon S3 into our NoSQL datastore achieving infinite scalability and throughput combined with zero maintenance and incredible cost reductions.

You can read the case study at: http://www.s3nosql.com

November 6, 2013 | Unregistered CommenterGary Kraft

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>