advertise
« Stuff The Internet Says On Scalability For June 17, 2011 | Main | Shakespeare on Why Other People Like Such Stupid Stuff »
Wednesday
Jun152011

101 Questions to Ask When Considering a NoSQL Database

You need answers, I know, but all I have here are some questions to consider when thinking about which database to use. These are taken from my webinar What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications. It's a companion article to What The Heck Are You Actually Using NoSQL For?

Actually, I don't even know if there are a 101 questions, but there are a lot/way too many. You might want to use these questions as kind of a NoSQL I Ching, guiding your way through the immense possibility space of options that are in front of you. Nothing is fated, all is interpreted, but it might just trigger a new insight or two along the way.

Where are you starting from?

  • A can do anything green field application?
  • In the middle of a project and worried about hitting bottlenecks? 
  • Worried about hitting the scaling wall once you deploy?
  • Adding a separate loosely coupled service to an existing system?
  • What are your resources? expertise? budget?
  • What are your pain points? What's so important that if it fails you will fail? What forces are pushing you?
  • What are your priorities? Prioritize them. What is really important to you, what must get done?
  • What are your risks? Prioritize them. Is the risk of being unavailable more important than being inconsistent?

What are you trying to accomplish?

  • What are you trying to accomplish? 
  • What's the delivery schedule? 
  • Do the research to be specific, like Facebook did with their messaging system:
Facebook chose HBase because they monitored their usage and figured out what was needed: a system that could handle two types of data patterns.

Things to Consider...Your Problem

  • Do you need to build a custom system?
  • Access patterns: 1) A short set of temporal data that tends to be volatile 2) An ever-growing set of data that rarely gets accessed 3) High write loads 4) High throughput, 5) Sequential, 6) Random
  • Requires scalability?
  • Is availability more important than consistency, or is it latency, transactions, durability, performance, or ease of use?
  • Cloud or colo? Hosted services? Resources like disk space?
  • Can you find people who know the stack?
  • Tired of the data transformation (ORM) treadmill? 
  • Store data that can be accessed quickly and is used often?
  • Would like a high level interface like PaaS?  

Things to Consider...Money

  • Cost? With money you have different options than if you don't. You can probably make the technologies you know best scale.
  • Inexpensive scaling?
  • Lower operations cost? 
  • No sysadmins?
  • Type of license?
  • Support costs?

Things to Consider...Programming

  • Flexible datatypes and schemas?
  • Support for which language bindings?
  • Web support: JSON, REST, HTTP, JSON-RPC
  • Built-in stored procedure support? Javascript?
  • Platform support: mobile, workstation, cloud
  • Transaction support: key-value,  distributed, ACID, BASE, eventual consistency, multi-object ACID transactions.
  • Datatype support: graph, key-value, row, column, JSON, document, references, relationships, advanced data structures, large BLOBs.
  • Prefer the simplicity of transaction model where you can just update and be done with it? In-memory makes it fast enough and big systems can fit on just a few nodes.

Things to Consider...Performance

  • Performance metrics: IOPS/sec, reads, writes, streaming?
  • Support for your access pattern: random read/write; sequential read/write; large or small or whatever chunk size you use. 
  • Are you storing frequently updated bits of data? 
  • High Concurrency vs High Performance?
  • Problems that limit the type of work load you care about?
  • Peak QPS on highly-concurrent workloads?
  • Test your specific scenarios?

Things to Consider...Features

  • Spooky scalability at a distance: support across multiple data-centers?
  • Ease of installation, configuration, operations, development, deployment, support, manage, upgrade, etc.
  • Data Integrity: In DDL, Stored Procedure, or App
  • Persistence design: Memtable/SSTable; Apend-only B-tree; B-tree; On-disk linked lists; In-memory  replicated; In-memory snapshots; In-memory only; Hash; Pluggable.
  • Schema support: none, rigid, optional, mixed
  • Storage model: embedded, client/server, distributed, in-memory
  • Support for search, secondary indexes, range queries, ad-hoc queries, MapReduce?
  • Hitless upgrades?

Things to Consider...More Features

  • Tunability of consistency models?
  • Tools availability and product maturity?
  • Expand rapidly? Develop rapidly? Change rapidly?
  • Durability? On power failure?
  • Bulk import? Export? 
  • Hitless upgrades?
  • Materialized views for rollups of attributes?
  • Built-in web server support?
  • Authentication, authorization, validation?
  • Continuous write-behind for system sync?
  • What is the story for availability, data-loss prevention, backup and restore?
  • Automatic load balancing, partitioning, and repartitioning?
  • Live addition and removal of machines?

Things to Consider...The Vendor

  • Viability of the company? 
  • Future direction?
  • Community and support list quality?
  • Support responsiveness?
  • How do they handle disasters?
  • Quality and quantity of partnerships developed?
  • Customer support: enterprise-level SLA, paid support, none

Reader Comments (7)

A very refreshing post on the SQL vs NOSQL debate. I whole heartedly agree that people need to focus more on educating themselves on how these technologies should be employed, their strengths and weaknesses, where they fit, what they bring when they fit, instead of defending their camps, starting flame wars, or hyping stuff for the fun of it, in the long run that just spreads dumbness.
The slides are a fantastic primer for anyone who is feeling the pressure to go NOSQL. There are tons of good reasons, but it takes a while to figure out which of the many choices best fits your businesses use-cases/needs. So a big UP for this post as it is wise, correct, and anti-hype :)

June 15, 2011 | Unregistered CommenterRussell Sullivan

Lots of questions, but no answers or recommendations based on those questions.
Not very helpful.

June 16, 2011 | Unregistered CommenterMorgan

Isn't know which questions to ask the true path to wisdom?

June 18, 2011 | Registered CommenterTodd Hoff

Indeed no answers to all those questions ... Totally misleading. Sry man.

June 20, 2011 | Unregistered CommenterPanos

Search for the answers within Panos. Only there will you find true wisdom.

June 20, 2011 | Registered CommenterTodd Hoff

All questions no answers... could have been good.

July 11, 2011 | Unregistered CommenterBryan

If someone gave answers to each of the questions, it would be the best post on choosing NOSQL or Relational database systems over the internet. Anyway good effort to list all these questions.

April 28, 2014 | Unregistered CommenterPrabhu

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>