amazon

Damn, Which Database do I Use Now?

High Scalability

04 Nov 2009 — 2 min read

With so many database options available these days, like for the rest of life, it's natural to wonder how it all fits together. Amazon complicated, or rather expanded the available options by introducing RDS, their relational database service. RDS is MySQL safely cocooned as a manageable cloud element, resting boldly within an energy providing elastic CPU pool, supported by a virtually infinite supply of very capable virtualized storage .

MySQL in AWS is now easy to start, stop, monitor, backup, snapshot, expand, and effortlessly move up and down the instance hierarchy. What it's not, contrary to what you might expect, is a scale-out solution, it's a scale-up solution. You get more by buying a bigger instance, not by horizontally adding more instances. There's a limit. Admittedly a larger limit now with Amazon's new high memory instances.

That's OK, well maybe not for people who helped grow Amazon's ecosystem by offering a similar product, but so many projects use MySQL that this is a big win for a lot of people. It makes life easier even if the promise of infinite relational database storage has yet to be realized.

If one of the reasons you were considering using a Platform as a Service is to knock the database item off your worry list, RDS is one more reason to consider playing your own general contractor and orchestrating all the elements together yourself. As more services become packaged into cloud capable components this is likely how many systems will be bolted together in the future.

But we are left wondering, how RDS fits together with SimpleDB and all the other database options?

This is where James Hamilton comes in with his sense making One Size Does Not Fit All posting. As an organizing principle James came up with this taxonomy for databases based on what the customer wants to do, their intent:

features-first - This is the relation database bucket where the database provides a lot of high level features that make the programmer's job a lot easier. The downside is scaling is hard or very expensive. Examples: Oracle, SQL Server, DB2, MySQL, PostgreSQL, RDS.
scale-first - These are systems that must scale from the start. They generally lack features and put a lot more responsibility back on the programmer. Examples: Voldemort, Ringo, SimpleDB, Ringo, Kai, Dynomite, PNUTS, ThruDB, Hypertable, CouchDB, Cassandra, MemcacheDB.
simple structure storage - These are key-value stores where the emphasis is on storing and getting values of arbitrary structure. They generally don't have the features or scalability of other systems. Examples: file systems, Cassandra, BerkelyDB, SimpleDB.
purpose-optimized stores - These are databases systems built to be good at one thing, like data warehousing. Examples: StreamBase, Vertica, VoltDB, Aster Data, Netezza, Greenplum.

I'm not sure where a graph database like Neo4j would fit in the list, but James has created a useful way to look at things and of course he provides a lot more detail in his post. So all you have to do is figure out your intention, find out where you are belong in the list, and make a choice :-) I know, you want it all, you deserve it all, but that doesn't seem to be happening. Sites for a very long time will be forged from a mix of technologies and we are the master mixoligists.

Windows Azure
NoSQL is a Horsless Carriage by Steve Yen. A nice NoSQL taxonomy: key‐value‐cache, key‐value‐store, eventually‐consistent key‐value‐store, ordered‐key‐value‐store, data‐structures server, tuple‐store, object database, document store, wide columnar store.