Product

Product: Amazon's SimpleDB

Todd Hoff

Jan 20, 2009 — 6 min read

Update 35: How and Why Glue is Using Amazon SimpleDB instead of a Relational Database. Discusses a key design decision that required duplicating data in order to mimic RDBMS joins: Given the trade off between potential inconsistencies and scalability, social services have to choose the latter.
Update 34: Apparently Amazon pulled this article. I'm not sure what that means. Maybe time went backwards or something? Amazon dramatically drops SimpleDB pricing to $0.25 per GB per month from $1.50 per GB. This puts SimpleDB on par with Google App Engine. They also announced a few new features: a SQL-like SELECT API as well as a Batch Put operation to streamline uploading of multiple items or attributes. One of the complaints against SimpleDB is that programmers end up writing too much code to do simple things. These features and a much cheaper price should help considerably. And you can store lots of data now. GAE is still capped.
Update 33: Amazon announces Elastic Block Store (EBS), which provides lots of normal looking disk along with value added features like snapshots and snapshot copying. But database's may find EBS too slow. RightScale tells us Why Amazon’s Elastic Block Store Matters.
Update 32: You can now get all attributes for a property when querying. Previously only the ID was returned and the attributes had to be returned in separate calls. This makes the programmer's job a lot simpler. Artificial levels of parallelization code can now be dumped.
Update 31: Amazon fixes a major hole in SimpleDB by adding the ability to sort query results. Previously developers had to sort results by hand which was a non-starter for many. Now you can do basic top 10 type queries with ease.
Update 30: Amazon SimpleDB - A distributed, highly-scalable, light-weight, query-able, attribute store by Sebastian Stadil. It introduces the CAP theorem and the basics of SimpleDB. Sebastian does a lot of great work in the AWS world and in what must be his limited free time, runs the AWS Meetup group.

Update 29: A stroll down the history of a previous RDBMS killer, object databases. Lots of fond memories of the new kid on the block showing us how objects and code were one, the endless OO vs. relational wars, writing a OODBMS training course, dealing with object migration and querying etc, and the slow decline followed by groveling in front of the old master. It would be a terrible irony if a hash table succeeded where OODBMSs failed.
Update 28: I didn't make the beta program :-(
Update 27: IBM has hired CouchDB creator Damien Katz as their player in the game. Teams Microsoft, IBM, and Amazon have all entered the race. Amazon is 10 furlongs ahead, but watch for team Google, a fast finisher on the outside.
Update 26: Red Monk says Microsoft's Astoria project is SDBish, but developers are afraid of lock-in.
Update 25: Nati Shalom thinks SDB isn't even a database.
Update 24: Igvita asks why do you need SDB when Thrudb is faster and cheaper? It provides a memcached layer in front of a database storing data in S3. And even better, all its service names start with "thru" instead of "S".
Update 23: For all you Perl haters, the Perl interface to SDB is clean and beautiful.
Update 22: On an Erlang email list Jim Larson says the proper model is to store bulk data in S3 and indexable metadata in SimpleDB. The cost of SimpleDB is 10x for storing data versus S3. We are supposed to build our own inverted index for text searching, which is one of those decisions that sounds good in the meeting room (yay, we don't have to do all that work), but is not a good decision in the real world.
Update 21: Sensepost is already creating attack models to drain your bank account through repeated queries.
Update 20: Grow some stones, smoothspan says Eventual Consistency Is Not That Scary.
Update 19: Jacob Harris in A First Look at Amazon SimpleDB offers up some beta Ruby libraries for accessing SDB.
Update 18: Erlang folks hope to get some run, but Erlang the language is too different to go mainstream, though Erlang's concurrency model rocks. A while back I talked about how The Solution to C++ Threading is Erlang and how Java's concurrency approach is fundamentally broken.
Update 17: Subbu tirelessly provides a A RESTful version of Amazon's SimpleDB.
Update 16: Snarfed sees it as a sort of tuplespace implementation. Compare it to Facebook's API. Ning also has a data API.
Update 15: Uncom thinks Winer & Scoble Fail In Tandem. SDB's XML response has 1,755% transmission overhead, which is genius for a per byte pricing model. And I love this one: if you are starting a business whose success hinges on scalability of a data store, you had best figure out how to shard across N machines before you launch. Using a single instance of MySQL for the whole thing is a strong indicator that you have failed at life.
Update 14: Styled Bits sees SDB as more of a way to add metadata to S3 objects.
Update 13: Bex Huff makes the point you'll still need a caching layer in front of SDB.
Update 12: Shahzad Bhatti has been coding for SimpleDB for a few months and gives us a cool Java and Erlang API for basic CRUD operations.
Update 11: DBA4Life says Amazon has just flux capacited us back to 1980s style database management.
Update 10: Bob Warfield of SmoothSpan explains Why the Amazon SimpleDB is a Huge Next Step. It helps achieve the necessary "16:1 operations cost advantages over conventional software."
Update 9: SimpleDB is berkleyDB and 90% of all computing will live in cloud city. Will the Troglyte's revolt?
Update 8: Dave Winer says Amazon removes the database scaling wall by adding a storage ramp that scales up when needed and scales down when unneeded. You no longer need to buy expensive VC funded database talent to take your product to the next level.
Update 7: Kevin Burton in Google vs Amazon in Open Infrastructure has doubts about the entire hosted model. Bandwidth costs too much, it might hurt your acquisition chances, and you can't trust 'em. He just wants to lease managed raw machine power.
Update 6: Amazon SimpleDB and CouchDB compared. Some key differences: SimpleDB is hosted. CouchDB is REST/JSON and SimpleDB is REST/SOAP/XML. In SimpleDB attribute updates are atomic in CouchDB record updates are atomic. CouchDB supports JSON data types and SimpleDB thinks everything is a string. CouchDB has much more flexible indexing and queries.
Update 5: Sriram Krishnan gives a more technical overview of SimpleDB. He likes the big hash table approach and brings up how the query language allows for parallelization.
Update 4: Mark from areyouwatchingthis.com makes a really insightful point: I run a startup that gets 75% of our traffic from our API. The ability to move that processing and storage into a cloud _might_ save me a lot on hosting.
Update 3: Marcelo Calbucci thinks SimpleDB is more of a directory service than a database because records can contain different attributes (no schema) and attributes can have multiple values.
Update 2: Smug Mugs' Don MacAskill likes the service, but is concerned that field sizes are limited to 1024 characters and latency from far away datacenters. He thinks most queries will be easy to convert as they are predominantly hash like lookups anyway.
Update: Scoble asks if SimpleDB kills MySQL, Oracle, et al. The answer is no. Google has a similar service internally and they are still major users of and contributors to MySQL. Sometimes you just need structured data. So RDBMSs aren't dead. They just may not be the starting point as the barrier to entry for doing the simplest thing to start a website has plummeted. No more setup or admin. Just code and go.

The cherry missing from Amazon's AWS hot fudge sundae was a database service. They had a CPU scoop with EC2, they had storage scoop with S3, they had a work distribution scoop with their queue, but the database cherry was missing. Now they've added it and it's dessert time.

News of SimpleDB is everywhere. Apparently it's been in development for a while. You can read about it inside looking out, GIGAOM, Innowave, SimpleDB Developer's Guide, and the SimpleDB Home Page.

It seems to be a simple properties like store implemented on Erlang (as is CouchDB). It has simple query capabilities on attributes. It's fast and scalable. And At $0.14 per hour it's quite competitive with other options.

What it doesn't have is a text search or complex RDBMS style queries for structured data. It's not clear if the data are geographically distributed, in case you are interested in fast response times from different parts of the world. I would be very curious on the relationship between SimpleDB and Dynamo.

Even with these limitations it's a disruptive service. Most high speed websites use a property store for unstructured data and that's been hard for smaller groups to implement at scale. But if you're losing your mind trying to figure out how to store your data at scale, maybe you can now turn your attention to more productive problems.

Product: Amazon's SimpleDB

Todd Hoff

Read more

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale

The Swedbank Outage shows that Change Controls don't work