Product: Amazon's SimpleDB

Todd Hoff's picture

Update 30: Amazon SimpleDB - A distributed, highly-scalable, light-weight, query-able, attribute store by Sebastian Stadil. It introduces the CAP theorem and the basics of SimpleDB. Sebastian does a lot of great work in the AWS world and in what must be his limited free time, runs the AWS Meetup group.

Update 29: A stroll down the history of a previous RDBMS killer, object databases. Lots of fond memories of the new kid on the block showing us how objects and code were one, the endless OO vs. relational wars, writing a OODBMS training course, dealing with object migration and querying etc, and the slow decline followed by groveling in front of the old master. It would be a terrible irony if a hash table succeeded where OODBMSs failed.
Update 28: I didn't make the beta program :-(
Update 27: IBM has hired CouchDB creator Damien Katz as their player in the game. Teams Microsoft, IBM, and Amazon have all entered the race. Amazon is 10 furlongs ahead, but watch for team Google, a fast finisher on the outside.
Update 26: Red Monk says Microsoft's Astoria project is SDBish, but developers are afraid of lock-in.
Update 25: Nati Shalom thinks SDB isn't even a database.
Update 24: Igvita asks why do you need SDB when Thrudb is faster and cheaper? It provides a memcached layer in front of a database storing data in S3. And even better, all its service names start with "thru" instead of "S".
Update 23: For all you Perl haters, the Perl interface to SDB is clean and beautiful.
Update 22: On an Erlang email list Jim Larson says the proper model is to store bulk data in S3 and indexable metadata in SimpleDB. The cost of SimpleDB is 10x for storing data versus S3. We are supposed to build our own inverted index for text searching, which is one of those decisions that sounds good in the meeting room (yay, we don't have to do all that work), but is not a good decision in the real world.
Update 21: Sensepost is already creating attack models to drain your bank account through repeated queries.
Update 20: Grow some stones, smoothspan says Eventual Consistency Is Not That Scary.
Update 19: Jacob Harris in A First Look at Amazon SimpleDB offers up some beta Ruby libraries for accessing SDB.
Update 18: Erlang folks hope to get some run, but Erlang the language is too different to go mainstream, though Erlang's concurrency model rocks. A while back I talked about how The Solution to C++ Threading is Erlang and how Java's concurrency approach is fundamentally broken.
Update 17: Subbu tirelessly provides a A RESTful version of Amazon's SimpleDB.
Update 16: Snarfed sees it as a sort of tuplespace implementation. Compare it to Facebook's API. Ning also has a data API.
Update 15: Uncom thinks Winer & Scoble Fail In Tandem. SDB's XML response has 1,755% transmission overhead, which is genius for a per byte pricing model. And I love this one: if you are starting a business whose success hinges on scalability of a data store, you had best figure out how to shard across N machines before you launch. Using a single instance of MySQL for the whole thing is a strong indicator that you have failed at life.
Update 14: Styled Bits sees SDB as more of a way to add metadata to S3 objects.
Update 13: Bex Huff makes the point you'll still need a caching layer in front of SDB.
Update 12: Shahzad Bhatti has been coding for SimpleDB for a few months and gives us a cool Java and Erlang API for basic CRUD operations.
Update 11: DBA4Life says Amazon has just flux capacited us back to 1980s style database management.
Update 10: Bob Warfield of SmoothSpan explains Why the Amazon SimpleDB is a Huge Next Step. It helps achieve the necessary "16:1 operations cost advantages over conventional software."
Update 9: SimpleDB is berkleyDB and 90% of all computing will live in cloud city. Will the Troglyte's revolt?
Update 8: Dave Winer says Amazon removes the database scaling wall by adding a storage ramp that scales up when needed and scales down when unneeded. You no longer need to buy expensive VC funded database talent to take your product to the next level.
Update 7: Kevin Burton in Google vs Amazon in Open Infrastructure has doubts about the entire hosted model. Bandwidth costs too much, it might hurt your acquisition chances, and you can't trust 'em. He just wants to lease managed raw machine power.
Update 6: Amazon SimpleDB and CouchDB compared. Some key differences: SimpleDB is hosted. CouchDB is REST/JSON and SimpleDB is REST/SOAP/XML. In SimpleDB attribute updates are atomic in CouchDB record updates are atomic. CouchDB supports JSON data types and SimpleDB thinks everything is a string. CouchDB has much more flexible indexing and queries.
Update 5: Sriram Krishnan gives a more technical overview of SimpleDB. He likes the big hash table approach and brings up how the query language allows for parallelization.
Update 4: Mark from areyouwatchingthis.com makes a really insightful point: I run a startup that gets 75% of our traffic from our API. The ability to move that processing and storage into a cloud _might_ save me a lot on hosting.
Update 3: Marcelo Calbucci thinks SimpleDB is more of a directory service than a database because records can contain different attributes (no schema) and attributes can have multiple values.
Update 2: Smug Mugs' Don MacAskill likes the service, but is concerned that field sizes are limited to 1024 characters and latency from far away datacenters. He thinks most queries will be easy to convert as they are predominantly hash like lookups anyway.
Update: Scoble asks if SimpleDB kills MySQL, Oracle, et al. The answer is no. Google has a similar service internally and they are still major users of and contributors to MySQL. Sometimes you just need structured data. So RDBMSs aren't dead. They just may not be the starting point as the barrier to entry for doing the simplest thing to start a website has plummeted. No more setup or admin. Just code and go.

The cherry missing from Amazon's AWS hot fudge sundae was a database service. They had a CPU scoop with EC2, they had storage scoop with S3, they had a work distribution scoop with their queue, but the database cherry was missing. Now they've added it and it's dessert time.

News of SimpleDB is everywhere. Apparently it's been in development for a while. You can read about it inside looking out, GIGAOM, Innowave, SimpleDB Developer's Guide, and the SimpleDB Home Page.

It seems to be a simple properties like store implemented on Erlang (as is CouchDB). It has simple query capabilities on attributes. It's fast and scalable. And At $0.14 per hour it's quite competitive with other options.

What it doesn't have is a text search or complex RDBMS style queries for structured data. It's not clear if the data are geographically distributed, in case you are interested in fast response times from different parts of the world. I would be very curious on the relationship between SimpleDB and Dynamo.

Even with these limitations it's a disruptive service. Most high speed websites use a property store for unstructured data and that's been hard for smaller groups to implement at scale. But if you're losing your mind trying to figure out how to store your data at scale, maybe you can now turn your attention to more productive problems.

Comments

cbmeeks's picture

Re: Product: Amazon's SimpleDB

I think it's a great idea and I can't wait to start using it.

I've seen a few blogs mention that it doesn't have this or it doesn't have that (like a traditional RDBMS).

People also complained that S3 didn't have this and it didn't have that. Ugh.

S3, SQS, SimpleDB.....SIMPLE!!!

Some people WANT simple as long as it works for them.

One thing I am really curious on is how fast is fast? Amazon claims it will be super fast...or something like that.

http://codershangout.com
A place for coders to hangout!

Re: Product: Amazon's SimpleDB

Todd first of all i'd like to thank you for the coverage of all this interesting topics on this site - i found it very useful, keep on the good work.

"Update 25: Nati Shalom thinks SDB isn't even a database."

To put things in the right context i think that SimpleDB is a very interesting product - my comment is that SimpleDB is not yet another database, and shouldn't be measured as such, any attempt to do so just misses the point of what it tries to bring to the table.

See a snippet from my post Amazon SimpleDB is not a database! below:

SimpleDB seems to address a need that I have seen referred to as Document-Driven Databases, in which records aren’t grouped by their structure but by their attributes. ORM tools, such as Hibernate or the Active Record pattern,
attempt to address this requirement by hiding the underlying relational model. They, however, still inherit the complexity and limitation of the underlying relational model.

Having said that, SimpeleDB is clearly not a solution for every scenario. In fact, it solves only a limited set of scenarios, such as the one described above.  As a disruptive technology, I expect that it will take some time before there is enough experience and patterns to use it correctly in the architecture.

The introduction of SimpleDB occurred mainly due to the limitations of existing database implementations, and how well they fit (or rather, don't fit) with the cloud computing model. There are other approaches that can be used to address these limitations, some of which I covered in my recent posts PaaS – Persistence as a Service (using Hibernate) (which discusses how you can address such requirements while keeping the data in the existing database) and The Missing Piece in Cloud Computing: Middleware Virtualiztion (which provides a broader context on the need for virtualization of the entire middleware stack, not just the data store, to make better use of cloud computing).

There are other solutions, some of which are in the making as we speak, such as the integration of Lucene/Compass and GigaSpaces. I'm sure that there are other solutions that aim to solve this challenge that I'm not aware of, so my recommendation is simple: before going  down the SimpleDB path, take a good look at your application requirements, and make sure that it is the right solution for your problems.

Nati S.
GigaSpaces
Write Once Scale Anywhere

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd><div ?=?><p ?=?> <img ?=?> <embed ?=?> <h1 ?=?><h2 ?=?><h3 ?=?>
  • Lines and paragraphs break automatically.
  • Glossary terms will be automatically marked with links to their descriptions
  • You may link to webpages through the weblinks registry

More information about formatting options

To combat spam, please enter the code in the image.