SimpleDB

Todd Hoff's picture

Hitting 300 SimbleDB Requests Per Second on a Small EC2 Instance

High Performance Multithreaded Access to Amazon SimpleDB is a great follow up to the idea in How SimpleDB Differs from a RDBMS that more programming is the price paid for performance in SimpleDB. It shows how much work and infrastructure is required to batter better performance out of SimpleDB.

Remember, in SimpleDB you get keys to records from queries so if you want to get all the fields for records you need to make separate requests. Since SimpleDB isn't exactly a speed daemon the obvious strategy is to parallelize. Even if a job takes a 100 msecs you can get a lot done in a little time if you can execute enough jobs in parallel.

Parallelization is the approach taken by Haakon@AWS in his Java code example of how to get the most out of SimpleDB. You can find the code at Indexing and Querying Amazon S3 Metadata with Amazon SimpleDB. We'll also consider how a back-end service architecture built on Erlang may be a better fit with cloud computing.

Todd Hoff's picture

The Search for the Source of Data - How SimpleDB Differs from a RDBMS

Update 2: Yurii responds with the Top 10 Reasons to Avoid Document Databases FUD.
Update: Top 10 Reasons to Avoid the SimpleDB Hype by Ryan Park provides a well written counter take. Am I really that fawning? If so, doesn't that make me a dear?

All your life you've used a relational database. At the tender age of five you banged out your first SQL query to track your allowance. Your RDBMS allegiance was just assumed, like your politics or religion would have been assumed 100 years ago. They now say--you know them--that relations won't scale and we have to do things differently. New databases like SimpleDB and BigTable are what's different. As a long time RDBMS user what can you expect of SimpleDB? That's what Alex Tolley of MyMeemz.com set out to discover. Like many brave explorers before him, Alex gave a report of his adventures to the Royal Society of the AWS Meetup. Alex told a wild almost unbelievable tale of cultures and practices so different from our own you almost could not believe him. But Alex brought back proof.

Using a relational database is a no-brainer when you have a big organization behind you. Someone else worries about the scaling, the indexing, backups, and so on. When you are out on your own there's no one to hear you scream when your site goes down. In these circumstances you just want a database that works and that you never have to worry about again. That's what attracted Alex to SimpleDB. It's trivial to setup and use, no schema required, insert data on the fly with no upfront preparation, and it will scale with no work on your part. You become free from DIAS (Database Induced Anxiety Syndrome). You don't have to think about or babysit your database anymore. It will just work. And from a business perspective your database becomes a variable cost rather than a high fixed cost, which is excellent for the angel food funding. Those are very nice features in a database. But for those with a relational database background there are some major differences that take getting used to.

Todd Hoff's picture

Microsoft's New Database Cloud Ready to Rumble with Amazon

Update: Zdnet says Ozzie signals Microsoft’s surrender to the cloud. CD ROMs are to the internet as the internet is to the cloud and Microsoft aims to scratch and claw its way into this paradigm shift as well.

The gloves are off. The tag line for Microsoft's new SQL Server Data Service is Your Data, Any Place, Any Time. Thems fighten' words. Microsoft is itch'n for a fight! Who will be Amazon's second?

The service description:
SQL Server Data Services (SSDS) are highly scalable, on-demand data storage and query processing utility services. Built on robust SQL Server database and Windows Server technologies, these services provide high availability, security and support standards-based web interfaces for easy programming and quick provisioning.

Sounds like a fast uppercut aimed squarely at SimpleDB's jaw. As a developer what do you need to know?

Todd Hoff's picture

The Current Pros and Cons List for SimpleDB

Not surprisingly opinions on SimpleDB vary from it sucks, don't take my database, to it will change the world, who needs a database anyway? From a quick survey of the blogosphere, here's where SimpleDB stands at the moment:

SimpleDB Cons

  • No SLA. We don't know how reliable it will be, how fast it will be, or how consistent the performance will be.
  • Consistency constraints are relaxed. Reading data immediately after a write may not reflect the latest updates. To programmers used to transactions, this may be surprising, but many people think this is one of the tradeoffs that needs to be made to scale.
  • Database is a core competency. If you don't control your database you can't out compete your competition.
  • When your database is out of your control you can't guarantee it will work properly. You can't create the proper indexes and other optimizations.
  • No join or IN operator. You'll need to do multiple client side calls to simulate joins, which will be slow.
  • No stored procedures, referential integrity, and other relational goodies. This is not a professional product.
  • Attribute size limited to 1024 bytes. It's not designed for content serving.
  • Latency from outside Amazon will be high.
  • Setting up and maintaining a database is cheap and easy these days, so why bother? It costs too much compared when compared to running your own servers.
  • What happens when you need to super scale to very large datasets?
  • No API support from common languages like PHP, Ruby, etc.
  • All your existing code and infrastructure needs to be rewritten.
  • Not geographically distributed with nearest datacenter routing.
  • Queries are lexigraphical. So you’ll need to store data in lexicographical order. This means says inside looking out: zero-padding your integers, adding positive offsets to negative integer sets, and converting dates into something like ISO 8601.
  • Attribute values are typeless which could lead to a lot of typing related errors and inefficient queries.
  • The 10 GB maximum per domain is too limiting.
  • It's not Dynamo. Amazon is keeping the really good stuff to themselves.
  • Text searching is not supported. You'll need to construct your own fast search indexes.
  • Queries are limited to 5 seconds running time. It's only for getting and setting, nothing more SQLish.
  • No cloned APIs for unit testing. Need to be able to develop locally against other data stores.
  • Your data is under Amazon's control, so there could be security and privacy problems.
  • The XML based protocol unnecessarily increases overhead, latency, and cost.
  • Lockin. If you decide to leave Amazon’s cloud how do you move all your data and get a similar system up and working outside the cloud?
  • Open cash register. Since SDB is charge on use, a malicious user can simply setup a loop to query your site, which costs you an unbounded amount of money.

    SimpleDB Pros

  • SimpleDB is not a relational database. Relational databases are too complex and don't scale well. Keeping data access simple is a selling point, not a weakness.
  • Low setup costs and pay-as-you-go expansion make it perfect for startups. The price is reasonable given the functionality and the hands off admin.
  • Setting up and maintaining a highly available clustered database that is constantly growing is extremely difficult. Building your application on a building block that does all this for you adds a lot of value.
  • Setting up a database inside EC2 is a pain. The makes getting basic database functionality trivial. No need to worry about scaling, capacity planning, or partitioning.
  • It has a decent query language, which is unusual for this type of data store.
  • Data are stored across multiple nodes which supports parallel query execution.
  • It's built on Erlang and that's cool.
  • You don't need to seek funding to hire a database team and buy hardware.

    Depending on how you weight each factor, SimpleDB could be way behind or way ahead of other options. What's interesting is to see what people think is important. For many people the only real database is relational and if it doesn't have transactions, joins, etc it's not real. Databases like beauty seem to be in the eye of the beholder.

  • Amazon SimpleDB - Scalable Cloud Database

    Amazon has announced the limited beta of Amazon SimpleDB - a simple web services interface to create and store multiple data sets, query your data easily, and return the results. Together with the Simple Storage Service (S3), Elastic Compute Cloud (EC2) and other web services Amazon offers a complete utility computing platform. SimpleDB was the missing piece of AWS - the scalable structured database.

    Check out my blog entry: http://innowave.blogspot.com/2007/12/amazon-simpledb-scalable-cloud-data...

    I was waiting for this one :-)

    Geekr

    Syndicate content