Why isn't Google's aggressive new database pricing strategy getting more pub? That's what Bill Katz, instigator of the GAE Meetup and prize winning science fiction author is wondering:
It's surprising that the blogosphere hasn't picked up the biggest difference in pricing: Google's datastore is less than a tenth of the price of Amazon's SimpleDB while offering a better API.
If money matters to you then the burn rate under GAE could be convincingly lower. Let's compare the numbers:
Update: Aaron Worsham Interview with James Lindenbaum, CEO of Heroku. Aaron nicely sums up their goal: Heroku is looking to eliminate all the reasons companies have for not doing software projects.
Adam Wiggins of Heroku presented at the lollapalooza that was theCloud Computing Demo Night. The idea behind Heroku is that you upload a Rails application into Heroku and it automatically deploys into EC2 and it automatically scales using behind the scenes magic. They call this "liquid scaling." You just dump your code and go. You don't have to think about SVN, databases, mongrels, load balancing, or hosting. You just concentrate on building your application. Heroku's unique feature is their web based development environment that lets you develop applications completely from their control panel. Or you can stick with your own development environment and use their API and Git to move code in and out of their system.
For website developers this is as high up the stack as it gets. With Heroku we lose that "build your first lightsaber" moment marking the transition out of apprenticeship and into mastery. Upload your code and go isn't exactly a heroes journey, but it is damn effective...
High Performance Multithreaded Access to Amazon SimpleDB is a great follow up to the idea in How SimpleDB Differs from a RDBMS that more programming is the price paid for performance in SimpleDB. It shows how much work and infrastructure is required to batter better performance out of SimpleDB.
Remember, in SimpleDB you get keys to records from queries so if you want to get all the fields for records you need to make separate requests. Since SimpleDB isn't exactly a speed daemon the obvious strategy is to parallelize. Even if a job takes a 100 msecs you can get a lot done in a little time if you can execute enough jobs in parallel.
Parallelization is the approach taken by Haakon@AWS in his Java code example of how to get the most out of SimpleDB. You can find the code at Indexing and Querying Amazon S3 Metadata with Amazon SimpleDB. We'll also consider how a back-end service architecture built on Erlang may be a better fit with cloud computing.
Update 2: Yurii responds with the Top 10 Reasons to Avoid Document Databases FUD.
Update: Top 10 Reasons to Avoid the SimpleDB Hype by Ryan Park provides a well written counter take. Am I really that fawning? If so, doesn't that make me a dear?
All your life you've used a relational database. At the tender age of five you banged out your first SQL query to track your allowance. Your RDBMS allegiance was just assumed, like your politics or religion would have been assumed 100 years ago. They now say--you know them--that relations won't scale and we have to do things differently. New databases like SimpleDB and BigTable are what's different. As a long time RDBMS user what can you expect of SimpleDB? That's what Alex Tolley of MyMeemz.com set out to discover. Like many brave explorers before him, Alex gave a report of his adventures to the Royal Society of the AWS Meetup. Alex told a wild almost unbelievable tale of cultures and practices so different from our own you almost could not believe him. But Alex brought back proof.
Using a relational database is a no-brainer when you have a big organization behind you. Someone else worries about the scaling, the indexing, backups, and so on. When you are out on your own there's no one to hear you scream when your site goes down. In these circumstances you just want a database that works and that you never have to worry about again. That's what attracted Alex to SimpleDB. It's trivial to setup and use, no schema required, insert data on the fly with no upfront preparation, and it will scale with no work on your part. You become free from DIAS (Database Induced Anxiety Syndrome). You don't have to think about or babysit your database anymore. It will just work. And from a business perspective your database becomes a variable cost rather than a high fixed cost, which is excellent for the angel food funding. Those are very nice features in a database. But for those with a relational database background there are some major differences that take getting used to.
It's been a few days now since GAE (Google App Engine) was released and we had our First Look. It's high time for a retrospective. Too soon? Hey, this is Internet time baby. So how is GAE doing? I did get an invite so hopefully I'll have a more experience grounded take a little later. I don't know Python and being the more methodical type it may take me a while. To perform our retrospective we'll take a look at the three sources of information available to us: actual applications in the AppGallery, blogspew, and developer issues in the forum.
The result: a cautious thumbs up. The biggest issue so far seems to be the change in mindset needed by developers to use GAE. BigTable is not MySQL. The runtime environment is not a VM. A service based approach is not the same as using libraries. A scalable architecture is not the same as one based on optimizing speed. A different approach is needed, but as of yet Google doesn't give you all the tools you need to fully embrace the red pill vision.
I think this quote by Brandon Smith in a thread on how to best implement sessions in GAE nicely sums up the new perspective:
Consider the lack of your daddy's sessions a feature. It's what will make your app scale on Google's infrastructure.
In other words: when in Rome. But how do we know what the Romans do when the Romans do what they do?
I haven't developed an AppEngine application yet, I'm just taking a look around their documentation and seeing what stands out for me. It's not the much speculated super cluster VM. AppEngine is solidly grounded in code and structure. It reminds me a little of the guy who ran a website out of S3 with a splash of Heroku thrown in as a chaser.
The idea is clearly to take advantage of our massive multi-core future by creating a shared nothing infrastructure based firmly on a core set of infinitely scalable database, storage and CPU services. Don't forget Google also has a few other services to leverage: email, login, blogs, video, search, ads, metrics, and apps. A shared nothing request is a simple beast. By its very nature shared nothing architectures must be composed of services which are themselves already scalable and Google is signing up to supply that scalable infrastructure. Google has been busy creating a platform of out-of-the-box scalable services to build on. Now they have their scripting engine to bind it all together.
Everything that could have tied you to a machine is tossed. No disk access, no threads, no sockets, no root, no system calls, no nothing but service based access. Services are king because they are easily made scalable by load balancing and other tricks of the trade that are easily turned behind the scenes, without any application awareness or involvement.
Using the CGI interface was not a mistake. CGI is the perfect metaphor for our brave new app container world: get a request, process the request, die, repeat. Using AppEngine you have no choice but to write an app that can be splayed across a pointy well sharpened CPU grid. CGI was devalued because a new process had to be started for every request. It was too slow, too resource intensive. Ironic that in the cloud that's exactly what you want because that's exactly how you cause yourself fewer problems and buy yourself more flexibility.
The model is pure abstraction. The implementation is pure pragmatism. Your application exists in the cloud and is in no way tied to any single machine or cluster of machines. CPUs run parallel through your application like a swarm of busy bees while wizards safely hidden in a pocket of space-time can bend reality as much as they desire without the muggles taking notice. Yet the abstraction is implemented in a very specific dynamic language that they already have experience with and have confidence they can make work. It's a pretty smart approach. No surprise I guess.
One might ask: is LAMP dead? Certainly not in the way Microsoft was hoping. AppEngine is so much easier to use than the AWS environment of EC2, S3, SQS, and SDB. Creating an app in AWS takes real expertise. That's why I made the comparison of AppEngine to Heroku. Heroku is a load and go approach for RoR whereas AppEngine uses Python. You basically make a Python app using services and it scales. Simple. So simple you can't do much beyond making a web app. Nobody is going to make a super scalable transcoding service out of AppEngine. You simply can't load the needed software because you don't have your own servers. This is where Amazon wins big. But AppEngine does hit a sweet spot in the market: website builders who might have previously went with LAMP.
What isn't scalable about AppEngine is the scalability of the complexity of the applications you can build. It's a simple request response system. I didn't notice a cron service, for example. Since you can't write your own services a cron service would give you an opportunity to get a little CPU time of your own to do work. To extend this notion a bit what I would like to see as an event driven state machine service that could drive web services. If email needs to be sent every hour, for example, who will invoke your service every hour so you can get the CPU to send the email? If you have a long running seven step asynchronous event driven algorithm to follow, how will you get the CPU to implement the steps? This may be Google's intent. Or somewhere in the development cycle we may get more features of this sort. But for now it's a serious weakness.
Here's are a quick tour of a few interesting points. Please note I'm copying large chunks of their documentation in this post as that seems the quickest way to the finish line...
Scalr is a fully redundant, self-curing and self-scaling hosting environment utilizing Amazon's EC2. It has been recently open sourced on Google Code.
Scalr allows you to create server farms through a web-based interface using prebuilt AMI's for load balancers (pound or nginx), app servers (apache, others), databases (mysql master-slave, others), and a generic AMI to build on top of.
Scalr promises automatic high-availability and scaling for developers by health and load monitoring.
The health of the farm is continuously monitored and maintained. When the Load Average on a type of node goes above a configurable threshold a new node is inserted into the farm to spread the load and the cluster is reconfigured. When a node crashes a new machine of that type is inserted into the farm to replace it.
Amazon is fixing two of their major problems: no static IP addresses and single datacenter operation. By adding these two new features developers can finally build a no apology system on Amazon. Before you always had to throw in an apology or two. No, we don't have low failover times because of the silly DNS games and unexceptionable DNS update and propagation times and no, we don't operate in more than one datacenter. No more. Now Amazon is adding Elastic IP Addresses and Availability Zones.
Elastic IP addresses are far better than normal IP addresses because they are both in tight with Jessica Alba and they are:
I attended Sebastian Stadil's AWS Training Camp Saturday and during the class Sebastian brought up a wonderfully counter-intuitive idea: CPU (EC2) costs a lot less than storage (S3, SDB) so you should systematically move as much work as you can to the CPU. This is said to be the Client-Cloud Paradigm. It leverages the well pummeled trend that CPU power follows Moore's Law while storage follows The Great Plains' Law (flat). And what sane computing professional would do battle with Sir Moore and his trusty battle sword of a law?
Embedded systems often make similar environmental optimizations. CPU rich and memory poor means operate on compressed serialized data structures. Deserialized data structures use a lot of memory, so why use them? It's easy enough to create an object wrapper around a buffer. Programmers shouldn't care how their objects are represented anyway. Yet we waste ginormous amounts of time and memory uselessly transforming XML in and out of different representations. Just transport compressed binary objects around and use them in place. Serialization and deserialization happen only on access (Pimpl Idiom).
It never occurred to me that in the land of AWS plenty similar "tricks" would make sense. But EC2 is a loss leader in AWS. CPU is plentiful and cheap. It's IO and storage that costs you...
Update 30: Amazon SimpleDB - A distributed, highly-scalable, light-weight, query-able, attribute store by Sebastian Stadil. It introduces the CAP theorem and the basics of SimpleDB. Sebastian does a lot of great work in the AWS world and in what must be his limited free time, runs the AWS Meetup group.
Not surprisingly opinions on SimpleDB vary from it sucks, don't take my database, to it will change the world, who needs a database anyway? From a quick survey of the blogosphere, here's where SimpleDB stands at the moment:
Depending on how you weight each factor, SimpleDB could be way behind or way ahead of other options. What's interesting is to see what people think is important. For many people the only real database is relational and if it doesn't have transactions, joins, etc it's not real. Databases like beauty seem to be in the eye of the beholder.
Amazon has announced the limited beta of Amazon SimpleDB - a simple web services interface to create and store multiple data sets, query your data easily, and return the results. Together with the Simple Storage Service (S3), Elastic Compute Cloud (EC2) and other web services Amazon offers a complete utility computing platform. SimpleDB was the missing piece of AWS - the scalable structured database.
Check out my blog entry: http://innowave.blogspot.com/2007/12/amazon-simpledb-scalable-cloud-data...
I was waiting for this one :-)
Geekr
Recent comments
2 days 3 hours ago
2 days 3 hours ago
2 days 3 hours ago
2 days 3 hours ago
1 week 2 days ago
1 week 3 days ago
1 week 3 days ago
1 week 4 days ago
1 week 4 days ago
1 week 4 days ago