GAE

Todd Hoff's picture

Google AppEngine - A Second Look

Update: Here are a few experience reports of developers using GAE. Diwaker Gupta likes how easy it is to get started on the good documentation. Doesn't like all the limits and poor performance. James here and here also likes the ease of use but finds the data model takes some getting used to and is concerned the API limits won't scale for a real site. He doesn't like how external connections are handled and wants a database where the schema is easier to manage. These posts mirror some of my own concerns. GAE is scalable for Google, but it may not be scalable for my application.

It's been a few days now since GAE (Google App Engine) was released and we had our First Look. It's high time for a retrospective. Too soon? Hey, this is Internet time baby. So how is GAE doing? I did get an invite so hopefully I'll have a more experience grounded take a little later. I don't know Python and being the more methodical type it may take me a while. To perform our retrospective we'll take a look at the three sources of information available to us: actual applications in the AppGallery, blogspew, and developer issues in the forum.

The result: a cautious thumbs up. The biggest issue so far seems to be the change in mindset needed by developers to use GAE. BigTable is not MySQL. The runtime environment is not a VM. A service based approach is not the same as using libraries. A scalable architecture is not the same as one based on optimizing speed. A different approach is needed, but as of yet Google doesn't give you all the tools you need to fully embrace the red pill vision.

I think this quote by Brandon Smith in a thread on how to best implement sessions in GAE nicely sums up the new perspective:

Consider the lack of your daddy's sessions a feature. It's what will make your app scale on Google's infrastructure. 

In other words: when in Rome. But how do we know what the Romans do when the Romans do what they do?

Todd Hoff's picture

Biggest Under Reported Story: Google's BigTable Costs 10 Times Less than Amazon's SimpleDB

Why isn't Google's aggressive new database pricing strategy getting more pub? That's what Bill Katz, instigator of the GAE Meetup and prize winning science fiction author is wondering:

It's surprising that the blogosphere hasn't picked up the biggest difference in pricing: 
Google's datastore is less than a tenth of the price of Amazon's SimpleDB while offering a better API.

If money matters to you then the burn rate under GAE could be convincingly lower. Let's compare the numbers:

Todd Hoff's picture

How I Learned to Stop Worrying and Love Using a Lot of Disk Space to Scale

Update 3: ReadWriteWeb says Google App Engine Announces New Pricing Plans, APIs, Open Access. Pricing is specified but I'm not sure what to make of it yet. An image manipulation library is added (thus the need to pay for more CPU :-) and memcached support has been added. Memcached will help resolve the can't write for every read problem that pops up when keeping counters.
Update 2: onGWT.com threw a GAE load party and a lot of people came. The results at Load test : Google App Engine = 1, Community = 0. GAE handled a peak of 35 requests/second and a sustained 10 requests/second. Some think performance was good, others not so good. My GMT watch broke and I was late to arrive. Maybe next time. Also added a few new design rules from the post.
Update: Added a few new rules gleaned from the GAE Meetup: Design By Explicit Cost Model and Puts are Precious.

How do you structure your database using a distributed hash table like BigTable? The answer isn't what you might expect. If you were thinking of translating relational models directly to BigTable then think again. The best way to implement joins with BigTable is: don't. You--pause for dramatic effect--duplicate data instead of normalize it. *shudder*

Flickr anticipated this design in their architecture when they chose to duplicate comments in both the commentor and the commentee user shards rather than create a separate comment relation. I don't know how that decision was made, but it must have gone against every fiber in their relational bones...

Todd Hoff's picture

Cloud Programming Directly Feeds Cost Allocation Back into Software Design

Update: Amazon adds Elastic Block Store at $0.10 per 1 million I/O requests. Now I need some cost minimization storage algorithms!

In the GAE Meetup yesterday a very interesting design rule came up: Design By Explicit Cost Model. A clumsy name I know, but it is explained like this:

If you are going to be charged for an operation GAE wants you to explicitly ask for it. This is why some 
automatic navigation between objects isn't provided because that will force an explicit query to be written. 
Writing an explicit query is a sort of EULA for being charged. Click OK in the form of a query and you've 
indicated that you are prepared to pay for a database operation.

Usually in programming the costs we talk about are time, space, latency, bandwidth, storage, person hours, etc. Listening to the Google folks talk about how one of their explicit design goals was to require programmers to be mindful of operations that will cost money made me realize in cloud programming cost will be another aspect of design we'll have to factor in.

Instead of asking for the Big O complexity of an algorithm we'll also have to ask for the Big $ (or Big Euro) notation so we can judge an algorithm by its cost against a particular cloud profile. Maybe something like $(CPU=1.3,DISK=3,IN-BANDWIDTH=2,OUT=BANDWIDTH=3, DB=10). You could look at the Big $ notation for algorithm and shake your head saying that approach will never work for GAE, but it could work for Amazon. Can we find a cheaper Big $? ...

Todd Hoff's picture

Using Google AppEngine for a Little Micro-Scalability

Over the years I've accumulated quite a rag tag collection of personal systems scattered wide across a galaxy of different servers. For the past month I've been on a quest to rationalize this conglomeration by moving everything to a managed service of one kind or another. The goal: lift a load of worry from my mind. I like to do my own stuff my self so I learn something and have control. Control always comes with headaches and it was time for a little aspirin. As part of the process GAE came in handy as a host for a few Twitter related scripts I couldn't manage to run anywhere else. I recoded my simple little scripts into Python/GAE and learned a lot in the process.

Todd Hoff's picture

Google AppEngine - A First Look

I haven't developed an AppEngine application yet, I'm just taking a look around their documentation and seeing what stands out for me. It's not the much speculated super cluster VM. AppEngine is solidly grounded in code and structure. It reminds me a little of the guy who ran a website out of S3 with a splash of Heroku thrown in as a chaser.

The idea is clearly to take advantage of our massive multi-core future by creating a shared nothing infrastructure based firmly on a core set of infinitely scalable database, storage and CPU services. Don't forget Google also has a few other services to leverage: email, login, blogs, video, search, ads, metrics, and apps. A shared nothing request is a simple beast. By its very nature shared nothing architectures must be composed of services which are themselves already scalable and Google is signing up to supply that scalable infrastructure. Google has been busy creating a platform of out-of-the-box scalable services to build on. Now they have their scripting engine to bind it all together.

Everything that could have tied you to a machine is tossed. No disk access, no threads, no sockets, no root, no system calls, no nothing but service based access. Services are king because they are easily made scalable by load balancing and other tricks of the trade that are easily turned behind the scenes, without any application awareness or involvement.

Using the CGI interface was not a mistake. CGI is the perfect metaphor for our brave new app container world: get a request, process the request, die, repeat. Using AppEngine you have no choice but to write an app that can be splayed across a pointy well sharpened CPU grid. CGI was devalued because a new process had to be started for every request. It was too slow, too resource intensive. Ironic that in the cloud that's exactly what you want because that's exactly how you cause yourself fewer problems and buy yourself more flexibility.

The model is pure abstraction. The implementation is pure pragmatism. Your application exists in the cloud and is in no way tied to any single machine or cluster of machines. CPUs run parallel through your application like a swarm of busy bees while wizards safely hidden in a pocket of space-time can bend reality as much as they desire without the muggles taking notice. Yet the abstraction is implemented in a very specific dynamic language that they already have experience with and have confidence they can make work. It's a pretty smart approach. No surprise I guess.

One might ask: is LAMP dead? Certainly not in the way Microsoft was hoping. AppEngine is so much easier to use than the AWS environment of EC2, S3, SQS, and SDB. Creating an app in AWS takes real expertise. That's why I made the comparison of AppEngine to Heroku. Heroku is a load and go approach for RoR whereas AppEngine uses Python. You basically make a Python app using services and it scales. Simple. So simple you can't do much beyond making a web app. Nobody is going to make a super scalable transcoding service out of AppEngine. You simply can't load the needed software because you don't have your own servers. This is where Amazon wins big. But AppEngine does hit a sweet spot in the market: website builders who might have previously went with LAMP.

What isn't scalable about AppEngine is the scalability of the complexity of the applications you can build. It's a simple request response system. I didn't notice a cron service, for example. Since you can't write your own services a cron service would give you an opportunity to get a little CPU time of your own to do work. To extend this notion a bit what I would like to see as an event driven state machine service that could drive web services. If email needs to be sent every hour, for example, who will invoke your service every hour so you can get the CPU to send the email? If you have a long running seven step asynchronous event driven algorithm to follow, how will you get the CPU to implement the steps? This may be Google's intent. Or somewhere in the development cycle we may get more features of this sort. But for now it's a serious weakness.

Here's are a quick tour of a few interesting points. Please note I'm copying large chunks of their documentation in this post as that seems the quickest way to the finish line...

Syndicate content