AWS v GCE Face-off and Why Innovation Needs Lower Cost Infrastructures

This is a repost of part 2 (part 1) of an interview I did for the Boundary blog.

Boundary:  There’s another battle coming down the pike between Amazon (AWS) and Google (GCE). How should the CTO decide which one’s best?

Hoff: Given that GCE is still closed to public access we have very little common experience on which to judge. The best way to decide is as always, by running a few experiments. Pick a few representative projects, a representative team, implement the projects on both infrastructures, crunch some numbers, figure out the bigger picture and then select the one you wanted in the first place



Sebastian Stadil, founder of Scalr, recently wrote about his experiences on both platforms and found some interesting differences: AWS has a much richer set of services; GCE is on-demand only, so AWS can be cheaper; GCE has faster disk and faster network IO, especially between datacenters; GCE has faster boot times and can mount read-only partitions across multiple machines; and GCE shares images across regions.

Sebastian also wondered what new architectures Google’s feature set will encourage to flourish? Interesting was his idea that because the inter-datacenter network IO is so fast it will be possible to put read slaves in multiple datacenters and replicate in real-time. NuoDB also talked about their experiences on GCE recently and found GCE offered great performance and was easy to work with.

To generalize, AWS will be hard to compete with on features. Amazon is a feature-producing machine. Both providers will probably stay at cost parity, thanks to the power of competition. Yet Google will compete successfully on speed, throughput and low 99% percentile variance. As customers, we are all winners. Your CTO should be looking into cloud diversity to meet resilience goals, so it’s more likely that both providers will be in the mix.

Netflix has suffered a lot of ups and downs with customers but from an IT perspective, I keep hearing about them and their “Symian army” for performance prowess. Any thoughts?

What Netflix has managed to do is take the serious business of testing and make it fun and interesting with their Symian army gloss. By example, they’ve given people permission to think outside the unit test/system test box and do something very different. We need these kinds of pathfinders to set a new norm and give cover to those eagerly following behind.

For a smaller company/site, how can you get similar performance results but with a much smaller budget, as you grow?

The answer usually depends on your skill set and the problems you are trying to solve. A team with solid IT skills and a relatively bounded problem can save a lot of money by buying their own hardware, pushing scale up to the limits and managing everything themselves. They might use a cloud service for testing, cloud bursting, backup and other specialized tasks.

If you want to go cloud there are ways of cutting costs by using reserved instances, spot instances, carefully sizing instances, and other techniques like shutting down instances and selecting cheaper algorithms.

Another interesting option is to serve a static site directly out of S3 or Github. There are many tools available to help build static sites. If you want dynamic features, like comments, you can often find a service that embeds directly into your static html and does all the work for you.

What’s clear though is that building large systems is still way too expensive given currently available monetization strategies. Costs need to drop an order of magnitude or two before programmers can build the kind systems that take advantage of huge user bases and huge streams of data. Resource costs are stifling innovation.

We remain stuck with the advertising model, a poor source of nourishment. The only way not to starve to death is for vendors to dramatically lower costs. Can the cloud model sufficiently lower costs so that companies can make a living on marginal revenue sources? Not every service can find a way to sustain itself on a transactional model. Yet there’s a huge amount of innovation possible from exploiting scale, if only we could eke out a living doing it. As revenues don’t seem on the rise, to fund innovation we need lower costs, which will require a revolution in our underlying programming model.