Hot Scalability Links For Oct 24, 2010

On a cold and rainy Fall day, a day stolen from winter rather than our usual gorgeous Indian Summers, a day not even the SF Giants winning the pennant can help warm, here are some hot links to read by a digital flame: 


Paper: Netflix’s Transition to High-Availability Storage Systems 

In an audacious move for such an established property, Netflix is moving their website out of the comfort of their own datacenter and into the wilds of the Amazon cloud. This paper by Netflix's Siddharth “Sid” Anand, Netflix’s Transition to High-Availability Storage Systems, gives a detailed look at this transition and does a deep dive on SimpleDB best practices, focussing especially on techniques useful to those who are making the move from a RDBMS.

Sid is going to give a talk at QCon based on this paper and he would appreciate your feedback. So if you have any comments or thoughts please comment here or email Sid at or Twitter at @r39132 Here's the introduction from the paper:

Click to read more ...


What is Network-based Application Virtualization and Why Do You Need It?

With all the attention being paid these days to VDI (virtual desktop infrastructure) and application virtualization and server virtualization and <insert type> virtualization it’s easy to forget about network-based application virtualization. But it’s the one virtualization technique you shouldn’t forget because it is a foundational technology upon which myriad other solutions will be enabled.


This term may not be familiar to you but that’s because since its inception oh, more than a decade ago, it’s always just been called “server virtualization”. After the turn of the century (I love saying that, by the way) it was always referred to as service virtualization in SOA and XML circles. With the rise of the likes of VMware and Citrix and Microsoft server virtualization solutions, it’s become impossible to just use the term “server virtualization” and “service virtualization” is just as ambiguous so it seems appropriate to give it a few more modifiers to make it clear that we’re talking about the network-based virtualization (aggregation) of applications.

Click to read more ...


Machine VM + Cloud API - Rewriting the Cloud from Scratch

Write a little "Hello World" program these days and it runs inside a bewildering Russian Doll of nested environments, each layer adding its own special performance and complexity tax. First, a language executes in its own environment of data structure libraries, memory management, and so on. That, more often than not, will run inside a language VM like the JVM, CLR, or V8. The language VM will in-turn run inside a process that runs inside an OS. An application will run in one or more threads inside a process. And the whole thing will run inside a machine sharing VM layer like Xen. And across all of that are frameworks for monitoring, elasticity, storage, and so on. That's a lot of overhead for a such a little program.

What if we could remove all these taxes and run directly on the new bare metal, which some consider to be a combination of Machine VM + Cloud API? That's exactly what a system called Mirage, described in the paper Turning down the LAMP: Software Specialisation for the Cloud, sets out to do by treating the cloud virtual hardware as a compiler target, and converting high-level language source code directly into kernels that run on it.

Click to read more ...


Sponsored Post: Playfish, Electronic Arts, Tagged, Undertone,, Wiredrive, Joyent, DeviantART, CloudSigma, ManageEngine, Site24x7

Who's Hiring?

Fun and Informative Events

  • Membase Meetups Coming to Major US Cities. The first of these technical meetups is on October 28 at Zynga’s San Francisco offices.

Cool Products and Services

Click to read more ...



In this post i wanted to spend sometime on the CAP theorem and clarify some of the confusion that i often see when people associate CAP with scalability without fully understanding the implications that comes with it and the alternative approaches

You can read the full article here


Troubles with Sharding - What can we learn from the Foursquare Incident?

For everything given something seems to be taken. Caching is a great scalability solution, but caching also comes with problems. Sharding is a great scalability solution, but as Foursquare recently revealed in a post-mortem about their 17 hours of downtime, sharding also has problems. MongoDB, the database Foursquare uses, also contributed their post-mortem of what went wrong too.

Now that everyone has shared and resharded, what can we learn to help us skip these mistakes and quickly move on to a different set of mistakes?

Click to read more ...


I, Cloud

Do we need Three Laws of Cloud? Not yet. Neither should we be overly concerned regarding reports of cloud leading to the elimination of IT.

Click to read more ...


The CIO’s Problem: Cloud “Mess” or Cloud “Mash”

Loved those mainframe days – you only needed one, but then came along the AS400’s and soon you had ten – but wait, you needed client server and SOA, oh sh%# – now I have ten thousand servers and I need to consolidate server and datacenter operations!

Is Cloud Computing going to follow the same path?

Click to read more ...


4 Scalability Themes from Surgecon

Robert Haas in his SURGE Recap of the Surge conference, reflected a bit, and came up with an interesting checklist of general themes from what he was seeing. I'm directly quoting his post, so please see the post for a full discussion. He uses this framework to think about the larger picture and where PostgreSQL stands in its progression.

  1. Make use of the academic literature. Inventing your own way to do something is fine, but at least consider the possibility that someone smarter than you has thought about this problem before.
  2. Failures are inevitable, so plan for them.  Try to minimize the possibility of cascading failures, and plan in advance how you can operate in degraded mode if disaster (or the Slashdot effect) strikes.
  3. Disk technology matters. Drive firmware bugs are common and nightmarish, and you can expect very limited help from the manufacturer, especially if the drive is billed as consumer-grade rather than enterprise-grade. SSDs can save you a lot of money, both because a given number of dollars buys more IOs-per-second, and because electricity isn't free.
  4. Large data sets require horizontal scalability.  In the era of 1TB drives, "large" doesn't mean quite what it used to,  but even though the amount of data you can manage with one machine is growing all the time, the amount of data people want to manage is growing even faster.