End-To-End Performance Study of Cloud Services

Cloud computing promises a number of advantages for the deployment of data-intensive applications. Most prominently, these include reducing cost with a pay-as-you-go business model and (virtually) unlimited throughput by adding servers if the workload increases. At the Systems Group, ETH Zurich, we did an extensive end-to-end performance study to compare the major cloud offerings regarding their ability to fulfill these promises and their implied cost.

Click to read more ...


Strategy: Rule of 3 Admins to Save Your Sanity

The idea came up in this Hacker News thread, commenting on a 37signals interview, that having three system administrators is the minimum optimal number of admins. Everyone wants to lower their costs by having each admin administer a lot of machines. The problem is when you have fewer than three admins you can never get a break from the constant corrosive pressure of always being on call. When every moment of your life you are dreading the next emergency, it eats at you. Having three admins solves that problem. With three admins you can:

  • Go on a real vacation. The two remaining admins can switch off being on call.
  • Not be on call all the time.

A larger shop will naturally have more admins so it's not as big an issue, but at smaller shops trying to minimize head count, carrying three admins (or people in those roles) might be something to consider.





Strategy: Scale Writes to 734 Million Records Per Day Using Time Partitioning

In Scaling writes in MySQL (slides) Philip Tellis, while working for Yahoo, describes how using time based partitions they were able to increase their write capability from 2100 inserts per second (7 million a day) to a sustained 8500 inserts per second (734 million a day). This was capacity enough to handle the load during Michael Jackson's memorial service. In summary, the secrets to scalable writes are:

Click to read more ...


7 Lessons Learned While Building Reddit to 270 Million Page Views a Month

Steve Huffman, co-founder of social news site Reddit, gave an excellent presentation (slides, transcript) on the lessons he learned while building and growing Reddit to 7.5 million users per month, 270 million page views per month, and 20+ database servers.

Steve says a lot of the lessons were really obvious, so you may not find a lot of completely new ideas in the presentation. But Steve has an earnestness and genuineness about him that is so obviously grounded in experience that you can't help but think deeply about what you could be doing different. And if Steve didn't know about these lessons, I'm betting others don't either.

There are seven lessons, each has their own summary section: Lesson one: Crash Often; Lesson 2: Separation of Services; Lesson 3: Open Schema; Lesson 4: Keep it Stateless; Lesson 5: Memcache; Lesson 6: Store Redundant Data; Lesson 7: Work Offline.

By far the most surprising feature of their architecture is in Lesson Six, whose essential idea is:

Click to read more ...


Hot Scalability Links for May 14, 2010

Lots of good ones this week...

Click to read more ...


The Rise of the Virtual Cellular Machines

My apologies if you were looking for a post about cell phones. This post is about high density nanodevices. It's a follow up to How will memristors change everything? for those wishing to pursue these revolutionary ideas in more depth. This is one of those areas where if you are in the space then there's a lot of available information and if you are on the outside then it doesn't even seem to exist. Fortunately, Ben Chandler from The SyNAPSE Project, was kind enough to point me to a great set of presentations given at the 12th IEEE CNNA - International Workshop on Cellular Nanoscale Networks and their Applications - Towards Megaprocessor Computing. WARNING: these papers contain extreme technical content. If you are like me and you aren't an electrical engineer, much of it may make a sort of surface sense, but the deep and twisty details will fly over head. For the more software minded there are a couple more accessible presentations:

Here a few excerpts from the presentations, just things I found particularly interesting. I'm still trying to make sense of it all and I thought you might be interested too. It's clear there's something new here and it will require different algorithms and programming models to work. What will those be and who will invent them?

Click to read more ...

May102010 Architecture - A Portal at 3900 Requests Per Second is one of the leading portals in India. is owned by the same company and is one of the top content aggregation sites in India, primarily targeting Non-resident Indians from around the world. Ramki Subramanian, an Architect at Sify, has been generous enough to describe the common back-end for both these sites. One of the most notable aspects of their architecture is that Sify does not use a traditional database. They query Solr and then retrieve records from a distributed file system. Over the years many people have argued for file systems over databases. Filesystems can work for key-value lookups, but they don't work for queries, using Solr is a good way around that problem. Another interesting aspect of their system is the use of Drools for intelligent cache invalidation. As we have more and more data duplicated in multiple specialized services, the problem of how to keep them synchronized is a difficult one. A rules engine is a clever approach.

Click to read more ...


Going global on EC2

Since its inception, Amazon EC2 has enabled companies to run highly scalable infrastructure with minimal overhead.  Over the years, Amazon Web Services has expanded with new offerings and additional regions around the world.

All this growth has made establishing a global footprint easier than ever.  And yet, most EC2 customers still choose to operate in a single region.  While this is fine for many applications, customers with significant web infrastructure are depriving users of drastically improved performance.  Deploying infrastructure in EC2's new regions cuts out one of the biggest sources of latency: distance.

In this post, I describe how Bizo significantly reduced load times by implementing Global Server Load Balancing (GSLB) to distribute traffic across all Amazon regions.


How will memristors change everything? 

A non-random sample of my tech friends shows that not many have heard of memristors (though I do suspect vote tampering). I'd read a little about memristors in 2008 when the initial hubbub about the existence of memristors was raised. I, however,  immediately filed them into that comforting conceptual bucket of potentially revolutionary technologies I didn't have to worry about because like most wondertech, nothing would ever come of it. Wrong. After watching Finding the Missing Memristor by R. Stanley Williams I've had to change my mind. Memristors have gone from "maybe never" to holy cow this could happen soon and it could change everything.

Let's assume for the sake of dreaming memristors do prove out. How will we design systems when we have access to a new material that is two orders of magnitude more efficient from a power perspective than traditional transistor technologies, contains multiple petabits (1 petabit = 128TB) of persistent storage, and can be reconfigured to be either memory or CPU in a package as small as a sugar cube (in a stacked configuration)?

Click to read more ...


Business continuity with real-time data integration

Enterprises want to protect their data. As the appetite for data volumes grows, storage technology becomes a critical business asset on which business continuity relies. My recent survey in the medium-size enterprise segment shows the five dominant investment directions at the level of data management architecture: disaster recovery (DR), high availability (HA), backup, data processing performance and migration to more advanced databases.


This suggests that corporations generally have sufficiently structured data collections but are concerned with business continuity and continuous availability of data. What infrastructures can provide these assurances? In this post I want to focus on yet another option, and that is the Real-Time Data Integration model. As an example I am going to discuss Oracle GoldenGate, which permits you to manage the data critical to your business in safety, ensuring business continuity without disruption even if the data is distributed among multiple, heterogeneous business applications and architectures.