advertise
Tuesday
Oct122010

The CIO’s Problem: Cloud “Mess” or Cloud “Mash”

Loved those mainframe days – you only needed one, but then came along the AS400’s and soon you had ten – but wait, you needed client server and SOA, oh sh%# – now I have ten thousand servers and I need to consolidate server and datacenter operations!

Is Cloud Computing going to follow the same path?

Click to read more ...

Friday
Oct082010

4 Scalability Themes from Surgecon

Robert Haas in his SURGE Recap of the Surge conference, reflected a bit, and came up with an interesting checklist of general themes from what he was seeing. I'm directly quoting his post, so please see the post for a full discussion. He uses this framework to think about the larger picture and where PostgreSQL stands in its progression.

  1. Make use of the academic literature. Inventing your own way to do something is fine, but at least consider the possibility that someone smarter than you has thought about this problem before.
  2. Failures are inevitable, so plan for them.  Try to minimize the possibility of cascading failures, and plan in advance how you can operate in degraded mode if disaster (or the Slashdot effect) strikes.
  3. Disk technology matters. Drive firmware bugs are common and nightmarish, and you can expect very limited help from the manufacturer, especially if the drive is billed as consumer-grade rather than enterprise-grade. SSDs can save you a lot of money, both because a given number of dollars buys more IOs-per-second, and because electricity isn't free.
  4. Large data sets require horizontal scalability.  In the era of 1TB drives, "large" doesn't mean quite what it used to,  but even though the amount of data you can manage with one machine is growing all the time, the amount of data people want to manage is growing even faster.
Thursday
Oct072010

Hot Scalability Links For Oct 8, 2010

Tuesday
Oct052010

Sponsored Post: Box.net, Wiredrive, Joyent, DeviantART, CloudSigma, ManageEngine, Site24x7

Who's Hiring?

Cool Products and Services

Click to read more ...

Monday
Oct042010

Paper: An Analysis of Linux Scalability to Many Cores  

An Analysis of Linux Scalability to Many Cores, by a number of MIT researchers, is a refreshingly practical paper on what it takes to scale Linux and common applications like Exim, memcached, Apache, PostgreSQL, gmake, Psearchy, and MapReduce to run on 48 core systems. A very timely paper given moderately massive multicore systems are reportedly the near future of computing.

This paper must have taken a lot of work. They both tracked down bottlenecks in a number of applications and the Linux kernel and they also tried to fix them. Modestly speaking the authors said they made "modest" changes to the kernel and applications, but there's nothing modest about what they did. It's excellent work.

After the next bit, which is the abstract, there is a list of the problems they found and how they fixed them.

Click to read more ...

Friday
Oct012010

Hot Scalability Links For Oct 1, 2010

Click to read more ...

Friday
Oct012010

Google Paper: Large-scale Incremental Processing Using Distributed Transactions and Notifications

This paper, Large-scale Incremental Processing Using Distributed Transactions and Notifications by Daniel Peng and Frank Dabek, is Google's much anticipated description of Percolator, their new real-time indexing system.

The abstract:

Updating an index of the web as documents are crawled requires continuously transforming a large repository of existing documents as new documents arrive. This task is one example of a class of data processing tasks that transform a large repository of data via small, independent mutations. These tasks lie in a gap between the capabilities of existing infrastructure. Databases do not meet the storage or throughput requirements of these tasks: Google’s indexing system stores tens of petabytes of data and processes billions of updates per day on thousands of machines. MapReduce and other batch-processing systems cannot process small updates individually as they rely on creating large batches for efficiency.

 

We have built Percolator, a system for incrementally processing updates to a large data set, and deployed it to create the Google web search index. By replacing a batch-based indexing system with an indexing system based on incremental processing using Percolator, we process the same number of documents per day, while reducing the average age of documents in Google search results by 50%. 
Thursday
Sep302010

More Troubles with Caching

As a tasty pairing with Facebook And Site Failures Caused By Complex, Weakly Interacting, Layered Systems, is another excellent tale of caching gone wrong by Peter Zaitsev, in an exciting twin billing: Cache Miss Storm and More on dangers of the caches. This is fascinating case where the cause turned out to be software upgrade that ran long because it had to be rolled back. During the long recovery time many of the cache entries timed out. When the database came back, slam, all the clients queried the database to repopulate the cache and bad things happened to the database. The solution was equally interesting: 

Click to read more ...

Thursday
Sep302010

Facebook and Site Failures Caused by Complex, Weakly Interacting, Layered Systems

Facebook has been so reliable that when a site outage does occur it's a definite learning opportunity. Fortunately for us we can learn something because in More Details on Today's Outage, Facebook's Robert Johnson gave a pretty candid explanation of what caused a rare 2.5 hour period of down time for Facebook. It wasn't a simple problem. The root causes were feedback loops and transient spikes caused ultimately by the complexity of weakly interacting layers in modern systems. You know, the kind everyone is building these days. Problems like this are notoriously hard to fix and finding a real solution may send Facebook back to the whiteboard. There's a technical debt that must be paid. 

The outline and my interpretation (reading between the lines) of what happened is:

Click to read more ...

Tuesday
Sep282010

Sponsored Post: Wiredrive, Joyent, DeviantART, CloudSigma, ManageEngine, Site24x7

Who's Hiring?

Cool Products and Services

Click to read more ...