Hot Scalability Links for January 28 2010

  1. Google's Research Areas of Interest:  Building scalable, robust cluster applications. At Google we see distributed systems as a technology in its infancy, with huge gaps in the supporting research  that represent some of the most important problems in the space. Here are some examples: Resource sharing, Balancing cost, performance, and reliability, Self-maintaining systems.
  2. Amazon SimpleDB: A Simple Way to Store Complex Data by Paul Tremblett. The most effective way I have found to understand SimpleDB is to think about it in terms of something else we all use and understand -- a spreadsheet.
  3. Rackspace Cloud Servers versus Amazon EC2: Performance Analysis. The Bitsource conducted a review of the two cloud computing platforms, Rackspace Cloud Servers and Amazon Elastic Compute Cloud (EC2), to get a general idea of overall system performance.
  4. Private Clouds Are Not The Future by Jame Hamilton. Private clouds are better than nothing but an investment in a private cloud is an investment in a temporary fix that will only slow the path to the final destination: shared clouds.
  5. What is the right way to measure scale? by Daniel Abadi. So which scales better? Is using the number of nodes a better proxy than size of data? Hadoop can “scale” to 3800 nodes. So far, all we know is that Greenplum can “scale” to 96 nodes. Can it handle more nodes?
  6. High Availability Across Multiple Data Centers, Multihoming and EC2 by Andrew Johnstone. It becomes much more complicated to handle fail-over between multiple data centers. As an example if data center 1 fails entirely, we need to ensure that VIPs are routed to the correct data center OR DNS is changed.
  7. SimpleDB Performance : 5 Steps to Achieving High Write Throughput by Siddharth Anand. I was recently tasked with fork-lifting ~1 billion rows from Oracle into SimpleDB.
  8. Scalability Updates for Jan 26th 2010 by Royans.