Hot Scalability Links for June 25, 2010

  • Royans Tharakan is blogging like a mad man at the Velocity Conference. Read a summary of many of the presentations on his blog.
  • Zuckerberg almost guarantees 1 billion Facebook users. And I almost believe him.
  • Northscale introduces Membase, a new distributed key-value NoSQL competitor featuring a memcache compatible interface, yet is persistent like a database. Hopefully we'll have more on their internals later.
  • Notable Tweets: 
    • Aaron Cordova - scalability means "can change size" and also "works at large sizes" - this conflates two orthogonal features of cloud computing. 
    • Jaime Garcia Reinoso - It's the scalability, stupid! 
    • Alex Averbuch - when I read/hear "unlimited/inifinite scalability" I stop reading/listening and start thinking about cake.
    • Dennis Clark - I used to smirk at developers whose main DB experience was in MUMPS or Pick, until I realized those are old-school #NoSQL engines.
  • Hypertable vs. HBase Performance Evaluation. Hypertable's benchmarking shows Hypertable has up to 10 times better performance than HBase on some tests and they feel they are fine tuned for cloud hosted web analytics workloads.
  • GigaOM is putting up articles and video from their cloud focussed Structure 2010 Conference. You may find the Structure 2010: The Quest for Exascale Computing Power panel talk hosted by Joyent's Jason Hoffman especially interesting. To build an exascale computer build today would you would use 4 gigawatts of power and 125 million cores!
  • SQL Azure Horizontal Partitioning: Part 2 by Wayne Walter Berry. Azure supports 1 GB, 10 GB databases, and soon 50GB databases. To go larger you'll have to horizontally partition across multiple databases and Wayne shows you how.
  • Theo Schlossnagle has shared his Operating at Scale presentation at Velocity. This is a subject he's knowledgeable and passionate about, so it's worth a look. Theo is also putting on the Surge Conference, which is laser focussed on designing large scale systems.  
  • A Chef recipe from ClusterChef  that will help you create a scalable, efficient compute cluster in the cloud. It has recipes for Hadoop, Cassandra, NFS and more. 
  • Dr. Eric Schadt has an informative video in Nature, Computational solutions to large-scale data management, on how to tame the avalanche of scientific data using High Performance Computing in the cloud.
  • A blast from the past with an overview of 2001 architecture: Building a Large-Scale E-commerce site with Apache and mod_perl. Things really haven't changed that much. Replace Perl with your favorite dynamic language of choice and this approach would still feel very comfortable today.
  • Which is better for building backend servers: Python, Erlang, or Haskell? For quite a discussion read Debunking the Erlang and Haskell hype for servers on Codexon. Rule one about any sort of benchmark on the Internet is it will be destroyed from every angle. I get what they were doing, but I do agree with some of the commentors. A useful benchmark has to include complex workloads so issues likes locks, memory management, latency, paging, fairness, drops, etc can be revealed, that's when a more complex infrastructure really shows its worth.
  • Bloom filters for bioinformatics. Abhishek Tiwari writes on how algorithms first popularized by Google in BigTable are now being used in large scale gene sequence analysis. 
  • While a reverse migration back to MySQL from NoSQL doesn't seem to be imminent, we do see some early settlers going back to a more familiar home. In this case Blue74 is going from MongoDB back to MySQL because of: Lack of Transactions, Missing Records, No Joins, Schemalessness, and Unstability.

Who's Hiring?

If you would like to advertise a product, job, or event, please contact us for more information.