Stuff the Internet Says on Scalability For December 17th, 2010

  • If you missed it here's a link to my webinar and here's the slidedeck for the talk with a buch of additional slides that I didn't have a chance to talk about. The funky picture of Lincoln is classic.
  • Can MySQL really handle 1,000,000 req/sec? Sure, when you turn it into a NoSQLish database, skip all the SQL processing, and access the backend store directly. Percona is making this possible with their HandlerSocket plugin based on the work of Yoshinori Matsunobu.
  • Quotable Quotes:
    • @labsji: If SQL is an abstraction of Big machines....NoSQL is an abstration of distributed computing.
    • : man this eventual consistency #nosql thingy makes #facebook even more annoying. "you have a new comment, no you dont"
  • Nice racks. Time has pictures of a Facebook datacenter. 
  • How Hunch Built a Data-Crunching Monster By Pete Warden. Really cool article following Huch through their scaling thought process. Their whole approach is based around parallelism within a single box, and they had some interesting reasons for making that choiceThey determined that the key bottleneck was network bandwidth, which led them towards housing all of their data processing within a single machine.
  • Bound by the Speed of Light by - There's only so much you can do to optimize NFS over a WAN by George V. Neville-Neil. Caching and acceleration is not going to make accessing a trans oceanic server anywhere near usable. To be honest, the real answer to these sorts of problems usually lies with understanding what data needs to be where, and distributing it properly. People always want a silver bullet, but they're very rare.
  •  Running Git on a SSD Speed Comparison. The git add took around ten minutes to run on the hard drive compared to the 40 seconds for the ssd. Committing was also 1/3 of the time taking only 20 seconds v.s. over a minute.
  • Jeff Darcy. A replica is supposed to be complete and authoritative. A cache can be incomplete and/or non-authoritative.
  • A Closer Look At The Scalability Of Windows Azure. Five tips on using spending less on Azure: Avoid crossing data center boundaries, Minimize the number of compute hours by using auto scaling, Use both Azure Table Storage (ATS) and SQL Azure,  ATS table modeling, Data purging in ATS.
  • In Multipart Upload and Large Object Support, Jeff Barr tells us how to make use of those new 5 TB Objects in S3. You clearly can't sequentially access them. S3's REST API use the HTTP Range header so you can extraction sections out. Create your own file format and indexing scheme and you should be able to navigate around these pretty easily.
  • Greg Linden does a nice round up of Papers on specialized databases at Google.
  • The Secrets Behind Blekko's Search Technology By Pete Warden. They have around 800 servers in their data center, each with 64 GB RAM and eight SATA drives giving each one about eight terabytes of local storage.
  • Royans with his Scalability links for December 4th