Stuff The Internet Says On Scalability For September 14, 2012

It's HighScalability Time:

  • Serves 4 billion hours of video each month, has 425M gmail users, and has 100PB of active data: Google;  340,000+ cores across 300 data centers to >10k scientists, archiving 15PB / yr: Open Science Grid
  • Quotable Quotes:
    • Chris Travers: MySQL is what you get when application developers build an RDBMS. PostgreSQL is what you get when database developers build an application development platform.
    • Hasen: Node.JS is a terrible platform. It’s terribleness stems from a very simple aspect of it, and this aspect happens to be central to how it works: callback-based I/O
    • @cgul: @github nooooooooo say it aint so. But I read all your articles on high scalability!
    • @fhd: It may sound perverse, but I'm really happy to be confronted with scalability issues at Eyeo. I guess I just love tricky stuff :D
    • @brianfcoope: Can't we all just get along? MT @otrajman "Biggest problem with NoSQL guys is none of them know anything about databases..." - Stonebraker
    • @salsabeela: Sigh. I finally decide to join the geeks on the CTO club to talk about scalability over coffee next week 
  • Thank you CIO for including HighScalability as a top Cloud blog.
  • Listen to your mom, just because everyone is doing it doesn't mean you should too. Timo Zimmermann in  My Stack Is Bigger Than Yours - Ranting About Web Applications And Scalability says the same about frameworks. Don't go full stack, instead: keep your stack as small as possible; always keep scaling in mind; only scale when you need it; use what you know; "it works" is good enough most of the time.
  • No longer fear signing up front for reserved instances: Amazon EC2 Reserved Instance Marketplace. Given the finance costs making money on the proposition will be tough, but if you want reserved instances to save money and/or increase redundant capacity you now don't have to let those resources remain unused. It also provides an upgrade path to new more powerful instances. 
  • 37signals managed to make a mobile version of their site in HTML5. Imagine that. Backstage: Basecamp for mobile. Starting from scratch and using an iterative, keep it simple, speed first approach, they managed to produce a mobile site they think users are very happy with. Mobile doesn't like lots of javascript so they went with no frameworks and went light on the javascript. 
  • Automatic Memory Machine. Evernote with great details their automated install system using Puppet, Preseed, PXE boot, and custom scripts. Like their VLAN approach to separate dev from production. 
  • Will SSDs give your database a performance kick? Intel SSD 910 vs HDD RAID in tpcc-mysql benchmark says yes: for its price Intel SSD 910 handles MySQL workload quite well... especially if you are looking for quick performance boost in IO heavy workload.
  • Supercomputer built with Raspberry Pi and Lego. It may not be practical from a performance and power perspective, but it's still berry cool: The whole system cost under £2,500 and has a total of 64 processors and 1Tb of memory (16Gb SD cards for each Raspberry Pi).
  • Track down those misbehaving queries with How to find MySQL queries worth optimizing? One helpful metric: ratio between rows sent and rows analyzed. 
  • Scaling out Postgres, Part 2: Routing: we now have a partitioning scheme with the ability to add new cells without stopping the system. We incur the logarithmic complexity when routing the external profile identifiers—provider-proid pairs, which does not happen often and can be further optimized by local caching of resolved routes.
  • A faster tuenti. Lessons Learned in Client Side Scalability: Load Javascript on demand; measure everything; Optimize the first Page Load. Use the right tool for the job; Maintainable Javascript; CSS matters.
  • Galaxy internals, part III: TCP/UDP, P2P, packet aggregation, guaranteed delivery and ordering, multicasting. 
  • Visualizing Latency Numbers Every Programmer Should Know. An interesting approach. Worth a look.
  • Distributed Reader-Writer Mutex: let's try to create a scalable distributed reader-writer mutex. The mutex is going to be very simple, I'm not going to dive too deep into advanced lockfree algorithms, let's just create the simplest possible distributed design.
  • Tiny Transactions on Computer Science (TinyToCS). A keep it short description of vetted technical papers.
  • 268x Query Performance Increase for MongoDB with Fractal Tree Indexes, SAY WHAT?: We’ve continued our experimental integration of Fractal Tree® Indexes into MongoDB, adding support for clustered indexes.  A clustered index stores all non-index fields as the “value” portion of the index, as opposed to a standard MongoDB index that stores a pointer to the document data.  The benefit is that indexed lookups can immediately return any requested values instead of needing to do an additional lookup (and potential disk IOs) for the requested fields.
  • RESOURCE ALLOCATIONVIACOMPETING MARKETPLACES: This thesis proposes a novel method for allocating multi-aribute computational resources via competing marketplaces. Trading agents, working on behalf of resource consumers and providers, choose to trade in resource markets where the resources being traded best align with their preferences and constraints.

This weeks selection: