Stuff The Internet Says On Scalability For December 9, 2011

It takes a licking and keeps on HighScalabilitying:

  • Instagram: 60 photos per second; Twitter: 8,868 Tweets per second MTV Awards; Foursquare: 15 million users; Facebook: 60 million qps; Netflix: stream 1 billion hours in Q4; Evolution: 16,000 eyes; Evernote: 750K Paid Users; AOL: 60,000 Servers;  Netflix API: 20,000 requests per second; Tumblr: 900 posts per second.
  • Quotable quotes:
    • @LusciousPear : This is my key-value store. There are many others like it, but this one is mine.
    • @pavlobaron : @old_sound if this site isn't on highscalability.com , I won't open it
    • @nzmrmn : Words I hate: moist, "sinfully decadent," facilitator, dopesauce, deliverables, accountability, coolio, "ping me," scalability.
    • @anggriawan : Mark Callaghan (Facebook Engineer): "You can tune #MySQL to perform very fast per node if you know what you’re doing."
    • Jesper Nordenberg : Java is good at hiding the real complexities of programming and introducing incidental ones.
    •  @markimbriaco : It's true, I'm a hardware nerd. I love servers, not gonna lie.  
      • I see a lot of people comming out of the server closet these days (Kyle Brandt). Be proud!
    • @anggriawan : Facebook: Most (>90%) queries never hit database, but only touch cache layer.
  • Possibly NSFW Cartoon: The Hard Life of a NoSQL Coder
  • Making the Netflix API More Resilient. The video must flow. Ben Schmaus explains how Netflix keeps the landsraad supplied even through collapse and revolution: graceful fallback driven by real-time stats and visibility through a dashboard.
  • IBM’s 3 big chip breakthroughs explained: Racetrack memory - electrons round a wire like a car goes around a racetrack; Graphene - carbon-based chips; Carbon nanotubes - first transistor with channel lengths that are smaller than 10 nanometers.
  • One reason instances are rebooted in the cloud: security updates. It can't always be a sexy cascading failure.
  • Replication and the latency-consistency tradeoff. Daniel Abadi takes a penetrating look at the nature of replication and how it reduces consistency via increased latency, not just through network partitions. There are only three alternatives for implementing replication (each with several variations): (1) data updates are sent to all replicas at the same time, (2) data updates are sent to an agreed upon master node first, or (3) data updates are sent to a single (arbitrary) node first. 
  • Melanie Mitchell says We Need to Talk About Scaling - Because scaling properties—how one variable changes as another is varied— can provide a lot of insight into the mechanisms underlying these properties in a complex system. The mathematical forms describing scaling properties often can give fundamental clues to how complex systems work, and, in fact, provide a rigorous vocabulary for talking about such systems.  Anyone who wants to understand complexity needs to know about the important mathematical forms that scaling relationships typically exhibit.
  • Apache Kafka - A high-throughput distributed messaging system, originally built by LinkedIn for activity stream processing. Excellent and detailed description of activity streams and what is needed to process them. Makes the case that using the disk for streaming rather than random IO is quite efficient.
  • Where are the Hardware Clouds? asks Satnam Singh in Reconfigurable Data Processing for CloudsThis is an ideal time to dramatically improve the efficiency of data-centres by mapping common and large-scale tasks into shared, million-LUT FPGAs boards that complement the general-purpose hardware currently installed. Also, LLHDL - Low Level Hardware Description LanguageFPGA HPC – The road  beyond processorsOpenCL for heterogeneous reconfigurable systems programmingAltera announces industry’s first OpenCL program for FPGAs
  • Pagination with Cassandra and what we can learn from it by Michael Kopp. I've also seen these called stateless cursors. 
  • eBay has released ql.io -  a declarative, evented, data-retrieval and aggregation gateway for HTTP APIs. ql.io consists of a domain-specific language inspired by SQL and JSON, and a node.js-based runtime to process scripts written in that language.
  • A Comparison of LTE Advanced HetNets and WiFi.  The next performance and capacity leap will come from network topology evolution by using a mix of macro cells and small cells – also referred to as a Heterogeneous Network (HetNet) – effectively bringing the network closer to the user. Pico cells offer a superior performance than Wi-Fi APs due to expanded coverage of LTE Advanced pico cells.
  • Batching operations is a big scalability win and it's now Amazon approved: Amazon S3 - Multi-Object Delete. Delete up to 1000 objects from an S3 bucket with a single request.
  • Cassandra : Lessons learnt. Buddhika Chamith shows how he built a data monitoring store on top of Cassandra. Going against type, the data is dynamic and the queries are not known ahead of time, which requires building extra composite indexes. Excellent details on the process. Conclusion: All in all it’s rather involved scheme to get what we wanted out of Cassandra. Not a big selling point I would say. We like Cassandra for its fast writes and scalable architecture.
  • If you are arguing with yourself if you should stay and stick with a relational database then WHY I’M PRETTY EXCITED ABOUT USING NEO4J FOR A CMDB BACKEND by Willie Wheeler is a great read. So far I’m finding it a lot easier to work with the graph database than with a relational database, and I’m finding Spring Data Neo4j to be a big help in terms of repository building and defining app-level schemas. The code is a lot smaller than it was when I did this with a relational database.
  • Videos are now available from the The Google Test Automation Conference (GTAC).
  • Entertaining translation from Russian of Lev Walkin's experience with Riak. Lots of good technical discussion. Mathias Meyer summarizes: He recommends to start out with a relational database like MySQL or Postgres if you don't know what Riak is and how you'd benefit from it, or how you data will evolve over time. Start out with one of them, add Riak to the mix later, when the need arises.
  • We already had a big post on Monday, if you need even more stuff, and with Christmas coming, we always need more stuff.