Stuff The Internet Says On Scalability For August 31, 2012

It's HighScalability Time:

  • LHC compute jobs: use 1.5 CPU millennia every 3 days; Obama helps load test Reddit: 4.3 million page views
  • Quotable Quotes:
    • @secastro: Want to see nearly a terabyte of memory?
    • @DZone: Apache Projects are the Justice League of Scalability
  • Apple And Google Might Be Negotiating Patents. Remember when empires and nation states would have a nice little summer war and then negotiate boundary lines and terms of trade?
  • Google Faculty Summit 2012: The Online Revolution - Education at Scale. Someday we may have direct knowledge downloads and augmented wisdom packs, but until then these primitive attempts at learning process improvement are a good start. 
  • OnLive lost: how the paradise of streaming games was undone by one man's ego. The fascinating story behind a radical idea: applications hosted and rendered in the cloud while being displayed remotely on a device. You might have thought latency would be the killer, but it turned out the rendering required a physical machine per customer, which isn't viable economically. Virtualization has its place. Another scalability lesson proves the virtue of incremental expansion: the company deployed thousands of servers that were sitting unused, and only ever had 1,600 concurrent users of the service worldwide. There were all these reasons why we were going to be an instant success, but it didn't succeed instantly.
  • On IT as a material metaphor, it's interesting how in material science materials are elastic in that they can stretch a certain amount and then snap back to their original shape, be permanently deformed, slowly creep into new shapes over time, or just break. There are measures of this characteristic called stiffness, strength, and ductility. We don't have these kind of notions in the materials we build though they seem to apply at some deeper level.
  • Scaling Deep Learning at Google by Jeff Dean: Automatically discover high level categories from vast amounts of unlabeled data combined with smaller amounts of labeled data. Bigger models + more training data = better results. 2 billion parameter models. The approach is parallelize the heck out of everything... Where does Google get their data? You. So those free services...not really free. You are really helping train SkyNet.
  • If Chef and Puppet have left you a little battered and bruised in the past then why not trying rubbing a little Salt in your wounds? Sebastian Kreutzberger says Salt is Python based and is like: a mix of Chef/Puppet (defining states) and an easy way to communicate with machines directly (like with an MQ). The big difference to Chef is the architecture: the slave (called minion) does not pull for changes every bunch of minutes, which can cause weirdness, but has a standing connection to the master which allows instant changes and commands. 
  • iPhone Cloud Based Architecture, Part 1. Nice description of how CollectedIt! built their iPhone app. The backend uses a Microsoft stack so it may be of particular interest to those who have alternative coding styles.. 
  • Great interview with Simon Wardley on DevOps cafe: Diffusing knowledge about the thusness of technology. Cycles of chaos to custom to utility ripple across time through the ebb and flow of opposites like distributed and hierarchical, deviation and standardization and push vs pull. Higher level organizations form and reform as standardization washes up and down stream, knocking down inertia barriers to change along the way. Differential advantage leads to a world of operational efficiency. Ignoring customers gives way to listening to customers. Look for weakness in competitors and attack by strategically understanding the process of change. Commoditize a barrier to entry. Play ecosystem games. Then answers don't have much to do with the questions, but still a very interesting conversation.
  • Why I’m choosing CouchDB. Always good to see people reason through these things for their own problems. Hates: SQL and Node.js. Likes: json, views, distributed, git
  • A million times have I meant to write about Akka, but for some reason I haven't got around to it. Distributed (in-memory) graph processing with Akka is a really clear code example using Akka to code, well, a distributed graph solution. Actors rule!
  • The Hot Topics in Software Defined Networking conference makes all their papers available. Lots of good work. 
  • Be not impressed by big number advises a thoughtful thread on Google Groups. Baron Schwartz with a good gloss: Fermilab says their LHC compute jobs gobble CPU at a mind-boggling rate of 1.5 CPU millennia every 3 days (exercise for the reader: why not cite the amount used in 1 day?). But scalability and capacity planning guru Neil Gunther advises us not to be too impressed by the numbers, and compares them to SETI@home, which he says is of similar magnitude. A follow-up to the thread guesses at the likely failure rate of components in the cluster, estimating 2 switch failures per day, among other things.
  • Jellyfish: Networking Data Centers Randomly: we're proposing a slightly radical alternative: a completely random network. First, high bandwidth helps servers avoid bottlenecks. Second, we want a network that is incrementally expandable.
  • How Algorithms Rule the World. Welcome to your new Bubble Sort overlords.
  • Greg Linden with another great set of Quick Links. A few themes are the dark fate of startups, search, and Microsoft.

This weeks selection: