Stuff The Internet Says On Scalability For February 1, 2013

Hey, it's HighScalability time:

  • 400 milliseconds : modern private wire for Natural Gas; 620,000: size of a mobile phone botnet; 1 million cores: jet noise modeling supercomputer; 2.2 petabytes per gram: DNA with storage density; 1 billion: Google's one quarter infrastructure spend
  • Quotable Quotes:
    • @StartupLJackson: "Good thing we spent 6mo on scalability pre-launch. The thing went hyper-viral day one, just as planned." - nobody ever.
    • @dysinger: WELCOME TO DEVOPS ADVENTURE! YOU ARE STANDING INSIDE AWS. NEARBY IS AN ANGRY ELB. THERE ARE SOME SSH KEYS ON THE GROUND.
    • ~‏@levie: Well cloud computing was fun while it lasted; now we can just store all our data in DNA strands.
    • @mtnygard: Debates about “Tech X > tech T” aren’t religious wars. They’re also economic wars.
    • @alfonsol: I do not believe your per load number of socks requires scalability.
    • @mikeolson: Latency breeds contempt.
    • @DrQz: GitHub looks to me like power-law growth with 4MM users ~ Oct 2013 and 5MM ~ April 2014 
    • @floodgateintern: It spends only 1.9% of its time in the kernel at one core, this grows to 23% at 48 cores, indicating scalability limitations
    • @abhatta9: Scalability is key in monetizing the mobile opportunity...Walmart, 140M shoppers a week!
    • @98pm: "@reillyusa: The cloud unit of currency is workload not instance.” < +1, I'd point to the culture of Virtualisation
    • @traviskaufman: lesson of the day: there is a major difference between "scalability" and "overcomplication"
    • @Posmatrach: Scalability is mostly a matter of the level of thought given to the design, rather than technology.
    • @ArnaudGourlay: "Scalability does not come from your runtime platform but from your application architecture"
    • @peakscale: Intra-region transfer price drops are big and a good thing. Now curious what sync-for-you services AWS is planning on.
    • @backslashr: Martin Fowler makes a case for throwing hardware at scalability problem. I tend to disagree to this particular statement:
  • Next step after service orientation? Symbioses. The world swarms with symbiosis. 
  • How EVE Online servers deal with a 3,000 person battle. There's nothing to throttle as servers run at 100%. The game is sharded by solar system, multiple solar systems run on a machine are moved when a machine becomes overloaded. Some games are run on machines with exceptional hardware. The trickiest strategy to deal with large battles is time dialation, time is slown down. Interesting discussion on Hacker News about game design,  bandwidth requirements and state management techniques. 
  • Towards A Swarm of Agile Micro Quadrotors: We describe the architecture and algorithms tocoordinate a team of quadrotors, organize them into groups andfly through known three-dimensional environments. We provideexperimental results for a team of 20 micro quadrotors.
  • Do streaming music royalties scale? You be the judge: On Spotify, 131,000 plays last year netted just $547.71, or an average of 0.42 cent a play.
  • It's DevOps human style - Mind follows the body: Neurobiologists such as Antonio Damasio of the University of Southern California have demonstrated that emotions begin with actions - rapidly increased heart rate, for example - and end with the perception of those actions - the sensation of fear or anger. Damasio calls this the "body loop": the brain learns of the body's response to change via chemical and electric signals conveyed by the bloodstream and nervous system. Thus feeling follows behaviour; the mind follows the body.
  • Cache oblivious algorithms. Cache aware algorithms are 2X faster than cache oblivious algorithms and 5x faster the cache naive algorithms. You can make a program more cache efficient without knowing the cache line size ahead of time. As the problem size grows cache oblivious approaches cache aware in performance.
  • Here are some ideas on how to model data based on "time bucket". Also, Indexing Hbase Data. Also also, Tables vs CFs vs Cs.
  • Airbnb found the Holy Grail. Their search is etched in Our First Node.js App: Backbone on the Client and Server. Only this is no myth and the symbology of the grail has changed to fit a more modern context. The grail is executing common code on the client and the server. This stack replaced their old Backbone.js + Rails stack and is 5x faster. Also, Scalable realtime services with Node.js, Socket.IO and Windows Azure. Also also, MySpace is built using Node and Express
  • Using Cascalog to build
 an app based on City of Palo Alto Open Data. Great example by Paco Nathan of how to bring different datasets together to do something unexpected - go for a walk. Quite detailed and practical. 
  • Metaphor: a system for related search recommendations at LinkedIn: Metaphor builds on a number of signals and filters that capture several dimensions of relatedness in search activity: correlation based on time, correlation based on clicks, correlation based on term overlap, and length bias.
  • Ah, FSM based specs, so refreshingly clear. 
  • Amazon Launches New Cloud Based Video Transcoding Service. I guess an advantage of working closely with customers is you can cherry pick high value services and make them your own. This strategy works on American Idol at least. Also, on demand prices reduced by 10-20%.
  • Reaching 200K events/sec. Optimizations like pipelining, batching, and removing magic framework layers really spruce things up.
  • Why is MongoDB wildly popular? It’s a data structure thing. Nice description of the issues and a good discussion. Working with complex value structures for a while and then moving to tables is a serious reminder of the data structure thing. But then again constraints and consistency are nice too. 
  • A couple of good articles on Game Theory.
  • Facebook fights spam using a domain-specific language called FXL. 
  • More Back-to-Basics Weekend Reading - Epidemics: Alan Demers' seminal paper on epidemic techniques for database replication.
  • Forecasting MySQL Scalability with the Universal Scalability Law: Forecasting a system’s scalability limitations can help answer questions such as “will my server handle ten times the existing load?” and “at what point will I need to upgrade my hardware?” Timely answers to these questions have more business value than exact predictions. Dr. Neil J. Gunther’s Universal Scalability Law is a model that predicts a system’s deviation from ideal scalability, based on simple measurements that are relatively easy to collect.
  • Multi-armed Bandit Experiments: This article describes the statistical engine behind Google Analytics Content Experiments. Google Analytics uses a multi-armed bandit approach to managing online experiments.
  • Thredis is Redis + SQL + Threads.