Stuff The Internet Says On Scalability For December 30, 2011

Pork. The Other HighScalability:

  • PlentyOfFish: 6 Billion Page Views; World: info doubling every 2 years; 2015: 7,910 exabytes of global digital data; Khan Academy: 4 million uniques; G+: 62 million users; Zynga:  leased 9 megawatts of capacity; Heroku: billions of page views a month
  • Quoteable quotes:
    • Udi Dhan : Scalability is not boolean.
    • John Boyd: Look at the mission, not the technology. And if you do look at the mission, don't look at the most fashionable mission of the day.
    • @BigDataClouds : I think the fear of change is the biggest challenge that companies are facing
    • @cjzero : If you were wondering, the #Mythbusters scalability test of a Newton's Cradle using wrecking balls? Busted.
    • @Xorlev : Scalability is really really hard. That's why it's fun. It pushes the limits of engineering talent.
  • 100 Best Cloud & Data Stats of 2011 by Zenoss. Lots of fun facts about how mind bendingly huge the world of information is exponentially becoming. More from the Economist: Welcome to the yotta world
  • Fault tolerant applications in nodejs. Marak Squires with an excellent run down on building distributed systems in general and node.js in particular. A sampler: 1) It's mathematically impossible your distributed application is not going to f*ck up. Deal with it. 2) Operating system's RAM, processes, and file descriptors are relatively cheap. Use them as needed. 3) Embrace crash-only design. 4) Use hooki.io with Paxos. 
  • Why Startups Could Use .NET, But Don’t by Ian Muir. The real reason startups don’t use .NET is based more on culture than technology. Benjamin Pollack, of Fog Creek Software, leaves a comment on a Hacker News thread, which details another reason why: "Whenever we're doing something in Kiln that Microsoft anticipated, the environment is among the best I could possibly imagine. But when we do something that Microsoft did not anticipate, it's nothing but pain." Frameworks give and the taketh away...
  • On Distributed Failures (and handling them with Doozer). Blake Mizerany with one of the better descriptions of the chaos of building a real distributed system I've seen, based on their experience building a PaaS at Heroku. Most distributed systems are glorified 2 or 3 tier systems, what happens when it gets more complicated than that? Some bits: Communicate - servers talking to each other, knowing who is dead, who is alive, etc. Use nginx, varnish, hermes, and dynos. Git pushes code. Broadcasting status messages doesn't scale, so you Coordinate. RabitMQ is used to shuttle messages around. Need the state of all the things, kept in a database. Race conditions. Distributed dead-lock. Out-of-sync problems. There's a disconnect between state, messages, and logic that continually produce race conditions. Need consistency and high yield (p(you will get a response to a request)). Tolerate - Doozer is a Paxos implementation, distributed state machines so nodes in a cluster can come to a consensus on a value. Avoid consistency problems from split brains.
  • Fiesta! with a live-blog summary of a presentation by Kenny Gorman on Shutterfly on MongoDB Performance Tuning. That's a mouthful! Shutterfly has 8 MongoDB clusters with ~30 servers in a datacenter. Process is like that for RDBMS, look at queries etc; modeling is key; know when to stop tuning; fast when read-only, minimize writes; use the profiler; use explain & mongostat; split on functional areas first to different replica set clusters, then worry about sharding those; if you have a lot of physical I/O use SSD, Shutterfly saw a speedup of 500% w/ flashcache.
  • Greg Linden contributes another great set of Quick links. Some real good ones.
  • What is LTE? Neal Gompa with an epic post explaining the miracle of wireless speed that is LTE. LTE represents a paradigm shift from hybrid voice and data networks to data-only networks.
  • Nice short explanation of how to configure a HA environment for ScaleBase on Amazon EC2.
  • Building Blocks for a Participatory Web of Things: Devices, Infrastructures, and Programming Frameworks. Vlad Trifa proposes an end-to-end, fully Web-based framework that fosters fast prototyping of distributed sensing applications that run on top of heterogeneous NEDs (Networked Embedded Devices).
  • Amazon now offers Object Expiration for S3, which schedule the removal of objects after a defined time period. Extremely handy for caches, deleting old logs, and generally keeping data sizes down. Great feature. Setting up timers to delete huges swaths of data is much harder than it sounds.
  • Scalability Matters In The Digital Economy. Great graphic showing how most revenue jumps during the holidays and if you can't handle that traffic spike, it will be an unhappy christmas.
  • Thrift Architecture. Scale the thrift server layer with load independent of your middle-tier and hbase cluster scaling. Run the thrift servers on the middle-tier servers or on the hbase cluster nodes.
  • How to Make a Massively Cross-Platform Game: Use Lua for Everything!
  • Why (and how) we've switched away from Google Maps. API/service selection for an application is like sourcing ingredients for a restaurant. If you are concerned about Google's new Map pricing policy, consider OpenStreetMap as a viable option for your mapping needs. 
  • Deleting data can be surprisingly expensive: So if you have 100 millions entities that's an easy $200-300 to delete it. And believe me it is really easy to generate that many entities when your app processes 1500 req/s.
  • So cool: Decades later, a Cold War secret is revealed -  "They envisaged a satellite that was 60-foot long and 30,000 pounds and supplying film at speeds of 200 inches per second. The precision and complexity blew my mind."
  • MySQL at Facebook, Current and Future. Facebook performance engineer Domas Mituzas gives a keynote address on how Facebook uses MySQL, including a lot of bottlenecks they have found and solved (Krishna Sankar with a nice summary). More MySQL improvements by fixing the Group Commit Problem, which shows the value of a more general scalability principle: combining pending work together instead of processing each request separately. For transaction commits using separate commits increases IO and queueing compared to doing a batch of commits at once. The workaround, lazy commits, effectively removed the D in ACID because the commits would be delayed.
  • When you are Facebook with 800+ million users, here's the kind of thing you create: Doppler - "measures worldwide packet latency and generates a mapping between user IPs and DNS resolver IPs. A billion data points can cover a lot of methodological sins. Doppler is not about per-user data, it's about measuring the internet as a whole as it relates to our servers."
  • I always enjoy Nat Torkington' Four short links posts. You may have randomly used sampling as a scaling strategy, Nat points to Sparse- and low-rank approximation wiki that describes a neat trick: "instead of sampling at 2x the rate you need to discriminate then compressing to trade noise for space, use these sampling algorithms to (intelligently) noisily sample at the lower bit rate to begin with. Promises interesting applications particularly in for sensors."
  • If you are into flute music check out F# Contrabass (Long Ago, Deep Within) - Native American Flute in a Tunnel. Amazing!