Stuff The Internet Says On Scalability For June 29, 2012 - The Velocity Edition

Judging from the tweet flow, Velocity looked like a riotous good time. In this video on the main themes at Velocity, after a little microphone enhanced violence, John Allspaw and Steve Souders identify resilience and automation as two of the big ideas behind building a faster and stronger web.

John says resiliency is the idea that we we don't live in a perfect world so trying to build perfect systems is counter productive. We have to accept failure as a baseline and think in terms of degrees of availability. All abstraction layers leak so every part of a system must be monitorable and open to introspection.

A focus on resilience means the web is growing up. Resilience has long been a requirement for "real" systems, it's great to see the web thinking in terms of the complex systems they've always been. For the Alpha and Omega on resilience you'll want to watch Dr. Richard Cook's inspiring talk on How Complex Systems Fail.

Here are some of the most enjoyable Quotable Quotes from Velocity:

  • @guypod : LTE latency has roughly the same latency we had with dialup connections. 3G latency is akin to satellite... (@patmeenan at ‪#velocityconf‬)
  • @akucharski : Akamai produces 1.3 billion log lines every day! ‪#velocityconf‬
  • @mikeodea : Facebook: 6 billion mobile messages (!!) every 30 minutes ‪#velocityconf‬
  • @mmaretzke : ‪#velocityconf‬ Last ... mind-boggling ... Facebook facts: 3.8 trillion cache operations in 30 minutes! Unbelievable. Scaling Systems. 160m newsfeeds, 5bln realtime msgs, 10bln profile pics, 108 bln queries on mysql still 30 minutes
  • @atseitlin : "If I were to do things over again, I would think about the cloud first" ‪#velocityconf‬ @yammer on migrating to the cloud
  • @holydevil : RT @laraswanson: Swapping DNS architecture for a major site yielded ~3s of page load time decrease for a site. @tomdyninc at ‪#velocityconf‬
  • @mikeodea : Facebook: 3.8 trillion cache ops every 30 minutes. ‪#velocityconf‬
  • @Anselmo : BBC mobile page render costs ~15 Joules on a HTC Android phone, which is pretty fantastic. ‪#velocityconf‬
  • @atseitlin : Integration points are #1 risk to stability @mtnygard ‪#velocityconf‬. That's why we built the circuit breaker pattern
  • @RealGeneKim : @mtnygard: "Antipattern #1: integrations are #1 risk to stability: every process call can/will kill you; even db calls" ‪#velocityconf‬
  • @adrianco : #velocityconf‬ preventing failure is ultimately less successful than responding quickly to it. That's what we use rollback for.
  • @ramarob : Twitter is 45% done converting monolithic Rails to modular Java apps. So 4+ yrs pain for alleged initial gain of rapid dev? ‪#velocityconf‬
  • @jbarciauskas : Bulkhead pattern: partition the system, allow partial failure. Apply at different levels: thread pools, load balancers ‪#velocityconf‬
  • @jonathanklein : "Don't optimize the little things" - I've said it before and it's a popular phrase at ‪#velocityconf‬ - you must measure and focus on big wins
  • @jhofmann : Facebook runs BGP all the way to their top or rack switches. One common protocol they can build tools around. ‪#velocityconf‬
  • @kenny_dee : Time between requests : 3 seconds to type a URL on Google Chrome, 15 seconds to select a search result on Google ! ‪#velocityconf‬ ‪#webperf‬
  • @souders : Good advice from @jrauser wrt data analysis: "Look at the extremes and you'll find things that are broken." ‪#VelocityConf‬
  • @SubmittedDenied : ~45% of twitter traffic now served off JVM stack ‪#velocityconf‬
  • @laraswanson : Crazy mobile energy consumption: Changing images on Amazon to JPEG format would save 20% joules; on Facebook it'd save 30% ‪#velocityconf‬
  • @CoteIndustries : But then as you go up to 20 racks, colo is better #velocityconf
  • @jbarciauskas : Per rack, EC2 servers are 5x more expensive, but inc. other costs: network, load balancing, racking/power, manpower, is equal ‪#velocityconf‬
  • @adrianco : ‪#velocityconf‬ Netflix dev teams push code 10s of times a day and rollback is needed a few times a week.
  • @dlutzy : Operations people are 1. Monitoring 2. Responding 3. Adapting 4. Learning to enable Resilience (Cook) ‪#velocityconf‬
  • @allspaw : "The normal world is not well-behaved." Dr. Richard Cook ‪#velocityconf‬
  • @ginablaber : Dr Cook: "Resilience in my field is a life/death question. We design for reliability, but we what we want is resilience." ‪#velocityconf‬
  • @atseitlin : More Cook: How to design for resilience? Trust people, reveal controls, be transparent, foster learning. Good stuff. ‪#velocityconf‬
  • @RealGeneKim : Cook: "withstand transients, recovery swiftly/smoothly, prioritize to serve high level goals, recognize/respond.." ‪#velocityconf‬
  • @RealGeneKim : Twitter: "we do dark, rolling releases: only way we can gain confidence in releases, despite using iago in pre-prod" ‪#velocityconf‬
  • @RealGeneKim : Twitter: "outcomes: we can launch massive features in parallel: team org now matches sw stack; ‪#velocityconf
  • @jschauma : "The best disaster plans can be impacted by actual disasters." Mike Christian at ‪#velocityconf‬
  • @jonathanklein : Theo Schlossnagle says that eroding data granularity over time to save space is a mistake at ‪#velocityconf‬
  • @thewebvy : Uploading data over 3G uses almost 2x as much battery as downloading, especially when amt of data approaches 260k+ ‪#velocityconf‬
  • @souders : 25% of total time is DNS for CDN resources ‪#VelocityConf‬
  • @grigs : Global avg fixed latency is 125 and avg mobile is 290… global mobile consumer avg latency is 307.3 ms per Cisco – @guypod ‪#velocityconf‬
  • @jbarciauskas : Stability patterns: circuit breakers, timeouts, decoupling middleware, handshakes, test harnesses ‪#velocityconf‬ Some good basic engineering
  • @willmeyer : Word. RT @mikebrittain: "Avoid passive-aggressive snark." Always. ‪#velocityconf‬

Notes from Velocity Conf 2012 by  Pablo Mercado

Talks - Slidecks or Video