Stuff The Internet Says On Scalability For June 29, 2012 - The Velocity Edition
Judging from the tweet flow, Velocity looked like a riotous good time. In this video on the main themes at Velocity, after a little microphone enhanced violence, John Allspaw and Steve Souders identify resilience and automation as two of the big ideas behind building a faster and stronger web.
John says resiliency is the idea that we we don't live in a perfect world so trying to build perfect systems is counter productive. We have to accept failure as a baseline and think in terms of degrees of availability. All abstraction layers leak so every part of a system must be monitorable and open to introspection.
A focus on resilience means the web is growing up. Resilience has long been a requirement for "real" systems, it's great to see the web thinking in terms of the complex systems they've always been. For the Alpha and Omega on resilience you'll want to watch Dr. Richard Cook's inspiring talk on How Complex Systems Fail.
Here are some of the most enjoyable Quotable Quotes from Velocity:
- @guypod : LTE latency has roughly the same latency we had with dialup connections. 3G latency is akin to satellite... (@patmeenan at #velocityconf)
- @akucharski : Akamai produces 1.3 billion log lines every day! #velocityconf
- @mikeodea : Facebook: 6 billion mobile messages (!!) every 30 minutes #velocityconf
- @mmaretzke : #velocityconf Last ... mind-boggling ... Facebook facts: 3.8 trillion cache operations in 30 minutes! Unbelievable. Scaling Systems. 160m newsfeeds, 5bln realtime msgs, 10bln profile pics, 108 bln queries on mysql still 30 minutes
- @atseitlin : "If I were to do things over again, I would think about the cloud first" #velocityconf @yammer on migrating to the cloud
- @holydevil : RT @laraswanson: Swapping DNS architecture for a major site yielded ~3s of page load time decrease for a site. @tomdyninc at #velocityconf
- @mikeodea : Facebook: 3.8 trillion cache ops every 30 minutes. #velocityconf
- @Anselmo : BBC mobile page render costs ~15 Joules on a HTC Android phone, which is pretty fantastic. #velocityconf
- @atseitlin : Integration points are #1 risk to stability @mtnygard #velocityconf. That's why we built the circuit breaker pattern http://bit.ly/LNCVr4
- @RealGeneKim : @mtnygard: "Antipattern #1: integrations are #1 risk to stability: every process call can/will kill you; even db calls" #velocityconf
- @adrianco : #velocityconf preventing failure is ultimately less successful than responding quickly to it. That's what we use rollback for.
- @ramarob : Twitter is 45% done converting monolithic Rails to modular Java apps. So 4+ yrs pain for alleged initial gain of rapid dev? #velocityconf
- @jbarciauskas : Bulkhead pattern: partition the system, allow partial failure. Apply at different levels: thread pools, load balancers #velocityconf
- @jonathanklein : "Don't optimize the little things" - I've said it before and it's a popular phrase at #velocityconf - you must measure and focus on big wins
- @jhofmann : Facebook runs BGP all the way to their top or rack switches. One common protocol they can build tools around. #velocityconf
- @kenny_dee : Time between requests : 3 seconds to type a URL on Google Chrome, 15 seconds to select a search result on Google ! #velocityconf #webperf
- @souders : Good advice from @jrauser wrt data analysis: "Look at the extremes and you'll find things that are broken." #VelocityConf
- @SubmittedDenied : ~45% of twitter traffic now served off JVM stack #velocityconf
- @laraswanson : Crazy mobile energy consumption: Changing images on Amazon to JPEG format would save 20% joules; on Facebook it'd save 30% #velocityconf
- @CoteIndustries : But then as you go up to 20 racks, colo is better #velocityconf
- @jbarciauskas : Per rack, EC2 servers are 5x more expensive, but inc. other costs: network, load balancing, racking/power, manpower, is equal #velocityconf
- @adrianco : #velocityconf Netflix dev teams push code 10s of times a day and rollback is needed a few times a week.
- @dlutzy : Operations people are 1. Monitoring 2. Responding 3. Adapting 4. Learning to enable Resilience (Cook) #velocityconf
- @allspaw : "The normal world is not well-behaved." Dr. Richard Cook #velocityconf
- @ginablaber : Dr Cook: "Resilience in my field is a life/death question. We design for reliability, but we what we want is resilience." #velocityconf
- @atseitlin : More Cook: How to design for resilience? Trust people, reveal controls, be transparent, foster learning. Good stuff. #velocityconf
- @RealGeneKim : Cook: "withstand transients, recovery swiftly/smoothly, prioritize to serve high level goals, recognize/respond.." #velocityconf
- @RealGeneKim : Twitter: "we do dark, rolling releases: only way we can gain confidence in releases, despite using iago in pre-prod" #velocityconf
- @RealGeneKim : Twitter: "outcomes: we can launch massive features in parallel: team org now matches sw stack; #velocityconf
- @jschauma : "The best disaster plans can be impacted by actual disasters." Mike Christian at #velocityconf
- @jonathanklein : Theo Schlossnagle says that eroding data granularity over time to save space is a mistake at #velocityconf
- @thewebvy : Uploading data over 3G uses almost 2x as much battery as downloading, especially when amt of data approaches 260k+ #velocityconf
- @souders : 25% of total time is DNS for CDN resources #VelocityConf
- @grigs : Global avg fixed latency is 125 and avg mobile is 290… global mobile consumer avg latency is 307.3 ms per Cisco – @guypod #velocityconf
- @jbarciauskas : Stability patterns: circuit breakers, timeouts, decoupling middleware, handshakes, test harnesses #velocityconf Some good basic engineering
- @willmeyer : Word. RT @mikebrittain: "Avoid passive-aggressive snark." Always. #velocityconf
Notes from Velocity Conf 2012 by Pablo Mercado
Talks - Slidecks or Video
- Velocity US 2012 Videos
- Velocity Slides
- Must See TV: Richard Cook, "How Complex Systems Fail"
- The Mobile Difference In Numbers by Guy Podjarny
- Beyond the Numbers by Baron Schwartz
- It's All About Telemetry by Theo Schlossnagle
- Understanding Hardware Acceleration on Mobile Browsers by Ariya Hidayat
- Mike Brittain, "Building Resilient User Experiences"
- John Rauser, "Investigating Anomalies"
- Arvind Jain & Dominic Hamon, "Predicting User Activity to Make the Web Fast"
- Time To First Tweet by @danwrong and @sayrer
- Rum for Breakfast by Philip Tellis
- Logging as Event Streams by Paul Querna
- Ben Galbraith & Dion Almaer, "The Performance of Web vs. Apps"
- Speed in the Opera Mobile Browsers by Luz Caballero
- Vik Chaudhary, "Broadening the User Perspective"
- Leveling Up by Kate Matsudaira
- Jesse Robbins, "Changing Culture & Being a force for Awesome"
- Patrick Lightbody, "Gathering Insights from Real User Monitoring"
- Lindsey Simon, "Lightning Demo"
- Down With the Fancy Pants! -- How People Have Been Optimizing the Wrong Things and Increased Complexity by Jan Schaumann
- The 90-Minute Mobile Optimization Life Cycle by Hooman Beheshti
- Simple Log Analysis and Trending by Mike Brittain