Stuff The Internet Says On Scalability For December 7, 2012

It's HighScalability Time:

  • Quotable Quotes:
    • Built to win: 4Gb/s, 10k requests per second, 2,000 nodes, 3 datacenters, 180TB and 8.5 billion requests. Design, deploy, dismantle in 583 days to elect the President. 
    • @CarlosTheSailor: In modern terms, feudalism was a sort of scalability solution for the tribal system - @angel_m, starting from the beginning
    • @randybias: "Software-defined" is the new "cloud." Sprinkle it on your products along with an API and you *are* the future.
  • How can you resist a story about Lady Gaga and BigData? BigData magic helps convert her more than 31 million Twitter followers and over 51 million Facebook followers into sales by creating more intimate communities of little monsters. While Twitter, Google, Apple, and Facebook are all concentrating on eviscerating the middleman, Lady Gaga wants to cut them all out of the action too. Reap and sow. Reap and sow. 
  • Multi-Armed Bandit testing sounds so much cooler than A/B testing. And you always have a .000001% chance of hitting a jack pot.
  • Building Scalable Web Architecture and Distributed Systems. Kate Matsudaira has written excellent survey of the subject, a nice resource to send someone who is trying to get the big picture in a single shot.
  • Networking ain't easy. GitHub with an awesome write up of their networking problems that caused a little down time. Some new switches went dumb and couldn't learn their A B MAC addresses, which caused flooding, which saturated the links. Some good lessons on dealing with problems as well. Work with vendors sooner, add more monitoring, don't get blinded by your initial diagnoses, run more disaster scenarios. 
  • There's a metric crap ton of interesting discussion in this thread: Go failed to solve significant problems (according to ex-googler). Nothing was resolved of course, but we do get to see the various expectations that are put on our programming languages. For one person's must have feature there's another person saying meh. Then they argue about it.
  • When you are looking to choose between two systems something like HBase vs Cassandra, a short and to-the-point comparison of the two, is exactly what you want to find.
  • vmWare increased the capacity of their Application Performance Manager (APM) by 5x, reaching a throughput of up to 3M metrics per minute on a single VM, by: Replace the metric processing and storage logic, Apply several scalability design patterns, Implement various performance optimizations, Utilize off-the-shelf technologies, Keep the road open for horizontal-scalability later on.
  • Scalability is Easy! (To Get Wrong): When we set out to make a system scale, we need to identify the real scenarios we are trying to scale for and the bottlenecks that stand in our way. Blindly performance tuning can look like an improvement, but is really just a poor short term investment that often entrenches the current performance problems even more deeply.
  • Is it the Golden Age of APIs? And Lucy always tells Charlie this is the golden age of kicking the football.
  • Netflix Announces Blitz4j - a scalable logging framework: the Netflix logging infrastructure that helps Netflix achieve high volume logging without affecting scalability of the applications. Karthikeyan Ranganathan says to "Think of blitz4j as a simple framework on top of log4j to make log4j much more scalable for high traffic applications. Scribe is a log collector framework."
  • Web Advent with advice on Going from One to a Million Users. Money solves problems - let others manage your infrastructure. Start small - scale vertically when you need to grow up. Necessary complexity - separate out your system architecture by broad components and services. 
  • Greg geeks out with a review of Google's Processing a Trillion Cells per Mouse Click paper: And that's the point. Build a system that can answer 90% of the questions people ask of the logs in seconds. Build another than can answer 90% of the remaining, harder questions people ask of the logs in minutes. Greg in another post talks about Google's strategy of continually running experiments and how "The entire way you develop software, the entire way you develop product, changes when you are A/B testing everything constantly and continuously."
  • Werner Vogels on Cost-Aware Architectures: find the dimension you are going to make money over and then make sure that the architecture follows the money. Related to, but not quite the same as Cloud Programming Directly Feeds Cost Allocation Back Into Software Design. Related, AWS Re:Invent was Awesome! and videos from Invent.
  • Scaling up or out. Good discussion of moving up to a 16GB VM or add more 8GB VMs? With high iowait times considering adding more nodes. 
  • Great interview with Daniel Abadi on On Big Data, Analytics and Hadoop. Talks eventual consistency, BigData, and Calvin, a system scales transactional processing in database systems to millions of transactions per second while still maintaining ACID guarantees.
  • The human brain keeps a pain map. Maybe software should build more physically oriented models as well?
  • Antirez breaks down Twemproxy, a Redis proxy from Twitter. A proxy of this sort is something you sit in-between your application and the centralized thingy you really wish was properly distributed. A discussion of the design, likes, and dislikes ends with the verdict: thumbs up. 
  • Holloween has passed, but here are Three fundamental tricks for developers writing distributed systems: use a database-based queue to run jobs that propagate data to other systems; Enqueue ids, not data; Embrace idempotency. 
  • Nice looking diagram of a Typical “Big” Data Architecture.
  • A Comparison and Critique of Eucalyptus, OpenNebula and Nimbus: In analyzing these various open-source cloud computing frameworks, we find that there are salient philosophical differences between them regarding the overall scheme of their design
  • Are column stores really better at compression?: Column stores will almost always have a compression and/or compression performance advantage over row stores. < Some excellent comments as well.
  • Amazon Plans Carefully Its Distribution Capacity Growth: Because Amazon opens so many warehouses — 40 in North America — the company believes in rapid build cycles, which allows it to build capacity as the company needs it. Amazon is challenged by ever growing sales, year-over-year growth of 35 percent, and hockey stick seasonal demand, which is 3X greater than non-seasonal demand. Amazon’s CFO has made it clear that he does not want to pay for unused capacity.