Stuff The Internet Says On Scalability For January 11, 2013

Hey, it's HighScalability time:

  • 240,000,000,000 URLs : Wayback Machine; 743 billion : number of words Google analyzed to find etaoin srhldcu were the most used letters in the English language
  • Quotable Quotes:
    • @actuallyshayne : Building cloud scalability is a lot like playing a tower defense game.
    • @traviskaufman : lesson of the day: there is a major difference between "scalability" and "overcomplication"
    • @deathmtn : Sometimes it seems like that storm of discussion about massive scalability has boiled down to "avoid JOINs and other multi-table queries."
    • @rbranson : we use c1.xlarges for user caching since they need to push so many req/sec.
    • @rbranson : we've got memcache instances in an AZ with no app instances and they can push 100K PPS across AZs fine.

  • Gabriel Weinberg of DuckDuckGo on Orders of magnitude: I find framing things in orders of magnitude is a really useful way to measure progress and think about the future. Not much changes structurally if you grow by a factor of two; usually your technical and non-technical infrastructure can handle that kind of growth pretty easily. But when you grow by a factor of ten (an order of magnitude) something usually breaks. 

  • C++ and Beyond 2012: Herb Sutter - C++ Concurrency.  Wonderful talk where the key message is never block, ever. Wait free algorithms are the gold standard of concurrency. Hear that mutex? Truth: Sharing is the root of all contention. Though I fear a future where reading source code will be as much use as pasting binary file printouts to your forehead.

  • An apt metaphor for firewall security: The tragic point of their strategy is that the Romans concentrated military force at the frontier. When the Germans attacked the frontier and got in behind the Roman troops, the whole Roman territory was open. Think of the empire as a cell, and barbarian armies as viruses: Once the empire's thin outer membrane was breached, invaders had free rein to pillage the interior.

  • If you are into nerd adventure travel then you may enjoy The Dynjeeling Unlimited Part 4: Finding A Data Center Home In India by Tom Daly. It's a really interesting story. Just finding a Data Center can be interesting, but finding one in the vastness of India is doubly interesting. ** SPOILER ** : We’re pretty sure we will be placing our first India POP in Mumbai on the west coast of the country...Mumbai is able to offer us a more stable power service than the other cites are able to.

  • Thumbs up to this relatively simple thumbnail generation system using Thumbor and Cloudfront.  How Yipit Scales Thumbnailing With Thumbor and Cloudfront

  • Improving ticket spinlocks: Spinlocks, being the lowest-level synchronization mechanism in the kernel, are the target of seemingly endless attempts at performance enhancement. Now, though, some developers have identified a performance bottleneck associated with these locks and are busily trying to come up with an improved version. 

  • Relational Model Considered Obsolete. I don't really understand how Actors and the relational model relate, but the discussion is interesting. Also, Actor Model of Computation: Scalable Robust Information Systems

  • H.265 4K streams at 1080p video file sizes. The future will be televised and it will look good. Also, Netflix is building out their own CDN by deploying caches inside ISP networks.

  • Process Partitioning in Distributed Systems. Ryan Smith with a good description of how to start processing data in parallel using queuing and process partitioning. 

  • Intel® 64 and IA-32 Architectures Optimization Reference Manual. Just in case you needed that extra performance boost.

  • Kurt Monash introduces a possibly interesting database, GenieDB: The heart of the GenieDB story is probably wide-area replication.

  • There's more to graph processing than just a database says Marko Rodriguez in this informative video: Graph Systems and Databases.

  • Ksankar surveys some BigData predictions for 2013: Big Data Borgs, Rise of the Big Data Machines & Revenge of the Fallen Algorithms: learning machines, ARM based servers, “… for every 25% increase in functionality of a system, there is 100% increase in complexity”,  Big Data = all data, Big Data = [Business + Technology] [Requirements + Users], Real-Time architectures will be prominent, “… big data is about sensing, algorithmic discovery and gaining deeper insight through data”.