Stuff The Internet Says On Scalability For September 6, 2013

Hey, it's HighScalability time:


(Unidentified Ivy Bridge processor using 22 nanometer Tri-Gate transistors)

  • Quotable Quotes:
    • @pbailis: Big ups to AWS folks for following up re: all of my questions on cr1 provisioning. We saw a huge win moving from m1.xl to cr1.8xl
    • @rob_carlson: Packet switching via containers --> almost 8X increase in trade; what will #drones bring? What is optimal mesh size?
    • @mrtazz: “an Open Source, Clojure-based DevOps platform” congratulations, I now have no idea what you’re talking about
    • @KentBeck: If you can't make engineering decisions based on data, then make engineering decisions that result in data.
    • @cassandralondon: Cassandra on AWS SSDs - a perfect fit because you don't get write amplification 

  • If you think about it, a cloud as a rule driven, capability rich environment, accessible over a large surfaced API, plays the same role as physics in biology. Software must specify every little detail. Biology relies on the laws of physics to do the work. A cloud provides the response to the call of programmers. Complex grunt work is just done, as if by nature. Peter M. Hoffmann: The amount of information contained in our DNA is staggering, but it is not nearly enough to specify each molecule’s or cell’s location, or even the shape of an organ. Rather than being a blueprint (as DNA is often mistakenly called), DNA is more like a cooking recipe. When I make a cake, I don’t have to specify where each starch or sugar molecule goes. I just follow the instructions, and the molecules go where they are supposed to. Much of the information to make a cake or a human being is contained in the laws of physics and chemistry. Molecules “know” how to put themselves together.

  • Vector 8: Sascha Segan on the fastest mobile networks. Is your LTE 5x5 or 20x20? Don't know what that means? This is an awesome explanation of how data networks really work. 4g is defined as a consistent connection with 8 mbps down and 4 mbps up. This is the point where a lot of friction goes away. HD video streaming with no buffering. Seamless effortless file transfer. Web pages at speed the speed of your browser, not the speed of the internet.

  • Brandon Burton not only spoke at the Agile 2013 conference he also took the time to create a great write up on some of the talks.  The maturity index of DevOps practices was interesting as was the idea that Scrum without DevOps is like chocolate without peanut butter. And then there's the Art of the Three Ways: Flow, Feedback, Continual Experimentation and Learning. On this path DevOps Nirvana can be found. 

  • FlySwat with a concise summary of Digital Ocean, Amazon Web Services, Windows Azure benchmarked and compared: Digital Ocean sells you a VPS. AWS and Azure sell you a ecosystem. < But it is a good price.

  • Twitter's Streaming MapReduce with Summingbird: a library that lets you write streaming MapReduce programs that look like native Scala or Java collection transformations and execute them on a number of well-known distributed MapReduce platforms like Storm and Scalding.

  • Lots of interesting papers in the VLDB 2013 Program

  • You can now find all dotScale 2013 videos on YouTube.

  • If you are in a distributed frame of mind then Why Cassandra Doesn't Need Vector Clocks is a great discussion. Correctness versus performance tradeoffs made all the more crazy by this stuff being so dang nuanced and complicated. To understand some of the tradeoffs have a listen to A Distributed Systems Podcast on Causality. It's not causality in the metaphysical sense. There's no Aristotle, Locke or Descartes. It's a relatively accessible "discussion of causality, vector clocks, version vectors, and CRDTs."

  • DataFu is an open-source collection of user-defined functions for working with large-scale data in Hadoop and Pig from LinkedIn. LinkedIn also has a lot technical papers you might find interesting.

  • Need a SQL query? Why reinvent it? Here's a huge list of different queries from Get It Done With MySQL 5&6. Nice.

  • We've Been Looking at Ant Intelligence the Wrong Way: Counter-intuitively, years of bottom-up research has revealed that ants do not integrate all this information into a unified representation of the world, a so-called cognitive map. Instead they possess different and distinct modules dedicated to different navigational tasks. These combine to allow navigation. 

  • Harish Ganesan with an excellent series of articles on Load Balancing in Amazon Web Services

  • Looks interesting. MaaS - Metal as a Service: brings the language of the cloud to physical servers. It makes it easy to set up the hardware on which to deploy any service that needs to scale up and down dynamically; a cloud being just one example.

  • Low-Latency Multi-Datacenter Databases using Replicated Commits: We show in this paper that it is possible to provide the same ACID transactional guarantees for multi-datacenter databases with fewer crossdatacenter communication trips, compared to replicated logging, by using a more efficient architecutre. Instead of replicating the transactional log, we replicate the commit operation itself, by running Two-Phase Commit multiple times in different datacenters, and use Paxos to reach consensus among datacenters as to whether the transaction should commit.

  • Timescales, Symmetry, and Uncertainty Reduction in the Origins of Hierarchy in Biological Systems: We suggest that a primary driver of evolutionary change is the reduction of environmental uncertainty through the construction of dynamical processes with a range of characteristic time constants, or nested slow variables. Slow variables arise from mechanisms that naturally integrate over fast, microscopic dynamics. Slow variables, then, can be thought of as coarse-grained variables encoding statistics that are informative about the state of the system.

  • CPU Sharing Techniques for Performance Isolation in Multi-tenant Relational Database-as-a-Service: In this paper, we focus on the problem of effectively sharing and isolating CPU among co-located tenants in a multi-tenant DaaS. We show that traditional CPU sharing abstractions and algorithms are inadequate to support several key new requirements that arise in DaaS.