Stuff The Internet Says On Scalability For August 22nd, 2014

Hey, it's HighScalability time:


Exterminate! 1,024 small, mobile, three-legged machines that can move and communicate using infrared laser beams.

  • 1.6 billion: facts in Google's Knowledge Vault built by bots; 100: lightening strikes every second
  • Quotable Quotes:
    • @stevendborrelli: There's a common feeling here at #MesosCon that we at the beginning of a massive shift in the way we manage infrastructure.
    • @deanwampler: 2000 machine service will see > 10 machine crashes per day. Failure is normal. (Google) #Mesoscon
    • @peakscale: "not everything revolves around docker" /booted from room immediately
    • @deanwampler: Twitter has most of their critical infrastructure on Mesos, O(10^4) machines, O(10^5) tasks, O(10^0) SREs supporting it. #Mesoscon
    • @adrianco: Dig yourself a big data hole, then drown in your data lake...
    • bbulkow:  I saw huge Go uptake at OSCON. I met one guy doing log processing easily at 1M records per minute on a single amazon instance, and knew it would scale.
    • @julian_dunn: clearly, running Netflix on a mainframe would have avoided this problem

  • Programming is the new way in an old tradition of using new ideas to explain old mysteries. Take the new Theory of Everything, doesn't it sound a lot like OO programming?: According to constructor theory, the most fundamental components of reality are entities—“constructors”—that perform particular tasks, accompanied by a set of laws that define which tasks are actually possible for a constructor to carry out. Then there's Our Mathematical Universe, which posits that the attributes of objects are the objects: all physical properties of an electron, say, can be described mathematically; therefore, to him, an electron is itself a mathematical structure. Any data modeler knows how faulty is this conceit. We only model our view relative to a problem, not universally. Modelers also have another intuition, that all attributes arise out of relationships between entities and that entities may themselves not have attributes. So maybe physics and programming have something to do with each other after all?

  • Love this. Multi-Datacenter Cassandra on 32 Raspberry Pi’s. Over the top lobby theatrics is a signature Silicon Valley move.

  • Computation is all around us. Jellyfish Use Novel Search Strategy: instead of using a consistent Lévy walk approach, barrel jellyfish also employ a bouncing technique to locate prey. These large jellies ride the currents to a new depth in search of food. If a meal is not located in the new location, the creature rides the currents back to its original location. 

  • Two months early. 300k under budget. Building a custom CMS using a Javascript based Single Page App (SPA), a Clojure back end, a set of small Clojure based micro services sitting on top of MongoDB, hosted in Rackspace.

  • While Twitter may not fight against the impersonation of certain Journalism professors, it does fight spam with a large sword. Here's how that sword of righteousness was forged: Fighting spam with BotMaker. The main challenge: applying rules defined using their own rule language with a low latency. Spam is detected in three stages: real-time, before the tweet enters the system; near real-time, on the write path; periodic, in the background. The result: a 40% reduction in spam and faster response time to new spam attacks.

  • An architecture of small apps. A PHP/Symfony CMS called Megatron takes 10 seconds to render a page. Pervasive slowness leads to constant problems with cache clearing, timeouts, server spin ups and downs, cache warmup. What to do? As an answer an internal Yammer conversation on different options is shared. The major issue is dumping their CMS for a microservices based approach. Interesting discussion that covers a lot of ground.

  • Microservices and PaaS - Part II. John Wetherill with a good summary of a talk by Adrian Cockcroft: Break Things Deliberately, No Manual Anything, Respect Human Attention Span, Denormalize like Crazy, Polyglot Persistence, Avoid Trunk Conflicts, One Service, One Manifest, Contain Everything, No State, Don't Name your Chickens, Create and Curate Access Libraries, Optimize the Interaction, Release the Monkeys. < So how is data kept consistent between services given denormalization?

  • Nice overview of what happened to Go at OSCON. Lots, it seems.

  • Lovingly detailed and quite useful. Atomic operations and contention: let’s talk about some of the primitives necessary to build useful systems on top of a coherent cache, and how they work. The way to get scalable multi-processor code is to avoid contention as much as possible, and to make whatever contention remains pass quickly – in that order. And to do a decent job at this, it’s important to know how cache coherency works (in broad strokes, anyway), what kind of messages cores exchange to maintain memory coherency, and when that coherency traffic happens. <  Even more goodness at Optimizing Software Occlusion Culling – index

  • Who doesn't love a good beer and architecture story? How One Developer Serves Millions of Beers: Untappd + Iron.io. Shows the power of queues + asynchronicity in giving the mobile user fast response times: Untappd has been able to reduce the time to average check-in time from over 7 seconds to 500ms.

  • Ricky Ho explains the Lambda Architecture Principles in a very accessible fashion, "focus[sing] more in the underlying design principles and less in the choice of implementation technologies."

  • Reactive Frameworks, Microservices, Docker and Other Necessities for Scalable Cloud Native Applications. Good overview of cloud native: dynamic resource scaling; minimum assumption planning; economics driven adaptability; resiliency; incremental deployment; completely testability. Related technologies: Rx, Typesafe, NetlixOSS, Docker, CoreOS, Microservices architecture.

  • Pinterest on Stealthy shipping with atomic deploys: When a Pinner first visits the site, the backend server instructs the web browser to load a particular JavaScript bundle, and ensures the bundle matches the version of the backend software running on that server. 

  • Tom Livesey with Lessons learnt working with microservices: Don't do too much; Don't try and be too clever, keep things simple; Publish everything; Segregate and maintain data;  Standardise everything;  Maintain contracts; Cache everything. Benefits: building and deploying services is extremely quick and easy; we can easily use the best tool for the job or try out new technologies with minimal impact on the rest of the system; great for introducing members to the team.

  • Pinterest on Rebuilding the user typeahead. HBase was chosen as the storage solution for their new Contacts Service. They had to solve the typical problem that your average user has less than two hundred contacts, but some users have millions. The normal case is handled using a Wide Schema. The outlier case is handled using a Tall Schema. Updates are updated in near real-time. The results: 25ms or less. 

  • Scale Computing: infrastructure made simple: Imagine infrastructure that comes in a box with no costly VMware licenses, great support and good scalability. That is the idea behind Scale Computing. As one customer put it: “It’s not VMware – it’s better.”

  • Optimising the Unikernel. Thomas Leonard with a great step by step tour of low level performance profiling on Mirage. Don't use a debug build. It looks like profiling OCaml is a tricky prospect, but he did some magic a found a large data structure was serviced too often. Sometimes Xen is just slower than Linux. Used TCP checksum offload, HTTP fixed encoding, keeping multiple disk reads in flight and using optimal buffer sizes. Increased the download speed of a test service running on an ARM dev board from 2.46 MB/s to 7.24 MB/s.

  • Is it the Y2TCAM bug? Mike Palladino, Internap Director of Network Architecture, tells us: Over the next few years, millions of chassis will hit their physical limits. But you can’t just upgrade the supervisor module to the latest and greatest to get a few more years of runway — the entire chassis has to be replaced. 

  • PoolCounter:  a network daemon which provides mutex-like functionality, with a limited wait queue length. If too many servers try to do the same thing at the same time, the wait queue overflows and some configurable action will be taken by subsequent clients, such as displaying an error message or using a stale cache entry.

  • Data Center Cooling Done Differently: Deep Water Desal proposes to mitigate the power consumption of desalination in a very creative way. Rather than reduce the power required to desalinate water, they proposed to co-locate up to 150MW of data center facilities on site and reduce the power required to cool the data center. Essentially the desalination plant and data centers would be symbiotic and the overall power consumption of the combination of the two plants together would be lower.

  • Singularity: a platform that enables deploying and running services and scheduled jobs in cloud infrastructures, providing efficient management of the underlying processes life cycle and effective use of the cluster resources. Singularity is an Apache Mesos framework. It runs as a task scheduler on top of Mesos Clusters taking advantage of Apache Mesos scalability, fault-tolerance, and resource isolation.

  • Using Reasoning about Knowledge to Analyze Distributed Systems by Joseph Halpern.  In summary, the recent progress in the field gives us reason to hope that reasoning about knowledge will prove to be an extremely useful tool in designing, analyzing, and understanding distributed systems. However, as the list above [and the list of problems given in Halpern (1986) in the general area of reasoning about knowledge] shows, there is much more work to be done. < More here.

  • Abstraction without regret in data management systems: This paper has argued for us to work at a greater level of abstraction, both in our research and in the systems that we build. There is a need to revisit data management re- search with an aim to identify fundamental principles and patterns. I have illustrated that this is a low hanging fruit even in the space of exploiting data locality.

  • Language Support for Loosely Consistent Distributed Programming by Neil Conway: This thesis explores how to aid developers of loosely consistent applications by providing programming language support for the diculties they face. The language level is a natural place to tackle this problem: because developers that use loose consistency have fewer system facilities that they can depend on, consistency concerns are naturally pushed into application logic. In part, our goal has been to recognize, formalize, and automate application-level consistency patterns.