Stuff The Internet Says On Scalability For August 8th, 2014

Hey, it's HighScalability time:


Physicists reveal the scaling behaviour of exotic giant molecules.

  • 5 billion: Transistors Intel manufactures each second; 396M: WeChat active users
  • Quotable Quotes:
    • @BenedictEvans: Every hour or so, Apple ships phones with something around 2x more transistors than were in all the PCs on earth in 1995.
    • @robgomes: New client. Had one of their employees tune an ORM-generated query. Reduced CPU by 99.999%, IO by 99.996%.  Server now idle.
    • @pbailis: As a hardware-oriented systems builder, I'd pay attention to, say, ~100 ns RTTs via on-chip photonic interconnects
    • @CompSciFact: "Fancy algorithms are buggier than simple ones, and they're much harder to implement." -- Rob Pike's rule No. 4
    • @LusciousPear: I'm probably doing in Google what would have taken 5-8 engineers on AWS.
    • C. Michael Holloway, NASA: To a first approximation, we can say that accidents are almost always the result of incorrect estimates of the likelihood of one or more things.
    • Stephen O'Grady: More specific to containers specifically, however, is the steady erosion in the importance of the operating system. 

  • Wait, I thought mobile meant making single purpose apps? Mobile meant tearing down the portal cathedrals built by giants of the past. Then Why aren’t App Constellations working?: The App Constellation strategy works when you have a core resource which can be shared across multiple apps. 

  • Decentralization: I Want to Believe. The irony is mobile loves centralization, not p2p. Mobile IP addresses change all the time and you can't run a server on a phone. The assumption that people want decentralization has been disproven. Centralized services have won. People just want a service that works. The implementation doesn't matter that much. Good discussion on HackerNews and on Reddit.

  • Myth: It takes less money to start a startup these days. Sort of.  Why the Structural Changes to the VC Industry Matter: It turns out that, while it is in fact cheaper to get started and enter the market, it also requires more money for the breakout companies to win the market. Ultimately, today’s winners have a chance to be a lot bigger. But winning requires more money for geographic expansion, full-stack support of multiple new disciplines, and product expansion. And these companies have to do all of this while staying private for a much longer period of time; the median for money raised by companies prior to IPO has doubled in the past five years. 

  • Multipetabyte datasets getting you down? Then you may be asking What is Google Cloud Dataflow?: Cloud Dataflow allows you to build pipelines, monitor their execution, and transform & analyse data, all in the cloud. It’s multifunctional. It aims to address the performance issues of MapReduce when building pipelines. It’s good with big data. The coding model is pretty straightforward.  It “evolved” from Flume and Millwheel. 

  • This is also the Twitter problem, draining the energy from your developer ecosystem. Amazon’s Scorpion Problem: We think it's a very dangerous move on their part. Instead of encouraging innovation on their platform, they are actively discouraging it. Do you think we intend to build anything else on AWS? Of course not. Anything we do on AWS just gives them more information. In fact, we are moving off of AWS. EnduroSync was using DynamoDb as a core component, but no more. 

  • The Network is Reliable. Or is it? This article in the ACM contains an impressive list of things that have gone wrong in places like Google, Microsoft, Github, Amazon, HP, Fog Creek, Twilio, PagerDuty, and Amazon. It's a rogues list of split brains, routing loops, double failures, DDoS attacks, hardware failures, maintenance failures, power failures, and misconfigurations. Be careful out there.

  • Since Google is eschewing content quality based ranking, preferring lower latency and SSL only sites, Ilya Grigorik recommends a very sensible course of action: improve your TLS performance. And here's just the chapter to help: Chapter 4. Transport Layer Security (TLS). Great explanation of TLS and related complexities along with many performance improvement suggestions.

  • Parse is expanding its backend services (cloud storage, authentication, social network integration, push services, analytics) to the server side of the world, which could make it easier to build server side code. But they are keeping the same resource limits, which doesn't make sense. Parse PHP SDK.

  • Is the glass half full or half empty? Or is there a glass at all? Parallelism and concurrency need different tools: I believe that concurrency and parallelism call for very different tools, and each tool can be really good at either one or the other.

  • Deploying Docker containers with Mesos and Marathon: See how they help deploy and manage Docker containers at scale and how the Mesos cluster scheduler builds highly-available, fault-tolerant web scale apps.

  • Taking over low level bodily functions is a real step towards conscious control of the whole body. Imagine writing apps for this. Wirelessly Charged Microchip Opens Doors into 'Electroceutical’ Devices': We envision that the powering method could pave the way for new generations of sensors and stimulators that can electrically treat some disorders in ways more effective than drugs.

  • Building Carousel, Part II: Speeding Up the Data Model. More nitty gritty details on how Dropbox's architecture: This accumulator and snapshot design works best because we can hold a user’s view model with metadata for all of their photos in memory. We can’t hold all of the user’s photos in memory at the same time, though, so we have a different solution designed to keep a window of the thumbnails in memory at any given time, fetching them around where the user is looking. 

  • nbm: I'm not sure about how Tumblr might do it, but one can use a combination of ECMP and ipvs (with a consistent hash) to do the lower-layer L4 load balancing. This means that even if one of the L4 load balancers go down and the connections originally going to that L4 load balancer by the switch/router get moved to another one, the consistent hash to the L7 load balancer handling the request means the connections will not be reset (except in some interesting and less-frequent cases).
    This two-stage process also allows for good health checking from the much-simpler Linux ipvs L4 load balancer servers to the more-complicated L7 load balancer servers.

  • This is storage only, without compute servers are still necessary. The Server Needs To Die To Save The Internet: “What we’re building is software that connects together all the computers on the network to form — think of it as one giant computer, or effectively one giant cyber brain. So it really connects together all the nodes on the network and allows them to effectively become a very large datacenter, without of course the datacenter,” explains Lambert. “It’s a network infrastructure that will replace datacenters — and hopefully large technology companies.”

  • Excellent explanation of Prepping for a Tsunami: Scaling with Amazon CloudFront. The setup isn't easy, but "we moved 1400% more data than we ever have before. Thanks to CloudFront, everything was perfectly stable with no external complaints!"

  • Forest Hydraulics: The maple forest becomes a super-organism, as each tree’s internal plumbing is connected to a larger, landscape-scale hydraulic infrastructure.

  • Just because it's open source doesn't mean it can't be sold. Open Source Dickishness on the sale of Express to StrangeLoop. Are the project "active maintainers and its community" really the most important thing? Or is the project the most important thing? Or are the people who create it the most important thing?

  • For those in a mood most literary. Great stuff from Noah Raford. Silicon Howl. A tech take on Alan Ginsberg’s classic poem “Howl“: Singularity! waving! roaring! Climate changing! carrying flowers! Down to the river! into the street! 

  • Reading this gives you a feel for how primitive programming languages are as a medium of expression. We are always making more languages, but we aren't making different languages. The Secret Rules of Adjective Order: the eight categories shimmy into one magistral conga line: general opinion then specific opinion then size then shape then age then color then provenance then material.

  • Use of Formal Methods at Amazon Web Services: At AWS, formal methods have been a big success. They have helped us prevent subtle, serious bugs from reaching production, bugs that we would not have found via any other technique. They have helped us to make aggressive optimizations to complex algorithms without sacrificing quality.

  • Your computer is already a distributed system. Why isn’t your OS?: A modern computer is undeniably a networked system of point-to-point links exchanging messages: Figure 1 shows a 32-core commodity PC server in our lab1. But our argument is more than this: distributed systems (applications, networks, P2P systems) are historically distinguished from centralized ones by three additional challenges: node heterogeneity, dynamic changes due to partial failures and other reconfigurations, and latency. Modern computers exhibit all these features.

  • Greg Linden with another blazingly fine set of Quick Links