Stuff The Internet Says On Scalability For February 27th, 2015

Hey, it's HighScalability time:


Hear ye puny mortal. 1.3 million Earths doth fill our Sun. Whence comes this monster black hole with a mass 12 billion times that of the Sun?

  • 1 Terabit of Data per Second: 5G; 1.9 Terabytes:  customer in stadium data usage during the Super Bowl; 1 TB: free each month on Big Query; 100x: reduced power consumption in radio chip

  • Quotable Quotes:
    • Robin Harris: But now that non-volatile memory technology - flash today, plus RRAM tomorrow - has been widely accepted, it is time to build systems that use flash directly instead of through our antique storage stacks. 
    • Sundar Pichai: That’s the essence of what we [Google] get excited about – working on problems for people at scale, which make a big difference in [people’s] lives. 
    • @timoreilly: Facebook is hacked 600,000 times a day. @futurecrimes First thing to do to protect yourself, turn on 2-factor authentication
    • @architectclippy: I see you have a poorly structured monolith. Would you like me to convert it into a poorly structured set of microservices?
    • Poppy Crum: Your brain wants as much as possible to come up with a robust actionable perception of the world and of the information and data that is coming in.
    • @BenedictEvans: Both Google and Facebook killing XMPP.  IM being euthanized just at the time messaging could become a third run-time for the internet
    • @dhh: 4-tier / micro-service architectures are organizational scaling patterns far more than they're tech. 1st rule of distributed systems: Don't.
    • @amcafee: Ex-Etsy seller: "In practical terms, scaling the handmade economy is an impossibility."
    • kurin: If you're behind a LB you can just drain the traffic to the hosts you're about to upgrade. Also, if you're above your SLA... I mean, some dropped queries aren't the end of the world.
    • @WSJ: Facebook’s 5,000+ staff generate $1.36 million each in annual revenue. The key to productivity is custom-built software tools
    • @jaykreps: Software is mostly human capital (in people's heads): losing the team is usually worse than losing the code.
    • Dylan Tweney: Mobile growth is huge, and could surge at least 3x in the next two years
    • Joe Davison: I learned that there is often more to business than meets the eye, and the only way to succeed is to plan ahead and anticipate all contingencies.
    • @etherealmind: Google published 30000 configuration changes to its network in 1 month 

  • What's different about AI this time around? Less hype, more data, more computation. The Believers: It was a stunning result. These neural nets were little different from what existed in the 1980s. This was simple supervised learning. It didn’t even require Hinton’s 2006 breakthrough. It just turned out that no other algorithm scaled up like these nets. "Retrospectively, it was a just a question of the amount of data and the amount of computations," Hinton says.

  • What lesson did Ozgun Erdogan learn while working on a database at Amazon that never saw the light of day? How to Build Your Distributed Database (1/2): This optimized plan has many computations pushed down in the query tree, and only collects a small amount of data. This enables scalability. Much more importantly, this logical plan formalizes how relational algebra operators scale in distributed systems, and why. That's one key takeaway I had from building a distributed database before. In the land of distributed systems, commutativity is king. Model your queries with respect to the king, and they will scale.

  • Replication for resiliency? Nature thought of that. Nibbled? No Problem: Champaign first observed in the 1980s, some plants respond by making more seeds, ultimately benefiting from injury in a phenomenon called overcompensation. More recently, Paige and postdoc Daniel Scholes suspected a role for endoreduplication, in which a cell makes extra copies of its genome without dividing, multiplying its number of chromosome sets, or “ploidy.”

  • Jim Starkey's NoSQL low-down: NoSQL requires all the intelligence to be on the client side, and that's the wrong place to put the intelligence. In NoSQL, data views are completely defined by the application program, so no two applications have the same view of the data. Imagine the nightmares that result when applications have different views of the data, and the data becomes inconsistent.

  • Not dead yet. Intel: Moore's Law will continue through 7nm chips.

  • How I used node-cache to speed up external API calls. Nice example of how easy it is to adding caching proxy using off the shelf tools.

  • What does it take to log millions of requests everyday? Kafka, MongoDB, S3, Amazon Redshift, Rabbit MQ, a Pipeline Manager Server, and a Transporter Cluster that is "an array of Non Blocking Thrift servers for the sole purpose of receiving data from the web servers and routing them to any other component."

  • Pinterest is Open-sourcing PINCache. Good explanation of threading issues on iOS, whose fragmented and overly simplistic approach to threading makes for a huge number of possible problems. The problem is the next gen of apps require real-time OS techniques and iOS at its roots is a simple single threaded container.

  • This is no lie, it is A Comprehensive Guide to Building a Scalable Web App on Amazon Web Services - Part 1. Lots of good info.

  • Just what I was wondering. What’s the Significance of Docker’s Swarm Announcement Today?: "So today’s unveiling of Swarm is really about Docker having gotten very serious about how enterprises can deploy and manage containers at massive scale." Also, Deploying the Crate Distributed Database Using Mesos and Marathon.

  • Where is 5G going? Chetan Sharma says some of the performance goals for 5G under discussion are:  < 1 ms latency; 1000 times reduction in power consumption; Very high reliability; Deep indoor coverage; 30x higher device density; 10-100x connected devices; significantly higher security requirements.

  • Setting up service discovery in a simple and scalable way. 10x: Service Discovery at Clay.io. Synapse: is a daemon which dynamically configures a local HAproxy configuration which routes requests to services within your cluster. It watches Amazon EC2 (or another endpoint) for services, specified in a configuration file. We use the ec2tag watcher, which easily lets us add servers to our cluster by tagging them.

  • Veit Heller curates Awesome & Interesting Talks concerning Programming.

  • There are certainly similarities. Containers are the new static binaries. One big difference is static binaries just need a standard OS to run on. What's the standard runtime for containers? Some solid advice: Use descriptive versus script oriented orchestration; Colocate everything that is a dependency; Go Micro-Container; Build everything all the time; Know an application is a graph; Version everything.

  • I hope Kaiser is listening. Internet of DNA: The way the math works out, sharing data no longer looks optional, whether researchers are trying to unravel the causes of common diseases or ultra-rare ones. “There’s going to be an enormous change in how science is done, and it’s only because the signal-to-noise ratio necessitates it,” says Arthur Toga, a researcher who leads a consortium studying the science of Alzheimer’s at the University of Southern California. “You can’t get your result with just 10,000 patients—you are going to need more. Scientists will share now because they have to.”

  • Who will prove the provers? I remember thinking this in a class on proving programs correct. If I can't write perfect code how can I write a perfect proof? Aren't they both just code? Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken: Fast forward to 2006. I was shocked to learn that the binary search program that Bentley proved correct and subsequently tested in Chapter 5 of Programming Pearls contains a bug. 

  • Troy Hunt with Stories from the trenches: Sizing and penny pinching with Azure websites: These results are all clearly just from one sample test, but what I can emphatically say is that time and again, I’d see high latency on response times, connections dropped and even total outage – sometimes for more than a minute – when deploying on a small instance. I’ve never seen loss of service or anything more than a slight and momentary service degradation on a medium instance. Not once.

  • Randy Bias on the meaning of Hyper-converged infrastructure: What is unique to hyper-converged is that a choice was made to drive more cost and labor efficiencies by trying to standardize and drive conformity in the system through extreme homogeneity...The key to understanding the flaws in HCI for larger production deployments is the very fact that the control plane and data plane have been collapsed into a single system. 

  • What do all of those 2-letter abbreviations mean from the top command? Here you go. Understanding Linux CPU stats

  • Core Data Concurrency & Maintaining a Silky Smooth UI. Clever use of Core Data to handle a high messaging load by assigning different functions to different tasks.

  • Get your C++ videos here. All videos of Meeting C++ 2014 are online!

  • It's not really about the technology. Settlement scaling and increasing returns in an ancient society: overall productivity of an individual working alone appears to have been relatively constant. As a result, any changes in per capita economic output over time are most likely traceable to changes in the size and density of social networks in larger settlements. 

  • Some pull quotes from Strataconf are available at Hardcore Data Science: 2015 California

  • RancherOS: The smallest, easiest way to run Docker in production at scale. Everything in RancherOS is a container managed by Docker. This includes system services such as udev and rsyslog.

  • John Carnahan on data science at Ticketmaster. First they used data to come up with specific outcomes. Why do tickets remain unsold at an event? Now they want to use data to unite up and down the full stack and across to different stacks. 

  • Year two of Papers We Love. There's a talk by Sargun Dhillon on VL2, a paper by Microsoft Research about computer networking. We talked about VL2 here. There's also a talk by Alex Rasmussen on Flat Data Storage.

  • Everything you need to know about HdrHistogram: A better latency capture method. Wonderful detail and elaboration. Definitely worth a look.

  • We have yet another way to execute functions across processes. gRPC: a brand new framework for handling remote procedure calls. It’s BSD licensed, based on the recently finalized HTTP/2 standard, and enables easy creation of highly performant, scalable APIs and microservices in many popular programming languages and platforms. Internally at Google, we are starting to use gRPC to expose most of our public services through gRPC endpoints as part of our long term commitment to HTTP/2.

  • Need a CDN? Read how Dan Rayburn breaks down How To Choose The Right CDN For Mid-Sized Customers In The SMB Market: If you’re a small customer and only have a few hundred dollars a month to spend, I would argue it really won’t matter who you pick. For mid-sized customers, picking and choosing the right CDN usually means looking past the big CDNs...For mid-sized customers, align yourself with mid-sized CDNs that specialize in CDN rather than a slew of non-related web services like web design or marketing.

  • Lock-Free Data Structures. The Evolution of a Stack: Back-off strategies are applicable everywhere in the design of lock-free algorithms. As a rule, each operation is enclosed in an endless loop on the principle of “keep doing it till we succeed”...Elimination back-off is a less general approach that is applicable to a stack, and, to a lesser extent, to a queue...Developed in recent years, flat combining is a special trend in building concurrent containers, both lock-free and fine grained lock-based.

  • David Rosenthal with a Report from FAST15. He takes a look at a lot of the different papers presented.

  • Ebay is Announcing Pulsar: Real-time Analytics at Scale:  an open-source, real-time analytics platform and stream processing framework. Pulsar can be used to collect and process user and business events in real time, providing key insights and enabling systems to react to user activities within seconds. 

  • Nice explanation of Causal Consistency with examples. 

  • Quantifying the Performance of Garbage Collection vs. Explicit Memory Management: These results quantify the time-space tradeoff of garbage collection: with five times as much memory, an Appel-style generational collector with a noncopying mature space matches the performance of reachabilitybased explicit memory management. With only three times as much memory, the collector runs on average 17% slower than explicit memory management. However, with only twice as much memory, garbage collection degrades performance by nearly 70%.

  • Slab: Offheap Java POJOs with guaranteed memory alignment.