Stuff The Internet Says On Scalability For April 8th, 2016

Hey, it's HighScalability time:


Time for a little drone envy. Sea Hunter, 132 foot autonomous surface vessel.If you like this sort of Stuff then please consider offering your support on Patreon.

  • 12,000: base pairs in the largest biological circuit ever built; 3x: places GitHub data is now stored; 3.5x: Slacks daily user growth this year; 56 million: events/sec processed through BigTable; 100 Billion: requests per day served by Google App Engine

  • Quotable Quotes:
    • Horst724: #PanamaPapers is the biggest secret data leak in history. It involves 2,6 TB of data, a total of 11.5 million documents that have been leaked by an anonymous insider.
    • Amazon cloud has 1 million users and is near $10 billion in annual sales: Today, AWS offers more than 70 services for compute, storage, databases, analytics, mobile, Internet of Things, and enterprise applications. We also offer 33 Availability Zones across 12 geographic regions worldwide, with another five regions and 11 Availability Zones.
    • @CodeWisdom: "Give someone a program, you frustrate them for a day; teach them how to program, you frustrate them for a lifetime." - David Leinweber
    • @peterseibel: OH: it is amazing how many people reach for some complex distributed system when really all they need is a PC with 256 gigs of RAM in it.
    • @dschobel: once you realize that 1TB of ram costs ~$10k it changes your calculus for going distributed. I mean hopefully it does :)
    • @channingwalton: “The grid takes 8 hours so I’ll run it on my dev box, it’ll take 20 mins”, OH’d at a large bank
    • @noahsussman: First attempt at showing that CPU usage statistics of Web servers exhibit a 1/f spectral density. #devops #testing
    • @BenedictEvans: Tech spirals: Open/closed Client/server Search/curation Messaging/apps Document/service Bundle/unbundle Special/general purpose FB/Myspace
    • @Carnage4Life: Insider states Nest falling apart from constant death marches, no new products and missed revenue numbers. 😢
    • Catherine (Cat): Right then and there, I said loud, confidently, and clearly so everyone in the room could hear me, “I DON’T UNDERSTAND.”
    • @mathiasverraes: Reductionist models of Complex Adaptive Systems are usually more appealing & seductive than acknowledging complexity.
    • @grayj_: Elixir vs. Go: the initial learning curve with Go is amazing, the number of things I miss when not using Elixir is amazing.
    • Broad Institute: Our DNA sequencers produce more than 20 Terabytes (TB) of genomic data per day, and they run 365 days a year.
    • Storage Mojo: The IOPS illusion has replaced the capacity illusion as the major impediment to understanding today’s I/O requirements. Latency, not IOPS, is the gating factor in storage performance today.
    • Ario Gilbert: This move [bricking Revolv] by Google opens up an entire host of concerns about other Google hardware.
    • Erik Darling: Every time I think of a place where someone could stick a scalar function into some SQL, it ends up killing parallelism. Now it’s just sad.
    • The Codist: I chose programmer because it was easier. Today I now realize how wrong I was despite all the great stuff I’ve been able to work on and ship over the past 20 years. Going towards the CTO/CIO/VP Engineering route, which was fairly new back then, would have been a much better plan.
    • dforrestwilson1: rotating green units in every 9-12 months and veteran units out is a recipe for failure. They have to relearn everything and rebuild any trust you build up with local leaders. The closest we got to that was 2 year deployments of reservists, which was effective in Iraq.
    • @drew_firment: #serverless … no servers = joy / testing = pain / logging = different / 3rd party APIs = be smart / scaling = dream

  • A good way to put it. Than Man [Nvidia] Selling Shovels in the Machine-Learning Gold Rush. Nvidia has produced a $2 Billion Dollar Chip to Accelerate Artificial Intelligence. It’s called the Tesla P100 and if you put $1000 down now you can buy it in a few years. Not really. From graphics processing to AI processing seems like an excellent product extension into the next big thing. Size matters in neural networks, it could support networks that are 30x larger because it’s a beast: 15 billion transistors, “roughly three times as many as Nvidia’s previous chips...an artificial neural network powered by the new chip could learn from incoming data 12 times as fast as was possible using Nvidia's previous best chip.”

  • Couldn’t agree more. Why I love ugly, messy interfaces. Long live UIs that actually let you do something.

  • Is cloud or on-premise cheaper? It turns on if you optimize your architecture to work in sympathy with the cloud. The Broad Institute: the cost of running the Genome Analysis Toolkit (GATK) best practices pipeline on a 30X-coverage whole genome was roughly the same as the cost of our on-premise infrastructure. Over a period of a few months, however, we developed techniques that allowed us to really reduce costs: We learned how to parallelize the computationally intensive steps like aligning DNA sequences against a reference genome. We also optimized for GCP’s infrastructure to lower costs by using features such as Preemptible VMs. After doing these optimizations, our production whole genome pipeline was about 20% the cost of where we were when we started, saving our researchers millions of dollars, all while reducing processing turnaround time eight-fold.

  • Sharpen up your neural nets, there's a new training dataset for deep learing on images: YFCC100M (Yahoo Flickr Creative Commons 100 Million Dataset). For the value of challenge datasets in advancing the state of the art see Jeff Dean On Large-Scale Deep Learning At Google.  Great indepth article in the Communications of the ACM: YFCC100M: The New Data in Multimedia Research: There are 68,552,616 photos and 418,507 videos in the dataset users have annotated with tags, or keywords...There are 48,366,323 photos and 103,506 videos in the dataset that have been annotated with a geographic coordinate... The YFCC100M dataset includes a diverse collection of complex real-world scenes.

  • Apple Introduces Their Answer To The Raspberry Pi | Hackaday. I totally fell for this. It had a whimsy and spontaneity. April fools.

  • Software Guard Extensions. While not Homomorphic encryption, this is an interesting new security technology talked about by Justine Sherry in Packet Pushers episode Why We’re Stuck With Middleboxes And How To Improve Them. The idea: everything in memory is encrypted. The processor decrypts data before it operates on it and as soon as it wants to put anything back into cache or memory it encrypts it again. So even if someone has physical access to the box a lot of fancy attacks won't work. The only time data is decrypts is while it's being processed by the CPU, so you have to crack the CPU to see registers. Breaking into the CPU is harder and all you get are the value of registers.

  • We have a winner: Open Source is losing, SaaS is leading, APIs will win…. Not sure about this. Unless APIs are yoked to a business model they are a source of technical debt and financial debt from the moment they start being used. If anyone trusts an API right now I'd be surprised. Creative destruction takes a hard toll on APIs.

  • Debugging Distributed Systems. Excellent overview of the all challenges of debugging distributed systems. Problems like heterogeneity, concurrency, distributed state, and partial failures.  Attempts at solutions like testing, model checking, theorem proving, record and replay, tracing, log analysis, and visualization.  What really shines is ShiViz, a tool for studying executions of distributed systems, which " displays the happens-before relation. Given event e at node n, the happens-before relation indicates all the events that logically precede e." The time-space diagram is a huge step from multiple terminals running grep on logs. It also has a diff feature for comparing program runs. That would be handy when you are wondering what the heck is different this time.

  • How much are you paying your cloud provider for bandwidth today? IP transit pricing has plummeted from $1,200 per Mbps in 1998 to $.63 per Mbps in 2015. What are the historical transit pricing trends?  While internet traffic typically increases by more than 50% per year.

  • Big-O notation is not the whole story. Implementation matters. As shown in How to Make C++ Code Run 90x Faster. The battle is between std::list vs std::vector, both O(n*n), yet the pointer chasing list version is much slower. The cause: the way modern computers cache memory. There is every indication that this difference will only grow over time, if computers keep evolving the same way over time!

  • How are Google's datacenters protected? Google shares data center security and design best practices. It's a layered approach. Lots of guards, alarms, chains of custody, etc. Google runs them, not third parties. No mention of Cerberus, which is surprising, a good dog is the best protector. What's different than in the past, not surprisingly, is the use of machine learning: Over the past couple years we've developed this algorithm and trained it with billions of data points from our sites all over the world. We now use this machine learning model to help visualize the data so the operations teams can set up the data center electrical and cooling plants for the optimal, most efficient performance on any given day considering up to 19 independent variables that affect performance. This helps the team identify discontinuities or efficiency inflection points that aren't intuitive.

  • Resilient ad serving at Twitter-scale: The technique we’ve outlined uses concepts from control theory to craft a control variable called quality factor, which is then used by the ad server in achieving the stated goals around resiliency (availability), scalability, resource-utilization, and revenue-optimality.

  • No flying cars and Brian Hayes asks Where’s My Petabyte Disk Drive?: Extrapolating the steep trend line of the past five years predicts a thousandfold increase in capacity by about 2012; in other words, today’s 120-gigabyte drive becomes a 120-terabyte unit...None of that has happened. The biggest drives in the consumer marketplace hold 2, 4, or 6 terabytes.

  • A programming language for living cells. Very cool of course, but the interesting thing is they didn't invent a new programming language as you might expect. They based their language on Verilog, "hardware description language (HDL) used to model electronic systems." It's what hard core chip designers use. The researchers designed computing elements such as logic gates and sensors that can be encoded in a bacterial cell's DNA. What accounts for the unreasonableness of biology and computing have so much in common?

  • How is Improving Video Playback on Android. The optimization they wanted was to stream video as the video file is being downloaded. Their solution: "have the MediaPlayer interact with a local proxy server instead, and have the proxy server serve the byte stream to the media player while also storing it into the disk cache." All was not smooth sailing and they explore a few unexpected problems, like artificial one second delays in older players. Their overall strategy is to: ship the simple thing first, make stable, then make data driven improvements. 

  • Whats App added Axolotl encryption in WhatsApp for iOS. If you are curious what that means Christine Corbett Moran wrote a colorful Illustrated Primer on the Axolotl Protocol.

  • Are you original? The surprising habits of original thinkers. Seems appropriate for an industry that prides itself on continual disruption. The themes are creative procrastination, energizing doubt, motivating fear, and trying a lot because that increases your chances of producing something great. Some highlights: Procrastinating is a vice when it comes to productivity, but it can be a virtue for creativity. What you see with a lot of great originals is that they are quick to start but they're slow to finish...the first-mover advantage is mostly a myth...there are two different kinds of doubt. There's self-doubt and idea doubt. Self-doubt is paralyzing...idea doubt is energizing. It motivates you to test, to experiment, to refine...What about fear? Originals feel fear, too. They're afraid of failing, but what sets them apart from the rest of us is that they're even more afraid of failing to try...the greatest originals are the ones who fail the most, because they're the ones who try the most...One of the best predictors is the sheer volume of compositions that they generate.

  • FPGA PRET Accelerators of Deep Learning Classifiers for Autonomous Vehicles. Interesting work on getting deep learning networks onto FPGAs. FPGAs are the target because they are predictable and timing friendly. Goal is to preemptively bridge the gap between computation and control in order to solve problems like those that occurred in the F16. The digital design of the flight control system introduced randomness.  The system became untestable. When you push performance you are bound to run into these problems. Neural networks are the surface layer of computation process and along the stack there's no notion of real-time. That's the tricky part of designing high-performance control. To control something we need a notion of time. 

  • We have a Greg sighting. Quick Links.

  • Today's hit of well earned wisdom. 12 Years, 12 Lessons Working at Thoughtworks: Tools don’t replace thinking; Agile “transformations” rarely work unless the management group understand its values;  Safety is required for learning; Everyone can be a leader; Architects make the best decisions when they code; Courage is required for change; Congruence is essential for building trust; Successful pair programming correlates with good collaboration; Multi model thinking leads to more powerful outcomes; Appreciate that everyone has different strengths; Learning is a lifelong skill; Happiness occurs through positive impact.

  • If you want to know how Google keeps their hugeness up and running there’s a new book on that: Site Reliability Engineering How Google Runs Production Systems. For an introduction see Lessons from a Google App Engine SRE on how to serve over 100 billion requests per day.

  • Moneyball is one of those ideas that appeals to geeks. Using data to find value overlooked by others. But it really hasn’t produced winners in Baseball or pharma. Why Moneyball Failed in the Pharmaceutical Industry: Pearson’s approach wasn’t as sound as Beane’s—there is little evidence to show that slashing R. & D. and jacking up prices maximizes profits in the long term.

  • Wonderful article, both broad and detailed, on the history of Power Water Networks: During the second half of the nineteenth century, water motors were widely used in Europe and America. These small water turbines were connected to the tap and could power any machine that is now driven by electricity. Great site in general.

  • AWS Lambda: a few years of advancement and we are back to stored procedures. Not really. Stored procedures run in the context of the database, usually clogging up the database so it can’t get any other work done. Lambda doesn’t do that. It uses services which are by definition supposed to be scalable. It can’t choke off anything.

  • Let he who is without sin cast the first electronically signed binary. Code Let Lottery Vendor Predict Winning Numbers: A forensic examination found that the generator had code that was installed after the machine had been audited by a security firm that directed the generator not to produce random numbers on three particular days of the year if two other conditions were met.

  • Army unveils far-reaching network strategy: The document focuses on five main focus areas, including dynamic transport, computing and edge sensors, data to decisive action, human cognitive enhancement, robotics and autonomous operations and cybersecurity and resiliency.

  • eBay on Running Arbitrary DAG-based Workflows in the Cloud. The problem is how to execute thousands of arbitrary workflows concurrently in the cloud? The solution is nicely described and consists of an Input Queue Manager, Workflow Manager, Task Scheduler, Load Balancer, and Node Executor. Execution graphs are defined using GraphML.

  • How do you get notified when a rack is about to overheat and potentially to fail? One way is with Complex event processing (CEP) continuously matches incoming events against a pattern. Here’s an explanation and code example: Introducing Complex Event Processing (CEP) with Apache Flink.

  • Why we [King] chose Akka for our Cloud Device solution: Quasar are Akka are very close in our evaluation. However, since Akka is more mature, it broke the deadlock, and we decided to choose Akka.

  • Elassandra: cassandra + elasticsearch. A fork of Elasticsearch modified to run on top of Apache Cassandra in a scalable and resilient peer-to-peer architecture. Elasticsearch code is embedded in Cassanda nodes providing advanced search features on Cassandra tables and Cassandra serve as an Elasticsearch data and configuration store.

  • Tikv: a Distributed Key-Value Database which mainly refers to the design of Google Spanner and HBase, but much simpler (Don't depend on any distributed file system). We've implemented the Raft consensus algorithm in Rust and stored consensus state in RocksDB.

  • A DNA-Based Archival Storage System: Using DNA to archive data is an attractive possibility because it is extremely dense, with a raw limit of 1 exabyte per cubic millimeter, and long-lasting, with observed half-life of over 500 years. This paper presents an architecture for a DNA-backed archival storage system. It is structured as a key-value store, and leverages common biochemical techniques to provide random access. We also propose a new encoding scheme that offers controllable redundancy, trading off reliability for density.

  • Consensus in the Cloud: Paxos Systems Demystified: We felt we had to write this paper because we have seen misuses/abuses of Paxos-based coordination services. Glad this is off our chests.