hot links

Stuff The Internet Says On Scalability For March 4th, 2016

High Scalability

04 Mar 2016 — 11 min read

Presented for your consideration: Drone Units of the U.S Armed Forces If you like this sort of Stuff then please consider offering your support on Patreon.

16 terabytes: new Samsung SSD; 1%: earned income from an on-demand platform; $35: PI 3 has 1.2GHz 64-bit quad-core ARM and WiFi; 1.5 million messages per second: Netflix cache replication;

Quotable Quotes:
- @jzawodn: all right.. everything on one disk in one computer: 15TB SSD
- @jaykreps: The disadvantage is that the needs of most companies are really different from Google's. Depth vs breadth thing.
- Eliezer Sternberg: The brain tries to maximize the efficiency of our thinking by recognizing familiar patterns and anticipating them.
- david-given: I would love to have a modernised Ada. With case sensitivity. And garbage collection (a lot of the language semantics are obviously intended to be based around having a garbage collector.
- @tyler_treat: You're not even building microservices if you have things operating in lockstep and tightly coupled interactions and data models.
- cognitive electronic warfare: using artificial intelligence to learn in real-time what the adversaries’ radar is doing and then on-the-fly create a new jamming profile. That whole process of sensing, learning and adapting is going on continually
- @WhatTheFFacts: Cleopatra lived closer to the invention of the iPhone than she did to the building of the Great Pyramid.
- @mjpt777: "I think the net contribution of RPC to human welfare is negative. It was a disaster." - Butler Lampson
- @just_security: Comey[FBI]: until these devices[smart phones], there was no closet, no room, no basement in America where we couldn't get in.
- @traviskorte: The people who give algorithms credit for "creating" DeepDream art are the same ones who say predictive scoring is just a neutral tool. Hmm.
- Emin Gün Sirer: Bitcoin provides an incredibly strong consistency guarantee, far stronger than eventual consistency. Specifically, it guarantees serializability, with a probability that is exponentially decreasing with latency.
- The best thing about working at Facebook: But what makes Facebook a unique place to work isn't its vibrant campuses or cushy salaries. It's the sheer, insane scale of how many people use its product around the world.
- TradersBit: I have found that maybe 80% of everything I am developing/have developed for TradersBit could soon run on Lambda.
- @asymco: There were over 1,800 automobile manufacturers in the United States from 1896 to 1930
- Rob Harrop: it’s better to preserve good service for a smaller number of customers rather than give bad service to all customers, which is what will happen as latency starts to degenerate under heavy load if your queue isn’t bounded.
- @jaykreps: Microservices are about scaling the number of engineers not the number of requests
- mbrock: The ideal is low coupling and high cohesion. That's supposed to mean your system is composed of parts that can be understood separately. Low coupling means that the innards of each module are isolated from the others. High cohesion means that each module presents a clear and distinct purpose.
- js8: What seems to be the main contention here - should the interface just use the names (akin to philosophical nominalism) and leave them open to interpretation or should it somehow encode the properties of things it describes (akin to philosophical realism)?
- Ross Williamson: if you’re working on a new product, try to do less. More and more features aren’t going to drive user adoption. It’s better to focus on a niche, and give those users exactly what they want.
- overenginered: In a sense, working with AWS and Azure has given me a very clear view on how exactly design decisions cost real money. Once you get a lot of traffic, each instance needed to balance the load is costing a non trivial amount of money. For that I'm grateful, because I can now see the need and the benefits of optimizing code and taking basic hygienic measures.

What has Google learned from creating three container management systems—Borg, Omega, and Kubernetes—in over a decade? The benefits of containerization go beyond merely enabling higher levels of utilization. Containerization transforms the data center from being machine oriented to being application oriented...The design of Kubernetes as a combination of microservices and small control loops is an example of control through choreography—achieving a desired emergent behavior by combining the effects of separate, autonomous entities that collaborate.

I can just imagine the disappointment of AIs as they learn how real people don't live up to their fictional counterparts. Computers read 1.8 billion words of fiction to learn how to anticipate human behaviour. What, you mean great minds don't really go on strike and escape to Atlantis when they get a little butthurt?

This is why human drivers will eventually be made illegal. Google: Self-driving car followed 'the spirit of the road' before accident: The test driver, who had been watching the bus in the mirror, also expected the bus to slow or stop, Google said, "and we can imagine the bus driver assumed we were going to stay put.

At a cost of $1.5 trillion it's nice to learn that the F-35 doesn't completely suck. Here's what I've learned so far dogfighting in the F-35. For a moving example of to counter this fiscal and strategic insanity, Boyd: The Fighter Pilot Who Changed the Art of War is a great read. It contains an illuminating discussion on the OODA loop as well. There seems a natural tendency for large projects to keep expanding in scope until they embrace all features and address no particular mission.

Yah, there may need to be some ease of issues to solve before these go mainstream. The absolute horror of WiFi light switches. It's a true horror story.

Nothing radical or new here, but it's highly instructive to see how it all fits together for a system as large as Netflix. Caching for a Global Netflix: adopted a stateless application server architecture which lets us serve any member request from any region...High-reliability databases and high-performance caches are fundamental to supporting our distributed architecture...Replicating such caches globally helps with the “thundering herd” scenario...Another major use case for caching is to “memoize” data which is expensive to recompute...One non-requirement is strong global consistency...EVCache replicates data both within a region and globally...Clients of EVCache are not aware of other regions or of cross-region replication; reads and writes use only the local, in-region cache instances...The message queue is the cornerstone of the replication system. We use Kafka for this...The 99th percentile of end-to-end replication latency for most of our caches is under one second...Moving into VPC significantly raised some limits, like packets per second, while also giving us access to other enhanced networking capabilities which allow the Relay and Proxy clusters to do more work per instance...Future improvements might involve pipelining replication messages on a single connection for better and more efficient connection use.

Often trends show up first in hiring. Here's a report from trend spotter Sean Hull, who learned this from a recruiter...Why is everyone suddenly talking about Amazon Redshift: “yeah it seems as though suddenly everybody is looking for Redshift & Snowflake”

The end of an era. WhatsApp has long touted their support of low end phones so they could be wherever their customers needed them to be. No more. WhatsApp support for mobile devices. Dumb phones aren't worth supporting as WhatsApp moves to add higher end features. Low end phones could still be a niche for someone.

Scaling Knowledge at Airbnb. Airbnb is on the forefront of a problem that will be a bigger problem for more organizations in the future: "how do we make sure that an insight uncovered by one person effectively transfers beyond the targeted recipient?...As an organization grows, the cost of transmitting knowledge across teams and across time increases." Their solution: build a Knowledge Repo that combines a Git repository with "R Markdowns and iPython notebooks solved the issue of reproducibility by marrying code and results."

Need speed? Preload support in Chrome Canary: it’s a way to tell a browser to start fetching a certain resource, because we as authors (or as server administrators, or as smart-server developers) know that the browser is going to need that particular resource pretty soon.

Here's a detailed post on Stack Overflow's Bosun architecture, which is part of their monitoring infrastructure that handles 3.7 Billion datapoints a day. As you might expect it's a sophisticated setup with the many components: scollector; BosunReporter.NET; OpenTSDB; Opserver; Grafana; TPC; tsdbrelay; HBase; HAProxy; Redis; Elastic. They are thinking of moving to InfluxDB in the future.

If you've been programming for years and don't know what a stack is, here's a quite nice introduction. What is "the stack"?

The Architecture of Open Source Applications has covered a number of new topics: building a graph database, building a web server, writing a program that recognizes handwritten characters, and the Strategy pattern.

If you are in the mood for a good old fashion language war then Go channels are bad and you should feel bad is for you. The idea: The channel API is inconsistent and just cray-cray. The reaction: more balanced than expected. Some good discussion on HackerNews and on reddit. Also, Curious Channels.

Zach Holman with an epic post on How to Deploy Software, gathered primarily from 5 years at GitHub. Some points: tests should run fast; support multiple deployed codepaths at once with feature flags; Deploys become boring, straightforward, and stress-free once you can new code doesn't break anything in production; Roll it out to a small percentage of users first to double-check and triple-check nothing unforeseen is going to break; The ultimate responsibility of the code that gets deployed falls upon the person or people who wrote that code; Start code reviews early and often; Deploy quickly and often; and lots more. BTW, I like spaces, not tabs.

You may want to slow your microservices roll and embrace The Majestic Monolith: the #1 rule of distribute computing: Don’t distribute your computing!...The patterns that make sense for organizations orders of magnitude larger than yours, are often the exact opposite ones that’ll make sense for you...embrace the monolith with pride and a salute! Don’t just accidentally waltz your system into a monolithic design, do so with intent and with your head held high...So what is a majestic monolith exactly? It’s an integrated system that collapses as many unnecessary conceptual models as possible. Eliminates as much needless abstraction as you can swing a hammer at.

Computing using the physical world. World's First Parallel Computer Based on Biomolecular Motors: the problem to be solved is 'encoded' into a network of nanoscale channels. This is done, on the one hand by mathematically designing a geometrical network that is capable of representing the problem, and on the other hand by fabricating a physical network based on this design using so-called lithography, a standard chip-manufacturing technique. The network is then explored in parallel by many protein filaments (here actin filaments or microtubules) that are self-propelled by a molecular layer of motor proteins (here myosin or kinesin) covering the bottom of the channels. The design of the network using different types of junctions automatically guides the filaments to the correct solutions to the problem.

Another excellent overview from Chetan Sharma of the Mobile World Congress 2016 Observations: Verizon was the first one to announce results from some early tests in the field – 10 Gbps for potential fixed wireless deployments...Operators who will invest to become “solutions providers” will be better positioned for the future vs. the ones who are purely “access providers...Verizon’s XO deal of $1.8B didn’t get much attention but it was a brilliant deal appreciated by the folks who really understand what is going on...The talk of 5G drowned out any discussion of connecting the unconnected...eSIM is potentially one of the biggest disruptive force our industry has seen in some time.

Hard to disagree. Interfaces - The Most Important Software Engineering Concept. But like the Tao, we each have our own idea of what is the way of interfaces.

SSD reliability in the real world: Google's experience: High-end SLC drives are no more reliable that MLC drives; SSD age, not usage, affects reliability; SSD UBER rates are higher than disk rates, which means that backing up SSDs is even more important than it is with disks; The SSD is less likely to fail during its normal life, but more likely to lose data.

Robots, they could be big, if you are into that sort of thing. Multi-Billion dollar robotics market is about to boom: International Data Corporation said worldwide spending on robotics and related services will hit $135.4 billion in 2019. The research firm said that global robotics spending in 2015 was $71 billion, and is set to grow at a compound annual growth rate of 17%.

Using a Serverless Architecture to deliver IRC Webhook Notifications: The economics of running IRC Hooky (or other Lambda functions) at scale is what is most appealing about this architecture...Estimating an exaggerated 100K API calls per month will ding you ~$0.35 (per month) with API Gateway.

You can market stuff without building stuff. How I built a hoverboard company and then blew it up.

dastbe on working with feature flags "at scale": Always be developing against the current running features. No brainer. * Design things so that they integrate feature flags, not work around them. This usually means pushing feature flag determination to more generic/common code. * Separate backend/frontend changes into separate feature flags when possible. Turn on backend changes early and often to better measure your feature's impact. * Give individual features their own flag, but also have a global flag that manages the entire experience. This makes it easier to manage your gradual dial up as well as shut off problematic features that would otherwise mess up the launch. * Be diligent about removing feature flags once they're turned on. Schedule it into sprint time, reward teams that remove them, make it a management mandate, whatever. Just get rid of them once they're no longer needed. * Invest in monitoring around your services that (ideally) can correlate failures with features. you should turn on features over the course of a few hours/days to mitigate customer impact in the event of failures and gain data about performance at 50/50.

Learn how Code School does it: Building and scaling Code School with Docker and a service-oriented architecture. Excellent interview.

Quantum algorithms: an overview. Overview does not mean simple! This goes way beyond the functional vs OO divide. Also, Quantum Computer Factors Numbers, Could be Scaled Up: now come up with a new, scalable quantum system for factoring numbers efficiently. While it typically takes about 12 qubits to factor the number 15, they found a way to shave the system down to five qubits, each represented by a single atom. Each atom can be held in a superposition of two different energy states simultaneously. The researchers use laser pulses to perform "logic gates," or components of Shor's algorithm, on four of the five atoms.

Inside the network wars. Google Blocking IPv6 Adoption With Cogent, Impacting Transit Customers: Of course, if Comcast, Verizon or one of the ISP was using this IPv6 tactic to their benefit, Internet advocacy groups and the media would be calling for their heads. Yet no one seems to have noticed what Google is doing, and is complaining about it, other than Cogent’s customers being impacted by Google’s tactics. So the question people should now be asking is, “Who is blocking IPv6 routes and why?”

Why hasn't functional programming taken over yet? Perhaps a bad foreign policy? A weak military?

MacroBase: Analytic Monitoring for the Internet of Things: To facilitate rapid development and scalable deployment of analytic monitoring queries, we have developed MacroBase, a data analytics engine that performs analytic monitoring of IoT data streams. MacroBase implements a customizable pipeline of outlier detection, summarization, and ranking operators. For efficient and accurate execution, MacroBase implements several cross-layer optimizations across robust estimation, pattern mining, and sketching procedures, allowing order-of-magnitude speedups. As a result, MacroBase can analyze up to 1M events per second on a single core. MacroBase has already delivered meaningful analytic monitoring results in production at a medium-scale IoT startup.

Scaling up Superoptimization: Superoptimization can, in principle, discover machine-specific optimizations automatically by searching the space of all instruction sequences. If we can increase the size of code fragments a superoptimizer can optimize, we will be able to discover more optimizations.

tensorflow/tensorflow/core/distributed_runtime/: This directory contains the initial open-source implementation of the distributed TensorFlow runtime, using gRPC for inter-process communication.

google/cayley: Cayley is an open-source graph inspired by the graph database behind Freebase and Google's Knowledge Graph. Its goal is to be a part of the developer's toolbox where Linked Data and graph-shaped data (semantic webs, social networks, etc) in general are concerned.

Stuff The Internet Says On Scalability For March 4th, 2016

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale