hot links

Stuff The Internet Says On Scalability For July 29th, 2016

High Scalability

29 Jul 2016 — 11 min read

Hey, it's HighScalability time:

Facial tats to disrupt big brother surveillance systems may actually work. Our future?If you like this sort of Stuff then please support me on Patreon.

40.4 million: iPhones sold this quarter; 7: number of times Facebook has avoided the IRS; 104: new exoplanets; 100: new brain regions found; 2x: HTTPS adoption;

Quotable Quotes:
- @mat: Apple is doomed: "the nearly $8 billion in profits this quarter is more than twice what Facebook made in 2015"
- Bruce Schneier: The truth is that technology magnifies power in general, but the rates of adoption are different. The unorganized, the distributed, the marginal, the dissidents, the powerless, the criminal: they can make use of new technologies faster. And when those groups discovered the Internet, suddenly they had power. But when the already powerful big institutions finally figured out how to harness the Internet for their needs, they had more power to magnify. That’s the difference: the distributed were more nimble and were quicker to make use of their new power, while the institutional were slower but were able to use their power more effectively.
- @mjasay: What AWS does for AMZN: $2.89B in revenue (up from $1.8B last year), earning 56% of Amazon profits (EPS was $1.78, up from $0.19 last year)
- @kurtseifried: I wonder how discrete cloud billing can get? Per cpu cycle? bit moved in and out? I suspect yes.
- Algorithms to Live By: More generally, our intuitions about rationality are too often informed by exploitation rather than exploration. When we talk about decision-making, we usually focus just on the immediate payoff of a single decision—and if you treat every decision as if it were your last, then indeed only exploitation makes sense.
- Pinterest: As it turns out, it’s damn hard to design consistent and beautiful things at scale.
- @obfuscurity: OH: “god i hate having to lie about loving containers all the time”
- @beaucronin: Leah McGuire: "Metrics are the unit tests of data science"; without them you won't know when things break and you'll be exposed #wrangleconf
- @tsantero: OH: "Blockchain: a system that allows a bunch of non-CS people to suddenly be distributed computing experts."
- zeveb: People want safety; they want security; they want conformity; they want power over others.
- Richard Watson: My take-home [re Pokemon Go]: even the very best can be surprised when the scale hits the fan.
- @xaibeha: HTTP/2: Because a hundred requests per page load is just a fact of nature.
- mdatwood: many people have this irrational hate for Java, or they hate the Java from 10 years ago. Todays Java is fast, has tons of mature frameworks, and is probably one of the best tools to use from building a web service back end.
- @BenedictEvans: Obvious: an iPhone has hundreds of times more compute power than the original Pentium. More important: $50 Androids in rural Africa do too
- Dark Silicon: infeasible to operate all on-chip components at full performance at the same time due to the thermal constraints (peak temperature, spatial and temporal thermal gradients etc.
- @Sneakyness: Why do people always assume that companies have scaling issues, and not that they've determined that 85% uptime is enough to make money
- @cdixon: Alternative headline: "Alphabet invests $859M in long-term projects."
- @xaprb: We were promised a Utopian vision with the “semantic web,” but it turns out it’s actually Feedly, IFTT, Slack, and Pocket that fulfill it.
- Amit: Let's drop 10¢ coins and $10 bills and treat them like 50¢ coins, $2 bills, $50 bills — they exist but we don't use them widely.
- Graham Templeton: One major advantage of life over modern engineering is power efficiency.
- @neil_conway: @t_crayford @kellabyte >10k threads running native code + user-defined stored procedures in a single address space sounds pretty scary.

Niantic is looking for a Software Engineer - Server Infrastructure to help make Pokemon go. You think it's easy? Think again: Create the server infrastructure to support our hosted AR/Geo platform underpinning projects such as Pokémon GO using Java and Google Cloud. You will work on real-time indexing, querying and aggregation problems at massive scales of hundreds of millions of events per day, all on a single, coherent world-wide instance shared by millions of users.

DDos attacks as a reason to bypass the kernel. Why we use the Linux kernel's TCP stack: During some attacks we are flooded with up to 3M packets per second (pps) per server...With this scale of attack the Linux kernel is not enough for us. We must work around it. We don't use the previously mentioned "full kernel bypass", but instead we run what we call a "partial kernel bypass". With this the kernel retains the ownership of the network card, and allows us to perform a bypass only on a single "RX queue".

BTW, I bought nothing on Prime Day. How AWS Powered Amazon’s Biggest Day Ever: This wave of traffic then circled the globe, arriving in Europe and the US over the course of 40 hours and generating 85 billion clickstream log entries. Orders surpassed Prime Day 2015 by more than 60% worldwide and more than 50% in the US alone. On the mobile side, more than one million customers downloaded and used the Amazon Mobile App for the first time.

Usually you read about projects going the other direction. You may not agree, and many do not, but Uber did their homework. Why Uber Engineering Switched from Postgres to MySQL. Postgres limitations found: Inefficient architecture for writes; Inefficient data replication; Issues with table corruption; Poor replica MVCC support; Difficulty upgrading to newer releases. MySQL wins: The most important architectural difference is that while Postgres directly maps index records to on-disk locations, InnoDB maintains a secondary structure. Instead of holding a pointer to the on-disk row location (like the ctid does in Postgres); MySQL supports multiple different replication modes; InnoDB storage engine implements its own LRU in something it calls the InnoDB buffer pool; MySQL implements concurrent connections by spawning a thread-per-connection. This is relatively low overhead.

What does the decentralize web mean? Depends on who you ask. This sounds good. Jae Kwon: The Decentralized Web is a self-organizing federation of humans and machines that speak many evolving protocols and languages, ideally robust in availability, unencumbered by unwanted regulations, and with no central point of failure, benign or malicious, created and maintained for the benefit of everyone. The Decentralized Web is not a right, but a continuing achievement, by humans and machines.

This is the Google version of embrace and extend. What if you could run the same, everywhere?

You don't need to be Amazon, Facebook, or Google to scale. Riot Games is building a world-wide network for themselves, all in the name of consistently low latency. Fixing the Internet for Real-Time Applications: Part III: Our approach to improving this situation has been to build a system of “gateways” that allow us to steer traffic destined for our network in the appropriate direction at the earliest possible point (where we peer with ISPs), and then “pin” that traffic so that it returns exactly the same way. This completely removes the need for BGP balancing, and allows players traffic to choose the best route every time they play. This also allows us to use protocols like Anycast to help us verify that traffic arrives at the right gateway via the most efficient path possible. And we can do all of this ourselves using common compute, instead of using expensive and specialized networking hardware.

Excellent discussion of Schedulers with Adrian Cockcroft. Scheduling different workloads--instructions, jobs, containers--across different fabrics--CPUs, networks, grids--share a lot of similarities. Cache affinity matters at every level.

It's a big image so I can't post it, but it's good reading. Envy not. Much work was done in the making. Overnight success takes 20 years: The story of PokemonGO.

Really more of a meditation. Random notes on improving the Redis LRU algorithm: My initial tests show that it outperforms LRU in power-law access patterns, while using the same amount of memory per key, however real world access patterns may be different: time and space locality of accesses may change in very different ways, so I’ll be very happy to learn from real world use cases

Good thread on Maximizing inter-core memory bandwidth/minimizing latency in Broadwell Xeon v4. Martin Thompson: Turbo Boost: With increased active cores the clock rate can go down. Using x86 PAUSE in spin loops can help but best to frequency lock all cores; Bandwidth Limitations: If all cores are accessing the same L3 cache slice then the port on that slice can become a bottleneck. Cache coherence traffic for invalidation and then re-fetching of all cores needs to be considered as the publisher gets exclusive access before modification; re you seeing back pressure on the publisher? If so, then you likely waiting for the publisher flow control window to be updated. This could either the driver conductor or one or more of the subscribers being starved out and thus holding everyone else back; There is so much to look at. Cache missing, starvation, setup for NUMA and CoD (Cluster on Die - effectively NUMA on socket); You need to model the flow rates and dependencies. To have parallel in-flight cache misses you need to ensure you are avoiding data dependent loads and even then you only have 10 line feed buffers per core to keep the cache misses operating concurrently.

I wanna go fast: HTTPS' massive speed advantage. OK, it wasn't really HTTPS, but HTTP/2 that made the difference, but the takeaway is security doesn't have to be slow. It turns out the smart use of a connection actually matters. As we always knew.

Not scary at all. Baidu uses millions of users’ location data to make predictions: The firm’s Big Data Lab in Beijing has announced that it has used billions of location records from its 600 million users as a lens on the Chinese economy, tracking the flux of people around offices and shops as a proxy measurement for employment and consumption activity. The lab even used the data to predict Apple’s second quarter revenue in China.

It will be a slow death for disk drives. QLC Flash on the horizon: Cloud systems such as Facebook's use tiered storage architectures in which re-write rates decrease rapidly down the layers. Because most re-writes would be absorbed by higher layers, it is likely that QLC-based SSDs would work well at the bulk storage level despite only a 500 write cycle life. It seems likely that only a few of the 2015 flash exabytes in the graph are 3D TLC, most would be 2D MLC. If we assume that half the flash from existing fabs becomes 3D QLC, flash output might increase 8x. This would still not be enough to completely displace hard disks, but it would reduce disk volumes and thus worsen the economics of building them. Fewer new flash fabs would be needed to displace the rest, which would be more affordable. Both effects would speed up the disk death spiral.

Lessons Learned the Hard Way: Postgres in Production at GoCardless: Look at what limits you can set in your database config. Figure out appropriate values for your system. Set them before a runaway process bites you; Don't trust your ORM/database adapter. Boot the damn thing up and see if it issues any queries/settings changes on top of what you were expecting; Set something up to watch your logs for (or better yet, turn them into metrics): Excessive temporary files being created Excessive waiting on locks

It's not easy to make TCP/IP work over a satellite. TCP Congestion Avoidance on Satellite Links: They implemented an IP-layer coding mechanism on the layer-3 path that traversed the satellite link, effectively distributing every TCP packet across a number of transport packets (to minimize the effects of packet bursts on other TCP sessions) while also adding forward error correction to recover from reasonable packet loss rate without triggering TCP retransmissions.

Cells as state machines? What could be done with a sufficiently programmable cell? How MIT’s new biological ‘computer’ works, and what it could do in the future: In response to each input variable, probably a chemical agent, a recombinase will either delete or invert its associated portion of the genome — and crucially, that portion of the genome itself contains targets that dictate later recombinase binding.

Box can now launch a new service in a week instead of six months. This is how it's done with a new post modern workflow. Kubernetes at Box: Microservices at Maximum Velocity: An engineer writes a Dockerfile to package up their service into a Docker image. Once their image has been built by Jenkins and published to our internal registry, they write and test the Kubernetes objects that run their service, set up service discovery, generate and load secrets, provide run-time configuration, and more...Once the engineer has written their config (in the Jsonnet templating language for easier refactoring), they add the configs to the central git repository. We then have an "applier" that's responsible for continually reconciling the state of the git repository with the state of the various Kubernetes masters we have in each of our datacenters using "kubectl apply...At that point Kubernetes takes over and creates Docker containers on the various servers, automatically configures our haproxy load-balancers using service-loadbalancer, provides secrets and configuration to the instances, and so on. Deploying to different environments or clusters is as simple as adding an if statement to generate a few more files in the central git repository.

Not much on details, but an interesting development. Titanfall 2 To Use Multiplay’s Multi-Cloud Platform To Ensure Endless Scalability & High Reliability: Multiplay (a subsidiary of GAME Digital) boasts an auto-scaling hybrid-cloud technology which is integrated into a global network of dedicated server providers and cloud providers such as Microsoft Azure, Google Compute and Amazon Web Services. The promise is to deliver minimal latency and maximum capacity, with Multiplay ready to deploy multiple Cloud providers and Bare Metal Server data center locations in each region

We first talked about small satellites here. Now it seems they are moving on up. CubeSats are going to Mars. There's also Mars rover uses A.I. to decide what to zap with a laser. You know that future where humans send an army of self-replicating robots to explore every nook and cranny of the universe, sending streams of data back home? It's getting closer.

There are times when you need thousands of low latency timers. That's when you need a hierarchical timer wheel. Here's a great explanation of how they work. Ratas - A hierarchical timer wheel: A timer wheels is effectively a ring buffer of linked lists of events, and a pointer to the ring buffer. Each slot corresponds to a specific timer tick, and contains the head of a linked list. The linked list contains the events that should happen on that tick.

Nice tutorial on how to Build Scalable Newsfeeds with PHP 7 and Laravel – in 60 Minutes.

If the Brothers Grimm were alive and collecting stories today this would definitely make the cut: A DevOps Cautionary Tale or how a company with nearly $400 million in assets went bankrupt in 45-minutes because of a failed deployment.

If you need cheap deals on hosting there's a place for that.

Just a glimpse of the future when everything is enchanted with sensors. Researchers invent “smart” thread that collects diagnostic data when sutured into tissue: The researchers used a variety of conductive threads that were dipped in physical and chemical sensing compounds and connected to wireless electronic circuitry to create a flexible platform that they sutured into tissue in rats as well as in vitro. The threads collected data on tissue health (e.g. pressure, stress, strain and temperature), pH and glucose levels that can be used to determine such things as how a wound is healing, whether infection is emerging, or whether the body’s chemistry is out of balance. The results were transmitted wirelessly to a cell phone and computer.

rsms/immutable-cpp: Persistent immutable data structures providing practically O(1) for appends, updates and lookups based on work by Rich Hickey and by consequence Phil Bagwell.

FaRM: Fast Remote Memory: main memory distributed computing platform that provides distributed transactions with strict serializability, high performance, durability, and high availability. To scale out, FaRM distributes objects across machines in a data center and also allows transactions to span any number of machines. To reduce CPU overhead it uses one-sided RDMA (Remote Direct Memory Access) operations.

Web Scaling Frameworks Building scalable, high-performance, portable and interoperable Web Services for the Cloud: The major design goal for Web Scaling Frameworks is to create an architecture that enables to build maintainable, automatable, scalable, resilient, portable and interoperable implementations of WSFs...The approach to optimising the performance is to minimise the request flow graph for every request.

Epidemic Broadcast Trees: We use a low cost scheme to build and maintain broadcast trees embedded on a gossip-based overlay. The protocol sends the message payload preferably via tree branches but uses the remaining links of the gossip overlay for fast recovery and expedite tree healing. Experimental evaluation presented in the paper shows that our new strategy has a low overhead and that is able to support large number of faults while maintaining a high reliability

The Majority Illusion in Social Networks: a behavior that is globally rare may be systematically overrepresented in the local neighborhoods of many people, i.e., among their friends. Thus, the "majority illusion" may facilitate the spread of social contagions in networks and also explain why systematic biases in social perceptions, for example, of risky behavior, arise.

Stuff The Internet Says On Scalability For July 29th, 2016

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale