hot links

Stuff The Internet Says On Scalability For August 5th, 2016

High Scalability

05 Aug 2016 — 12 min read

Hey, it's HighScalability time:

What does a 107 football field long battery building Gigafactory look like? A lot like a giant Costco. (tour)

If you like this sort of Stuff then please support me on Patreon.

60 billion: Facebook messages per day; 3x: Facebook messages compared to global SMS traffic; $15: min wage increases job growth; 85,000: real world QPS for Twitter's search; 2017: when MRAM finally arrives; $60M: Bitcoin heist, bigger than any bank robbery; 710m: Internet users in China;

Quotable Quotes:
- @cmeik: When @eric_brewer told me that Go was good for building distributed systems, I couldn't help but think about this.
- David Rosenthal: We can see the end of the era of data and computation abundance. Dealing with an era of constrained resources will be very different.In particular, enthusiasm for blockchain technology as A Solution To Everything will need to be tempered by its voracious demand for energy.
- Dr Werner Vogels: What we’ve seen is a revolution where complete applications are being stripped of all their servers, and only code is being run. Quite a few companies are ripping out big pieces of their applications and replacing their servers, their VMs and their containers with just code. Perhaps we no longer have to think about servers.
- @dsb: agree w serverless future - seeing more startups using that model & entirely eliminates most of my infra diligence questions
- Emin Gün Sirer: It's too early for a coherent story to emerge from the smoldering ashes of the Bitfinex disaster.
- @jeremiahdillon: The coming decades will bring population shrinkage not seen since the Black Death. Good for wages, bad for GDP.
- Nicole Hemsoth: The chatter is going around, once again, that AWS is looking to deliver a private version of its public cloud infrastructure, something that is not as easy to do as it sounds.
- Michael Rabin: I must admit that after many years of work in this area, the efficacy of randomness for so many algorithmic problems is absolutely mysterious to me. It is efficient, it works; but why and how is absolutely mysterious.
- Algorithms to Live By: that “bubble sort has no apparent redeeming features,” the research of Ackley and his collaborators suggests that there may be a place for algorithms like Bubble Sort after all. Its very inefficiency—moving items only one position at a time—makes it fairly robust against noise, far more robust than faster algorithms like Mergesort, in which each comparison potentially moves an item a long way. Mergesort’s very efficiency makes it brittle
- JoshGlazebrook: Looks like Hitachi (HGST) is still leading in terms of reliability.
- @SeanMcElwee: don't argue with capitalists. seize the means of production.
- jondubois: What the author describes, I would not call 'protocols' - The Bitcoin network is a hosted implementation of the Bitcoin protocol - It is not the protocol itself. Tokens in the context of the Bitcoin protocol itself have no value - The value is derived from the popularity of the infrastructure, not from the popularity of the protocol.

Where there is Pokemon there is a way. If you don't make an API someone will. Ingenious third party tracking services are one reason Pokemon Go is slow: The company says these services were making the servers unreliable. Pokémon Go doesn’t have an API, so it seems like Pokévision and others created countless of accounts on many servers around the world using Android emulators. With these emulators, they could fake movements around cities and reverse-engineer the game to create a sort of lightweight API and gather Pokémon data.

Two years later is appears Facebook creating a separate Messenger app was a good idea. Go figure. This Is The Smartest Thing Facebook Ever Did: In phase one, Facebook grows the user base. “We’re really at the beginning of phase two,” he said, in which the company focuses on growing organic interactions between people and businesses. Once businesses see this is working, the company launches stage three, in which it asks companies to pay up. This strategy has worked well for the company’s other products: Facebook reported $6.44 billion in sales this year, up 59 percent from a year ago. The company’s profits almost tripled to $2.06 billion.

So you want a system where the guberment has the master key to all encrypted systems? What a great idea! Anyone can now print out all TSA master keys.

This is from WWI! French gov: "WWI sites will be fully cleared of unexploded ordnance in... 300-900 years." Can you imagine what the the aftermath of the cryptowars will be like? Sorry, don't touch that toaster...it will hack your neural lace and make you do crazy shite. Voting booths are all compromised, back to paper. Don't even think of using your all electric AI controlled car. It's now an IDAID (Improvised Destructive AI Device). Remember all those families that drove themselves over the cliff? So sad. After the fifth iteration of this pattern we'll have to melt it all down and start over again, only this time through only steampunk tech will be allowed.

Ka-ching! Everysecond for Apple. See the $$$ increase as you watch.

Wayfair Engineering's Stack in a nutshell: PHP on Linux, and a few data backends, with continuous deployment at the pace of ~250 zero-downtime code pushes a day...We’re proud of how far we got as a business, from our founding in 2002 until 2010, on a keep-it-simple-stupid or KISS architecture of relational-database-backed web scripting...structure consisting of Hadoop and Vertica clusters, and some specialized, vertically-scaled big-memory and GPU machines, for analytical workloads... RabbitMQ and Kafka provide a kind of circulatory system for the data, and they are gradually replacing what traditional ETL we have... if we were starting Wayfair today, we would do it on public cloud infrastructure, for the speed-to-market aspect, for sure...We do a lot of virtualization, and we like it, but when various types of systems become very cookie-cutter or have certain types of requirements, we run physical boxes. Virtualization adds overhead, and it’s one more thing that can break

How do you make your petabyte-scale analytics database fast? BigQuery executes queries completely in memory: BigQuery’s execution engine builds simple and purely in-memory operators and achieves petabyte-scale analytics through scalable data repartitioning, or "shuffle."...The shuffle step is required for execution of large and complex joins, aggregations and analytic operations...In-memory BigQuery shuffle stores intermediate data produced from various stages of query processing in a set of nodes that are dedicated to hosting remote memory......This API is carefully designed to provide a shared-memory abstraction at the right level: It's general purpose enough to support any form of data repartitioning and transfer in a data processing pipeline, yet it's specific enough to be implemented efficiently using Google networking technology...BigQuery uses a dynamic partitioning mechanism that intelligently chooses partitioning based on the type of operators used in the query, the data volume, background load and other factors.

It could, but how many times have we seen centralized branded services lose to federated unbranded services? Disrupting Uber: Driver-owned apps could end Uber’s exploitative reign over the ride-share market.

You are a considering using an exotic declarative, statically typed, purely-functional programming language like Haskell, but you are afraid to take the risk. You want to know how it will work in production. Here's a story for you: A founder's perspective on 4 years with Haskell. The good news: Haskell lets you get stuff done with fairly limited knowledge. The bad news: there’s a lot to learn, even for experienced developers. Haskell lacks tool support, has decent libraries, it's a lazy language so you have to grok space complexity, it's easy to refactor, they experienced no downtime in a heavily used system, the type system makes testing easier, it's not a quick prototyping language, hiring people wasn't as hard as you might think, don't expect to be a well paid Haskell programmer, it helps encourage a good engineering culture. The verdict: "there are many things I would have done differently at Better; choosing Haskell is not one of them."

Unikernel = embedded system + CGI

A short and honest assessment of Why we [Postgres] lost Uber as a user. It comes down to Uber's particular use case: That's a recipe for runaway table bloat; VACUUM can't do much because there's always some minutes-old transaction hanging around (and SNAPSHOT TOO OLD doesn't really help, we're talking about minutes here), and because of all of the indexes HOT isn't effective. Removing the indexes is equally painful because it means less efficient JOINs.

Videos from Serverlessconf Videos - NYC 2016 are available.

The Machine Learning blog with an excellent summary of talks and papers from ICML 2016 (International Conference on Machine Learning). ICML 2016 was awesome.

Multicore systems are now ubiquitous. So how do you make lockless synchronization fast on a multicore CPU? Samy Al Bahra of Backtrace reveals all. The key is not to use your primitive languages. You need safe memory reclamation schemes, like concurrent data structures, which allows for performant and robust memory management that is also suitable for advanced concurrent programming techniques such as non-blocking synchronization. The talk is on the paper: Making Lockless Synchronization Fast: Performance Implications of Memory Reclamation. It's a good talk, lots of wonderful detailed illustrations and explanations of what's going on. A bit long, so be sure to reclaim some time to watch.

Did you see Star Trek Beyond? This is a gun powder moment is warfare. When everything changes. Even if it's not this particular system it will be something like it. An interesting part of the design is how it's a hybrid of AI + humans. 6th-generation Russian drone fighter jets to fly in swarms and enter near space: Each swarm will consist of the main vehicle which maintains general command with each of the remaining aircrafts programmed to perform particular tasks, such as reconnaissance, hitting ground targets or destroying enemy aircrafts...A pilot is performing tasks with his air group, while another [group] is a thousand of kilometers away. Let’s say, one ‘swarm’ has suffered losses. Then the second group can share aircrafts with the first one.

Lots of projects have tried writing a better TCP in UDP only to end up recreating TCP again, just wearing different clothes. Google is giving it shot with QUIC, nicely explained in Google’s QUIC protocol: moving the web from TCP to UDP, which includes a startling quote from Jim (QUIC Architect): I expect QUIC to largely displace TCP, even as QUIC provides any/all technology suggestions for incorporation into TCP. TCP is routinely implemented in the kernel, which makes evolutionary steps take 5-15 years (including market penetration!… not to mention battles with middle-boxes), while QUIC can evolve in the course of weeks or months.

Here's some real practical info for those who walk the dark side: Where there’s no money, there’s no money laundering. A judge sayeth Bitcoin Is Not Money, so you can't be charged for money laundering. So exchange value as you will.

What do you do if you have 5 different search services? You add another layer of indirection to rule them all. SuperRoot: Launching a High-SLA Production Service at Twitter: a scatter-gather Thrift aggregation service that sits between our customers and our indexes...presenting a consistent, logical view of the underlying decomposed, physical indices...we chose the Power of Two Choices (P2C) + Peak EWMA load balancing algorithm, which is designed to quickly move traffic off of slow endpoints...The VM team implemented asynchronous logging and the issue disappeared...Due to the incremental nature of our development process, there was no single day when we launched the SuperRoot.

Great Ideas in Computer Architecture. Great UCBerkeley class (lecture notes) for those who want to dive into the deep end of the machine architecture pool. The big ideas? Abstraction; Moore's Law; Principle of Locality/Memory Hierarchy; Parallelism; Performance Measurement and Improvement; Dependency vs Redundancy.

Wayne Scarano predicts by 2021: PaaS usage will exceed IaaS; Serverless/Microservice/API architectures will standardize and evolve into InterCloud as a Service (ICaaS); Cloud/InterCloud Architecture role will supplant DevOps role (think LessOps).

When a prominent company like Uber openly declares their switch from Postgres to MySQL you know there's going to be some blow back. Would that it all was as well reasoned and polite (except perhaps the bit about average developers) as Use The Index, Luke's On Uber’s Choice of Databases. Lot's of thoughtful counterpoints. The conclusion is Uber is following a long tradition of customizing MySQL to do their bespoke bidding: I believe Uber did not replace PostgreSQL by MySQL as their article suggests. It seems that they actually replaced PostgreSQL by their tailor-made solution, which happens to be backed by MySQL/InnoDB (at the moment). It seems that the article just explains why MySQL/InnoDB is a better backend for Schemaless than PostgreSQL.

A good question. Why don't companies use FreeBSD as much in production as Linux? nirvdrum: I'm not sure you're going to find one unifying answer, so I'll just contribute my own experience...Linux distributions are just a lot easier to set up and have traditionally enjoyed better packaging of proprietary software (graphics card drivers, RAID card drivers, applications, etc.). While FreeBSD has some very compelling advantages on the server, for those in my circle, it's not not so much better as to justify the cognitive overhead in switching between two similar, but different, OSes...FreeBSD wasn't a first class OS on EC2 for years. This allowed an entire ecosystem of devops tools to evolve with essentially Linux-only support.

Neil Brown has written an amazingly comprehensive seven part series on Linux control groups, you know, that little invention that made containers possible.

Hot AWS tip: Using enhanced networking results in consistently lower inter-instance latency. More at How To Setup A Highly Available Multi-AZ Cassandra Cluster On AWS EC2.

We have a pivot. Looks like OpenStack is entering the GIFEE (Google Infrastructure for Everyone Else) space. A sensible plan for both sides. One needs to stay relevant and the other needs to establish a bulwark against AWS on as many fronts as possible. Now we just have to wait for the Americans to enter the war. OpenStack embraces Kubernetes to become a whole lot more like Google: OpenStack vendor Mirantis announced a collaboration with Google and Intel to rewrite key portions of OpenStack's infrastructure as Docker containers managed by Kubernetes. For the historically-sluggish OpenStack project, this is a dramatic move. And a promising one.

Getting robots to listen: Using Watson’s Speech to Text service. This is cool and all, but it's a shame this kind of tech remains under big company control. It won't really take off until it can be used freely in the wild. Will no one rid me of this turbulent priest!

benjaminwootton on A few ways in which this [GIFEE] stack can simplify your life: Abstract away elements of failover, disaster recovery, clustering, load balancing; Abstract your application services cleanly from underlying servers; Give you more consistency across and within development, test and production environments; Allow for immutable deployments; Allow for canary releasing, A/B releasing, rollback etc; Better isolation of processes and services; Move away from general purpose operating systems to lighter weight single purpose OS such as CoreOS.

How much faster is Doing Bayesian Data Analysis On the GPU? dragandj: For example, robust linear regression from chapter 17, that fits 300 points over 4 parameters (easy, but far from trivial) runs in 180 seconds in JAGS and 485 in Stan, in parallel with 4 chains, taking 20,000 samples. Bayadera takes 276,297,912 samples in 300 milliseconds, giving much fine-grained estimations. So, depending on how you count the difference, it would be 500-1000 times faster for this particular analysis, while per-sample ratio is something like 7,000,000 (compared to JAGS).

What goes inside the transaction journal? Lots: The whole point of the journal is to make sure that you if you had some sort of error (including power loss), you can use the information on the journal to recover up to the same point you were at before the failure.

Too bad blogs are dead. Blogging cells tell their stories using CRISPR gene editing: the CRISPR gene editing technique has been adapted to make cells keep a log of what happens to them, written inside their own DNA. Such CRISPR-based logging could have a huge range of uses, from smart cells that monitor our health from within, to helping us understand exactly how our bodies develop and grow. This exciting technology could record the biography of a cell, says synthetic biologist Darren Nesbeth of University College London, who was not involved in the work. For example, therapeutic immune cells could be engineered to patrol a person’s body, recording what they see and reporting back to clinicians when they are recaptured. “That’s just one of many possible examples,” says Nesbeth.

Why You Should Care about High-Dimensional Sphere Packing: Sphere-packing comes in when we are trying to find error-correcting codes for sending more complicated messages. In this case, we can think of the data we are transmitting as a point in some high-dimensional space...Our goal is to find a good set of points in 100-dimensional space that we can use as codewords...The sphere-packing problem asks how densely we can pack equal-size spheres in, say, 100 dimensions. For error-correcting codes, the centers of these spheres are our codewords...So we're looking for spheres in 100-dimensional space whose radii are all some set distance away from each other. Surprise! We just found a sphere-packing problem in its natural habitat.

Mencius: Building Efficient Replicated State Machines for WANs: We present a protocol for general state machine replication – a method that provides strong consistency – that has high performance in a wide-area network. In particular, our protocol Mencius has high throughput under high client load and low latency under low client load even under changing wide-area network environment and client load. We develop our protocol as a derivation from the well-known protocol Paxos. Such a development can be changed or further refined to take advantage of specific network or application requirements.

Stuff The Internet Says On Scalability For August 5th, 2016

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale