hot links

Stuff The Internet Says On Scalability For September 23rd, 2016

High Scalability

18 Sep 2016 — 14 min read

Hey, it's HighScalability time:

Will Minority Report for developers really help us program better? (Primitive)

If you like this sort of Stuff then please support me on Patreon.

October 2017: ICANN changes the DNSSEC root keys; $2.91M: cost of running Let's Encrypt; 20%: Amazon convenience tax; 100%: increase in spam; 6.2 km: Quantum teleportation across a metropolitan ﬁbre network; March 18, 1982: birth of containers; 6 months: how long a lightening bolt can power a 60 watt bulb; trillions: EV cache hits per day @ Netflix; 5x: Spark is faster than MapReduce; billions: HTTP, Git and SSH connections served per day at GitHub; 28: # of websites in North Korea;

Quotable Quotes:
- @vgcerf: It is time to admit after 18 years that the multistakeholder model of Internet operation works. #yestoIANA
- @EricLathrop: Netflix found a 5x performance variation between AWS instances at the same price! They benchmark to avoid overpaying. @indirect #Strangeloop
- @swardley: Perfectly reasonable @NigelBarron. Larry's statements are ludicrous, play is to milk existing customers whilst hoping to find a new future.
- @BethanyMacri: Etsy is very anti-SOA. Monolith forever!
- janfoeh: I've said it before here and I'll say it again: the JS ecosystem is moving in the wrong direction. Sometimes I feel that with Javascript, we developers have taken something that wasn't ours, and we're in the process of destroying the best thing there ever was about it. So here we are, the single <script> tag having been replaced with compilers, transpilers, five mutually incompatible build systems, three different module systems in God knows how many implementations, frameworks changing their API every ten minutes and five thousand lines of NPM module code to be installed for even the simplest of tasks.
- marknadal: This is the way humans have been thinking for thousands of years. And guess what, I sat down with a large airline and had to warn them "we're not Strongly Consistent" and they laughed at me saying "you realize we've been booking seat reservations before there was internet, before you were born, and before there was cheap telephony. Seat reservation has never been strongly consistent - we used to have hundreds of travel agents booking seats and it would take 2 weeks before we would hear about it."
- Jason Feifer: All I have to do is go to another website and see the price is different, and I don't. It's crazy. Like, why am I not doing that? We're the problem.
- @cmeik: "The clock-free design paradigm I promote must eventually prevail. It fits Physics."
- @gabrielgironda: mclaren and apple are a great fit. all the stability of apple's software combined with the reliability of british automobiles
- Bryan Cantrill: The virtual machine is vestigial abstraction. We can not get to #serverless without getting rid of of the VM.
- @dchetwynd: The number of US households that only use cellular data has doubled from 10% to 20% between 2013 and 2016 #strangeloop
- Ayende Rahien: In a word, the idea that having a larger amount of simpler queries is better is nonsense. In particular, it completely ignores the cost of going to the database. Sure, a more complex query may require the database to do additional work, and if you are using caching, then you’ll not have the data in the cache in neat “cache entry per row”. But in practice, this leads to applications doing hundreds of queries per page view, absolute reliance on the cache and tremendous cost at startup.
- @amar47shah: Four atomic operations describe the basic workings of a knitting machine -- @doridoidea #strangeloop
- @fnthawar: Memo from TJ Watson, Jr @IBM wondering how they lost super-computing leadership to 34 people "including the janitor"
- Ivan Pepelnjak: Just because clouderati talk about pets and cattle doesn't mean that it's true in typical enterprises. Likewise I could give you examples of network devices being treated like cattle (for example, every cable modem :D).
- @seancribbs: "Good system design is shaped by the boundaries, not the center" @ztellman #strangeloop
- @niall_obrien: People do realise that serverless doesn't have to equal AWS Lambda, right? Currently having fun with @Firebase and @webtaskio :)
- Evan Ackerman: What Made in Space wants to do with Project RAMA*, short for Reconstituting Asteroids into Mechanical Automata, is to make asteroids into self-assembled, self-contained, self-propelled, fully autonomous spacecraft.
- @danielbryantuk: "If I find a performance win at Netflix, I can deploy a fix to the cloud and have cost savings within the hour" @brendangregg #JavaOneConf
- alecmuffett: It wasn't even me, it was done before I arrived, but it was done by a team of geeks with a tremendous nose for making the best of the database that they had available to them without pulling the old password-migration "log in with one password, parallel-encrypt with a new algorithm, and save the new hashes" - thing, because some of those billion people might never log in again for years. You would never stop migrating people.
- Google: Algorithms Engineering is a lot of fun because algorithms do not go out of fashion
- Dan Rayburn: But the writing is on the wall and content owners should take note that at some point soon, RTMP will no longer be a viable option. It’s time to start making the transition away from RTMP as a delivery platform.
- Brain Coprocessors: In the future, the computational module of a brain coprocessor may be powerful enough to assist in high-level human cognition or complex decision making.
- @somic: you know what comes after #serverless? #codeless. then there is #engineeringless. and then it's #skynet.
- Matthew McCaffrey: The idea that gaming conventions are reflections of economic principles is just one example of the many opportunities for economic teaching presented by the mass appeal of gaming.
- erikpukinskis: what all of this boils down to is: avoid declarative control structures at all costs. Procedural design by default. There are a handful of situations where I will build a declarative API, but they are rare. I will use libraries, but only if they have a single well defined purpose and are largely procedural.
- @houglande: "Averages are lie-candy for your brain" speaking about gathering metrics -@indirect #strangeloop
- @etherealmind: The DevOps market reminds me strongly of the IT market in the 1980s. Camps, strongholds, silos and zero interoperability.
- mrob: Bread darkening accelerates with time under heating as the water levels reduce, so human vision isn't the only non-linearity here. For an ordinary consumer product I'd just test various breads to find the average time needed for light/medium/heavy toast, and linearly interpolate between those if I needed finer granularity, no gamma correction required. Hardly anybody knows about objective standards for toastedness, and they can always turn it up or down next time.
- @ryaneshea: Crazy: the Ethereum hash rate is crashing due to a memory bug in Geth, the Go implementation
- @tcrawford: Even if Oracle gave IaaS away for free...they would still struggle to compete w/ #AWS for a number of reasons. #oow16

Interesting results from a major architecture change at Netflix. Zuul 2 : The Netflix Journey to Asynchronous, Non-Blocking Systems. Netflix had a blocking servlet connectionless based architecture and they moved to a nonblocking asynchronous connection architecture. In general, from a latency, CPU, throughput, and capacity perspective the async version didn't perform much better than the old sync version. Netflix found "the less work a system actually does, the more efficiency we gain from async", which makes sense in terms of scheduling and IO. There was a big win however in the ability to scalably maintain over 83 million persistent connections, one for every client, back into their cloud infrastructure. The cost of a connection becomes a file descriptor instead of a thread, which is a lot cheaper. By using a persistent connect Netlfix can reduce overall device requests, improve device performance, understand and debug the customer experience better, enable more real-time user experience innovations, and reduce overall cloud costs by replacing “chatty” device protocols today (which account for a significant portion of API traffic) with push notifications. Operations did take a hit. Sync systems are much easier to understand and debug. Also, making the migration was not easy. Changing sync code to async is not for the faint-hearted.

This is hilarious. Read the whole thread. You won't be disappointed. @stef: You are in a startup. All around is a burning runway. There are exits to the North and East. You have a bootstrap. There is a VC here.

Oracle Cloud is a "minimum viable product" with a bare-metal focus and hopefully attractive lower price point, according to this excellent overview of Oracle’s next-gen cloud IaaS offering: Oracle has paid richly to hire an “A” team, so to speak — former long-time senior AWS engineers lead the project, and they’ve recruited heavily from all three hyperscale clud providers in Seattle (AWS, Microsoft Azure, Google Cloud Platform)...SDN (capable of both Layer 2 and Layer 3 networking...bare-metal servers (thus the initial moniker, “Oracle Bare Metal Cloud”)...Virtual machines (VMs) are coming later this year, with containers to follow early next year...based on a detailed engineering briefing that Oracle provided to myself and my colleagues, I would say that smart and scalable choices seem to have been made throughout...would characterize this early offering as minimum viable product. See also Next week, Oracle will start a price war with Amazon over cloud computing.

Videos from Strange Loop 2016 are available. Languages for 3D Industrial Knitting is absolutely fascinating. Here are some other recommendations.

Google has a new TCP congestion control algorithm. BBR (Bottleneck Bandwidth and RTT): BBR has significantly increased throughput and reduced latency for connections on Google's internal backbone networks and google.com and YouTube Web servers. BBR requires only changes on the sender side, not in the network or the receiver side. Thus it can be incrementally deployed on today's Internet, or in datacenters...In a nutshell, BBR creates an explicit model of the network pipe by sequentially probing the bottleneck bandwidth and RTT.

Videos from Earth Engine User Summit 2016 are available, and available, and available.

There's an irony that most of the repository platforms used to manage open source projects are not themselves open source. GitHub vs. Bitbucket vs. GitLab vs. Coding. When the crash happens it will be deafening.

Functional at Scale: Functional programming promotes thinking about building complex behavior out of simple parts, using higher-order functions and effects to glue them together. At Twitter we have applied this line of thinking to distributed computing, structuring systems around a set of core abstractions that express asynchrony through effects and that are composable. This allows building complex systems from components with simple semantics that, preserved under composition, makes it easier to reason about the system as a whole.

Don't think this is true. Code is created based on intent. Chemicals preexist and are put together simply for their reactions. @strickinato: "Reading code to understand a program is like looking at chemical reactions and trying to bake a cake" -- @inconshreveable #strangeloop.

If you were to start a bank de novo what would your architecture look like? Here's what Monzo came up with. Building a Modern Bank Backend. It's fully buzzord compliant, but is that a good thing? Good dicussion on HackerNews. It will be most interesting in 10 years to see if they've met their goals. There's a wisdom in starting a system from a small working kernel and extending it based on actual problems that need solving. jhuckestein: I agree that using microservices and kubernetes is not what allows us to provide a good user experience. We could provide the same user experience if we had built everything on rails and postgres. In fact, that would have been much easier. The reason we are investing so heavily in a rock solid platform is that we want to still be able to offer the best possible user experience in 10 years time. This is what we mean when we say we want the platform to be "extensible". Many large banks IT systems are not extensible, in the sense that it is very expensive to make changes. Take, for example, the ability to freeze a card in the app at the tap of a button. A friend at RBS told me that they considered this feature many times, but it took a long time to work out which 20 IT systems would need changing and some of them had been under change freeze for a few years. So eventually the idea was discarded as too expensive.The freeze card feature would be easy to implement for a startup, regardless of their stack. The key is to still be able to implement such a feature easily in 10 years time :)

I'm shocked, shocked I tell you. Amazon’s Algorithms Don’t Find You the Best Deals: The ease of this clearly wins many people over: apparently the majority of shoppers buy the suggested item. But the cost difference between the algorithm-selected choice and the cheapest version available to buy elsewhere on the site was, on average, $7.88 for the 250 products, adding up to a 20 percent inflation across all the items.

Videos from HashiConf 2016 are available.

How Up Hail Used AWS to Evolve from a Side Project to a Business: The first step was decoupling our monolithic application and separating responsibilities across several virtual machines (also known as instances), thereby creating a multi-tier (or n-tier) architecture...Our app and database servers were deployed onto independent hosts, with security groups in place that defined communication rules between them. Our app servers now live in an Auto Scaling group, with a minimum of two production machines running simultaneously at all times. In case of a traffic spike, new virtual machines spin up in real time in response to demand...We use an Elastic Load Balancing (ELB) load balancer to distribute the load across the cluster. We moved our static assets to Amazon S3, and we use Amazon CloudFront to serve our content for faster delivery to our users. Additionally, we configured Amazon CloudWatch alarms to send SMS and email alarms to our team should anything be unreachable. We created Lambda functions for backups, snapshots, and automated testing. We created security groups to ensure that our databases are not accessible via the public Internet and only whitelisted IP addresses can access our internal infrastructure.

This gives wardriving a completely new meaning. Car Hacking Research: Remote Attack Tesla Motors by Keen Security Lab.

Brief Talk at the Storage Architecture Meeting: But for forensic purposes, for example after an attack, they would like to be able to reconstruct the small fraction of the total that was relevant. This is becoming possible. By storing a small amount of additional data with the extracted features, and devoting an immense amount of computation to the task, the original video can be recovered. This provides

Here are all the papers from the Very Large Data Bases (VLDB) 2016 conference.

It wasn't supposed to be like this. Why Middlemen Are Taking Over The Global Economy: Actually, the eBay sellers were right. What they're doing is legal. It's called arbitrage. That's when you buy something at one price, turn around and sell it somewhere else at another price. But this was arbitrage with a twist because these eBay sellers never actually touched a Ripple Rug. They would advertise it and wait. When they sold one, they would buy it from Fred Ruckel's Amazon store for a lower price, have it shipped directly to their customer and pocket the difference. They never owned the rug. They took on no risk.

Love it. Teen Creates App So Bullied Kids Never Have To Eat Alone.

A great step-by-step walk through of search performance. Performance estimation: a worked example using bloom filters. Grep can only take you so far. After you've saturated memory bandwidth you need indexes and horizontal scaling. And of course bloom filters: we can think of a bloom filter as a data structure that represents a set using a bit vector and a set of independant hash functions.

This is how you know the blockchain is no longer cool. KPMG Launches Blockchain Service in Canada: KPMG’s Digital Ledger Services include full lifecycle support from strategic qualification and business case development to relevant use-case development, systems and operations integration, and ongoing management of a company’s blockchain infrastructure. The lifecycle support includes management consulting, risk consulting in financial processes and regulation. KPMG’s in-house coding and development will also be part of the services offered to clients

A very good question for the future of ecommerce. Fashion forward: Eventually, though, Amazon will build a strong offering, and consumers will be called upon to decide: do they want a one-stop-shop for everything, from electric toothbrushes to Jimmy Choo shoes? Zalando’s hope is that there is still something special about shopping for fashion, even if it’s done while waiting for the bus.

Show me your lighenting bolt! eBay is using AMP for instant loading of ecommerce pages. Experience the Lightning Bolt. AMP does require a different production pipeline as AMP isn't as expressive as the real web. There are stale cache issues. In the future they are thinking of serving all AMP pages on mobile to reduce the duplication of effort.

Keynote videos from the O'Reilly Velocity Conference in New York 2016 are available.

Astonishing disappointment with AWS’s API Gateway. Can you really be an API gateway and not support binary data? Maybe HTTP gateway would be a better name.

Safe and unsafe operations for high volume PostgreSQL. A detailed list of safe and unsafe operations, where safe means the command can be executed without downtime. The reason: if I run a bad command, it can lock out updates to a table for a long time.

Go is not the only game in town. Really thoughtful article on My experience rewriting Enjarify in Rust: Python is slower than Go and Go is slower than Rust. However, I was surprised by just how big the gaps were, especially between Go and Rust. I guess all those lifetime annotations and long compile times really pay off. I think this does emphasize how Go isn’t quite on the same level of “systems” language as C++ and Rust are.

Well that's clear as mud. purescript-parallel/src/Control/Parallel/Class.purs

Color may be all in your head but that doesn't mean there's no way to do it right. And if you like examples of power laws in action you'll like human sensory perception. What every coder should know about gamma: the only reason to use gamma encoding for digital images is because it allows us to store images more efficiently on a limited bit-length.

supergiant/supergiant: an open-source container orchestration system that lets developers easily deploy and manage apps as Docker containers using Kubernetes.

bitfunnel.org: an experimental open-source information retrieval system.The goal of this site is to explain how BitFunnel works. We will cover the math and science in order to lay down a theoretical foundation, but to understand the algorithm in practice, it helps to see real, working code.

roughtime / roughtime: Roughtime is a protocol that aims to achieve rough time synchronisation in a secure way that doesn't depend on any particular time server, and in such a way that, if a time server does misbehave, clients end up with cryptographic proof of it.

deepstream.io: a fast, secure and scalable websocket & tcp server for mobile, web & iot

The Tyranny of the Clock: Since my 1988 Turing lecture, I have been exploring an alternative “clock-free” design paradigm. I seek change in the design paradigm to cast off the tyranny of the clock. Instead of making all logic “march to an external drum beat,” let us allow each logic element to proceed at its own pace. Because each element acts only when and if necessary, such a paradigm shift will lead to designs that save energy. The clock-free paradigm will also make computers go faster because doing away with border-crossing delays speeds communication.

L4 Microkernels: The Lessons from 20 Years of Research and Deployment: We demonstrate that while much has changed, the fundamental principles of minimality, generality and high inter-process communication (IPC) performance remain the main drivers of design and implementation decisions.

Why does deep and cheap learning work so well? (video): We show how the success of deep learning depends not only on mathematics but also on physics: although well-known mathematical theorems guarantee that neural networks can approximate arbitrary functions well, the class of functions of practical interest can be approximated through "cheap learning" with exponentially fewer parameters than generic ones, because they have simplifying properties tracing back to the laws of physics.

The End of Slow Networks: It’s Time for a Redesign: In this paper, we first argue that traditional distributed DBMS architectures cannot take full advantage of high-performance networks and suggest a new architecture to address this problem. Then, we discuss initial results from a prototype implementation of our proposed architecture for OLTP and OLAP, showing remarkable performance improvements over existing designs.

TrafficDB: HERE’s High Performance Shared-Memory Data Store: This paper presents TrafficDB, a shared-memory data store, designed to provide high rates of read operations, enabling applications to directly access the data from memory. Our evaluation demonstrates that TrafficDB handles millions of read operations and provides near-linear scalability on multi-core machines, where additional processes can be spawned to increase the systems’ throughput without a noticeable impact on the latency of querying the data store

Nitro: A Fast, Scalable In-Memory Storage Engine for NoSQL Global Secondary Index: a high-performance in-memory key-value storage engine used in Couchbase 4.5 Global Secondary Indexes. The Nitro storage engine is well suited for the recent hardware trends like large amounts of memory and many CPU cores. The storage engine leverages latch-free data structures and tries to achieve linear scalability for the index read-write operations.

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics: Overall, our experiment show that Spark is about 2.5x, 5x, and 5x faster than MapReduce, for Word Count, k-means, and PageRank, respectively. The main causes of these speedups are the efficiency of the hash-based aggregation component for combine, as well as reduced CPU and disk overheads due to RDD caching in Spark. An exception to this is the Sort workload, for which MapReduce is 2x faster than Spark. We show that MapReduce’s execution model is more efficient for shuffling data than Spark, thus making Sort run faster on MapReduce.

Stuff The Internet Says On Scalability For September 23rd, 2016

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale