hot links

Stuff The Internet Says On Scalability For January 6th, 2017

High Scalability

Jan 6, 2017 — 17 min read

Hey, it's HighScalability time:

Hot rods in space. The Smith Cloud plummets towards our galaxy at nearly 700,000 mph. Vroom!
If you like this sort of Stuff then please support me on Patreon.

3 of top 5: Stackoverflow questions are about Git; 3,000: four-passenger cars could serve 98 percent of NYC taxi demand; 44%: US population lives within 20 miles of Amazon fulfillment center; 72%: Amazon customers shopped using mobile device; 110%: increase in industrial control system attacks; 455: Number of scripted television series aired this year; $28.5 billion/yr: App downloads on iOS;

Quotable Quotes:
- @ValaAfshar: Number of robots working in Amazon warehouses: 2016: 45,000 / 2015: 30,000 2014: 15,000 / 2013: 1,000 — @JonErlichman
- @jason_kint: updated duopoly #s. new IAB data came out yesterday. easy to run vs earnings for goog and fb, it's evident everyone else is zero sum game.
- rb2k_: I also haven't seen one [company in Germany] that isn't riddled with MBA grads that mainly push Jira tickets around.
- Joe McCann: The best software developers I know are always hacking over the holidays. True story.
- @kaffeecoder: Sigh. Async vs blocking protocol is irrelevant. What matters is communicating with other services outside your own req/response cycle.
- Eric Jang: It's not a coincidence that Nvidia, the literal arms-dealer of deep learning, has had a good year in the stock market.
- @markimbriaco: Just read a comment that said "Any good codebase has every part perfectly isolated". Oh, to be young and optimistic about software again.
- @swardley: Asked "What do I think is the biggest impact AI would have?" ... hmmm, the largest erosion of social mobility in human history?
- The Attention Merchants: It is therefore more effective for the State to intervene before options are seen to exist. This creates less friction with the State but requires a larger effort: total attention control.
- StorageMojo: The cloud’s collateral damage to the legacy IT vendors continues to spread. A few billion here and a few billion there, and pretty soon you’re talking real money.
- Janakiram MSV: The key takeaway is that Amazon wants enterprises to consume EC2 while it is pushing startups and developers towards Lambda. This move from Amazon will fuel the growth of serverless computing in the industry.
- Maxime Chevalier-Boisvert: Edsger Dijkstra famously said, “The question of whether machines can think is about as relevant as the question of whether submarines can swim.”
- @karlseguin: Microservices without asynchronous messaging (queues) is actually a monolith with really slow and error prone method invocation.
- AshleysBrain: We've been using WebRTC Datachannels for multiplayer gaming in the browser in our game editor Construct 2 (www.scirra.com) for a couple of years now. Generally they work great! However the main problem we have is switching tab suspends the game, which if you're acting as the host, freezes the game for everybody. This is really inconvenient.
- @lstoll: 2017: Year of the return of three tier architecture.
- @tealtan: “I will never make a racial profiling database!” *continues working on social networks, analytics, ad tech*
- @abt_programming: Inverse bus factor: "how many developers have to be hit by a bus before a project starts to proceed smoothly?” - @gasproni
- M.G. Siegler: The numbers speak for themselves. 2 billion words written on Medium in the last year. 7.5 million posts during that time. 60 million monthly readers now. Pageviews galore. So step 2 is simply to slap some banner ads on the site, while step 3 is to profit, right?
- snarf21: Writing software is hard but to me the hardest part is always taking a random abstract concept from someone's mind (or worse, several people) and converting that into something "real" in a fixed timeline and budget. There will have to be lots of tradeoffs and miscues by definition. We are always making something that doesn't already exist, it is creation and creation is hard.
- @Pinboard: Who could have foreseen the always-on home microphone might be of interest to the cops?
- @ThePracticalDev: I heard a rumor that Santa moved over to AWS this year. Big if true.
- Drew Purves~ “intelligence” extends beyond brains; something as simple as self-replicating RNA exhibit intelligent behavior at the evolutionary scale. The natural world is fractal, cyclic, and fuzzy...in a biosphere, every organism is a resource to another organism. That is, learning and adaptation of each organism is not independent of other organisms
- @meatcomputer: System clocks are always accurate and increase monotonically. Timestamps from remote machines are reliable
- pjmlp: Everything on web development feels like an hack.
- @kelseyhightower: In my opinion Serverless does not mean FaaS. I consider any platform that hides the management of servers from the user to be Serverless.
- Amit: one of the lessons I learned from this journey was that the tutorials work best when I've needed that technology for a real project
- @Carnage4Life: Snapchat' copied all the worst parts of Apple's culture & seen success. More copycats to come
- ch: So all that's missing with the decentralized web is a centralized service to aggregate the decentralized streams?
- @mathiasverraes: "Separation of intent and implementation" is probably a much more useful programming principle than all of SOLID combined.
- doh: We moved back and forth between AWS and GCE (based on who gave us free credits). Once we ran out, we chose GCE and never regretted it. GCE has many quirks, for instance the inconsistency between API and the UI, it misses the richness of the services offered by AWS but everything GCE does offer is just faster, more stable and much more consistent.
- Exponential Laws: We have argued that exponential growth would not have succeeded without sustained exponential growth at three levels of the computing ecosystem—chip, system, and adopting community. Growth (progress) feeds on itself up to the inflection point.

Measuring a gnat's eyebrow at a billion miles. Ivan Linscott tells the thrilling story behind the development of the New Horizons probe to Pluto. a16z Podcast: New Year, New Horizons — Pluto! Completing the probe was a close thing. Finding enough plutonium to power system almost didn't happen. Enough wasn't found so the probe had a much lower power budget than originally spec'ed, which caused the communication system to use one FPGA instead of two. You have to use radiation hardened parts. The chips sit right next to a pile of plutonium pumping out gamma rays and neutrons. The FPGA's had a capacity of a million gates, were hardened by design, and had triple redundancy. Each gate in the array is implemented in threes. They are voted in pairs. If three agree then fine. If two agree that's the value used. They fit all the code with 5 gates margin. They also had a hero's journey sourcing a high precision oscillator. And then the frightening story of when the watchdog timer timedout and put the probe in safe mode. It turned out the JPEG compression algorithm took too long to compress an image of Pluto and that caused the timeout to fire. The reason is one of those crazy testing stories. When this feature was tested the picture of the sky was darker so it took less time to compress!

The impulse for folks at Twitter to delay Trump's tweets and insider trade on that information must be overwhelming.

33C3 (Chaos Computer Congress) videos are now available. Great overview by Chris Hager. Lots of interesting talks. You might like: Dissecting modern (3G/4G) cellular modems; Edible Soft Robotics - An exploration of candy as an engineered material; Software Defined Emissions - A hacker’s review of Dieselgate; Rebel Cities - Towards A Global Network Of Neighbourhoods And Cities Rejecting Surveillance.

A compelling break down of the DNC phishing attack. Making everything viewable through a generic UI and everything programmable through a scriptable API has interesting consequences @pwnallthethings: Could have hacked? Sure. Did hack? No. Let me go through why not..The hackers weren't hacking one-by-one; so URL contraction wasn't done manually. It was done via the Bitly API...Why did the hackers include this info? Same reason they contracted links via API. Because they're not hacking 1-by-1. Are hacking at scale...When hackers hack at scale, they reuse infrastructure. They make mistakes. This isn't unusual. You can piece the bits together.

In the game of data you want to be at the top of the data gravity well. When your are down well nothing escapes without great cost. AWS Snowball.

The future is a strange place. 7,500 Faceless Coders Paid in Bitcoin Built a Hedge Fund’s Brain. antognini: I've been participating in Numerai for a few months now. (I've only made some beer money from it, nothing serious.) When you get the data, you have no idea what it is. It's just a file with ~70,000 data points, each of which has 21 features. Each feature is uniformly distributed between 0 and 1. All you have to do is make a binary classification of 0 or 1 (or, more accurately, the probability that the data point is in class 0 or 1). They don't tell you what the 21 features represent. As far as you know, these predictions could be used to make currency trades, or stock predictions, or real estate purchase, or something more exotic. You really have no idea. And since you don't know what these data points represent you can't use any insider knowledge about anything to help you.

It was even more interesting when I misread warehouse as whorehouse. @AustenAllred: It's real. Amazon just patented a floating warehouse that spits out drones .

Moore's Law is now more of a suggestion whispered into the wind. Intel Core i7-7700K review: Kaby Lake debuts for deskto: But the hard truth is, Intel’s “Optimization” step doesn’t seem to have delivered all that much in the way of concrete benefits. Best-case, Kaby Lake is about 10% faster than Skylake in a modestly improved power envelope. Considering that Skylake launched 18 months ago now, that’s not much improvement to deliver

Police seek Amazon Echo data in murder case. What if someone identified their attacker? What if someone updated their will? What if someone poured their heart out to their loved ones as the last few seconds of life drained away? The future is a strange place.

Would it make a difference if VM centric pricing was dropped for staging container services in favor of a resource usage based pricing model? TNS Analysts Show #116: 2016 Year End Wrap-Up: Discussing Docker, OpenStack, and Open Source~ You pay for as many VMs as are necessary to stage containers. So the more distributed your container are the more expensive it becomes. That is what has been keeping cost prohibited up to know. I expect there will be a time when somebody will test the waters to try a different pricing model that is based on the resources that containers actually consume in terms of compute units, storage units, and/or bandwidth. Making it more cost effective. Then there will be new classes of industries that will say maybe containerization is the way to go. The class I'm really thinking about is health care.

You think you have debug logs? SpaceX Anomaly Updates: Investigators scoured more than 3,000 channels of video and telemetry data covering a very brief timeline of events – there were just 93 milliseconds from the first sign of anomalous data to the loss of the second stage, followed by loss of the vehicle.

Aeron is a good choice when you have a lot of communicating modules. Interview with Adam Gibson, Creator of Deeplearning4j: Why Aeron Matters: Deeplearning4j is a whole connected ecosystem of libraries focused on deep learning applications...Aeron is literally a raw UDP library. What it does is facilitate peer-to-peer transactions via something called a Media Driver...In essence, it does network communications and it's vastly faster than say, Google RPC...What we're seeing now, just from our communications and compression, is a ten-times speed-up so far.

Time reversal is a tricky thing for programs not used to traveling through time. How and why the leap second affected Cloudflare DNS: When RRDNS selects an upstream to resolve a CNAME it uses a weighted selection algorithm. The code takes the upstream time values and feeds them to Go’s rand.Int63n() function. rand.Int63n promptly panics if its argument is negative. That's where the RRDNS panics were coming from.

One cloud to rule them all. Why the Computing Cloud Will Keep Growing and Growing: “He [Andy Jassy, head of AWS] wasn’t explicit, but if you were hoping to invest in storage, computing — anything below applications — you are hosed,” said Dharmesh Thakker, a partner at Battery Ventures, who attended the lunch. “Andy is smart and approachable, but reading between the lines, I’m not sure this is good for the V.C. ecosystem.”

Lots of good advice and practices. Overhauling Loco2's hosting infrastructure with AWS, Docker and Terraform.

0 to 1 Million: Scaling my side project to 1 million requests a day. Good story. Started by running everything on a $5/month Digital Ocean droplet. Traffic grew so moved to a $20 per month droplet and moved from Apache to Nginx. For the next growth phase here are some practices: Moved file storage to AWS S3; Moved Mongo to Compose.io; Moved Nginx to it’s own server; Added more Node servers;

Why are you using containers? Why not? What is holding you back? awal: I find containers to be semantically broken for that purpose. What developers want is better and easier management of state/configuration, and I don’t see how putting all that non-managed state into a deeper room is the solution. How about we fix the problem where it exists instead of covering it under rugs and calling it a day.Here is a crazy idea, which is probably never going to see life: Lets standardize software configuration! pushcx: I don’t use them because they don’t solve any problems. Rather than address the problem of stateful server deployment and management, they just zip up a stateful system

Eric Jang with a great Summary of NIPS [Neural Information Processing Systems] 2016: research topics that are in vogue right now GANs, RL, generative models, unsupervised learning, robotics, alternative ways to train DNNs; The volume and quality of published work in 2016 was staggering; There was a 50% percent increase in NIPS registrations; Generative Adversarial Networks (GANs) to be “the biggest breakthrough in Machine Learning in the last 1-2 decades"; I think one of the most promising areas of research right now is the marriage of deep neural nets with graphical models, referred to some as “Bayesian deep learning”; People often cite 1) compute, 2) data, and 3) algorithms as the reason why DL is taking off, and I think we should add 4) the unprecedented accessibility of ML research;

"TLS [Transport Layer Security] 1.3 is the core protocol of the Internet," says Steve Gibson in the most recent Security Now. It's a major upgrade that provides authentication and privacy, both of which are necessary to know who we are talking to and protect any information that is exchanged. It's much faster (fewer handshakes), simplified by removing out of date features, and has been analyzed for weakness using a process called model checking.

Murat with Learning Machine Learning: A beginner's journey. Lots of good courses and other sources to learn from.

Matheus28: All my games use a server-client architecture. It's a custom written WebSocket implementation. I think Agar.io has around 190 players per server. Diep.io has around 72. Per game room (each room is a process). I end up just using boxes that have 1 CPU core and run just that game room in there. Except for some dedicated servers that have 40+ cores, in which we run 40+ processes. On Agar.io doing all the collision checking and encoding the packets is the biggest bottleneck. Similarly for Diep.io. Number of players of course increases those two factors almost linearly. For example, Diep.io doesn't process shapes that aren't being transmitted to anyone...For Diep.io, it's completely server side and very little happens on the client side. That is intentional as Agar.io had a problem with "private servers" popping up which were actually people ripping (read stealing) the client side code, putting their ads in there, hosting it on their own website and pointing it at their server emulator.

Gil Tene: One of the biggest reasons folks tend to stay away from the consumer CPUs in this space (like the i7-6950X you mentioned below) is the ack of ECC memory support. I really wish Intel provided ECC support in those chips, but they don't. And ECC is usually a must when driving hardware performance to the edge, especially in FinServ. The nightmare scenarios that happen when you aggressively choose your parts and push their performance to the edge (and even if you don't) with no ECC are very real. The soft-error correcting capabilities (ECC is usually SECDED) is crucial for avoid actually wrong computation results from occurring on a regular basis from simple things like cosmic ray effects on your DRAM, and with the many-GBs capacities we have tin those servers, going without a cosmic-ray-driven bit-flip in DRAM is unlikely.

Why not have both? The Micro Monolith Architecture: The core principle in the Micro Monolith architecture is to keep the hardware, software and the data close together in one place. By doing so we can simplify things and get rid of unnecessary coordination...I and the team I work with have used the Micro Monolith architecture in a real production system for a while. We started with a Microservices architecture, where every service had its own repository in Git. The transition to Micro Monolith was very smooth and all we had to do was to throw away about 30% of the code in all our services and replace all REST service calls with simple function calls. It was not only the REST parts that disappeared, but also a lot of complexity related to state and error handling.

burntsushi with a detailed exploration of some performance differences between Go and Rust. Once again memory allocation sucks. But maybe we can all get along: Most importantly, both languages at least have a path to writing a very fast program, which is often what most folks end up caring about at the end of the day.

How to look good and save money while setting up scaling policies in AWS: What has worked for me is to have a time based spot instance schedule according to the load pattern where we add spot servers in any given hour according to the load that we experience which would cause CPU to remain low and will prevent the on-demand auto scale group to scale up, when spot instance prices go up and we lose them, on-demand group will kick in and scale with ease.

Interesting breakdown of development and deployment into a four class system. How we develop in and with distributed systems: I believe we’ll be transitioning to CLASS IV within the next 5 to 10 years...Both CL and DE are remote. Call it Chromebook-based development or whatever, but essentially nothing runs your machine, really, in this setup.

Maybe we should pay more attention to inhibition? Developmental broadening of inhibitory sensory maps: We found that, in contrast to the refinement observed for excitatory maps, inhibitory sensory maps became broader with maturation. However, like excitatory maps, inhibitory sensory maps are sensitive to experience.

Fast Topic Matching. A detailed look at 5 algorithms--hash, inverted bitmap, optimized inverted bitmap, trie, concurrent subscription trie--for matching topics with subscribers. A task at the heart of many services. The result: concurrent subscription trie is the best option if there is high concurrency and throughput matters. It offers similar performance to the trie but scales better.

Mobile Predictions 2017: we are entering the Connected Intelligence Era and the mobile industry is growing beyond its traditional borders to transform every vertical industry and by extension – the global GDP. Proof is in the numbers. 7 Zettabytes of digital information created. 1.3 billion smartphones sold. Over 60 Exabytes of mobile data traffic (which btw will grow 15x+ in the next 5 years). Almost 100 million wearables sold. More than 16 billion connected devices. Almost half trillion dollars in data revenues. Over 400 billion dollars in OTT revenues. At least 77 companies generating a billion or more from 4th wave. At least 8 companies generating a billion or more from IoT.

Here's a handy dandy list of 2017 DEVOPS CONFERENCES.

The three faces of multi-cloud: Multi Cloud with workloads moving between different cloud providers; Multi Cloud where you are consuming services from different cloud providers; Multi Cloud with clean separation of workloads between different cloud providers. Pros and Cons of a Multi Cloud approach: Being multi Cloud puts a burden and slows down your rate of innovation.

Everything Sysadmin: Are You Load Balancing Wrong? Anyone can use a load balancer. Using it properly is much more difficult. Some questions to ask: 1. Is this load balancer used to increase capacity (N+0) or to improve resiliency (N+1)? 2. How do you measure the capacity of each replica? 3. Are you monitoring whether you are compliant with your N+M configuration?

Everything old is new again. UDP vs. TCP: My recommendation then is not only that you use UDP, but that you only use UDP for your game protocol. Don’t mix TCP and UDP, instead learn how to implement the specific pieces of TCP that you wish to use inside your own custom UDP based protocol.

On the Journey from Legacy Code to Clean Architecture: Rebuilding the Buffer Android Composer. Great discussion on what a complicated application should look like. Way too indepth to summarize, but a useful process to go through. What is a clean architecture?: Layers that have no interest in frameworks are completely independent of them; Due the separation of concerns, our code becomes more testable as we can write more focused and fine-grained test classes; Layers that have no interest in any UI have no knowledge of these view components; Layers are completely independent of any Databases that may be used.

Uber completely rewrote their app. Why Uber Started Over. To support more distributed development Uber is modularizing their own infrastructure by adopting a core code and optional code model so a hundred different program teams and thousands of engineers can work on the same product. Core code changes must be reviewed etc. More experimental development that can be turned off if it doesn't work happens in the optional code. iOS and Android code bases will be aligned more closely by separation of business logic, view logic, data flow, and routing. Not so different then Buffer, Uber is also targeting a Clean Architecture. They went with With Riblets, which delegated responsibilities to six different components: Builder, Component, Router, Interactor, Presenter, View (controller). Routing is guided by business logic rather than view logic, which has a lot of interesting implications.

Here's how to create Even Faster Images using HTTP2 and Progressive JPEGs. You may not like progressive JPEGs, but there's lots of good lore here on creating a faster image download experience: The best way to counter negative effects of loading image assets is image compression; HTTP2 Multiplexing will initiate almost all image downloads simultaneously; Our goal is to show meaningful image contents sooner while enabling browsers to lay out the site speedily; Progressively encoded JPEGs contain ten scan layers by default; HTTP2 offers another tool we may use for even faster delivery of image contents: Server Push.

Peter Norvig is solving the coding puzzles that appear each day at Advent of Code. It's always a worth-while meditation to see how he goes about solving problems.

Scaling Bitcoin with Secure Hardware: We have developed a new scalability solution for Bitcoin, called Teechan. It is a new, practical, high throughput, low latency off-chain transaction protocol that can be deployed securely on the Bitcoin network, as it exists today. Teechan is similar in design to the Lightning Network, save for one crucial differentiating factor: it leverages trusted execution environments (TEEs), that is, secure hardware components found in recent commodity processors such as the latest batch of Intel CPUs with Software Guard Extensions (SGX).

The stirring tale of The Road to 2 Million Websocket Connections in Phoenix. You'll recognize may of the techniques used to increase the number connections handled, but it's always a good story.

Deep Learning Gallery is a curated collection of deep learning projects. deepjazz sounds pretty good. It's clear by now if you have a specific task with data to learn from deep learning can get it done.

deepmind/learning-to-learn: In this paper we show how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way.

Tencent/mars: a cross-platform network component developed by WeChat.

Memshare: a Dynamic Multi-tenant Memory Key-value Cache: We present Memshare, a novel web memory cache that dynamically manages memory across applications. Memshare provides a resource sharing model that guarantees private memory to different applications while dynamically allocating the remaining shared memory to optimize overall hit rate. Today's high cost of DRAM storage and the availability of high performance CPU and memory bandwidth, make web caches memory capacity bound. Memshare's log-structured design allows it to provide significantly higher hit rates and dynamically partition memory among applications at the expense of increased CPU and memory bandwidth consumption.

The hippocampus as a predictive map: We approach this puzzle from a reinforcement learning perspective: what kind of spatial representation is most useful for maximizing future reward? We show that the answer takes the form of a predictive representation. This representation captures many aspects of place cell responses that fall outside the traditional view of a cognitive map.

Stuff The Internet Says On Scalability For January 6th, 2017

High Scalability

Read more

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale

The Swedbank Outage shows that Change Controls don't work