hot links

Stuff The Internet Says On Scalability For August 26th, 2016

High Scalability

26 Aug 2016 — 13 min read

Hey, it's HighScalability time:

The Pixar render farm in 1995 is half of an iPhone (@BenedictEvans)

If you like this sort of Stuff then please support me on Patreon.

33.0%: of all retail goods sold online in the US are sold on Amazon; 110.9 million: monthly Amazon unique visitors; 21 cents: cost of 30K batch derived page views on Lambda; 4th: grade level of Buzzfeed articles; $1 trillion: home value threatened by rising sea levels; $1.2B: Uber lost $1.2B on $2.1B in revenue in H1 2016; 1.58 trillion: miles Americans drove through June;

Quotable Quotes:
- @bendystraw: My best technical skill isn't coding, it's a willingness to ask questions, in front of everyone, about what I don't understand
- @vmg: "ls is the IDE of producing lists of filenames"
- @nicklockwood: The hardest problem in computer science is fighting the urge to solve a different, more interesting problem than the one at hand.
- @RexRizzo: Wired: "Machine learning will TAKE OVER THE WORLD!" Amazon: "We see you bought a wallet. Would you like to buy ANOTHER WALLET?"
- @viktorklang: "The very existence of Ethernet flow control may come as a shock" - http://jeffq.com/blog/the-ethernet-pause-frame/
- @JoeEmison: 4/ (c) if you need stuff on prem, keep it on prem. No need to make your life harder by hooking it up to some bullshit that doesn't work well
- @grayj_: Also people envision more than you think. Wright Brothers to cargo flights: 7 yrs. Steam engine to car: 7 yrs.
- David Wentzlaff: With Piton, we really sat down and rethought computer architecture in order to build a chip specifically for data centres and the cloud
- @thenewstack: In 2015, there was 1 talk about #microservcies at OSCON; in 2016, there were 30: @dberkholz #CloudNativeDay
- The Memory Guy: Now for the bad news: This new technology [3D XPoint] will not be a factor in the market if Intel and Micron can’t make it, and last week’s IDF certainly gave little reason for optimism.
- @Carnage4Life: $19 billion just to link WhatsApp graph with Facebook's is mundane. Expect deeper, more insidious connections coming
- Seth Lloyd~ The universe is a quantum computer. Biological life is all about extracting meaningful information from a sea of bits.
- Facebookk: To automate such design changes, the team introduced new models to FBNet in which IPs and circuits were allocated using design tools based on predefined rules, and relevant conﬁg snippets were generated for deployment.
- Robert Graham: Despite the fact that everybody and their mother is buying iPhone 0days to hack phones, it's still the most secure phone. Androids are open to any old hacker -- iPhone are open only to nation state hackers.
- oppositelock: I'm a former Google engineer working at another company now, and we use http/json rpc here. This RPC is the single highest consumer of cpu in our clusters, and our scale isn't all that large. I'm moving over to gRPC asap, for performance reasons.
- Gary Sims: The purposes and goals of Fuchsia are still a mystery, however it is a serious undertaking. Dart is certainly key, as is Flutter.
- @mjpt777: "We haven't made all that much progress on parallel computing in all those years." - Barbara Liskov
- @AnupGhosh_: Just another sleepy August: 1. NSA crown jewels hacked. 2. Apple triple 0-day weaponized. 3. Short selling vulnerabilities for fun & profit.
- @JoeEmison: Hypothesis: enterprises adopted CloudFoundry because at least it gets up and running (cf OpenStack), but now finding it so inferior to AWS.
- Robert Metcalfe: I predict the Internet will soon go spectacularly supernova and in 1996 catastrophically collapse.
- Alan Cooper~ Form follows function to Hell. If you are building something out of bits what does form follows function mean? Function follows the user. If you are focussing on functions you are missing the point.
- @etherealmind: I've _never_ seen a successful outsourcing arrangement. And I've work on both sides in more than 10 companies.
- @musalbas: Schools need to stop spending years teaching kids garbage Microsoft PowerPoint skills and teach them Unix sysadmin skills.
- Dan Woods: With data lakes there’s no inherent way to prioritize what data is going into the supply chain and how it will eventually be used. The result is like a museum with a huge collection of art, but no curator with the eye to tell what is worth displaying and what’s not.
- Jay Kreps: Unlike scalability, multi-tenancy is something of a latent variable in the success of systems. You see hundreds of blog posts on benchmarking infrastructure systems—showing millions of requests per second on vast clusters—but far fewer about the work of scaling a system to hundreds or thousands of engineers and use cases. It’s just a lot harder to quantify multi-tenancy than it is to quantify scalability.
- Jay Kreps: the advantage of Kafka is not just that it can handle that large application but that you can continue to deploy more and more apps to the same cluster as your adoption grows, without needing a siloed cluster for each use.
- @vambenepe: My secret superpower is using “reply” in situations where most others would use “reply all”.
- @tvanfosson: Developer progression: instead of junior to senior 1. Simple and wrong 2. Complicated and wrong 3. Complicated and right 4. Simple and right
- Maria Konnikova: The real confidence game feeds on the desire for magic, exploiting our endless taste for an existence that is more extraordinary and somehow more meaningful.
- gpderetta: Apple A9 is a quite sophisticate CPU, there is no reason to believe is not using a state of the art predictor. The Samsung CPU might not have any advantage at all on this area.
- Chetan Sharma: For 4G, we went from 0% to 25% penetration in 60 months, 25-50% in 21 months, 50-75% in 24 months and by the end of 2020, we will have 95%+ penetration. By 2020, US is likely to be 4 years ahead of Europe and 3 years ahead of China in LTE penetration. In fact, the industry vastly underestimated the growth of 4G in the US market. Will 5G growth curves be any different?

You know what's cool? A rubberband powered refrigerator. Or trillions of dollars...in space mining. Space Mining Company Plans to Launch Asteroid-Surveying Spacecraft by 2020. Billionaires get your rockets ready. It's a start: Weighing about 110 pounds, Prospector-1 will be powered by water, expelling superheated vapor to generate thrust. Since water will be the first resource mined from asteroids, this water propulsion system will allow future spacecraft–the ones that do the actual mining–to refuel on the go.

False positives in the new fully automated algorithmic driven world are red in tooth and claw. We may need a law. You know that feeling when you use your credit and you are told it is no longer valid? You are cutoff. Some algorithm has decided to isolate you from the world. At least you can call a credit card company. Have you ever tried to call a Cloud Company? Fred Trotter tells a scary story of not being able to face his accuser in Google Intrusion Detection Problem: So today our Google Cloud Account was suspended...Google threatened to shut our cloud account down in 3 days unless we did something…but made it impossible to complete that action...Google Cloud services shutdown the entire project...It is not safe to use any part of Google Cloud Services because their threat detection system has a fully automated allergic reaction to anything that has not seen before, and it is capable of taking down all of your cloud services, without limitation.

In the "every car should come with a buggy whip" department we have The Absurd Fight Over Fund Documents You Probably Don't Read. $200 million would be saved if investors got their mutual fund reports online instead of on paper. You guessed it, there's a paper lobby against it.

Rejoice all ye small technology driven developer-centeric startups. Martin Casado (Nicira co-founder/a16z) in Trends in — and the Future of — Infrastructure says infrastructure is not dead, it's transforming (like Autobots or Decepticons?). We are actually entering a golden era of infrastructure. IT is a $4 trillion industry that is being disrupted by The Cloud. The Cloud is "only" a $240 billion market right now, so there's a lot of room to grow. And the entire market is still growing. Incumbents usually don't make the transition to a new technology. There are three broad trends: software-defined movement; software to services; rise of developers. Infrastructure is following the same pattern as consumer goods, being built purely in software instead of a box + software. Software only players exist in compute (mesosphere, databricks), network (cumulus, instalogic, nicera), storage (actifio, alluxio), security (illumio, pindrop). Software only startups require less funding, can focus on just building software, and are easier to deliver to the customer. Software in the field is hard to support, upgrade, administer, charge for, etc, so software as a service, even as core infrastructure, is the way to go. Since infrastructure is moving to software, developers now control everything. Which is an opportunity. Win over developers and you win the day. Almost. Developers often don't control budget so enterprise sales still require a direct sales team. It's a new era.

Which do people like to reinvent more: serialization formats; languages; or web frameworks? gRPC: a true internet-scale RPC framework is now 1.0 and ready for production deployments. Why choose gRPC over HTTP/JSON? gorset: The performance is good, and it's nice to have proto files with messages and services, which acts both as documentation and a way to generate client and server code. Protobuf is much faster, produces less garbage and is easier to work with than JSON/jackson. The generated stubs are very good and it's easy to switch between blocking and asynchronous requests, which still only require a single tcp/ip connection.

Apple would really really like you to know they're hip to AI too. The Brain is Here and it's Already Inside Your Phone: "An exclusive inside look at how artificial intelligence and machine learning work at Apple." To be effective AI needs to get inside your OODA loop. Is that consistent with privacy?

It can happen to the best of us. Google App Engine Incident #16008: During this procedure, a software update on the traffic routers was also in progress, and this update triggered a rolling restart of the traffic routers. This temporarily diminished the available router capacity.

Groupon explains how they Process Payments at Scale. This is one case where you can't go "dude, that's no big deal, that's like only half a transaction a second, that could run on my toaster." Groupon has beat you to it. Groupon processed $1,492,882,000 of gross billings for Q2 2016. That amounts to 7.5 payment transactions per second, with a peak of 12.5 payment transactions per second. They use Kill Bill, an open-source billing and payments system, on 7 VMs, sharing a single MySQL database (dedicated hardware, running SSD, typical master/slave replication). To make this work MySQL is not used has a general database. They allow no DELETEs, no UPDATEs (except for two phase-commit scenarios), no JOINs, and no stored procedures. So each node needs to process only about 1 or 2 payments per second. Nice.

A timeline of technology changes at Twitter: 2012 - SSDs become the primary storage media for our MySQL and key/value databases; 2013 - Our first custom solution for Hadoop workloads is developed, and becomes our primary bulk storage solution; 2013 - Our custom solution is developed for Mesos, TFE, and cache workloads; 2014 - Our custom SSD key/value server completes development; 2015 - Our custom database solution is developed; 2016 - We developed GPU systems for inference and training of machine learning models.

Scripting the cloud. 30K Page Views for $0.21: A Serverless Story. Good explanation of a cron driven batch pipeline that calls lambda functions to scrape data that is saved into a series of S3 buckets to drive the rest of the pipeline. The development experience using Java seems less than optimal: slow loading times, high memory usage, and huge jar files. Some other negatives: lack of parallel S3 lambda triggers and exceptions should be events. A commenter pointed out can be solved using SNS. The downside of SNS is it's just another layer to manage. Good discussion on HN. You can get a VPS for $5 a month so one criticism is Lambda is actually expensive, especially as volume goes up and you get out of the free tier. Also, Serverless Architecture with Mike Roberts.

A very thorough overview: Hot Chips 2016: Memory Vendors Discuss Ideas for Future Memory Tech - DDR5, Cheap HBM, & More. The future looks big, persistent, cheaper, but maybe not as fast and not as soon as we would like.

Ex post facto is the idea that you can't be found guilty of a law passed after an action occurred. Ex post facto laws are expressly forbidden by the United States Constitution. Maybe the same restriction needs to apply to collected data as well?

In the "to fix a problem by adding a scheduling layer department" we have Solving network congestion: "MegaMIMO system from the Computer Science and Artificial Intelligence Lab speeds data transfer by coordinating multiple routers at the same time." It can "transfer wireless data more than three times faster than existing systems while also doubling the range of the signal."

A great discussion on private clouds. A Reality Check on “Everyone’s Moving Everything To The Cloud”. There's a lot of folks out there who want you to know private clouds are a thing and not everyone is hitching up their wagon and emigrating to the public cloud. Greg Ferro often talks about the need for IT to reskill. To quit outsourcing and hire people that know what the heck they are doing. If you do that then a private cloud can work for you. Otherwise I'll see you on the trail.

You need load balancing to prevent single points of failure. How do you load balance over your microservices? Convox explains how they use AWS ALB as a Container and Microservice Load Balancer: for $16/month, a single ALB can serve HTTP, HTTP/2 and websockets to up to 10 microservice backends...Conceptually ALB has a lot in common with ELB. It’s a static endpoint that’s always up and will balance traffic based on it’s knowledge of healthy hosts...But ALB introduces two new key concepts: content-based routing and target groups...Out of the box ALB does all the things we need for a modern microservice application: Content Based routing, Native HTTP/2 support, Native Websocket support, CloudWatch Metrics integration for service monitoring and alerting, EC2 Container Service (ECS) integration for managed container orchestration, AWS Certificate Manager (ACM) integration for free SSL certificates.

After 14 months of work High-Performance Java Persistence – Part Three is available. Looking at the Table of Contents and the free intro text it looks to be of good quality and quite comprehensive.

Chris Richardson thinks of a pattern language as a better way to discuss technology. A Pattern Language for Microservices.

Here are the Best Paper Award Winners for SIGCOMM 2016. You be interested to know progress is being made on making truly embedded devices...Inter-Technology Backscatter: Towards Internet Connectivity for Implanted Devices: "Finally, we build proof-of-concepts for previously infeasible applications including the first contact lens form-factor antenna prototype and an implantable neural recording interface that communicate directly with commodity devices such as smartphones and watches, thus enabling the vision of Internet connected implanted devices." Also, this kind of development, PISCES: A Programmable, Protocol-Independent Software Switch, is very interesting for the Software is Eating the World paradigm. You can't very well have software defined everything resting on a bed of unchangable ASICs.

Interesting thought process explained by Netflix in Engineering Trade-Offs and The Netflix API Re-Architecture. The Netflix API has grown and needs to be refactored somehow while balancing conflicting engineering principles: velocity and full ownership vs. maximum code reuse and consolidation. They've decided to split the API in two. Solomon would approve. Each team will own and operate in production what they build and own their release schedule. The price? Probable lack of component reuse; API drift; potential lack of team skills as one team needs to build the experience to create a resilient system that scales. There's no doubt that there's no perfect solution.

Images are the new text. Almost. We're still looking for a standard regex style package for images (also sound and movies). We are getting closer. Witness Facebook open sourcing acebookresearch/deepmask. It let's you use machine vision to detect and precisely delineate every object in an image and label each object mask with the object type it contains (e.g. person, dog, sheep). It's all explained here: Segmenting and refining images with SharpMask. See also, Text summarization with TensorFlow.

Adding a pinch of neural nets to make a product better even applies to chips. 'Neural network' spotted deep inside Samsung's Galaxy S7 silicon brain: If your CPU can predict accurately which instructions an app is going to execute next, you can continue priming the processing pipeline with instructions rather than dumping the pipeline every time you hit a jump...The neural net gives us very good prediction rates.

If you read some network box is going to get much faster it's probably because they are moving their network processing over to DPDK.

Gartner has rated Critical Capabilities for Operational Database Management Systems. There wasn't a clear overall winner. Select a product to fit your own needs.

This book might be interesting. The DevOps 2.0 Toolkit: Automating the Continuous Deployment Pipeline with Containerized Microservices using tools like Docker, Ansible, Ubuntu, Docker Swarm and Docker Compose, Consul, etcd, Registrator, confd, Jenkins, and nginx.

Princeton Piton Processor: a many-core designed by Prof. Wentzlaff's research group in March, 2015. It was taped-out in IBM's 32nm SOI process. Some of Piton's features: 25 modified OpenSPARC T1 cores; Directory-based shared memory; 3 On-chip networks; Multi-chip shared memory support; 1 GHz clock frequency; IBM 32nm SOI process (6mm*6mm); 460 million transistors.

OpenLambda: an open-source serverless computing platform. pmontra: So, after having read the PDF I eventually realized that this is not a tool to cross develop and deploy to the three lambda clouds, but to self host your lambdas on your servers.

Marten: Postgresql as Document Db & Event Store for .Net Development. To learn all about it's wonderfulness take a look at Moving from RavenDb to Marten.

facebookresearch/fastText: fastText is a library for efficient learning of word representations and sentence classification.

NASA research is now available on PubMed Central. I have a few plum trees so I found this quite useful: Dried plum diet protects from bone loss caused by ionizing radiation. Just in case we have a really bright future.

Dynamic Branch Prediction with Perceptrons: This paper presents a new method for branch prediction. The key idea is to use one of the simplest possible neural networks, the perceptron, as an alternative to the commonly used two-bit counters. Our predictor achieves increased accuracy by making use of long branch histories, which are possible because the hardware resources for our method scale linearly with the history length.

Spoofing 2D Face Detection: Machines See People Who Aren't There: we show that it is possible to construct images that fool facial detection even when they are printed and then photographed.

Robotron: Top-down Network Management at Facebook Scale: we present Robotron, a system for managing a massive production network in a top-down fashion. The system's goal is to reduce effort and errors on management tasks by minimizing direct human interaction with network devices. Engineers use Robotron to express high-level design intent, which is translated into low-level device configurations and deployed safely. Robotron also monitors devices' operational state to ensure it does not deviate from the desired state. Since 2008, Robotron has been used to manage tens of thousands of network devices connecting hundreds of thousands of servers globally at Facebook.

Stuff The Internet Says On Scalability For August 26th, 2016

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale