hot links

Stuff The Internet Says On Scalability For June 17th, 2016

High Scalability

17 Jun 2016 — 17 min read

Hey, it's HighScalability time:

You've seen the Netflix Death Star microservices map. Here's a map of microbes conversing on your skin.If you like this sort of Stuff then please support me on Patreon.

4281: # of unread articles in my HackerNews feed; 23%: of all corporate cash is held by Microsoft, Apple, Google; 400 million: number of new servers needed by 2020; ~25,740TB: storage Backblaze adds per month; 3 bits: IBM stores per memory cell; 488 million: faked comments by China per year; 90%: revenue Spotify makes fron 30% of users; 780 million: miles of Tesla driving data; 4 days: median time to binge watch a season on Netlix; $33: cost of Nike Air Max; $50 billion: amount Apple has paid out to app developers; $270 million: amount Line makes from selling stickers; 4,600: # of trees Apple will plant aorund the Spaceship; 200 million: Google photos users; $1.8 billion: Series F round for Snapchat; 3x: capacity of the roadway with driverless cars; 138%: growth in Alibaba's cloud;

Quotable Quotes:
- @swardley: By this time next year, AMZN should be comfortably worth more than IBM + HPQ + HPE + CISCO + VMW + ORCL + NetApp + EMC combined.
- @pcalcado: The #serverless revolution, or as we call it in my hometown: "JavaScript folks finally find out about CGI"
- @kellan: Those patents you filed at Y! that would "only ever be used defensively"? Up for sale
- @mipsytipsy: oooo, the term FaaS (Functions as a Service) is WAY better than #serverless. Accurate, doesn't irritate me at all!
- Ian Eyberg: We need a new paradigm. We really need to be deploying software not the systems the software lives on. We need to be configuring software not the system that they are on. This is a new way of thinking.
- @amernetflix: 2 Million #PacketsPerSecond on a #aws public #cloudinstance.
- @antirez: /me hopes that because at Redis Conf there were already folks telling me “don’t bother too much with Docker, we are moving away”.
- @swardley: Asked "Do I not like Docker?" - I like Docker but I view containers as an important but for most, ultimately invisible subsystem ...
- @jeffjarvis: Before AMP, 51% of WaPost users returned in 7 days; after AMP that's up to 63% says David Merrell at #io16
- @HackerNewsOnion: How LinkedIn Scaled To Billions Of Unread Messages
- @nzben: 90% Domain Driven Design 10% Switch Statements
- @igrigorik: 0.3s latency improvement → £8M revenue gain for thetrainline(.com): http://bit.ly/1NiFzdP - great case study.
- @thegrugq: A monk asked Satoshi: “Why do you not sign a challenge msg?” Satoshi answered: “signing proves nothing!” The monk was englightened
- #bitkoans
- @johnrobb: Stanford: Apple could manufacture its products in the US with robotic assembly at same price as China now
- @Khanoisseur: Amazon changes prices 2.5 million times a day. Wal-Mart and Best Buy change prices 50,000 times over entire month.
- @Khanoisseur: Amazon pushes code live every 12 seconds and can test a feature on 5000 users by turning it on for just 45 seconds
- southpolesteve: I have one hard example I can share. We had a node service that was running on ec2 and cost ~$2500/mo. Moved the code directly over to lambda. Now ~$400/mo.
- Keith Chen~ The behavioural economist at UCLA said users are willing to accept surge pricing increases as high as 9.9 times the normal price of a ride if their smartphone's battery is close to dying.
- @Bill_Gross: With its always on cellular connection, every 10 hours Tesla gets a MILLION miles of additional data
- @timallenwagner: Nice example of converting batch system to real-time serverless analytics!
- @danielbryantuk: "In one study 35% of catastrophic failures were caused by what I call 'dev laziness' " @caitie #qconnewyork
- @dechampsgu~ "Improving Anything but The Bottleneck is Close to Meaningless" @ziobrando #dddx
- @heinrichhartman: "We [#SQLite] don't compete against Oracle, we compete against fopen(3)" Richard Hipp on @changelog https://changelog.com/201/ (47:00)
- @Pinboard: Any complex website is an interdependent ballet of dozens of mutually supporting services. You can’t reduce it to a word like “up” or “down”
- @kevinmarks: “The web is already decentralized,” Mr. Berners-Lee said. “The problem is the dominance of one search engine…
- CHARITY.WTF: I’ve seen what happens when application developers think they don’t have to care about the skills associated with operations engineering. When they forget that no matter how pretty the abstractions are, you’re still dealing with dusty old concepts like “persistent state” and “queries” and “unavailability” and so forth, or when they literally just think they can throw money at a service to make it go faster because that’s totally how services work.
- @beaucronin: A dollar-an-hour EC2 instance today would have been at or near the top of Top500 in the early/mid 90s
- @cloud_opinion: Is this the real ops? / Is this just devops? / Caught in a deploy / No escape from techdebt / Open your eyes / Look up to the cloud and see #serverles
- lostcolony: In context, I'm pretty sure it was just saying "Unlike computation, bandwidth, and memory size, we haven't seen much improvement in latency, and even if we focused on it, we have a very clear limit we can't get past".
- @codinghorror: yay, 32GB DDR4 ECC DIMMs have reached near price parity ($169). Time for our first 256GB RAM server.
- @TechCrunch: "1 billion people are using Chrome on mobile each month" http://tcrn.ch/206z4Me #IO2016
- @KevlinHenney: "For 40-80% of the jobs submitted to MapReduce systems, you’d be better off just running them on a single machine."
- @mjpt777: Contention is an amazing thing. Make one part of a system do more work and the overall system gains throughput due to reduced contention.
- @christianhern: In '99 Hal Varian estimated that we produced 1.5 billion gigabytes/yr. In '15 IBM stated 2.3 trillion gigabytes of data are created/day
- @neil_conway: If you're building a Mesos framework, watch this: "A Declarative Approach to Building Stateful Frameworks"
- Holstege: It’s estimated that we’re born with around 20,000 blood stem cells, and at any one time, around 1000 are simultaneously active to replenish blood
- courtf: Random stats from one of our busier [Golang] nodes, over the past 20 min: 40GB currently in use, 391GB allocated (and freed) in the last 20min average of 2.6ms GC pauses every 70.6s 95th percentile on GC time is 7.26ms
- @kevinmarks: #io16 @jasontitus: Firebase analytics works across android and IOS and is free and unlimited
- @caitie: "Bottlenecks in your System will Shape your Architecture" @alexras #qconnewyork
- ChuckMcM: Imagine a 1.2TB of this stuff [Intel Optane SSD] on the motherboard substituting for DRAM. So you've got every application and all the data for your applications already "in memory" as far as the chip is concerned. App switching? Instant, app data availability? instant. Quite a different experience than what we have today.
- @patrickdebois: #serverless talk summary - docker is an alternative to a zip file to bundle your function
- @robotterror: "Serverless" isn't about not USING servers—it's about not managing/maintaining servers. #Serverless
- @vambenepe: Firebase, BigQuery, Pub/Sub… GCP is *the* platform for #serverless if that's what true Cloud services are now called
- @etherealmind: Don’t underestimate how huge this Apple/Cisco partnership will be. Every iPhone is now Cisco IP Collab node.
- @lindvall: “doesn’t scale” is becoming a trigger for me. Believing that all operations must be scalable is just as damaging as doing everything by hand
- @patrickdebois: #serverless conf is the first conference where talking about #docker is not cool
- Steve Newman: Brute force is a viable approach to real-world, large-scale problems.
- Piotr Solnica: As a result of 9 freaking years of working with Rails and contributing like hell to many ruby OSS projects, I’ve given up. I don’t believe anything good can happen with Rails.
- sidcool: Google Photos update: 200 million monthly active users. 1 trillion labels assigned.
- rconti: No insight into what Amazon uses, but we've got HP DL980s (g7s, so they're OLD) with 4TB of RAM) and just started using Oracle x5-8 x86 boxes with 6TB of RAM 8 sockets. I believe 144 cores/288 threads.
- @WhatTheFFacts: Candy Crush has more active monthly players than the entire population of Canada.
- @jamesurquhart: One critical front in the great cloud wars…services that own core economic processes.
- @viktorklang: Service Locator is an anti-pattern, embracing reality, what you should have is a router.
- Henrique Bucher: Modern C++ has become a game of academic circlejerk and a big waste of time. The 20/80 rule concludes we should move to learn something more productive.
- @MarcWilczek: New #CIO Study: Public #cloud is expected to comprise 20% of total workloads by 2018, up from just 8% in 2014:
- Consultant32452: No, the real battle for the best AI is being fought on the various global stock exchanges. The vast majority of trades are AI now. In this way AI virtually already controls the price of all globally traded goods.
- @philip_pfo: Netflix nextgen arch for device facing logic- containers/process isolation, JS/node. @probst_kathrin #qconnewyork
- @danielbryantuk: Netflix are carving up AWS r3.8xl boxes (32 CPU, 244Gb RAM) for deploying containers via Titus @aspyker #qconnewyork
- mattetti: We've been running splice.com on Go for 3 years now and handle 5TB of audio/binary data per day. Our memory usage is around 10-15MB per server and the GC pause time has been really low. You do need to stream your IOs instead of reading everything in memory.
- c2.com: On his next walk with Qc Na, Anton attempted to impress his master by saying "Master, I have diligently studied the matter, and now understand that objects are truly a poor man's closures." Qc Na responded by hitting Anton with his stick, saying "When will you learn? Closures are a poor man's object." At that moment, Anton became enlightened.

You know all those elaborate A/B testing and new feature testing systems built into websites so companies can gather data and learn more about how their product works in the real world? It's not just for websites. Tesla Tests Self-Driving Functions with Secret Updates to Its Customers’ Cars. It appears by deploying actual cars with real drivers Tesla has the Big Data advantage when it comes to creating self-driving cars. Though like much of the modern world it's damn spooky: Tesla can pull down data from the sensors inside its customers’ vehicles to see how people are driving and the road and traffic conditions they experience. It uses that data to test the effectiveness of new self-driving features. The company even secretly tests new autonomous software by remotely installing it on customer vehicles so it can react to real road and traffic conditions, without controlling the vehicle.

Can you be a Libertarian and use the government as an amplification attack on an enemy? The Stunning and Expected End of Gawker.

What has Backblaze learned from one billion hours of hard drive statistics? Ross Lazarus does a deep dive in Survival analysis of hard disk drive failure data. Any conclusions? osolo: It clearly shows that HGST has an overall superior survival rate over time. WD is a distant second and Seagate in third (although the Seagate ST4000DM000 model is exceptional and fairs very well).

Jury: Google's Use Of Oracle's Java APIs Was Fair Use, Oracle Loses On All Counts. This is great of course, but if Google would have just ponied up for a license in the first place this whole potential disaster could have been avoided.

Interesting insight into how Backblaze works from brianwski: Our new B2 product is priced at half a penny per GByte per month which accurately reflects how much it costs for us to store your data including a small profit for us. So the $5/month is profitable for us up to a 1 TByte backup. We have about 25 customers with more than 50 TBytes in a $5/month backup, and yes, we lose money on them (which is FINE - they often recommend us to friends with less data). On the opposite end, we have about 20,000 customers with less than 20 GBytes backed up where we are massively profitable on those particular customers. Interestingly enough, my 84 year old father is in that demographic - no digital music, no digital movies, a few digital photos and a Quicken file. In between the 50 TByte and 10 GByte customers is a big bell curve with the bulk of our customers basically paying for their own backups.

Biology is the way to program chemistry. Bionic leaf turns sunlight into liquid fuel: The beauty of biology is it’s the world’s greatest chemist — biology can do chemistry we can’t do easily,” she said. “In principle, we have a platform that can make any downstream carbon-based molecule. So this has the potential to be incredibly versatile.

Apple takes on Google Photos with new Photos update. What is the power draw for this sort of thing? How can Apple do all this on device? iMore has some details. The A9 processor is powerful enough to create the index as images are captured. And it seems likely that Apple trains their deep learning models in the cloud and downloads a compact version on to the device, like Google Translate.

If you want to get the flavor of IBM's OpenWhisk Carl has a nice introduction in Start with serverless computing on IBM Cloud. Since IBM is behind, they went with an open source strategy, as one does. Some interesting bits: Swift support and the function is defined by a container. Nice intro: The Cloudcast #252 - Understanding IBM OpenWhisk. Also, Polyglot serverless computing using Docker and OpenWhisk.

Big memory is entering the cloud stream. X1 Instances for EC2 – Ready for Your Memory-Intensive Workloads. pritambarhate asks an excellent question: "Can PostgreSQL/MySQL use such type of hardware efficiently and scale up vertically? Also can MemCached/Redis use all this RAM effectively?" What do programs capable of using all this memory even look like? At $100,000 a year the cost is high, but nucleardog has a good take on that: If you need to do some analysis or computation on a massive data set once a month or something, it's going to be cheaper to pay $5k/yr (assuming you run for 24 hours a month and don't make use of the spot market) than to purchase and maintain the hardware and infrastructure.

Here's an impressive graph clearly showing the predictive power of Moore's Law. Brennan Peterson points out that ACM has a trends database with a lot more detail.

Will Containers Replace Hypervisors? Almost Certainly!: It’s now a container world for the next generation of cloud native applications and it’s only a matter of time before we get there. In the meantime, running containers on top of virtualization substrates is a common way to “just get started”, while the underlying technologies get better and better at running containers directly on bare metal. The modern datacenter will be relatively homogeneous, just like the web scale companies who brought us cloud computing.

Papers. We got your papers here: Introducing Research for Practice - An expert-curated guides to the best of CS research. It has the Peter Bailis seal of approval. The initial offerings: Data Centers are Changing the Way We Design Server Systems and NFV and Middleboxes. As further reference Justine Sherry in a great Packet Pushers episode Why We’re Stuck With Middleboxes And How To Improve Them.

Is Apple effectively immortal? Apple has $137.1 billion in cash reserves.

All 16+ hours of the Decentralized Web Summit - Live From The Internet Archive (Day 2) are online. Lots of interesting talkers: Mitchell Baker, Vint Cerf, Cory Doctorow, Brewster Kahle, Tim Berners-Lee. Can't watch? There's some coverage: The Web’s Creator Looks to Reinvent It. The big idea: So what might happen, the computer scientists posited, if they could harness newer technologies — like the software used for digital currencies, or the technology of peer-to-peer music sharing — to create a more decentralized web with more privacy, less government and corporate control, and a level of permanence and reliability?

The fatal attraction of algorithm-based feeds. You can not escape the power of controlling both sides of a market. Instagram Made a Huge Change--and It's Not the Logo.

Round 2 of the Microservices Practitioner Summit will be held online starting July 13th. Virtual Microservices Practitioner Summit.

How do developers make money with new app delivery paradigms like Instant Apps for Android? For some time iOS apps have become more of a container for components like Today widgets, Watch apps, action extensions, share extensions, etc. Siri will extend that trend. Looks like programmers are filling out the functionality of these devices without a lot of payback.

Andy Butcher with a good experience report from QCon London.

Many moons ago I suggested using a wireless network this within a rack and was laughed at. The world's first wireless satellite.

Who knew supercomputer cluster interconnects might shed light on how the brain works? The Four-Dimensional Brain?: From topology, a strong concept comes into play in understanding brain functions, namely, the 4D space of a ‘‘hypersphere’s torus’’, undetectable by observers living in a 3D world… Here we hypothesize that brain functions are embedded in a imperceptible fourth spatial dimension and propose a method to empirically assess its presence.

Serializable, Lockless, Distributed: Isolation in CockroachDB: We have now demonstrated how CockroachDB’s Isolation system is able to provide a serializable and recoverable transaction history in a completely distributed fashion. Combined with our atomic commit post, we have already described a fairly robust system for executing concurrent, distributed ACID transactions.

This takes logging to a whole new level. Creating a DNA Record with CRISPR: Researchers repurpose a bacterial immune system to be a molecular recording device.

Yep. Pushing Back: Over the last year I’ve become more and more convinced that possibly the most important feature of any queuing system is the ability to take action immediately upon enqueuing of a new item, where the action can modify the queue, and is based on state of the queue itself. Most commonly, this is referred to as back-pressure. But back-pressure can have several different forms, suited to different scenarios.

Netflix trying to petition the traffic shaping god's with a dead simple speed test. Just measures content download, no latency stats.

How do you make Apache Spark 10x faster? Whole-stage Code Generation: automatically generating this handwritten code at runtime... so the engine can achieve the performance of hand-written code, yet provide the functionality of a general purpose engine...Whole-stage code-generation techniques work particularly well for a large spectrum of queries that perform simple, predictable operations over large datasets.

Security is one of those really important topics where our talking is the inverse of our learning. DevOpsSec by Jim Bird is a new book that might help change that.

An excellent explanation: Understanding Consensus and Paxos in Distributed Systems.

Stanford computer scientists show telephone metadata can reveal surprisingly sensitive personal information: a new analysis by Stanford computer scientists shows that it is possible to identify a person’s private information – such as health details – from metadata alone. Additionally, following metadata “hops” from one person’s communications can involve thousands of other people.

How embarrassing. What happened to Austin, Texas, when Uber and Lyft left town. It seems people can adjust and manage to get by. Holding a city hostage is a tricky strategy.

Another excellent chapter. Implementing Queues for Event-Driven Programs: I’m speaking about queues. And not only just about “some” queue, but about queues which have certain properties desirable for our Reactors a.k.a. ad-hoc Finite State Machines a.k.a. Event-Driven Programs.

Netflix Application data caching using SSDs. Their current architecture uses clusters of caches across availablity zones, managed as autoscaling groups. The output of a single stage of a single day's personalization batch process can load more than 5 terabytes of data into its dedicated EVCache cluster. As an optimization they use a two level cache of memory and disk. To save money they moved to the i2 family of instances, with 10 times the amount of fast SSD storage as the r3 family, and they also downsize instances when possible. They added another layer of caching with their Rend and Mnemonic servers, written in Golang. Rend is a high-performance proxy written in Go with Netflix use cases as the primary driver for development. Mnemonic is a disk-backed key-value store based on RocksDB that is twice as fast for reads and up to 30 times faster for writes. Mnemonic is a RocksDB-based L2 solution that stores data on disk.

Looks like it might be useful. Microservices: From Design to Deployment, a Free Ebook from NGINX.

When I read this I flash on that town in The Prisoner where old spies are warehoused. Google has a secret 'bench' program that keeps executives at the company even when they're not leading anything.

THE ARCHITECTURE OF THE LEAGUE CLIENT UPDATE: The target architecture for the updated client now addresses our three original issues well. It’s an engine which allows multiple teams to independently deliver HTML5-based features without unnecessary dependencies, using supporting communications infrastructure to allow players to remain connected. It’s effectively a hosting engine for C++ microservices and JavaScript web apps, where the choice of plugins is personalized and dynamic, based on a player’s entitlements.

When segregation makes sense. The Strange Science of Why Airport Security Lines Spiral Out of Control: That’s why, Larson says, if you were to line up veteran travelers and novice travelers in queues of equal length, the veteran line would move in half the time. Those grizzled travelers, shoes off before they even arrive at the bins, introduce much less variability.

NVMe Over Fabrics Standard is Released: NVMf allows the new high performance SSD interface, Non-Volatile Memory Express (NVMe), to be connected across RDMA-capable networks. This is the first new built from the ground up networked storage technology to be developed in over 20 years.

Networking @Scale, May 2016 — Recap. Lots of excellent content from Microsoft, Google, Netflix, AT&T, Facebook, Comcast, Akamai, and JPL. How can you not love a talk with a title like this? Jet Propulsion Laboratory, Luther Beegle — "Networking Between Earth and Mars".

Sweet: Understanding caching in Postgres - An in-depth guide.

Yelp on Monitoring Cassandra at Scale. spodkowinski: it's an interesting approach. Monitoring the availability of replicas instead of individual nodes probably makes sense. However, I'm wondering how this information is actionable for your team.

A wonderful introduction: From radio waves to packets with software defined radio: Software-defined radio as a concept means that some or all of the components in radio that have traditionally been implemented in hardware (e.g. mixers, filters, amplifiers, modulators/demodulators) are now implemented in software.

I've wondered this. Could I still program if I went blind? I don't know if I could, but this guy is amazing. A Vision of Coding, Without Opening your Eyes.

Antikernel: A Decentralized Secure Hardware-Software Operating System Architecture: This work presents Antikernel, a novel operating system architecture consisting of both hardware and software components and designed to be fundamentally more secure than the state of the art. To make formal verification easier, and improve parallelism, the Antikernel system is highly modular and consists of many independent hardware state machines (one or more of which may be a general-purpose CPU running application or systems software) connected by a packet-switched network-on-chip (NoC). We create and verify an FPGA-based prototype of the system.

Unorthodocs: Abandon your DVCS and Return to Sanity. For most cases, yep.

FPGA design with CλaSH: The world’s finest imperative programming language is also useful for implementing in hardware.

This is cool. hamsternz/FPGA_Webserver: A work-in-progress for what is to be a software-free web server for static content.

E2: Achieving the right balance of power and performance for an application is challenging with today's multicore processors. E2 solves this problem by providing the capability for cores to dynamically adapt their resources during execution to provide highly efficient power/performance hardware configurations for a wide range of workloads.

Stanford Seminar - Rick Coulson of Intel: "The quest for low storage latency changes everything"

Resilience Engineering: The videos presented on this page are the product of a collaboration between Ohio State University's Cognitive Systems Engineering Laboratory, O'Rreiliy Media and the Resilience Engineering Association. Each film represents one segment of half-semester long course taught at Ohio State University by Dr. David Woods.

Could a neuroscientist understand a microprocessor?: Here we take a simulated classical microprocessor as a model organism, and use our ability to perform arbitrary experiments on it to see if popular data analysis methods from neuroscience can elucidate the way it processes information. We show that the approaches reveal interesting structure in the data but do not meaningfully describe the hierarchy of information processing in the processor. This suggests that current approaches in neuroscience may fall short of producing meaningful models of the brain.

Hopdata: a service that makes it easy for developers of all skill levels to use machine learning technology. HopData provides visualization tools and wizards that guide you through the process of creating machine learning (ML) models without having to learn complex ML algorithms and technology. Once your models are ready, HopData makes it easy to obtain predictions for your application using simple APIs, without having to implement custom prediction generation code or manage any infrastructure.

traefik.io: a modern HTTP reverse proxy and load balancer made to deploy microservices with ease. It supports several backends (Docker, Swarm, Mesos/Marathon, Kubernetes, Consul, Etcd, Zookeeper, BoltDB, Rest API, file…) to manage its configuration automatically and dynamically.

linkedin/ambry: a distributed object store that supports storage of trillion of small immutable objects (50K -100K) as well as billions of large objects. It was specifically designed to store and serve media objects in web companies. However, it can be used as a general purpose storage system to store DB backups, search indexes or business reports.

Stuff The Internet Says On Scalability For June 17th, 2016

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale