hot links

Stuff The Internet Says On Scalability For February 26th, 2016

High Scalability

26 Feb 2016 — 12 min read

Wonderful diagram of @adrianco Microservices talk at #OOP2016 by @remarker_eu If you like this sort of Stuff then please consider offering your support on Patreon.

350,000: new Telegram users per day; 15 billion: messages delivered by Telegram per day; 50 billion suns: max size of a black hole; 10,000x: lower power for Wi-Fi; 400 hours: video uploaded to YouTube every minute;

Quotable Quotes:
- sharemywin: I don't think consensus scales. So, I think they'll be an ecosystem of block chains.
- @aneel: "There is no failover process other than the continuous dynamic load balancing."
- Jono MacDougall: If you are happy hosting your own solution, use Cassandra. If you want the ease of scaling and operations, Use DynamoDB.
- @plamere: Google’s BigQuery is *da bomb* - I can start with 2.2Billion ‘things’ and compute/summarize down to 20K in < 1 min.
- Haifa Moses: We’re evaluating a totally new software model that allows us to automatically diagnose if a failure occurs during a mission and for messages to be displayed for flight controllers on the ground
- @fmbutt: IBM abstracted analog calculation. MS abstracted HW. Goog abstracted SW. Powerful Mobile AI could abstract clouds.
- Jon Grall: Essentially, there’s a massive oversupply of apps, and the app markets are now saturated and suffering from neglect and short-term thinking by the companies who operate them.
- jhgg: At work we moved to GCE at the beginning of this year, from Linode after they were having stability issues over the christmas break. No complaints from us. So far have been very happy with it. We were considering moving to AWS, but to realize the same pricing as GCE we'd have to purchase reserved instances - the sustained usage discounts have been huge for us.
- Brave New Geek: Python and App Engine were fast. Not like “this code is f*cking fast” fast—what we call performance—more like “we need to get this sh*t working so we have jobs tomorrow” fast—what we call delivery.
- Ben Stopford: utilizing a distributed log as a backbone has some pretty cool side effects. Add a bit of stream processing into the mix and it all gets a little more interesting still.
- @mtnygard: Even my game console is a distributed system. Controllers have their own CPUs and talk over a bluetooth network.
- @sustrik: if select{} is expected to be used, we won't have nice object APIs. The channels will dangle out of the open gut of the object making attempts to hide internals of the object behind nice function-based API futile
- @kelseyhightower: So tired of complex, on-disk, config files. Simple key/value does the trick. Pull configs from remote source or env, then cache locally.
- tmckn: Bigger blocks are not a scaling strategy
- api: Seems to be a pattern: wimpy mobile endpoint devices drive everything to the cloud.
- nrh: Containers are a part of a long-term strategy to get away from puppet and onto more idempotent units of deployment, and move a lot of what is considered to be "configuration" back into the build process where it belongs.
- @kylembrandt: Some of our @OpenTSDB stats: 3.7 billion datapoints a day (43k a second) per cluster at ~8GB a day un-replicated storage (3x replicated).
- @BenBajarin: Chinese smartphone brands took 40% of global shipment share in 2015. This likely to be well over 50% in 2016. Eventually 60%.
- Steve Shogren: Pop Culture Architecture is the current “fad” of the day. I have seen it be microservices, business capabilities, CQRS, service-oriented architecture, Domain-Driven Design, test driven development, ORMs, ActiveRecord, and MVC. Each of these have been fashionable at some point.
- Marian Dvorsky: Lately, we've [Google] been instead focusing on building systems that make tuning for the most part unnecessary. For example, Dataflow figures out the number of shards automatically (and dynamically reshards as necessary) which takes the manual guesswork out of the picture.

You have to love the datacenter of the future. Data is stored in the DNA of seeds. Compute inhabits electronic plants using xylem, leaf, and vein in the creation of digital organic electronic circuits. Instead of walking into a cold dead datacenter we'll frolic in an uplifted Garden of Eden.

Relying APIs is like building bridges and skyscrapers out of materials that constantly change their properties. Just Landed is Shutting Down: Since Just Landed launched in 2012, the cost of running the service has steadily increased over time. While flight data remains expensive, the real source of the cost increases has been adapting to the demise or restructuring of supporting services such as StackMob, UrbanAirship, and Bing Maps that Just Landed previously relied on. Traffic and mapping data in particular, much of which used to be free, has become quite expensive, and is now tightly controlled by big companies under oppressive Terms of Service.

With Spotify moving to the Google Cloud Platform it looks like Google may have found a friendly marketing face to play the same role Netflix plays for AWS. Why make the move? nrh: Spotifier here. Frankly, price is not the biggest factor in a decision like this. If we were going for the lowest cost cloud option, it probably wouldn't be either AWS or Google - there are other providers who are hungrier for business that would be willing to do deep cuts at our scale. The way we think about this is that there are basically two classes of cloud services: commodities and differentiated services. Commodities are storage/network/compute, and the big players are going to compete on price and quality on these for the foreseeable future (as with most commodities). The differentiated services stuff is a bit more interesting. Different players have different strengths and weaknesses here - AWS has way, way better capabilities when it comes to administration and access control and identity management, for example (which is actually pretty important when trying to do this in a large org). The places were Google is strong (data platform) are the places that are most important for us as a business. Compelling: dataproc+gcs, bigquery, pubsub, dataflow Made it safe: high-enough quality, cheap enough.

This is serious design from Apple. Security Now 548 DDoS Attack Mitigation~ Every single update for an iOS device is custom made by Apple in response to a request from the device. It's signed with Apple's private key in order for the device to accept it. No iOS update can be mass distributed. An update can only be used one time on one device.

Uber is liking themselves some Go. How We Built Uber Engineering's Highest Query per Second Service Using Go. Golang was chosen instead of node.js because: Geofence lookups are required on every request from Uber’s mobile apps and must quickly (99th percentile < 100 milliseconds) answer a high rate (hundreds of thousands per second) of queries; Geofence lookups require CPU-intensive point-in-polygon algorithms. While Node.js works great for our other services that are I/O intensive; Because Node.js is single threaded, background refreshing can tie up the CPU for an extended period of time. The result: 99.99% uptime; this service handled a peak load of 170k QPS with 40 machines running at 35% CPU.

It really is. Someone is pretending to be the IT guy at Hogwarts and it's hilarious. "It took them until 2016, but both students and staff alike have finally caved and demanded that their cell phones work on school grounds, and with that request they had to find a “muggle” (a term I’m quickly learning to detest) to install wifi and maintain any technology that functions on school grounds."

Before you poo poo on the fly code generation remember the brain uses synaptic pruning as a tuning mechanism. For example, Japanese babies can hear the distinction between "r" and "l", although only the "r" sound exists in Japanese, but cannot by 12 months of age.

Facebook on the future of networking and their Telecom Infra Project: It is worth noting that a major transformation has happened in the data center world over the last six years. Optical networks, largely fueled by advancements in densification of DSP chips, have exploded in bandwidth — by ~20x — for the same cost. In 2010 at Facebook we had 1 Gbps links to each server with 2 Gbps of bandwidth per rack of 40 servers. Later this year we expect to deploy 25 Gbps links to each server and 400 Gbps of bandwidth per rack. This dramatic change has made an impact on the way we design almost all of our services and allowed us to realize large cost savings while simplifying the components involved in web service, storage, and databases. As more mobile operators move more of their infrastructure into larger data centers and deploy the newest optical technologies widely, we should see opportunities to level off or reduce spend while greatly increasing capacity.

Google seeks new disks for data centers. Google's storage needs continue to grow exponentially, with a 10x increase every five years. Since the key to scale is specialization, Google wants to create new disk drives "that are a better fit for data centers supporting cloud-based storage services." Disks for Data Centers is a white paper on the subject.

The great Golang talk on high performance that never was. So You Wanna Go Fast?: Lastly, we look at performance with contention on both the reader and writer. Again, the ring buffer’s performance is much worse in the single-threaded case but better in the multithreaded case; Using defer is almost five times slower in this test; Code-generated JSON is about 38% faster than the standard library’s reflection-based implementation; In this benchmark, we compare allocating structs in 10 concurrent goroutines on the heap vs. using a sync.Pool for the same purpose. Pooling yields a 5x improvement;

AMP is now sounding a little more like an alternet Internet. A Q&A with Google’s head of news Richard Gingras on its vision for the Accelerated Mobile Pages project: AMP is not just about news and not just about articles. That was our initial focus. I see applications across a whole spectrum of web experiences, from e-commerce sites to the landing pages for an ad. It’s interesting if you study ad performance, that if I click on an ad, how important getting quickly to the next step is. If you could somehow collapse that to be instantaneous, then ads will be more effective. There are lots of potential areas that are succeeding, and some that are not, and we’re excited about all these next steps.

Videos from Microservices Practitioner Summit are available. You might like Don't Build a Distributed Monolith or Scaling Uber from 1 to 100s of Services.

Here's some unsubstantiated speculuation on Telegram's operations: Spending on telegram's servers only is ~1M/month...you need to understand that telegram rolled not only their own crypto, but their own DB, PHP compiler and so on. Everything is written in C++. To solve one of the earliest DDOS attacks telegram's team take their servers put it on the track and install them to other DC. This is insane.

Autonomous cars only ease traffic when paired with smart lights: But giving intelligence to traffic lights as well created a different picture. In this scenario, Gershenson and Zapotecatl saw an improvement of 200 per cent in traffic flow compared to the situation with human drivers and ordinary traffic lights. Their smart lights were very simple: each set had sensors that could detect how many cars were approaching on each street and give priority to those with more traffic.

Interesting story of Building Software at Etsy. Lots on the process, engineering practices, and engineering principles.

Ryan Adams with an awesome muggle accessible technical explanation of AlphaGo on the Machine Learning Music Videos episode of the Talking Machines podcast. A number of different techniques were combined to reduce the number of available moves at any given moment and rapidly search the depth of the tree of potential moves. To reduce the number of potential plays they looked at 30 million plays made by high level human players. So humans still have some use.

Google Genomics has released a dataset that allows you to see how genetic variation is shared among individuals in 26 populations across the world. A Google Dataflow pipeline computed an analysis on over 5 trillion pairs of variants for each of 31 population groupings of the 2,504 individuals in the 1000 Genomes data set.

Very far from simple, but the MySQL autopilot pattern might save 40-50% over RDS. The autopilot pattern: is an approach to application and infrastructure design that pushes automation for each component of our systems into the application. Each container that makes up the application has its own lifecycle, and we package those lifecycle behaviors into the application container rather than relying on external infrastructure.

How a team of five created a next-gen MMO with Unity and SpatialOS. Gorgeous MMO built from millions of individually simulated entities with rich behaviour based on simple rules. It's based on an Entity-Component-Worker: Everything in the world (players, rocks, creatures) is an entity, the core building block of a simulation. Entities are made of components, which describe their different aspects (Health, Physical) and can be combined freely. These behaviours are not run by a single server but by workers, a swarm of compute resources dynamically allocated to cope with the changing workload of the simulation. These workers can be game engines such as Unity.

Videos from Container Summit New York 2016 are available.

Good answers to What in your opinion is the best way to scale Wordpress horizontally? An Nginx Web Server handles static content requests, all vhost routing, our Redis cache, and load balancing for php requests; We are hosted on Amazon, and use Elastic Beanstalk so it's quite easy to scale up and down based on demand and traffic; We actually use NFS mounts for uploads directory. No need to worry about synchronizing with rsync and there's no need to designate an "admin" or master/slave - all VMs are equals which makes on-demand scaling much more reasonable.

On-chip random key generation done using carbon nanotubes. david_in_oregon: this is a PUF (Physically Unclonable Function)...The work is very interesting to people who are interested in PUFs, but it doesn't constitute something product ready and widely understood to be secure.

This reminds me a lot of how programmers work. Chefs have an idea of Mise en place, a "French phrase that means to gather and arrange the ingredients and tools needed for cooking." But it's not merely an idea: "It really is a way of life ... it's a way of concentrating your mind to only focus on the aspects that you need to be working on at that moment, to kind of rid yourself of distractions."

Analyse city traffic from webcams through Google Cloud Vision API. Person of Interest is just around the corner, but as spookey as this is it's a great test of Google's Cloud Vision API.

Same as for a startup? Algorithm for making a hit film: create a child-friendly superhero film with plenty of action and scope for turning it into a franchise; set your budget at an impressive but not reckless $85m; convince a major studio to distribute it on wide release in the summer; cast two lead actors with a solid but unspectacular box-office history, who are thus not too expensive.

Not sure this is the architecture of the future, but it's a different way of thinking about building things. 7 Cool Decentralized Apps Being Built on Ethereum: Etheria is a Minecraft-like virtual world in which players can own tiles, 'farm' them for blocks and build things. According to the project website, the "entire state of the world is held in and all player actions are made through the decentralized, trustless Ethereum blockchain". Until now, it points out, all virtual worlds have been controlled by a single entity. All aspects of Etheria, on the other hand, are "agreed to" by the participants of the Ethereum network without central authority.

A helpful way of looking at the decision process. Splunk vs the Elastic Stack – Which Tool is Right For You? : is it a closed / constant problem or you’re expecting it to grow with additional use cases over time; Splunk has been traditionally on-prem, serving large enterprises, and that’s where it puts most of its focus, with easily customized solutions for a big set of use cases. ELK is all over the place, and its success depends on how much effort you’ll put in.

Definitely PRish, but the scale of broadcasting the Olympics is impressive. Scaling Live Streaming Olympics with Varnish: nearly two billion page views; 159 million video streams; 25,000 transactions per second; 2.8 petabytes of data delivered in a single day; 106 million requests for live and on-demand Olympic video content.

The Reasons Why Mobile Games Don't Go Viral Abroad: #1 No Plan For Worldwide Promotion; #2 Disregard Localization From The Beginning; #3 Not Paying Attention To Local Game Distribution Channels; #4 Unclear Monetization Strategy; #5 Not Enough Testing.

lewisl9029 with a great list of p2p-ish libraries.

onyx-platform/onyx: a masterless, cloud scale, fault tolerant, high performance distributed computation system; batch and stream hybrid processing model; exposes an information model for the description and construction of distributed workflows; Competes against Storm, Cascading, Cascalog, Spark, Map/Reduce, Sqoop, etc; written in pure Clojure.

openwhisk/openwhisk: a cloud-first distributed event-based programming service. It provides a programming model to upload event handlers to a cloud service, and register the handlers to respond to various events.

snipsco/ntm-lasagne: a library to create Neural Turing Machines (NTMs) in Theano using the Lasagne library. If you want to learn more about NTMs, check out our blog post.

Maglev: A Fast and Reliable Software Network Load Balancer: Maglev is Google’s network load balancer. It is a large distributed software system that runs on commodity Linux servers. A single Maglev machine is able to saturate a 10Gbps link with small packets. Maglev is also equipped with consistent hashing and connection tracking features, to minimize the negative impact of unexpected faults and failures on connection-oriented protocols.

ZeroDB white paper: ZeroDB is an end-to-end encrypted database that enables clients to operate on (search, sort, query, and share) encrypted data without exposing encryption keys or cleartext data to the database server. The familiar client-server architecture is unchanged, but query logic and encryption keys are pushed client-side.

Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples: We introduce the attack strategy of fitting a substitute model to the input-output pairs in this manner, then crafting adversarial examples based on this auxiliary model. We evaluate the approach on existing DNN datasets and real-world settings. In one experiment, we force a DNN supported by MetaMind (one of the online APIs for DNN classifiers) to mis-classify inputs at a rate of 84.24%.

linkerd: is our open-source RPC proxy for microservices. It's built directly on Finagle, and is designed to give you all the operational benefits of Twitter's microservice architecture—those many lessons learned over many years—in a way that's self-contained, has minimal dependencies, and can be dropped into existing applications with a minimum of change.

Stuff The Internet Says On Scalability For February 26th, 2016

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale