Stuff The Internet Says On Scalability For September 30th, 2016

Hey, it's HighScalability time:

Everything is a network. Map showing the global genetic interaction network of a cell.

If you like this sort of Stuff then please support me on Patreon.

  • 18: Google can now drink and drive in Washington DC.; $10 billion: cost of a Vision Quest to Mars; 620 Gbps: DDoS attack on KrebsOnSecurity; 1 Tbps: DDoS attack on OVH; $200,000: cost of a typical cyber incident; 8 million: video training dataset labeled with 4800 labels; 180: Amazon warehouses in the US; 10: bits of info per photon; 16: GPUs in new AI killer P2 instance type;

  • Quotable Quotes:
    • @markmccaughrean: 1,000,000 people to Mars in 100 yrs. 10 people/launch? That's 3 a day, every day, for a century. 1% failure rate? One explosion every month
    • @jeremiahg: Any sufficiently advanced exploit is indistinguishable from a 400lb hacker.
    • BrianKrebs: I suggested to Mr. Wright perhaps a better comparison was that ne’er-do-wells now have a virtually limitless supply of Stormtrooper clones that can be conscripted into an attack at a moment’s notice.
    • Sonia: Academia’s not-so-subtle distain for applied research does more than damage a few promising careers; it renders our field’s output useless, destined to collect dust on the shelves of Elsevier. 
    • Monica L. Smith: Nobody builds their own infrastructure. You don’t build your own highway, train line, water pipe, your own sewer. Those are things that connect you and your household to everybody else sequentially in your neighborhood, in your region, from the city out into the broader hinterlands.
    • @olesovhcom: This botnet with 145607 cameras/dvr (1-30Mbps per IP) is able to send >1.5Tbps DDoS. Type: tcp/ack, tcp/ack+psh, tcp/syn.
    • kenrose: We see this pattern at PagerDuty over the majority of our customers. There is a definite lull in alert volume over the weekends that picks up first thing Monday morning.It's led to my personal conclusion that most production issues are caused by people, not errant hardware or systems.
    • @rseroter: "We Crammed this Monolith Into a Container and Called it a Microservice"
    • @mweagle: I really don’t want to run my own k8s in AWS, but ECS is so opaque to debug that k8s seems like a good choice.
    • Werner Vogels~ We have this overarching goal which is customer centricity. Doing anything that benefits the customer gets priority above everything else. Working on eliminating all single points of failure in the company purely benefits the customer because it really improves the customer experience.
    • Cory Doctorow~ The thing open source software had going for it was the Ulysses Pact...the  irrevocable license, the failure mode of open source software, having founded an open source software company, I can tell you there are moments where it feels like your survival turns on being able to close the code you had opened when you were idealistic. There are moments of desperation when that happens. 
    • @lightbend: "We've been using #Akka in production for over two years, without a single crash." -@CruiseNorwegian |
    • @cloud_opinion: Monolithic -> Microservices -> "which container image?" -> "Screw it, lets do PaaS" ->  CF  or AWS?
    • Etsy: concurrency proved to be great for logical aggregation of components, and not so great for performance optimization. Better database access would be better for that.
    • Yaniv Nizan: the number of users actually contributing ad revenue in your app is a lot lower than 6.5% and much closer to the 1% or 2% that contribute revenue from In-app purchases. 
    • @reckless: Elon is basically putting on an Apple event, for going to Mars.
    • @potch: DRY: Don't Repeat Yourself / DAMP: Do Abstraction/Minimalism Pragmatically / MOIST: Maybe Only Innovate Some Times?
    • @dannysullivan: In the Facebook video metrics thing, spare a thought for the poor BuzzFeed watermelon, less viral than it thought :)
    • Addison Snell: If the promise of cloud computing is overblown, it because of the amplification it gets from its loyal converts, enterprises who have found liberation and agility in outsourcing IT. 
    • @psaffo: In 1990, the size of the US software industry was $3.2 billion -- the same size as the gourmet popcorn industry in that same year.
    • David Rosenthal: [Storage] Revenues are flat or decreasing, profits are decreasing for both companies. These do not look like companies faced by insatiable demand for their products; they look like mature companies facing increasing difficulty in scaling their technology.
    • @legind: Let's Encrypt now the 3rd largest CA, after Comodo and Symantec, comprising over 13% of the SSL cert market share 
    • @stewartbrand: “In the long run, the technology driving activities in space will be biological.” Rousing essay by Freeman Dyson.
    • @jessitron: Constructing causal ordering at the generic level of "all messages received cause all future messages sent" is expensive and also less meaningful than a business-logic-aware, conscious causal ordering. This conscious causal ordering gives us external consistency, accurate legibility, and visibility into what we know to be causal.

  • In an article light on details, written more with a marketing flourish, we still learn some interesting details on the infrastructure behind Pokemon Go. Bringing Pokémon GO to life on Google Cloud. It runs on Google Cloud, Kubernetes, Google Container Engine, HTTP/S Load Balancer, and Cloud Datastore. Keep in mind Alphabet is invested in Niantic and Ingress, the forerunner of Pokemon Go, ran on App Engine. So it sounds like a new backend implementation that had to scale from zero to the size of Twitter in a matter of weeks, with a much more complicated work load. Growth was explosive. Player traffic was 50x larger than initial estimates. An implication is the problems experienced during launch were not infrastructure related. Google, in the form of Customer Reliability Engineer (CRE), worked closely with Niantic to make sure the infrastructure scaled. The problems must have been elsewhere in the application stack, which is perfectly understandable. That sort of load could not have been predicted. The design decisions you make for 5x expected traffic are very different than they are for 50x. Nobody will spend the money or take the time to build a system for 50x. Nobody. Lots of good comments on HackerNews. Good question by ksec, would Poekemon Go even be possible in a pre-cloud era? 

  • Where do you turn when your second and third tier super heroes cry no mas? Google and their Project ShieldThe Democratization of Censorship. When Akamai shut down Brian Krebs' website in response to an unprecedented DDoS attack, Google stepped up and executed a classic Roman tortoise formation, using their the their massive infrastructure designed to protect independent news sites as a shield from DDoS attacks. You can't really blame Akamai, they said a sustained attack would cost them millions of dollars. Being a Hero is not profitable. The Internet was never designed to protect itself against attacks so almost no entity can protect themselves from the Spanish Inquisition. Let's just hope that in the same way bad actors in the ad space prompted Google to create AMP, that a constant stream of crippling DDoS attacks won't cause all websites to seek shelter under Google's mighty shield.

  • How YouTube Reinvented Itself for the Next Billion Users. When you read how Google redesigned YouTube to work in places without great networks and top of the line hardware it reads a lot like how Facebook has approached their mobile apps. How Facebook Makes Mobile Work At Scale For All Phones, On All Screens, On All Networks. What is the new YouTube experience?: make it work even on even the cheapest phones; enable sharing between people; localize it as much as possible; maximizing data-friendliness. The result is an app the works offline, really offline: YouTube Go compresses and caches thumbnails for videos so you can poke around and see what’s there. You can see, share, and watch videos without ever pinging a cellphone tower.

  • In the cloud version of celebrity dating it seems Adobe and Azure are now a thing. It's not an exclusive relationship. Adobe is still going out with Amazon and is spreading its love across multiple private and public clouds. Makes sense, you create more jealously, er, leverage that way. It seems Adobe's wandering eye might have been drawn by Cortana, Microsoft’s speech-recognition technology. Adobe’s use of Microsoft Azure will complement, not replace, AWS and its own cloud.

  • == Rules ==. Unlikely that you'll agree with all of them, but they make for interesting reading.

  • agentultra on What I Wish Small Startups Had Known Before Implementing A Microservices Architecture: Know your data. Are you serving ~1000 requests per second peak and have room to grow? You're not going to gain much efficiency by introducing engineering complexity, latency, and failure modes. Best case scenario and your business performs better than expected... does that mean you have a theoretical upper bound in 100k rps? Still not going to gain much. There are so many well-known strategies for coping with scale that I think the main take-away here for non-Uber companies is to start up-front with some performance characteristics to design for. Set the upper bound on your response times to X ms, over-fill data in order to keep the bound on API queries to 1-2 requests, etc. Know your data and the program will reveal itself is the rule of thumb I use.

  • Only in games do you get to have features with cool names like Loot. Running Online Services at Riot: Part I. Riot now deploys with a brand new system called rCluster, that leverages Docker and Software Defined Networking in a micro-service architecture. They have a hard problem, they have to be able create clusters that could support Docker around the world in places like North & South America, Europe, and Asia. They implemented a container scheduling called Admiral. OpenContrail is used to give each application a private network. Contrail uses GRE tunnels between compute hosts, and a gateway router to manage traffic entering and leaving the overlay tunnels and heading to the rest of the network. When the master repo is changed, a new application container is created and deployed to a QA environment. Containers now have dynamic IP addresses and are constantly spinning up and down so they had to build a Microservices Platform to handle things like Service Discovery, Configuration Management and Monitoring.

  • Private equity on the rise as investors chase yield by monetizing recurring revenue. Infloblox acquisition by private equity firm Vista Equity Partners for $1.6 billion. Money is cheap so a potential 10%ish return on taking a company private is attractive, especially for companies without growth prospects, yet have strong recurring revenue streams. Network Break 105

  • Car as a Platform lets Tesla add cool new features like setting a max temperature so kids and pets never get overheated. Another app I'd like to sell in the Tesla App Store is one that uses internal sensors to detect potholes and report road conditions. You could get a hell of a crowdsourced view of the highway system.

  • You know what would be nice? Examples. Super awesome software used by lots of people. Or perhaps a little more humility.  @cmeik: "I learned people are doing agile programming which ignores everything learned in the 70s." - Lamport on abstraction in Liskov QA. #hlf16

  • VMs great, PaaS not so much. One year on Google Cloud - Whats great, Whats not. The good: No Reboots. Ever!; Flexible VM sizes; Extremely fast VM creation. The bad: Cloud SQL; Subpar PAAS Services; Network issues; No Connection Draining APIs; Ubuntu Repositories. 

  • mikeash: Some of the comments here make me think of crabs in a pot pulling down the ones who try to climb out. Nobody's making you participate in this venture [Making Humans a Multiplanetary Species]. If you don't like it, then you're free to go do whatever it is you do like. You might think Musk could better direct his efforts and resources elsewhere, but most other billionaires don't do anything all that interesting, they just invest their money in mundane stuff, outsource jobs, build hotels, run for President, etc. So why are you upset with this one and not all those others?

  • Why the REST don’t use WebSockets: the amount of data used within the WebSocket is considerably smaller. With a single byte character set the REST request is 233% larger (81 vs 270 bytes) and the REST response is 190% larger (63 vs 183 bytes). Does this make a difference? Well, here at GameSparks we really think so. If you are pushing hundreds of messages a minute, which is not uncommon, all of this data has to be processed by your application. We think you’d prefer your CPU cycles to be doing more important stuff, like physics or rendering.

  • Very nice. Optimizing optimizing: some insights that led to a 400% speedup of PowerDNS. Lots of good advice like: These days, it turns out that most obvious optimisations have already been done for you, either by the compiler or the CPU...Many of us have wasted a ton of time implementing userspace multitasking and asynchronous I/O, because surely those system threads must suck, only to find that the kernel knows a hell of a lot more about CPUs and I/O than we do...get call-graph based performance profiles...The best way to speed up code is not to run it.

  • Going forward protocol designers will explicitly have to design with the goal of discouraging attacks, even if it hurts performance, constrains features, and reduces beauty. It's Darwinian out there. Ethereum Transaction spam attack: Next Steps. Today the network was attacked by a transaction spam attack that repeatedly called the EXTCODESIZE opcode (see trace sample here), thereby creating blocks that take up to ~20-60 seconds to validate due to the ~50,000 disk fetches needed to process the transaction. The result of this was a ~2-3x reduction in the rate of block creation while the attack was taking place.

  • Point of order. Cooperative multitasking is not not multitasking, it's parasitism. #Reactive #Programming - The Hot new thing or are we going back to Windows 2.1-style cooperative multitasking?

  • A good read on realities behind cloud computing: One of the things we discovered rapidly, for a bursting big data analytics effort, with a sizeable on site storage (a few hundred TB, pulling back 10% of the data per month), was that the cloud models, using specifically the most aggressive pricing models available, were more expensive (on a monthly basis) … often significantly … than the fully burdened cost (power/cooling, space/building, staff, network, …) of hosting an equivalent (and often far better/faster/more productive) system in house.

  • Take a fun trip through datacenter designs past. A Rare Tour Of Microsoft’s Hyperscale Datacenters: “SDN is a very big deal,” said Bakken. “We are already doing load balancing as a service within Azure, and what I really want to be able to do is cluster-based networking, which means I want to eliminate static load balancers, switches, and routers as devices and go to a standard industry interconnect and do those services in software. This is a big thing in my environment because networking makes up about 20 percent of the capital expenses because I have to replicate those frames in multiple locations.”

  • Evaluating MySQL Parallel Replication Part 4: More Benchmarks in Production: For all environments, aggressive parallel replication can produce much better speedups than conservative parallel replication...The biggest surprise was to see speedup increasing past 80 threads. We could have thought that more threads than processing units would slow things down, but it is not the case. This is probably caused by threads being most of the time in a waiting state: either waiting for a previous transaction to commit, or waiting for an IO. 

  • Software That Writes And Evolves Software: The solution: Model project creation as a sequence of transformations on a starting point, which is itself a normal, running project in its target technology. We call these transformations “editors.”

  • These days you just may be able to optimize a video pipeline enough to stream games from Amazon over the Internet. The Technology Behind A Low Latency Cloud Gaming Service. 60 FPS; Parsec is a high performance video streaming app; With network latencies below 20 ms, and bandwidth above 10 Mb/s, Parsec offers a near-native experience.

  • Lies, damned lies, and statistics. As any basketball player knows “Hot Hands” in Basketball Are Real: Real-life data consistently becomes less streaky when rearranged, suggesting that shooters really do run hot and cold. It may be time to put the myth of the myth of the hot hand to rest.

  • 2DX: a set of JavaScript tools built around a thin noSQL document store. 2DX tools include a single page application platform, graphing and dynamic math expression evaluation. It can serialize website date, including form inputs, and save it into a serialized file which when retrieved fully restores a user session. 

  • Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation: In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder.

  • Avoiding Catastrophic Forgetting by a Dual-Network Memory Model Using a Chaotic Neural Network: In neural networks, when new patterns are learned by a network, the new information radically interferes with previously stored patterns. This drawback is called catastrophic forgetting or catastrophic interference. In this paper, we propose a biologically inspired neural network model which overcomes this problem