Stuff The Internet Says On Scalability For March 31st, 2017

Hey, it's HighScalability time:

What lies beneath? Networks...of blood vessels. (Wellcome Image Awards)
If you like this sort of Stuff then please support me on Patreon.

  • 5000: node (150,000 pod) clusters in Kubernetes 1.6; 15 years: time to @spacex launch with a recycled rocket booster; 174 mbps: Internet speed in Dublin; 10 nm: Intel’s new Moore approved process; 30 minutes: to create Samsung's S8; 50 billion: of your cells replaced each day; 2 million: new red blood cells per second; 3dbm: attenuation of human body, same as a wall; 12: hours of tardis sounds; 350: pages to stop a bullet; 2: meters of DNA pack in a space .000006m wide; 

  • Quotable Quotes:
    • @swardley: Having met many "leaders" in technology & business, I wouldn't bet on the future survival of humanity. If anything AI might help the odds
    • Francis Pouliot: Any contentious hard fork of the Bitcoin blockchain shall be considered an alternative cryptocurrency (altcoin), regardless of the relative hashing power on the forked chain.
    • @coda: WhatsApp: 900M users, built w/ < 35 devs, using #erlang Krispy Kreme: 1004 locations, 3700 employees, original glazed is 190 #calories
    • @BenedictEvans: Still think it's interesting Instagram shifted emphasis from interests to friends. Is that a law of nature for social if you want scale?
    • @johnrobb: "each robot per thousand workers decreased employment by 6.2 workers and wages by 0.7 percent"
    • Alex Woodie: The Hadoop dream of unifying data and compute in a distributed manner has all but failed in a smoking heap of cost and complexity, according to technology experts and executives who spoke to Datanami.
    • @RichRogersIoT: "First you learn the value of abstraction, then you learn the cost of abstraction, then you are ready to engineer." - @KentBeck
    • @codemanship: Don't explain code quality to execs. Explain high cost of change. Explain slowing down of innovation. Explain longer cycle times.
    • @malwareunicorn: Bad malware pickup lines: Hey girl, I heard you like sandboxes. I would never try to escape yours ;)
    • dkhenry: The selling of data isn't the policy you need to fight. The monopoly power of ISP's is the problem you must push back on. 
    • @MaxWendkos: An SEO expert walks into a bar, bars, pub, tavern, public house, Irish pub, drinks, beer, alcohol
    • Barry Lampert: the point of Amazon isn't to offer a consumer the absolute lowest price possible; it's to offer the lowest price possible given the convenience that Amazon offers
    • Daniel Lemire: Let us make the statement precise: Most performance or memory optimizations are useless.
    • @sarahmei: People run into trouble with DRY because it doesn't tell you *what* not to repeat. People assume syntax, but it's actually concepts.
    • Dan Rayburn: China suffers from 9.2% transfer failure rate (similar to Malaysia, India and Brazil), and a high packet loss.  These two parameters have severe impact on content download time and overall performance.
    • Daniel Lemire: I submit to you that it is no accident if the StackOverflow list of top-paying programming languages is made of obscure languages. They are comparing the average of a niche against the average of a large population
    • Nate Finch: There are no microservices, everything runs in a single binary. This actually works fairly well, since Go is so highly concurrent, there’s no need to worry about any one goroutine blocking anything else.
    • euske: Watching a discussion like this really make me wonder this: what is a module, anyway? There are obviously varying aspects of modularity 
    • closeparen: For my employer, the whole point of microservices is separate deploys. When we had hundreds of engineers committing on the monolith, a bad change in one out of the few dozen commits in a given day's upgrade could require rolling the whole thing back.
    • AtticusTheGreat: I don't have much ideology behind going with microservices vs. monolith, but what we've done on some recent projects is organize our code into modules that only communicate with each other through a narrow and well defined boundary layer. If we need to split a module out into a separate service, then it isn't nearly as much work to split it out later.
    • angry_octet: The virtual memory system, with its concept of paging to disk, is obsolete in the sense that hardly anybody that does bigger-than-ram computations rely on the kernel's algorithms to manage it. The current paging system doesn't have a sensible mechanism for flash-as-core memory (10x RAM latency, e.g. DDR4 12ns for first word, so 120ns), persistent memory in general, or using SSDs as an intermediate cache for data on disk. ZFS has some SSD caching but it is not really taking advantage of the very large and very fast devices now available.
    • @mathiasverraes: Recursion in Haskell works like this: Every paper you read contains references to other papers you need to read first.
    • ortusdux: I once read the account of a student who gamed the bots sellers use to competitively price their products in order to get cheap textbooks. The person made a dummy seller account, setup a bot, and listed the books they needed. As they lowered their listing prices, the real sellers' bots auto-price matched. The student then bought all their books at ~1/10th the original price, canceled any orders they may have received, and deleted their store.
    • @jangray: #FPGA @Intel process scaling 32 nm: 7.5 million transistors/mm2 (2011)22 nm: 15.3 MTr/mm2 14 nm: 37.5 MTr/mm2 10 nm: 100.8 MTr/mm2
    • @codinghorror: Blacklisting doesn't work. But whitelisting ... does.
    • @brucevanhorn2: @StackOverflow I tried pair programming once.  You'll NEVER FIND THE BODY.
    • John Hagel: If you’re a large company or want to become a large company, the one business type you might want to shed most quickly is the product innovation and commercialization business, because that business type will be increasingly vulnerable to fragmentation.
    • tannhaeuser: To this day I still haven't understood what makes microservices different from SOA in a technical sense.
    • ebiester: I'm a monolith guy* in a microservices world. Friday, I had a question asked of me that in our old system was a simple query and I could answer in 5 minutes. This question, however, was split across three separate microservices in the new system, and the information had never been captured in a convenient way in hadoop. 
    • @chrisoldwood: Each instance of Chrome and Slack now weighs in at 1,000 MB, welcome to the Gig economy..
    • @johnrobb: Note that the global sales rate of industrial robotics is linear, not exponential...
    • lobster_johnson: We've used microservices for around 6-7 years now. One thing we realized quite early was that letting each microservice store state "silos" independently was a bad idea. You run into the synchronization issue you describe. Instead, we've moved the state to a central, distributed store that everyone talks to. This allows you to do atomic transactions. Our store also handles fine-grained permissions, so your auth token decides what you're allowed to read and write.
    • dluc: Having worked mostly with mesos and k8s, I found k8s configuration superior, e.g. how one can import secrets, config files, and more importantly set up the network without address translation or port forwarding. Tooling seems OK with both, and I agree one needs to spend some time to get familiar with the CLI and nomenclature, IMHO because both are quite flexible and powerful.
    • _errata_: If you are putting your DB at the center of your app in a traditional 3-tiered app, then you are going to be in a world of pain if anything about your db changes.
    • Silota: PostgreSQL is a much better choice for your next analytics project. 
    • graphicsRat: The fact that the [software development] process does not fit the Victorian definition of engineering does not make it otherwise. The process involves requirements gathering, design, prototyping, development, testing and maintenance. Looks like engineering to me but not in the Victorian/Industrial revolution sense of the word
    • Brent Ozar: Azure SQL DB isn’t just priced by storage capacity and availability: it’s also priced by performance capacity. The more data you churn, the more cash you burn.
    • Ed Sim: Looking at both S1 filings, it’s clear that AppDynamics and Mulesoft have caught on to what Salesforce already knows – if you want to be a massive business you also need to sell professional services. As these tech companies get larger and larger, their target customer also increases in size as these vendors look to move from 6 to 7 figure deals. In order to support continued ARR growth upstream, some of the best companies successfully use professional services as a weapon and make implementation, support and training part of the sale.

  • For good WiFi you don't necessarily need one big powerful router bristling with antenna like a radiation mutated ant. 802.eleventy what? A deep dive into why Wi-Fi kind of suck and New Screen Savers (@20 min). You want a true mesh network (Plume). WiFi should whisper, use 5G to create pools of WiFi in each room so signals don't penetrate between rooms. Lots of little access points can automatically find a path through your house. Use a wired backhaul for best performance. Raw throughput isn't the best measure. How does it perform with many people using many devices? Roaming isn't always well supported. Consider how well the system hands-off devices as you walk through the house. 

  • BloomCON 2017 Videos are now available. You might like Honey, I Stole Your C2 [Command-and-control] Server: A dive into attacker infrastructure.

  • AI denial is a thing. "What the hell happened?" asks futurist Amy Webb on Triangulation 291 about Treasury Secretary Steven Mnuchin's statement that he's not at all worried about robots displacing American workers, that it's not even on his radar screen, that it's 50 or 100 more years out. Key takeaway: stop infantalising the American public. Automation is coming. Why must we politicize technology? It's a bad thing for our country. There a whole bunch of things we are going to have to confront. Saying everything is OK isn't helping.

  • NASA Image and Video Library. It's like Google...for space. 

  • When is a rose a rose a module? Modules vs. microservices: Why not start with a modular application? You can always choose to move to microservices later. Then, instead of having to surgically untangle your monolith, you have sensible module boundaries cut out already. It's not even an exclusive choice: you can also use modules to structure microservices internally. The question then becomes, why do microservices have to be 'micro'? Great discussion on HackerNews of all manner of different architectures. 

  • Only 10% of cars need to be under algorithm control to improve traffic says New AI Algorithm Beats Even the World's Worst Traffic.

  • TodoMVC has been a Rosetta Stone for comparing web frameworks. Here's a new one rendering Hacker News as a Progressive Web App: tastejs/hacker-news-pwas. Examples are in: React, Preact, Svelte, Vue.js, Angular. Preact was fast and not horrible looking.

  • Google says programming techniques for supercomputers are not directly applicable to warehouse-scale computers. Warehouse-scale computing systems is the need to optimize for low latencies while achieving greater utilizations. Attack of the Killer Microseconds [video]. It doesn't matter if you have the fastest machine if it's hard for programmers to create fast programs. Synchronous programming is preferred over async because it's easier for programmers to get right. Networks, SSDs, machine learning accelerators, all operate at microsecond levels, yet programmers don't have tools to program efficiently at this time range. Comment: it' not clear how sync programming is tied to the microsecond problem, they seem like two separate ideas tossed together. And there doesn't appear to be a solution offered. How exactly does programming look different in the microsecond world? Or is it only hardware that needs to change?

  • tdammers hates MySQL with a well reasoned passion: Transactions: While MySQL has some degree of transaction support, it is brittle; Performance: MySQL performs really well, but only in very specific situations; Character encoding. MySQL is notorious for weak charset enforcement; Constraint enforcement. Let's just say MySQL is not very good at this; MySQL's full text indexes suck; MySQL's decision to separate storage engines from the rest of the system leads to a lot of feature fragmentation.

  • The title does not lie. 3 years on Google App Engine. An Epic Review: It is a fully-managed application platform. So far, I do not know a platform which comes close to GAE's full package: log management, mail delivery, scaling, memcache, image manipulation, distributed Cron jobs, load balancing, version management, task queue, search, performance analysis, cloud debugging, content delivery network - and that is not even mentioning auxiliary services that have popped up on Google's cloud in the meantime like SQL, BigQuery, file storage... the list goes on.

  • Perhaps we need a more interesting world? 7 Algorithms That Rule the World: Fast Fourier Transfrom; Link Analysis; Data Compression; Dijkstra's Algorithm; RSA Algorithm; Proportional Integral Derivative Algorithm; Sorting Algorithms.

  • ORM Bankruptcy: Why We Ditched Our ORM: Code Maintainability; Do less work; Don’t hide the good stuff; the service is now more pleasant to work with, and gives our engineers an increased sense of confidence when pushing code. This is due not only to the legibility of the code, but also the increased testability. 

  • Here's how you mess with image recognition AIs for fun and profit. It turns out you can add a human-invisible map to any image and make it unclassifiable. Universal adversarial perturbations: Given a state-of-the-art deep neural network classifier, we show the existence of a universal (image-agnostic) and very small perturbation vector that causes natural images to be misclassified with high probability. We propose a systematic algorithm for computing universal perturbations, and show that state-of-the-art deep neural networks are highly vulnerable to such perturbations, albeit being quasiimperceptible to the human eye.

  • This like your parents listening to the same music you do. J.P. Morgan Set to Run First Apps in Public Cloud. Johnson & Johnson Inc will push 85% of its applications to the cloud by the end of 2018. Deutsche Bank will move 30% by 2019.

  • A lurid glimpse into the new algorithm driven world. The High-Speed Trading Behind Your Amazon Purchase: Just beneath the placid surface of a typical product page on Amazon lies an unseen world, a system where third-party vendors can sell products alongside Amazon's own goods. It's like a stock market, complete with day traders, code-slinging quants, artificial intelligence algorithms and, yes, flash crashes...The algorithm will often raise the price on items in a seller's catalog, to see if other sellers will follow suit. The goal is to maximize sales while avoiding bidding wars that can be a race to the bottom...The result, said my marshmallow merchant, is that the customer isn't always getting the absolute best price, especially compared with in-store retail....But the point of Amazon isn't to offer a consumer the absolute lowest price possible; it's to offer the lowest price possible given the convenience that Amazon offers...The most vigorous competitor to sellers on Amazon is, in many cases, Amazon itself. The merchants I interviewed say it is common for Amazon to notice a product category that does well and begin selling it as well. Amazon may even go further and develop a house-branded version of a product. While the price of a pack of Duracell AAA batteries, for instance, fluctuates from one day to the next, the price of Amazon's own brand of AAAs is stable 

  • You will be tempted, but don't use this as your password. Spaceships in Rule 110: The function of the universal machine in Rule 110 requires an infinite number of localized patterns to be embedded within an infinitely repeating background pattern. The background pattern is fourteen cells wide and repeats itself exactly every seven iterations. The pattern is 00010011011111.

  • Powering UberEATS with React Native and Uber Engineering. A mega article on building an application spanning lots of device types, especially when you need to access to features of the device, like to print. If you expected the next words to be React Native then you've won the prize. They use it and they like it. Highlights: Bridging into the JavaScript layer for features such as firing analytics events also proved to be surprisingly straightforward. In hindsight, this lack of a technical barrier probably led us to rely too heavily on native libraries...eschewing iOS patterns and modules wherever possible...it is highly advantageous to minimize interaction between iOS and JavaScript and concentrate logic in the JavaScript layer... pushing updates in this manner has not completely replaced normal app releases (which are still occasionally needed for changes to the iOS or Android native code), it has reduced their frequency...Using Flow to type check allows us to verify that our state maintains its correct shape...Sagas, an alternative side effect model for Redux apps, leverage ES6 (ECMAScript 6) generator functions to provide a less complicated option.

  • What's next? Should viewers vote? NFL is going to a more centralized model of handling challenges. MMQB: Owners, coaches, GMs to be briefed on the league’s time-saving proposals, including the one that changes the game the most: refs no longer going under the hood on replay but rather watching on a sideline tablet—and NFL vice president of officiating Dean Blandino for the first time retaining final authority on all replay rulings.

  • Nate Finch with lessons learned from working 7000 hours on a 3542 file and 540,000 line Go project (Juju, article): much easier to develop and test a monolith than it would have been if it were a bunch of smaller services; package management wasn't a problem; say no to utils packages and repos; Go’s simplicity was definitely a major factor in the success, this was a boon to the project in that we could hire good developers in general, not just those who had experience in the language;  I would stick with the standard library for testing; the time package is the bane of tests and testable code; once we switched to gc, there were basically zero architecture-specific bugs; strong cross platform support of the stdlib for making it so easy to write cross platform code; I haven’t found stack traces in errors to be super useful; For a huge project, Juju is very stable, I credit this to go’s pattern of using multiple returns to indicate errors. The foo, err := pattern and always always checking errors really makes for very few nil pointers being passed around; . Only once or twice did I ever personally feel like I missed having generics.

  • Setting the Record Straight: containers vs. Zones vs. Jails vs. VMs: Solaris Zones, BSD Jails, and VMs are first class concepts...Containers on the other hand are not real things...A “container” is just a term people use to describe a combination of Linux namespaces and cgroups. Linux namespaces and cgroups ARE first class objects. NOT containers...Containers are not a Linux isolation primitive, they merely consume Linux primitives...You can get a sandbox level of isolation with containers...But this requires doing the work of building the Death Star from your pieces of Seccomp, AppArmor, and SELinux profiles...Containers allow for a flexibility and control that is not possible with Jails, Zones, or VMs. And THAT IS A FEATURE.

  • Another serialization victim. Optimizing Twitter Heron: we highlight the optimizations, and we show how these optimizations improved throughput by 400-500% and reduced latency by 50-60%...serialization/deserialization were observed as the limiting factors for stream manager throughput. Using simple lower level optimizations for protobuf messages such as preallocation of a memory pool, in place updates and lazy deserialization, we were able to improve Heron throughput and latency.

  • @cloud_opinion with the perfect picture of the AWS value chain showing there's a toll collector waiting at every point: I denoted where all AWS will get your pennies when you build a serverless app. Look for red dollar signs - these can add up quick.

  • Look for AI services in the future to say "sorry, it's my nap time, try again later." The Purpose of Sleep? To Forget, Scientists Say: When we sleep, the scientists argued, our brains pare back the connections to lift the signal over the noise...the injected mice couldn’t narrow their memories down to the particular chamber where they had gotten the shock. Without nighttime pruning, their memories ended up fuzzy.

  • A new Luddite movement might start by sabotaging AIs. If semi-trucks and cars start crashing the breaks will be put on the rollout of self-driving vehicles.

  • Never cared for broken promises. Async does look better, if you can wait for Node 8. 6 Reasons Why JavaScript’s Async/Await Blows Promises Away. Pros: cleaner syntax, better error handling, clearer conditionals, clearer exception handling, easier to debug. Concerns: worse performance, debugger support, it's less clear async stuff is happening.

  • If ISPs can now sell personal data that means they can also sell competitive intelligence data about your  company and all its employees. That's an interesting thought from Greg Ferro in Network Break 128. Who are they calling? Where are they going? Who are they talking to? You can probably infer a lot by analysing the type and amount of traffic a company is engaged in.

  • Synopsis: Traveling with a Quantum Salesman: For quantum backtracking, the team assigns a superposition of “traveled on” and “not traveled on” to each road and then simultaneously checks both. The results show that a near-quadratic speedup can be obtained when the number of roads leaving each city is small.

  • iamcicada.com: a completely decentralized application platform that delivers on the web’s original vision. Think of it as Lambda without AWS.

  • ncase/loopy: a tool for thinking in systems. 

  • ray-project/ray: Ray is a Python-based distributed execution engine. The same code can be run on a single machine to achieve efficient multiprocessing, and it can be used on a cluster for large computations.

  • Stanford One Hundred Year Study on Artificial Intelligence (AI100): Contrary to the more fantastic predictions for AI in the popular press, the Study Panel found no cause for concern that AI is an imminent threat to humankind. No machines with self-sustaining long-term goals and intent have been developed, nor are they likely to be developed in the near future. Instead, increasingly useful applications of AI, with potentially profound positive impacts on our society and economy are likely to emerge between now and 2030, the period this report considers. At the same time, many of these developments will spur disruptions in how human labor is augmented or replaced by AI, creating new challenges for the economy and society more broadly. Application design and policy decisions made in the near term are likely to have long-lasting influences on the nature and directions of such developments, making it important for AI researchers, developers, social scientists, and policymakers to balance the imperative to innovate with mechanisms to ensure that AI’s economic and social benefits are broadly shared across society. 

  • The final episode of a good series on Building a Scalable Online Game with Azure - Part 5

  • On the Design of Distributed Programming Models: We present two programming models, Lasp and Austere, each of which makes a strong tradeoff with respects to the CAP theorem. These two models outline the bounds of distributed model design: strictly AP or strictly CP. We argue that all possible distributed programming models must come from this design space, and present one practical design that allows declarative specification of consistency tradeoffs, called Spry.

  • Bizur: A Key-value Consensus Algorithm for Scalable File-systems: a consensus algorithm exposing a key-value interface. It is used by a distributed file-system that scales to 100s of servers, delivering millions of IOPS, both data and metadata, with consistent low-latency

  • The Tyranny of Qubits - Quantum Technology's Scalability Bottleneck: In this essay, I look at (bemoan) the issues surrounding simulating quantum systems in order to design quantum devices for quantum technologies. The program runs into a natural difficulty that simulating quantum systems really require a proper quantum simulator. The problem is likened to the "tyranny of numbers" that faced computer engineers in the 1960s.

  • Audience is really anyone programming for AWS. Optimizing Multiplayer Game Server Performance on AWS: This whitepaper discusses the exciting use case of running multiplayer game servers in the AWS Cloud and the optimizations that you can make to achieve the highest level of performance.