Stuff The Internet Says On Scalability For October 14th, 2016

Hey, it's HighScalability time:

A pattern from the collective unconscious of the universe. Scott Kelly's brilliant Year in Space Photos.

If you like this sort of Stuff then please support me on Patreon.

  • $1.5 million: new iOS hack bug bounty; 120 Terabits per second: Google and Facebook's submarine cable between Los Angeles with Hong Kong; 142,000: IT jobs lost last month;  $17 billion: cost of recall to Samsung; $4.1 Billion: IRS detected identity theft tax fraud; 1956: first mention of P vs NP by Kurt Gödel to John von Neumann; 1 million HTTP requests per second: DDoS attacks coming from IoT cameras; 90 petaflops: capacity of volunteer computing; 500 msec: time it takes the brain to integrate all sensory data into consciousness;

  • Quotable Quotes:
    • @GreatDismal: Silicon Valley fantasy that our universe is a simulation is actually the fantasy that our universe is a *sucessful startup*
    • @gblache: Being POTUS must be like inheriting a 240 year old code base and being asked to fix it in 4 years while half your team tries to sandbag you.
    • chrissnell: I'm a huge believer in colocation/on-prem in the post-Kubernetes era. I manage technical operations at a SaaS company and we migrated out of public cloud and into our own private, dedicated gear almost two years ago. Kubernetes and--especially--CoreOS has been a game changer for us.
    • @BenedictEvans: You spend 50-100x more on your smartphone than Google or FB make from you in ad revenue. They pay for their clouds out of that ad revenue
    • @kevinmarks: #NextEconomy Urs Hölzle: training a large model is super computationally intensive - trillions of flops
    • Tim O'Reilly: we see huge amounts of capital sitting on the sidelines rather than being part of a city - how do we fix this?
    • old-gregg: When I was at Rackspace, I was trying to analyze the top reasons our startup customers would stop using some of our SaaS offerings. The most common one, unsurprisingly, was they'd run out of business. But another top one was "they got successful". As they got bigger and more successful (can't mention names) they'd bring more and more in-house, eventually getting to a point that the only products they were interested in were just servers and bandwidth.
    • Joel Spolsky: But developers don’t want to overhear conversations. That’s ideal for a trading floor, but developers need to concentrate
    • Werner Vogels: Fast Data is an emerging industry term for information that is arriving at high volume and incredible rates, faster than traditional databases can manage. 
    • mattmanser: Honestly mate, you're just talking about the same old, same old. Every framework is about componentization and encapsulation. You could take React out of your post and replace it with any framework name in the last 40 years and it would have made 'sense' at the time.
    • @danielbryantuk: "Traditional software dev was like farming. You bought your tool stack and got busy. Now we're more like foragers" @monkchips #jaxlondon
    • Prashant Deva: RethinkDB is a classic story of good engineers doing only 'cool' things, not understanding their business, and ignoring all the 'boring' things that actually make a business tick.
    • Ada Lovelace Day: Lovelace came up with a method for the Analytical Engine to repeat a series of instructions: the first documented loop in computing
    • Greg Sanders: Let's stop talking about the block size. Let's talk about weight, the weight of a transaction, the weight of a block, the externalities it puts on the system. Let's talk about throughput. We can put more information in small spaces, so let's look at these problems
    • James Ryan: A major hold-up has been memory issues. GTA can’t even keep a car in memory after it’s left the player’s field of view, so there’s been no room at all for maintaining something resembling a character’s inner world.
    • yummyfajitas: Paraphrasing this to data science: "Everybody wants to have software provide them insights from data, but no one wants to learn any math."
    • @hunterwalk: "YouTube has a 46% share [of online video market], MySpace has 23% & Google Video has 10%." @nytimes 10/9/06  Happy 10th anniversary YT acq
    • @datawireio: "Microservices should not be used if the organization isn't embracing DevOps principles" http://d6e.co/2dxp0vr  by @danielbryantuk
    • delinka: I'm a bit older than the author. Every time I feel like I'm "out of touch" with the hip new thing, I take a weekend to look into it. I tend to discover that the core principles are the same, this time someone has added another C to MVC; or the put their spin on an API for doing X; or you can tell they didn't learn from the previous solution and this new one misses the mark, but it'll be three years before anyone notices (because those with experience probably aren't touching it yet, and those without experience will discover the shortcomings in time.)
    • sonnytron: But that's never good enough for douche bags that have a Foosball table in the office. They want you to give up your lunch and your evenings and play foosball with them. And crush it bro. And kill it bro.
    • @tupshin: @cmeik at scale (for various axes of scale, such as geographic-induced latency) a totally ordered system is impractical due to ux concerns
    • Victor J. Blue: When we’re addicted to online life, every moment is fun and diverting, but the whole thing is profoundly unsatisfying.
    • Richard Evans: I looked through the code and it turned out that much much earlier in the game I’d been rude to a servant during dinner, and the servant had gone into the kitchen and told the people there what a jerk I’d been – one of those people was the doctor. He remembered that. This took me quite a long time to debug. This is an example of how emergence is exciting but it opens up questions about game design.

  • This is the old: We had a post about whether you need maths to program. My answer: You need this kind [discrete math]. This is the new: Foundations of Data Science: we have written this book to cover the theory likely to be useful in the next 40 years, just as an understanding of automata theory, algorithms and related topics gave students an advantage in the last 40 years. One of the major changes is the switch from discrete mathematics to more of an emphasis on probability, statistics, and numerical methods.

  • Unlocking Horizontal Scalability in Our Web Serving Tier. Using MySQL on AWS RDS, Airbnb ran into C10K problems (connection limitations) that manifested as query latency increases, increased requests queues, and error rate spikes. So they added a connection pooling feature to MaxScale, a database proxy that supports intelligent query routing in between client applications and a set of backend MySQL servers. To neutralize the extra network hop introduced by the proxy they implemented availability zone aware request routing in SmartStack. Result: we were able to scale the application server tier with the addition of more servers without an increase in MySQL server threads. More than 15 Airbnb MaxScale database proxy services are in production.

  • Videos from #AnsibleFest San Francisco 2016 are available.

  • Ben Thompson~ One of things that's foiled the understanding of technology is the assumption that the only means of integration is OS and hardware. The truth is integration can happen anywhere on the value chain. Where you gain profits is by owning a particular spot on that value chain where you are able to integrate and everyone else modularizes around you and they are all made into commodities and you have that choke point and you profit from that. Apple does this with the OS. They make money on the hardware but the differentiating factor is the software. By fusing those two together they are very profitable business. Google has the best services that are manifesting in the form of Google Assistant. If you want Google Assistant it's only available on Pixel which costs money. They are fusing those together, that's their point of integration and that's how they hope to make money. Samsung's point of integration is very different. Samsung integrates far lower down the stack. Samsung makes a huge number of parts for their own phone. They are integrated in the actual manufacturing of the phone and then they are good and skilled and spend hundreds of millions of dollars and have relationships with every single phone seller and carrier in the world. THey have the distribution. Samsung has succeeded in a very different way that's not appreciated by a lot of observers. They look at the technology and look at the software and think it's not so great. They don't understand there's all these other parts of the value chain that Samsung is very very good at. And that's how they've taken over Android and become the dominant player.

  • O'Reilly has some Free Programming Ebooks.

  • How do you handle distributed transactions with microservices? Derpscientist: The general pattern I've seen flawlessly handle a lot of throughput is that behind each (at least once delivery) message queue you have a queue poller that checks the idempotency key then orchestrates all of the service calls. If all of the calls succeed then the message is considered processed to the queue poller otherwise it will retry N times. If each service is idempotent then partial failures are not bothersome.

  • The Sal Khan: Let's teach for mastery -- not test scores approach makes a lot of sense for programming. Learn something completely before moving on to the next skill. Does it make sense to learn while loops after you only scored 70% on your "if-then-else" skill test? No. Knowledge builds. And programming is the perfect "master concepts at your own pace" kind of subject.

  • Is it worth building anything latency sensitive on top of Lambda? There's a benchmark for that. API Gateway vs Lambda vs Bare EC2. Bare EC2: 15ms mean response time. API Gateway -> EC2: 213ms mean response time. API Gateway -> Lambda: 276ms mean response time. Though perhaps taking the mean isn't the best way to compare. I'd also be intersted in the direct to Lambda numbers. You don't need to go through API Gateway to use Lambda.

  • Great article on How Twitch Uses PostgreSQL. Of their 125 database servers ~117 run on PostgreSQL. Their original cluster handles 300,000 transactions per second. They moved from their own datacenter over to AWS, but still use their own specialized infrastructure to keep the system up and working. They also use Redis for cache and disk persisted key-value data. DynamoDB tables for high write load data use cases. S3 for event streams. Redshift for data analytics. OLTP runs on PostgreSQL.

  • Swift may be production ready. Linux (Ubuntu) Benchmarks for Server Side Swift vs Node.js. Node doesn't compare well, but it's a benchmark, so you know. 

  • Want to learn more about Azure? Here's a great resource: The World’s Most Complete Review of the Azure Services and the Portal.

  • Just another version of he who yells loudest and most often in a room wins. Seems humans and their algorithms share a not so smart frequentist bias. Two examples. Facebook has repeatedly trended fake news since firing its human editors and How Trump supporters got images of Bill Clinton to show when you Google 'rapist'. Ignore the politics if you can, but it's just this easy...On the reddit forum for Trump backers, the effort to impact Google search results appears to have begun when a user posted an image of Mr Clinton with the caption: “Rapist. When people search ‘rapist’ we want this image to be the first thing they see.”

  • PQ Show 94: The State Of Open Compute Networking. The story of how people can work together towards a common purpose without force or coercion. Just simple self-interest at work. Work together to create practical designs that solve real problems and let manufacturers compete to make them for you. It's almost enough to renew your faith in humanity. 

  • Good example of making the transition to real-time from batch. NetFlash: Tracking Dropbox network traffic in real-time with Elasticsearch. Dropbox went from a slow Hive/Hadoop/HiveSQL network monitoring solution to use NetFlow + Kafka + Zookeeper + Elasticsearch + Kibana to monitor 260 Billion NetFlow datagram records every day, that’s terabytes of aggregated data, in real-time. 

  • Is this why we dream? Asynchronous methods for deep reinforcement learning: This explains the superlinear speed-up in training time required to reach a given level of skill: the more games are being explored in parallel, the better the training input to the network. I really like this idea that the very nature of doing things in parallel opens up the possibility to use a fundamentally different approach. I don’t think that insight would naturally occur to me, and it makes me wonder if there are other scenarios where it might also apply.

  • The brain is the original big data system.

  • Graph-powered Machine Learning at Google: At its core, Expander’s platform combines semi-supervised machine learning with large-scale graph-based learning by building a multi-graph representation of the data with nodes corresponding to objects or concepts and edges connecting concepts that share similarities. The graph typically contains both labeled data (nodes associated with a known output category or label) and unlabeled data (nodes for which no labels were provided). Expander’s framework then performs semi-supervised learning to label all nodes jointly by propagating label information across the graph. 

  • It looks like everyone that isn't Intel is getting on a faster machine learning bus. OpenCAPI Unveiled: AMD, IBM, Google, Xilinx, Micron and Mellanox Join Forces in the Heterogenous Computing Era: "Google, AMD, Xilinx, Micron and Mellanox have joined forces with IBM to create a "coherent high performance bus interface" based on a new bus standard called "Open Coherent Accelerator Processor Interface" (OpenCAPI). Capable of a 25Gbits per second per lane data rate, OpenCAPI outperforms the current PCIe specification, which offers a maximum data transfer rate of 16Gbits per second for a PCIe 3.0 lane. We assume that the total bandwidth will be a lot higher for quite a few OpenCAPI devices, as OpenCAPI lanes will be bundled together." Peruse the comment section too, there's a good discussion vis-a-vi PCIe and datacenter vs consumer markets. 

  • MonitoringScape Live with Adrian Cockcroft - Monitoring Challenges: monitoring isn’t a solved problem after twenty plus years honing our craft. Spoiler-alert: it’s difficult to solve a problem that keeps changing...New architectures, tools, networks, and clouds require us to rethink what it is we’re monitoring, how we collect data, and what to do with it once we have it...The thesis underlying many of Adrian’s key points: monitoring must be cheaper than the thing being monitored...Adrian summarized the state of our industry with his wonderfully cynical “Tragic Quadrant” of vendors attempting to both scale horizontally - as the number of nodes to monitor increases - and vertically - as the pace of change within those nodes increases.

  • Understanding caching in Postgres - An in-depth guide: Postgres is a process based system, i.e each connection has a native OS process...Postgres as a cross platform database, relies heavily on the operating system for its caching...WAL is a redo log that basically keeps track of whatever that is happening to the system...It is always better to monitor something directly from postgres,rather than going through the OS route.

  • Here's the way malloc implementations generally work. It's much more complicated and nuanced than you might think.

  • Nicely done, especially the comparison of levels available in different databases. A Quick Primer on Isolation Levels and Dirty Reads. Also, Databases 101: ACID, MVCC vs Locks, Transaction Isolation Levels, and Concurrency.

  • Free, censorship resistant trade done without any middlepeople. That's OpenBaazar. Nice tight description in OpenBazaar with Sam Patterson. I trust Sam. He has a big beard. Bitcoin is the currency of choice. There's a multi-signature escrow feature so two out of three people have to agree before the money flows. Used by thousand people all over the world hosting about 8,000 items. The network is a completely decentralized and peer-to-peer, so getting statistics is difficult. It's completely free Open Source, though companies are building on top of it to make money by offering for pay services in a competitive market. NAT traversal issued mandated using UDP so it doesn't work with Tor. New version will be based on IPFS to support a more distributed web. Data can be distributed across other OpenBazaar users so store data will be resilient to failure. Transitioning from Python to Go. Also, Top 5 Things People Are Buying on OpenBazaar: Food, Cigarettes, Seeds, Comic Books, Clothing & Accessories.

  • Hybrid computing using a neural network with dynamic external memory. nl helps make sense of what this means: This is probably the most important research direction in modern neural network research.Neural networks are great at pattern recognition. Things like LSTMs allow pattern recognition through time, so they can develop "memories". This is useful in things like understanding text (the meaning of one word often depends on the previous few words). But how can a neural network know "facts"? Humans have things like books, or the ability to ask others for things they don't know. How would we build something analogous to that for neural network-powered "AIs"? There's been a strand of research mostly coming out of Jason Weston's Memory Networks research[1]. This extends on that by using a new form of memory, and shows how it can perform at some pretty difficult tasks. These included graph tasks like London underground traversal. One good quote showing how well it works: In this case, the best LSTM network we found in an extensive hyper-parameter search failed to complete the first level of its training curriculum of even the easiest task (traversal), reaching an average of only 37% accuracy after almost two million training examples; DNCs reached an average of 98.8% accuracy on the final lesson of the same curriculum after around one million training examples.

  • Some odd parallels to cloud adoption. Evolution and Spread of the Most Cooperative and Invasive Species: Us: Dr. Marean argues for another food-related milestone: the turn toward foraging dense and predictable food resources. This shift in behavior led to elevated levels of group territoriality and conflict, which may have provided the ideal conditions for the evolution of the hyper-cooperative behaviors unique to modern humans.

  • Illustrated Guide to Monitoring and Tuning the Linux Networking Stack: Receiving Data. Nice series of diagrams aimed to help readers form a more clear picture of how the Linux network stack works.


  • stackimpact: Monitor, optimize and troubleshoot production Go applications

  • open-guides/og-aws: Amazon Web Services — a practical guide.

  • bloomreach/bloomgateway: a lightweight entry point service which helps in controlling edge service requirements in distributed environment. It is designed to provide high availability, high performance and low latency for mediating service requests. I

  • runqlat: it summarizes run queue latency as a histogram. It's measuring the time from when a thread becomes runnable (eg, receives an interrupt, prompting it to process more work), to when it actually begins running on a CPU.

  • tidwall/summitdb: In-memory NoSQL database with ACID transactions, Raft consensus, and Redis API.