Stuff The Internet Says On Scalability For December 4th, 2015

Hey, it's HighScalability time:


Change: Elliott $800,000 in 1960, 8K RAM, 2kHz CPU vs Raspberry Pi Zero, $5, 1Ghz, 512MBIf you like Stuff The Internet Says On Scalability then please consider supporting me on Patreon.


  • 434,000: square-feet in Facebook's new office;  $62.5 billion: Uber's valuation; 11: DigitalOcean datacenters; $4.45 billion: black Friday online sales; 2MPH: speed news traveled in 1500; 95: percent of world covered by mobile broadband; 86%: items Amazon delivers that weigh less than five pounds.

  • Quotable Quotes:
    • Jeremy Hsu: Is anybody thinking about how we’ll have to code differently to accommodate the jump from a 1-exaflop supercomputer to 10 exaflops? There is not enough attention being paid to this issue.
    • @kml: “Process drives away talent” - @adrianco at #yow15
    • capkutay: Seems like a lot of the momentum behind containers is driven by the Silicon Valley investment community.
    • @taotetek: IoT is turning homes into datacenters with no system administrators and no security team.
    • @asymco: On Thursday and early Friday, mobile traffic accounted for nearly 60% of all online shopping traffic, and 40% of all online sales
    • Mobile App Developers are Suffering: It’s just too saturated. The barriers to adoption and therefore monetization are too high. It’s easier on the web.
    • Taleb: It is foolish to separate risk taking from the risk management of ruin.
    • Maxime Chevalier-Boisvert:  I believe dynamic languages are here to stay. They can be very nimble, in ways that statically typed languages might never be able to match. We’re at a point in time where static typing dominates mainstream thought in the programming world, but that doesn’t mean dynamic languages are dead.
    • @__edorian: "Can i have a static linked binary?" - "No that's stupid, it's slower and takes more space!" - "Can i have a docker image?" - "Sure!
    • @grzegorz_dyk: When I see people talking about fine grained #microservices I am thinking: why not use actors? #akka #erlang
    • Henry Miller: When you can’t create you can work.
    • @ValaAfshar: For the first time ever, online media consumption is bigger than TV consumption. 
    • @matthewfellows: I learned today that Airbus code is reviewed by hand... in raw assembly code #yow15 @dius_au
    • Rich Hickey: Programmers know the benefits of everything and the tradeoffs of nothing
    • Robin Harris: Cheap storage is changing the world. Whether it is in the cloud, on a dash cam, or embedded in an app, cheap – as in inexpensive – storage is enabling new relationships between individuals, and with culture, power, and groups.
    • @sustrik: libmill shows 1400x performance improvement in c10k scenarios. Wow! I love low-hanging fruit.
    • @jmckenty: At Scale: Bigger than what you’ve got now.
    • John Cage: My notion of how to proceed in a society to bring change is not to protest the thing that is evil, but rather to let it die its own death.
    • @b6n: preemptively blog about how you scaled to support the million users you don't have yet.
    • @joeweinman: When will the FCC start addressing app neutrality?
    • @ufried: i have this post about data scalability always open in a tab, just to remind me of some essentials once in a while 

  • Personalization is getting more personal and more useful. Personalized Nutrition: Healthy foods are unique to individuals: Israeli research teams have demonstrated that there exists a high degree of variability in the responses of different individuals to identical meals...Using their set of amassed data, the researchers then went a step further, applying machine-learning algorithm to their cohort of 800 participants and developing an algorithm capable of predicting individualized PPGRs (postprandial (post-meal) glycemic responses). This intricate algorithm incorporates 137 features representing meal content, daily activity, blood parameters, CGM-derived features, questionnaires, and microbiome features.

  • Now that's putting concertina wire on the walled garden fence. WhatsApp is blocking links to a competing messenger app.

  • As programming is a creative act, perhaps the ultimate creative act, this advice applies to programmers too. Ira Glass: Nobody tells this to people who are beginners, I wish someone told me. All of us who do creative work, we get into it because we have good taste. But there is this gap. For the first couple years you make stuff, it’s just not that good. It’s trying to be good, it has potential, but it’s not. But your taste, the thing that got you into the game, is still killer. And your taste is why your work disappoints you. A lot of people never get past this phase, they quit. Most people I know who do interesting, creative work went through years of this. We know our work doesn’t have this special thing that we want it to have. We all go through this. And if you are just starting out or you are still in this phase, you gotta know its normal and the most important thing you can do is do a lot of work. Put yourself on a deadline so that every week you will finish one story. It is only by going through a volume of work that you will close that gap, and your work will be as good as your ambitions. And I took longer to figure out how to do this than anyone I’ve ever met. It’s gonna take awhile. It’s normal to take awhile. You’ve just gotta fight your way through.

  • So you want a revolution, what will be the cost? It’s a Trap: Emperor Palpatine’s Poison Pill: In this case study we found that the Rebel Alliance would need to prepare a bailout of at least 15%, and likely at least 20%, of GGP in order to mitigate the systemic risks and the sudden and catastrophic economic collapse. Without such funds at the ready, it likely the Galactic economy would enter an economic depression of astronomical proportions.

  • Practical advice on How to be Successful Running Docker in Production. SalesforceIQ has over 70% of its infrastructure running in Docker, everything but persistence storage. Good candidates for containers: stateless services like web servers and API servers . Not so good candidates: stateful services like databases.   

  • How to optimize images for faster load times. Imgix serves 1 billion images per day. Resizing an image takes  ~700ms.  They processes images on Mac Pros using Core Image to handle graphics processing in their own datacenter. Some of the tools they use are HAProxy, Heka (logs), Prometheus, Graphite, Riemann, C, Objective-C, and Lua. 

  • Riot Games continues their Docker series, this installment is Taking Control of Your Docker Image. Lots of good examples with specifics.

  • We have a long way to go. Theory of 'smart' plants may explain the evolution of global ecosystems: plants may actively behave in ways that not only benefit themselves but also determine the productivity and composition of their environs.

  • Scalability doesn't matter until it does #1 Kobe Bryant announced his retirement through Derek Jeter's startup — and caused the site to crash. #2 The Slow, The Crashed, The Out Of Stock: #BlackFriday #Fail Twitter Report #3 Cyber Monday: Why retailers can't keep their sites from crashing

  • Julia is the bomb for numerical computing. Federal Reserve Bank of NY converts major economic model to JuliaBret Victor: Here’s an opinion you might not hear much — I feel that one effective approach to addressing climate change is contributing to the development of Julia. Julia is a modern technical language, intended to replace Matlab, R, SciPy, and C++ on the scientific workbench. It’s immature right now, but it has beautiful foundations, enthusiastic users, and a lot of potential.

  • AnandTech dissects More on Apple’s A9X SoC: 147mm2TSMC, 12 GPU Cores, No L3 Cache. It's a beast. I can't wait until my Apple Watch has one.

  • Going on Shark Tank as a marketing ploy can work. Shark Tank increased our traffic by 1000x. Here’s how we handled it. Rent Like a Champion went from a peak of 10-20 requests per second to 10-20 thousand requests per second. They use Rails with a stateless server approach. Session info is stored in the cookie so load balancing and adding servers wasn't a problem.  Server Central elastically hosts their VMs. A MariaDB cluster is used for persistence. Capistrano for deployment. nginx and Phusion Passenger as the web server. They cached on the server, using Varnish, and CloudFlare as their CDN. Images are on S3.

  • This is why talking about simplicity is pointless. Simplicity is a quality, not a state. What does simplicity mean? How do we agree on what is simple or not simple? How do we create simplicity? Human Error: The next speaker in line couldn't restrain himself and said: "I also love minimalism, and that's why I love Rails!". Rails is a project that makes a lot of money for a lot of people. It has fans everywhere, and it's handy for starting a project if you already know how to use it. But I would never call it minimalist. 

  • Nice example of Self-Organizing Maps with Google’s TensorFlow, with code.

  • Interesting way to look at it. Pattern: Backends For Frontends: rather than have a general-purpose API backend, instead you have one backend per user experience - or as (ex-SoundClouder) Phil Calçado called it a Backend For Frontend (BFF). Conceptually, you should think of the user-facing application as being two components - a client-side application living outside your perimeter, and a server-side component (the BFF) inside your perimeter.

  • Amazon Aurora in sysbench benchmarks: From a cost consideration (compared to provisioned IO volumes) 3000 IOPS is more cost efficient (for this particular case, but in your workload it might be different) than 2000 IOPS, in the sense that it gives more throughput per dollar.

  • Looks like a great market, too bad you need a billion dollars to create a new fab. SSDs aren’t as cheap as hard drives yet, but they’re getting there. Prices have fallen to 39 cents per GB, projected to fall to 17 cents by 2017. By comparison, HDD prices seem to have more or less bottomed out, moving from 9¢ per GB in 2012 to 6¢ per GB this year.

  • I've wondered about this. Why Ball Tracking Works for Tennis and Cricket but Not Soccer or Basketball: greater processing time severely limits the system’s utility for live-broadcasts of sporting events when the virtual replay has to be available almost immediately. The accuracy of certain types of ball tracking—ballistic shots, for example—becomes easier on shorter sequences since there is less unpredictability. 

  • Great explanation of the Challenges of Memory Management on Modern NUMA System. The definition of exactly what a modern NUMA system is and what it looks like is good as you'll read. The problem: "In the near future, expect systems to have even more NUMA nodes and more complicated NUMA topologies...the performance effects of NUMA are significant and that the problem is nontrivial." The solution: "Carrefour uses hardware performance counters and hardware sampling to determine an application's memory-access patterns online with low overhead. It then uses that knowledge to apply three page-level techniques." The result: "Carrefour is able to improve performance compared with default Linux for many applications."

  • Rob Pike says Simplicity is Complicated and that Go has succeeded is because it is simple. All languages seem like they are converging into one as they borrow features from each other. Since Go 1, Go has essentially been fixed. Adding features won't make Go better because it just makes Go bigger, not better. A  triumvirate of philosopher-kings selected the features that went it to Go so that's why Go has the right features. More features add complexity. Complexity hurts readability. And we want readability.

  • Humans aren't the only ones with bad memories. DRAM’s Damning Defects—and How They Cripple Computers: "At Los Alamos, for instance, more than 60 percent of machine outages came from hardware issues..the most common hardware problem was faulty DRAM...Between 12 percent and 45 percent of machines at Google experience at least one DRAM error per year...high temperatures don’t degrade memory as much as people had thought...a small minority of the machines caused a large majority of the errors...we found that almost 90 percent of the memory-access errors could have been prevented by sacrificing less than 1 megabyte of memory per computer." Also, Why use ECC? [A response to Atwood's post]?

  • Uber shows how they go about Identifying Outages with Argos: Argos addresses the above with anomaly detection algorithms that are fully automated, embarrassingly parallel, linearly scaling, and statistically and computationally robust. Answers appear in the alert and are easily accessible in dashboards once you wake your sleeping computer. 

  • Yep. Problems With Node.JS Event Loop: Node.JS is not an optimal platform to do complex request processing where different requests might contain different amount of data, especially if we want to guarantee some kind of Service Level Agreement (SLA) that the service must be fast enough. A lot of care must be taken so that a single asynchronous callback can’t do processing for too long

  • Awesome interview. Future Of Networking: Martin Casado: The network is for passing packets between two points in a complex network graph; Everything else is done in the application and networkers need to become application aware.

  • Here's how Localytics run Vertica at Scale in AWS on r3.4xlarge memory-optimized instances backed by Amazon Elastic Block Store (EBS). A lesson is constantly run tests to determine your optimal instance type, OS version, and product version. They found they got 2x performance on smaller instances due to reduced CPU coordination bottlenecks. 

  • There's more than one way to do it. Scaling Elasticsearch Across Data Centers With Kafka. Your options: Single Shared Elasticsearch Cluster Distributed Across Data Centers; Independent Elasticsearch Clusters Searchable by Tribe Nodes; Independent Elasticsearch Clusters and A Shared Kafka Cluster; Independent Elasticsearch and Kafka Clusters.

  • In a language bake-off between Java, Scala, Clojure, Node, Go, Python, who would win? Why we [Metabase] picked Clojure: we chose Clojure for JVM threads + Database drivers, the ability to ship an uberjar and the ease of expressing tree manipulations which were the core piece of complexity in our backend.

  • The most enigmatic headline of all time: How to encrypt a message in the afterglow of the big bang. I don't even care what the article is about. It just makes the mind wander to unimagined places. But if you have to know, the idea is to use cosmic microwave background radiations as a source of randomness. Kickstarter anyone?

  • This is a vendor article, but it is a good example what deploying on DCOS looks like. Scaling ArangoDB to gigabytes per second on Mesosphere’s DCOS: we explain how an ArangoDB cluster with 640 virtual CPUs can sustain a write load of 1.1 million JSON documents per second, which amounts to approximately 1GB of data per second. It takes a single short command and a few minutes to deploy this system, running on 80 nodes, using the Mesosphere Datacenter Operating System (DCOS).

  • Don’t build a queue on Cassandra, but if you want to here's a discussion of how to do it: PROJECT ANGELHAIR: BUILDING A QUEUE ON CASSANDRA.

  • Netflix shares a lot of helpful commands for debugging on Linux. Linux Performance Analysis in 60,000 Milliseconds. Good discussion on HackerNews with even more useful commands.

  • The Disruption Debate - What's Missing?: disruption is occurring with increasing frequency in the business world. Whether it is good or bad, it is happening and becoming increasingly widespread. Between 1965 and 2012, the topple rate increased by 40%. In 1937, at the height of the Great Depression and certainly a time of great turmoil, a company on the S&P 500 had an average lifespan of 75 years. By 2011, that lifespan had dropped to 18 years – a decline in lifespan of almost 75%. < My 2 cents is that technology is changing so fast that instincts developed on a human evolutionary scale can't deal. Humans aren't a one step backwards to go two steps forward kind of species. Look for dispassionate AIs to handle all this in the near future.

  • Avoiding system calls is age old advice for good performance. Here's the science. FlexSC: Flexible System Call Scheduling with Exception-Less System Calls: We present FlexSC, an implementation of exceptionless system calls in the Linux kernel, and an accompanying user-mode thread package (FlexSC-Threads), binary compatible with POSIX threads, that translates legacy synchronous system calls into exception-less ones transparently to applications. We show how FlexSC improves performance of Apache by up to 116%, MySQL by up to 40%, and BIND by up to 105% while requiring no modifications to the applications.

  • dgraph-io/dgraph: Scalable, Distributed, Low Latency, High Throughput Graph Database. DGraph's goal is to provide Google production level scale and throughput, with low enough latency to be serving real time user queries, over terabytes of structured data. DGraph supports GraphQL as query language, and responds in JSON.

  • A PROMISE THEORY APPROACH TO UNDERSTANDING RESILIENCE: Promise theory focuses on describing intended outcomes, within these systems. This forms a basis for qualitatively and quantitatively assessing partial systems, with suitably idealized approximations, based on calibrated measurement, and semantic reasoning.

  • Topics in High-Performance Messaging: Successful deployment of a messaging system requires background information that is not easily available; most of what we know, we had to learn in the school of hard knocks. To save others a knock or two, we have collected here the essential background information and commentary on some of the issues involved in successful deployments. 

  • jedisct1/libsodium: Sodium is a new, easy-to-use software library for encryption, decryption, signatures, password hashing and more. It is a portable, cross-compilable, installable, packageable fork of NaCl, with a compatible API, and an extended API to improve usability even further.

  • mercury: An RPC client/server implementation using Typhon, intended for building microservices.

  • ben-manes/caffeine: Caffeine is a high performance, near optimal caching library based on Java 8. Caffeine provides an in-memory cache using a Google Guava inspired API. The improvements draw on our experience designing Guava's cache and ConcurrentLinkedHashMap. Also, TinyLFU: A Highly Efficient Cache Admission Policy