Stuff The Internet Says On Scalability For August 14th, 2015

Hey, it's HighScalability time:


Being Google CEO: Nice. Becoming Tony Stark: Priceless (Alphabet)

  • $7: WeChat's revenue per user and there are 549 million of them; 60%: Etsy users using mobile; 10: times per second a self-driving car makes a decision; 900: calories in a litre of blood, vampires have very efficient metabolisms; 5 billion: the largest feature in the universe in light years

  • Quotable Quotes:
    • @sbeam: they finally had the Enigma machine. They opened the case. A card fell out. Turing picked it up. "Damn. They included a EULA." #oraclefanfic
    • kordless: compute and storage continue to track with Moore's Law but bandwidth doesn't. I keep wondering if this isn't some sort of universal limitation on this reality that will force high decentralization.
    • @SciencePorn: If you were to remove all of the empty space from the atoms that make up every human on earth, all humans would fit into an apple.
    • @adrianco: Commodity server with 1.4TB of RAM running a mix of 16GB regular DRAM and 128GB Memory1 modules.
    • @JudithNursalim: "One of the most scalable structure in history was the Roman army. Its unit: eight guys; the number of guys that fits in a tent" - Chris Fry
    • GauntletWizard: Google RPCs are fast. The RPC trace depth of many requests is >20 in miliseconds. Google RPCs are free - Nobody budgets for intradatacenter traffic. Google RPCs are reliable - Most teams hold themselves to a SLA of 4 9s, as measured by their customers, and many see >5 as the usual state of affairs.
    • @rzidane360: I am a Java library and I will start 50 threads and allocate a billion objects  on your behalf.
    • @codinghorror: From Sandy Bridge in Jan 2011 to Skylake in Aug 2015, x86 CPU perf increased ~25%. Same time for ARM mobile CPUs: ~800%.
    • @raistolo: "The cloud is not a cloud at all, it's a limited number of companies that have control over a large part of the Internet" @granick
    • Benedict Evans: since 1999 there are now roughly 10x more people online, US online revenues from ecommerce and advertising have risen 15x, and the cost of creating software companies has fallen by roughly 10x. 

  • App constellations aren't working. Is this another idea the West will borrow from the East? When One App Rules Them All: The Case of WeChat and Mobile in China: Chinese apps tend to combine as many features as possible into one application. This is in stark contrast to Western apps, which lean towards “app constellations”.

  • It doesn't get much more direct than this. Labellio: Scalable Cloud Architecture for Efficient Multi-GPU Deep Learning: The Labellio architecture is based on the modern distributed computing architectural concept of microservices, with some modification to achieve maximal utilization of GPU resources. At the core of Labellio is a messaging bus for deep learning training and classification tasks, which launches GPU cloud instances on demand. Running behind the web interface and API layer are a number of components including data collectors, dataset builders, model trainer controllers, metadata databases, image preprocessors, online classifiers and GPU­-based trainers and predictors. These components all run inside docker containers. Each component communicates with the others mainly through the messaging bus to maximize the computing resources of CPU, GPU and network, and share data using object storage as well as RDBMS.

  • How do might your application architecture change using Lambda? Here's a nice example of Building Scalable and Responsive Big Data Interfaces with AWS Lambda. A traditional master-slave or job server model is not used, instead Lambda is used to connect streams or processes in a pipeline. Work is broken down into smaller, parallel operations on small chunks with Lambda functions doing the heavy lifting. The pipeling consists of a S3 key lister, AWS Lambda invoker/result aggregator, Web client response handle. 

  • The Indie Web folks have put together a really big list of Site Deaths, that is sites that have had their plugs pulled, bits blackened, dreams dashed. Take some time, look through, and say a little something for those that have gone before.

  • Xkcd provides a handy Researcher Translation table. If a researcher says a cool new tech will be available to consumers in the forth quarter of next year, what they really mean is the project will be canceled in six months. The others seem spot on as well.

  • Advanced services are looking more like utilities all the time. Google is filling out their service portfolio with Google Cloud Dataflow and Cloud Pub/Sub. Google Cloud Pub/Sub delivered over a trillion messages to alpha and beta customers with pricing as low as 5¢ per million message operations for sustained usage.

  • Facebook shares A Large-Scale Study of Flash Memory Failures in the Field: SSD failure rates do not increase monotonically with flash chip wear; he effects of read disturbance errors are not prevalent in the field; sparse logical data layout across an SSD’s  can greatly affect SSD failure rate;  higher temperatures lead to higher failure rates; data written by the operating system to flash-based SSDs do not always accurately indicate the amount of wear induced on flash cells due to optimizations in the SSD controller.

  • EC2 changes over time. How has it changed this year? Amazon EC2 2015 Benchmark: Testing Speeds Between AWS EC2 and S3 Regions: Oregon is the fastest region, overtaking California’s seat from last year; Sao Paulo is the slowest region to upload from; US regions remain the fastest choice for global coverage; For regions outside the US, Tokyo and Ireland came on top; The new Frankfurt region is the slowest in Europe; speeds within the same region were the fastest.

  • Damn you science. The Universe really is slowly dying say researchers.

  • Grammarly shows how they implement Petabyte-Scale Text Processing with Spark: the net cost to get Spark running at our scale ended up being about the same as just using EMR to perform the same tasks. However, over a longer term, we do expect substantially lower overall costs. We hope that our findings will help you avoid our initial investments and help bring Spark to wider adoption in crunching really big data sets! 

  • How do you make a picture give a positive impression in only 200 bytes? That was Facebooks goal in The technology behind preview photos. Rather than download a 100KB image over a slow network, Facebook included a facsimile of the image itself in the initial network request. The photo preview was displayed in the very first drawing pass and the rest was downloaded from the CDN in the background. 

  • V8 JavaScript Engine show how they go about Getting Garbage Collection for Free. Wonderfully detailed explanation. Expensive garbage collection operations are hidden during idle periods. Which works fine for a browser, but in a server environment there are no idle periods.

  • A great Postmortem for July 27 outage of the Manta service. Deadlock makes fools of us all.

  • Here's another great postmortem, this time from Eve Online. A strange one. One particular log channel on one particular server caused extreme server stability issues.

  • Rethinking the Internet of Things: A Scalable Approach to Connecting Everything: The key principle of the Internet of Things architecture is the segregation of networking cost and complexity to the propagator nodes, permitting much simpler components and architectures at the billions of simple end devices. These intermediate elements then bridge the gap between raw data chirps and big data meaning.

  • One Trillion Edges: Graph Processing at Facebook-Scale: In this paper, we have detailed how a BSP-based, composable graph processing framework supports Facebook-scale production workloads. In particular, we have described the improvements to the Apache Giraph project that enabled us to scale to trillion edge graphs (much larger than those referenced in previous work). We described new graph processing techniques such as composable computation and superstep splitting that have allowed us to broaden the pool of potential applications. We have shared our experiences with several production applications and their performance improvements over our existing Hive/Hadoop infrastructure. < Good discussion on HackerNews

  • Learning to live with incompleteness. The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing: We propose that a fundamental shift of approach is necessary to deal with these evolved requirements in modern data processing. We as a field must stop trying to groom unbounded datasets into finite pools of information that eventually become complete, and instead live and breathe under the assumption that we will never know if or when we have
    seen all of our data, only that new data will arrive, old data may be retracted, and the only way to make this problem tractable is via principled abstractions that allow the practitioner the choice of appropriate tradeoffs along the axes of interest: correctness, latency, and cost.