Stuff The Internet Says On Scalability For February 20th, 2015

Hey, it's HighScalability time:


Networks are everywhere, they can even help reveal disease connections.

  • trillions: number of photons constantly hitting your eyes; $19 billion: Snapchat valuation;  8.5K: average number of questions asked on Stack Overflow per day
  • Quotable Quotes:
    • @BenedictEvans: End of 2014: 3.75-4bn mobiles ~1.5bn PCs  7-800m consumer PCs 1.2-1.3bn closed Android 4-500m open Android 650-675m iOS 80m Macs, ~75m Linux
    • @JeremiahLee: “Humans only use 10% of their internet.” —@nvcexploder #NodeSummit
    • beguiledfoil: Javavu - The feeling experienced when you see new languages make the same mistakes Java made 20 years ago and momentarily mistake said language for Java.
    • @ewolff: If Conway's Law is so important - are #Microservices more an organizational approach than an architecture?
    • @KentLangley: "Apache Spark Continues to Spread Beyond Hadoop." I would say supplant. 
    • Database Soup: An in-memory database is one which lacks the capability of spilling to disk.
    • Matthew Dillon: 1-2 year SSD wear on build boxes has been minimal.
    • @gwenshap: Except there is one writer and many readers - so schema and validation must be done on ingest. Anywhere else is just shifting responsibility
    • @jaykreps: Startup style of engineering (fail fast & iterate) doesn't work for every domain, esp. databases & financial systems
    • Taulant Ramabaja: Decentralization is not a goal in and of itself, it is a strategy
    • Eli Reisman: Etsy runs more Hadoop jobs by 7am than most companies do all day.
    • Dormando: We're [memcached] not sponsored like redis is. I've only ever lost money on this venture.
    • The Trust Engineers: There are more Facebook users than Catholics.

  • Exponent...The new integration is hardware + software + services. Not services like disk storage, but services  like HomeKit, HealthKit, Siri, Car Play, Apple Pay. Services that touch every part of our lives. Apple doesn't build cars, stores, or information services, it wraps them with an Apple layer that provides the customer with an integrated experience while taking full advantage of modularity. Modularity wrapped with integration. Owning the hardware is a better profit model than sercvices in the cloud.

  • Quite a response to You Don't Like Google's Go Because You Are Small on reddit. A vigorous 500+ comments were written. Golang isn't perfect. How disappointing, so many things are.

  • After making Linux work well on multiple cores that next bump in performance comes from Improving Linux networking performance. It's a hard job. For a 100Gb adapter on 3GHz CPU there are only about 200 CPU cycles to process each packet. Good break down of time budgets for for various instructions. The approach is improved batching at multiple layers of the stack and better memory management, which leads directly into Toward a more efficient slab allocator.

  • The process behind creating a Google Doodle for Alessandro Volta’s 270th Birthday reminds me a lot of the process of making old style illustrations as described in Cartographies of Time: A History of the Timeline. The idea is to encode symbolically as much of the important information as possible in a single diagram. The coded icon of a tiny skull could mean, for example, a king died while on the throne. A single flame could stand for the fall of man. This art is not completely lost with today's need to convey a lot of information on small screens. This sort of compression has advantages: Strass believed that a graphic representation of history held manifold advantages over a textual one: it revealed order, scale, and synchronism simply and without the trouble of memorization and calculation.

  • I wonder if there's a similar impact of viri on computer systems? Parasitism alters three power laws of scaling in a metazoan community: Taylor’s law, density-mass allometry, and variance-mass allometry.

  • What can you do in a single afternoon for $20? Quite a lot. You Don’t Need $1MM for a Distributed System. You can create some containers with Docker, set up service discovery with Consul, and setup dynamic load balancing with Nginx. Includes scripts and configuration info. 

  • Great story and lessons. Why Google Beat Inktomi: the Inside Story From Former Engineer: Inktomi didn't control the front-end. We provided results via our API to our customers. This caused latency. In contrast, Google controlled the rendering speed of their results. Inktomi didn't have snippets or caching. Our execs claimed that we didn't need caching because our crawling cycle was much shorter than Google's. Instead of snippets, we had algorithmically-generated abstracts. 

  • Derpscientist: Before sharding, I'd invest in aggressive caching, offloading the OLAP workload to elastic search and putting the requests that can be very high latency behind a batch inserting message queue. Low latency in a user (snowflake) schema is generally the highest priority item that cannot be tricked.

  • Nucleation is a deep deep pattern. All software starts from a seed and builds layer by layer. It’s Buggy Out There: To freeze at higher temperatures, water needs a seed, or ice nucleus, a tiny particle that acts as a geometric template, aligning water molecules into a highly organ­ized solid crystal.

  • Google makes the case that their cloud is cheaper than AWS. Understanding Cloud Pricing. A detailed comparison of different scenarios. Google says their solution has a cost advantage of between 38% and a 3.9% while offering greater flexibility in instance type selection.

  • Maybe photos don't rule the world. Reversal Of Facebook: Photo Posts Now Drive Lowest Organic Reach. The average video post is seen more than twice as often.

  • Walmart chose OpenStack. A significant win for OpenStack and they've already deployed on 100K cores. Walmart is overhauling their technology stack, running services on an elastic cloud. They talk about “diseconomies of scale,” where the cost per transaction actually goes up, and how a cloud architecture can be a more cost efficient way to deal with scale. They don't think OpenStack is best of breed tech, but love that it is open source. 

  • Ben Stopford with a great explanation of Log Structured Merge Trees: In log structured file systems data is written only once, directly to a journal repressed as a chronologically advancing buffer. 

  • The unstoppable rise of tape storage continues. If this seems strange, remember Google backs up on tape.

  • HTTP/2 is Done. The future is secure and fast. It also marks the end of the amateur web. What is it? Take a look at http2 explained or Ilya Grigorik's fabulous book: High Performance Browser Networking.

  • HTTP/2 is Live in Firefox: 9% of all Firefox release channel HTTP transactions are already happening over HTTP/2...For HTTP/1 74% of our active connections carry just a single transaction - persistent connections just aren't as helpful as we all want. But in HTTP/2 that number plummets to 25%. That's a huge win for overhead reduction. Let's build the web around that.

  • Doesn't this sound like what you do on a WAN connecting multiple datacenters? Smarter multicore chips: What we do is we first place the data roughly. You spread the data around in such a way that you don’t have a lot of [memory] banks overcommitted or all the data in a region of the chip. Then you figure out how to place the [computational] threads so that they’re close to the data, and then you refine the placement of the data given the placement of the threads. By doing that three-step solution, you disentangle the problem.

  • Not stuck in the 1990s, Netflix puts A Microscope on Microservices using Atlas for cloud-wide monitoring. They have an impressive ability to drill down on bottlenecks or take a look at the entire cloud, all with pretty pictures. 

  • Epic. Coding for SSDs – Part 6: A Summary – What every programmer should know about solid-state drives.

  • A very rare sighting of an Apple employee giving a technical presentation in the wild. SolrCloud at Apple: Automating and Scaling for Massive Multi-Tenancy. Good on them. It would be great if we could see Apple at more events. They must have a lot to contribute to the Great Work. And it's a good talk too.

  • Thoughts on Apache Mesos: A core challenge Mesos addresses is that of satisfying the constraints of a framework without actually knowing about them. Here’s where the sometimes misunderstood resource offer process comes into play and one way to understand this is by analogy. Mesos behaves like the parent host at a kids birthday party: you’ve got some 15 kids (==frameworks) to supply with food (== resources) and can’t possible know their inclinations (placement preferences). But you can offer them a piece of pizza or a bowl of rocket and they are free to accept it (now or later) or to reject it. Further, it might be that the dad who dropped off one of the guests told you that his youngster is a vegetarian, so there’s no point in you offering him, say, a beef burger (== filters), etc.

  • Papers are now available from Fast '15, the 13th USENIX Conference on File and Storage Technologies. You may like FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs or Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth.

  • Stream Processing and Probabilistic Methods: Data at Scale: Probabilistic solutions trade space and performance for accuracy. While a loss in precision may be a cause for concern to some, the fact is with large data streams, we have to trade something off to get real-time insight. Much like the CAP theorem, we must choose between consistency (accuracy) and availability (online). With CAP, we can typically adjust the level in which that trade-off occurs. This space/performance-accuracy exchange behaves very much the same way.

  • Million user webchat with Full Stack Flux, React, redis and PostgreSQL: Bottom line: to run a full-fledged, million-user chat server, you need 10-20 front-end processes, 1 postgreSQL server, 1 multiplexer process, and 1 redis process.

  • An interesting conversation about Memcached design and comparison with Redis with a response thread with Dormando: There's honestly nobody even asking for better performance... it's just an area of academic study for a lot of people (see one of the dozens of papers on modifying memcached). 

  • The Future is Now: Interview with Reality Sandwich: That’s the main hack: stop worrying what people think, ditch brand names, stay out of debt (you can’t even get rid of this stuff by going bankrupt) and – like Timothy Leary said – find the others. That can be the trickiest part – finding your team. It’s like finding your table in the high school cafeteria.

  • SuperIMAP. Monitor inboxes for incoming email, at scale. Nice explanation of how it works.

  • All things Scala

  • FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs:  We demonstrate that a multicore server can process graphs with billions of vertices and hundreds of billions of edges, utilizing commodity SSDs with minimal performance loss. We do so by implementing a graph-processing engine on top of a user-space SSD file system designed for high IOPS and extreme parallelism. 

  • Reliable, Consistent, and Efficient Data Sync for Mobile Apps: We built Simba, a data-sync service that provides mobile app developers with a high-level local-programming abstraction unifying tabular and object data—a need common to mobile apps—and transparently handles data storage and sync in a reliable, consistent, and efficient manner.

  • RIPQ: Advanced Photo Caching on Flash for Facebook: RIPQ aggregates small random writes, co-locates similarly prioritized content, and lazily moves updated content to further reduce device overhead. We show that two families of advanced caching algorithms, Segmented-LRU and Greedy-Dual-Size-Frequency, can be easily implemented with RIPQ. Our evaluation on Facebook’s photo trace shows that these algorithms running on RIPQ increase hit ratios up to ~20% over the current FIFO system, incur low overhead, and achieve high throughput.

  • Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth: In this paper, we design erasure codes that are simultaneously optimal in terms of I/O, storage, and network bandwidth. Our design builds on top of a class of powerful practical codes, called the product-matrix-MSR codes. Evaluations show that our proposed design results in a significant reduction the number of I/Os consumed during reconstructions (a 5 reduction for typical parameters), while retaining optimality with respect to storage, reliability, and network bandwidth.

  • Offline Evaluation and Optimization for Interactive Systems: One approach to offline evaluation is to build a user model that simulates user behavior (clicks, purchases, etc.) under various contexts, and then evaluate metrics of a system with this simulator. While being straightforward and common in practice, the reliability of such model-based approaches relies heavily on how well the user model is built.