hot links

Stuff The Internet Says On Scalability For June 14, 2013

High Scalability

14 Jun 2013 — 4 min read

Hey, it's HighScalability time:

(Steve Gibson on Security Now with a plausible analysis of the tech behind PRISM)

27 billion: WhatsApp messages per day
Quotable Quotes:
- Richard Feinman: If Bill Gates walks into a bar, on average, everybody in the bar is a millionaire.
- @giltene: Financial Programmers get paid by the CPU cycle. Web developers get paid by the developer cycle.
- @johndmitchell: “It’s the I/O, stupid.”
- @PatrickMcFadin: More people registering at #cassandra13 No worries. Adding more nodes at the reg desk.

Google does it with science. Here's a list of Excellent Papers for 2012 from Googlers and friends. Most relevant for HS readers is a wildly inspiring Spanner: Google's Globally-Distributed Database. But you'll also see the influence of extracting knowledge from data to do subtle and interesting things. On that theme is Improving Photo Search: A Step Across the Semantic Gap. Google is doing hard hard things with seemingly small paybacks, but taken as whole it's clear what is being created is a generative ecosystem built around creating and applying knowledge.

When applications move to the browser it turns out we get the same problems as we had on servers. Effectively Managing Memory at Gmail scale describes how Gmail suffered from 100% CPU usage. As usual the problem was memory: learn how the Gmail team used Chrome DevTools to identify, isolate, and fix their memory problems.

Google App Engine is also finding that scattered IDs are a good idea for performance. Spread data around to increase IO bandwidth.

Facebook uses Apache Giraph to turn weekend jobs using Hive to coffee break jobs. 20x CPU speedup and 100x elapsed time speedup. 15 hours becomes 9 minutes. Coordinate with ZooKeeper. Netty for networking. Fastutil for collections. byte[] is powerful. Being unsafe is a good a thing.

It's full of tools. Linux Performance Analysis and Tools. Brendan Gregg with an amazing 60 minute crash course on Linux performance analysis and tools. Love the diagram of the entire stack and what tools to use to inspect each part.

The first thing you learn when you try anything geo is that it's complicated and it's hard. So it's useful to see the stack Foursquare uses in its geo work: Quattroshapes: A Global Polygon Gazetteer from Foursquare: PostGIS, GDAL, Shapely, Fiona, QGIS, S2, and JTS as well as open geographic data: OSM, geonames.org, US Census’ TIGER, Canada’s geogratis, Mexico’s INEGI and EuroGeoGraphics.

Greg Ferro has a very good Introduction to How Overlay Networking and Tunnel Fabrics Work. It's tunnels all the way down.

Please conserve, mobile bandwidth is a precious commodity. Why We Need Responsive Images: if you served appropriately sized images on the original responsive demo site, you could shave 78% off the weight of those images (about 162kB) on small screens.

It's an App Native world. The mobile-first web: The number of mobile web users is already at 1.5 billion, which happens to be quite close to the total number of Internet users back in 2009. In 2015 there will be an estimated 2 billion smartphone users which is quite close to the total number of Internet users currently.

A surprising source of latency. Multiple data center performance. Writing to a replica in another datacenter increases latency. HA has a price.

Handling huge event streams by staying within the flow: Beyond Scaling: Real-TimeEvent Processing with Stream Mining. Stream mining algorithms answer “stream queries” with finite resources usign approximate results.
Videos for the 2013 Content Delivery Summit are available. CDNs are so big a part of modern web sites it's good to learn all you can. Dan Rayburn also has a good take on Amazon Gets More Competitive With Announcement Of Custom SSL Support For Static & Dynamic Content Delivery.
Kale stack: Skyline and Oculus. Skyline detects anomalous metrics. Oculus is searched to see if any other metrics look similar. At that point, we can make an informed diagnosis and hopefully fix the problem.
Interesting Facebook post on: Under the Hood: The entities graph. Hard to imagine that as late as "2010, nearly all profile sections on Facebook were stored as plain text." Google may have a knowledge graph, but Facebook has a people graph. Which graph wins? Also, an infographic on Facebook Behind the Numbers.
Werner Vogels goes back to basics on auction bidding strategies. I think I'll just one click that.
Could be fun. Ian Bogost announces Object Lessons: the first book, a surprising historical take on the remote control by Caetlin Benson-Allott.
Apple WWDC 2013 session videos are now available. Login required, of course.
RedHat is putting more PaaS competition into the ring with OpenShift. Steven Citron-Pousty in Scaling in Action on OpenShift shows how it works. The ideas are familiar and the execution looks straightforward. Scaling at the data tier is not yet supported. Just an option to consider.
Combining distributed with search is a tricky business. Yokozuna: Scaling Solr With Riak by Ryan Zezeski. Nice slide deck on a complicated topic.
Release Engineering as a Force Multiplier. Mozilla explains: how the team added support for concurrent development, rethought continuous integration and increased capacity by moving to a hybrid-cloud build infrastructure. These changes improved several aspects of the business, including switching to a rapid release model and reducing turnaround time on a release from weeks to hours. As a result, Mozilla improved its abilities against much bigger and better funded competitors in the marketplace while also allowing them to enter new markets and help ensure its long-term success.
Hopscotch Hashing: We present a new resizable sequential and concurrent hash map algorithm directed at both uniprocessor and multicore machines. The algorithm is based on a novel hopscotch multi-phased probing and displacement technique that has the ﬂavors of chaining, cuckoo hashing, and linear probing, all put together, yet avoids the limitations and over heads of these former approaches.
Fast Concurrent Queues for x86 Processors: show how to rely on fetch-and-add (F&A), a less powerful primitive that is available on x86 processors, to construct a nonblocking (lock-free) linearizable concurrent FIFO queue.
Drinking from the Firehouse: How the Mill CPU Decodes 30+ Instructions per Cycle.
MegaPipe: A New Programming Interface for Scalable Network I/O: a new API for efﬁcient, scalable network I/O for message-oriented workloads. The design of MegaPipe centers around the abstraction of a channel a per-core, bidirectional pipe between the kernel and user space, used to exchange both I/O requests and event notiﬁcations. On top of the channel abstraction, we introduce three key concepts of MegaPipe: partitioning, lightweight socket (lwsocket), and batching.
Optimal Queue-Size Scaling in Switched Networks: we provide a new class of online scheduling policies that achieve optimal average queue-size scaling for a class of switched networks including input-queued switches.

Stuff The Internet Says On Scalability For June 14, 2013

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale