hot links

Stuff The Internet Says On Scalability For December 6th, 2013

High Scalability

06 Dec 2013 — 7 min read

Hey, it's HighScalability time:

Test your sense of scale. Is this image of something microscopic or macroscopic? Find out.

72: Intel's 72 core x86 Processor; One Trillion: number of fonts served by Google.
Quotable Quotes:
- West-Eberhard: The gene does not lead, it follows.
- @waldojaquith: To an ant, gravity is nothing, but surface tension is a powerful force. When you change scale, you play by different rules.
- Nicholas Christakis: The spread of germs is the price we pay for the spread of ideas. We assemble ourselves into networks to facilitate the flow information but we pay a price, the spread of disease.
- James Mickens: When you debug a distributed system or an OS kernel, you do it Texas-style. You gather some mean, stoic people, people who have seen things die, and you get some primitive tools, like a compass and a rucksack and a stick that’s pointed on one end, and you walk into the wilderness and you look for trouble.
- Joe McMahon: The average small startup in Silicon Valley today – 20 or so people – is carrying about the equivalent power of all the PDP-11′s sold during the 1970′s in their pockets and purses.
- Ilya Grigorik: Wow... amazon.com is completely disregarding the initial TCP congestion window
- Twitter: Every problem is a scaling problem.
And so it begins. Google has opened Google Compute Engine to the masses. You can look at this by comparing features, cost, performance, etc. You can compare by ecosystem. You can compare by who is most likely to eat their young. But what is clear: developers will now be comparing.

Has Stack Overflow saved billions of dollars in programmer productivity? You can argue the amount, but there's no argument that it's bigger than a bread box and smaller than a death star.

Region boundaries continue to fall and the abstraction and power to the programmer quotient continues to rise. Amazon has made RDS read slaves configurable across regions. That's a solid HA story.

Nice approach. Tutorial Caching Story - this is an overview of basic memcached use case, and how memcached clients work. Though there's almost no character development and the middle is directionless and the twist at the end just doesn't work.

When algorithms break bad: Why Did 9,000 Porny Spambots Descend on This San Diego High Schooler?

Amazing look at the algorithms used in the Linux kernel. Lots of juicy explanations and examples to learn from with links to the code. Priority sorted lists used for mutexes, drivers, etc. Red-Black trees are used for scheduling, virtual memory management, to track file descriptors and directory entries,etc. Radix trees, are used for memory management, NFS related lookups and networking related functionality. Bit arrays, which are used for dealing with flags, interrupts, etc. and are featured in Knuth Vol. 4.

Strange fact of the day. Reddit has group dedicated to cableporn. Nothing dirty. Just lots of long nicely arranged cables.

Basho videos from RICON are coming on line.

Great discussion on What is the enlightenment I'm supposed to attain after studying finite automata? There are different kinds of enlightenment. There's the kind of insight enlightenment taught through years of pondering Zen koans. Bing, you are now enlightened. Now go chop wood and carry water. Then there's an enlightenment that comes from lots and lots of work and thinking. Enlightenment comes when facing challenges and the learning that comes from them. So no enlightenment from studying, but when you put yourself in a position where the knowledge is needed in practice, bing, you understand, and you chop wood and carry water with that knowing.

Here's how Facebook generates completely customizes pages for 1.2 billion users. Under the Hood: Building and open-sourcing RocksDB. RocksDB is an embeddable, persistent key-value store for fast storage.

What is the performance and stability of Riak compared to similar distributed storage solutions? Benjamin Black: Riak is extremely stable. The M/R system in Riak is for low latency querying, not for batch/bulk processing. M/R is a processing paradigm, not a specific implementation. If you are looking at Riak as an alternative to batch processing with Hadoop, it's probably not the right tool for the job. If you are looking at Riak as a reliable, well-designed, distributed alternative to CouchDB or MongoDB, then it's a great choice.

Netflix is the Cloud Santa Clause, dropping software gifts into developer stockings even when it's not Xmas. First there's Zeno, an in-memory distribution framework that efficiently propagates and keeps up to date large datasets in RAM across many servers. Then there's Scryer: Netflix's Predictive Auto Scaling Engine - Part 2. Fascinating technical design of Netflix's predictive autoscaling engine. Even if you don't want to use their software out of the box these are great elements to consider: The API layer, The Data Collector, The Predictor, The Action Plan Generator, The Scaler. The prediction part works on FFT-based smoothing and linear regression with clustered data points. Allows finding patterns of problems. Follows a SDN kind of model where the controller is centralized, sitting back and observing then controlling based on intelligence when stuff happens.

Google Compute Engine's value proposition according to Urs Hölzle: we're working hard on building a cloud that's better than what's available elsewhere, and this is an example of what to expect from GCE: superior performance and scalability, at a low price (the experiment cost $10). Stay tuned. (from a comment on Compute Engine Load Balancing hits 1 million requests per second!) < Impressive example, though some people confuse load balancing with actually serving a load. Like this comment from Mike Park: Once upon a time setting up a server was a days adventure. Here, a few public scripts can setup 200 servers to handle 1 million requests per second. Setup to live: 7.5 freaking minutes. Also, Running a 2400 Akka Nodes Cluster on Google Compute Engine.

Brilliant story of how Walmart tracked down a memory leak in Node.js. The problem turned out to be that file descriptors weren't closed properly, a common type of failure. But the fun part was the process and tools used to discover the problem.

Complexity is non-linear. A toy Twitter solution written in a day using RoR is not Twitter. Diagnosis for Healthcare.gov: Unrealistic Technology Expectations: In particular, the project was doomed by a relatively late decision that required applicants to open an account and let the site verify their identity, residence, and income before they could browse for insurance. That meant the site would have to interface in real-time with databases maintained by the Internal Revenue Service and other agencies.

Martin Thompson on Reactive System Design: I think event-based systems are the best way to go for anything of scale. I think we sort of – we're smoking the crack cocaine of synchronous systems and people think it's easy to begin with. As soon as it gets big and complex, it becomes a mess and hard to maintain, hard to make it scale, hard to make it perform, hard to make it reliable. And the answer is really asynchronous and we're sort of driving back to that and proper event-driven system now. It's what we used to do before the web came along.

Great summaries in the Java Advent Calendar, (Part 1 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al.

I've always hated the name master/slave myself, but it's not even close to meaning client/server so that won't work. Primary/secondary seems to work. Rename "master/slave" terminology to "client/server"

Great story of Eve Online Building a Balanced Universe after a huge increase of users. Full of cool graphics and difficult problems well explained. Synackaon's Summary: In short, care not about network latency, care instead about splitting instances based on load. And re-evaluate your theories.

New features in Go 1.2 show why handing your work off to a magical execution machine may not be the best idea: 1) Goroutines are now pre-emptively scheduled 2) An increase to the default goroutine stack size should improve the performance of some programs. < These are core features that apps in the field need extensive turning to get right. Scheduling work to give the desired cycles to the desired bits of work is they key feature of concurrency in a mixed work environment. Giving up control means you can't really give any guarantees about performance. And correctly managing stack space is another one of those resources that must be carefully managed to make your particular program work to its optimum. You can't do that if you don't control it.

Making the Web Faster with HTTP 2.0. Ilya Grigorik with a very detailed look at how the web is throwing off its text based get and display roots and evolving to make the web faster by addressing: binary framing, head-of-line blocking, multiplexing, compression, server push, flow control, priority. The price is complexity. These are the kind of features commonly found in client-server application protocols only now pushed to the web. Also, Configuring & Optimizing WebSocket Compression.

Scaling a popular internet radio station, an interview with Mark, creator of Stereodose

Or: What happened when I hit the front page of reddit. Chose PythonAnywhere as it specialized in Python and web2py. Providing random playlists solved by adding an extra column. ORDER BY RAND() not scalable.

IT'S ALIVE! IT'S ALIVE! Google's secretive Omega tech just like LIVING thing: One of Google's most advanced data center systems behaves more like a living thing than a tightly controlled provisioning system. This has huge implications for how large clusters of IT resources are going to be managed in the future.

Alex MacCaw open sources code behind link and news sharing site Monocle: This looks like a shining example of a well-factored Sinatra app powered by PostgreSQL. If you’re just learning Ruby or want to learn some new tricks, give it a read.

Engineyard's Distill Videos are now available.

Really nice diagram and explanation on the Understanding how SQL Server executes a query.

Some insight from Packet Pushers on the future of datacenter networks...Today in the datacenter we have 5 networks. A core network where your production is. You have a top of rack network. DMZ out the front. Dev/test network. Storage network. What we will have is your legacy network. DMZ into a VMware layer. vMotion at layer 2. ECMP overlay network where all your new stuff is. A network driven by applications.

Unified Query Processing for JSON Documents and Indexes: JPath, a JSON database query language, and its syntax, semantics, and implementation. We introduce an indexing data structure for answering JPath queries, and provide a theory unifying query execution on data and index trees using operations on matrices with lattice-valued elements.

SAMOA (Scalable Advanced Massive Online Analysis): a distributed streaming machine learning (ML) framework that contains a programming abstraction for distributed streaming ML algorithms.

Stuff The Internet Says On Scalability For December 6th, 2013

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale