Stuff The Internet Says On Scalability For January 31st, 2014

Hey, it's HighScalability time:

Largest battle ever on Eve Online. 2,000 players. $200K in damage. Awesome pics.

teaspoon of soil: hosts up to a billion bacteria spread among a million species.

Quotable Quotes:
- Vivek Prakash: The problem of scaling always takes a toll on you.
- @jcsalterego: See This One Weird Trick Hypervisors Don't Want You To Know

Upgrades are the great killer of software systems. Do you really want a pill that would supply materials with instructions for nanobots to form new neurons and place them near existing cells to be replaced so you have a new brain within six months? Scary as hell. But there's an nanoapp for that.

Ted Nelson has a fascinating series of Computers for Cynics vidcasts on YouTube. I'd ony really known of Mr. Nelson from his writings on hypertext, but he has a broad and penetrating insight into the early days of the computer industry. He's not really cynical, but I've always had a hard time differentiation realism from cynicism. Suffice it to say he thinks there have been many wrong paths chosen by our industry and not everything is as told by various industry hagiographies. You may like: The Nightmare of Files and Directories, The Database Mess, The Dance of Apple and Microsoft, and The Real Story of the World Wide Web.

From small beginnings. Where it all started: "the internet" in 1969: His idea for the project was the "spirit of community" and was interested in "having computers help people communicate with other people" (Licklider, Licklider, and Robert Taylor) as opposed to using the computer to communicate for us.... By the end of 1969, ARPANET was able to connect to four locations: UCLA, UC Santa Barbara, SRI, and Utah.

In the sometimes your enemy is really your friend department...Why This Mars Rover Has Lasted 3,560 Days Longer Than Expected: scientists originally thought that all of the Martian dust blowing around would coat the robot's solar panels, rendering them unusable within a few months. Instead, strong winds have helped to keep the panels relatively clean.

Data wants to be stored. James Hamilton explores bulk storage options in Optical Archival Storage Technology: This Facebook hardware project is particularly interesting in that it’s based upon an optical media rather than tape. Tape economics come from a combination of very low cost media combined with only a small number of fairly expensive drives. The tape is moved back and forth between storage slots and the drives when needed by robots. Facebook is taking the same basic approach of using robotic systems to allow a small number of drives to support a large media pool. But, rather than using tape, they are leveraging the high volume Blu-ray disk market with the volume economics driven by consumer media applications. Expect to see over a Petabyte of Blu-ray disks supplied by a Japanese media manufacturer housed in a rack built by a robotic systems supplier.

Yeppp!: a high-performance SIMD-optimized mathematical library for x86, ARM, and MIPS processors on Windows, Android, Mac OS X, and GNU/Linux systems.

Evernote on Synchronization Speedupification. Strategies that worked once may not work when you get larger. And SSD only gives you some headroom. This is how Evernote changed the client sync protocol to make it 99% faster and reduce IO and CPU use on shards. The trick was to create index so requests didn't have to go out to all shards.

Horizontal Scaling of PHP Apps, Part 1 - scaling of the application layer. Horizontal Scaling of PHP Apps, Part 2 - scaling the database. Nicely detailed with code examples. Covers most of the bases.

Redshift isn't as gamechanging as the people who make it suggest. Edward Capriolo says Hive doesn't have to vacuum, loads fast, is vectorized, has fewer limits, and can tackle problems Redshift can't. So there.

Antifragile Systems: power supplies that harvest energy from random vibrations, but perhaps that is too trivial an example. The closest examples I can find of antifragility in an engineered system involve multipath phenomena.

Dropbox on Improving Retrieving Thumbnails Performance. Solution reduces the number HTTP requests and improve performance on all platforms. First, monitor performance using the Navigation Timing API. Use Hive to look at page load distribution. SPDY didn't work with nginx or Amazon's ELB or the mobile stack. So use HTTP requests with multiple image urls (batch requests). Return all the images in a single plain-text response. Each image is on its own line, as a base-64-encoded data URI. Limit the batch sizes to get around GET request size limitations.

Drones are a problem for Eve Online too: This is one of the bounding scaling factors in large fleet fights, the unavoidable O(n2) situation where n people do things that n people need to see. This is a problem for guns too, of course, but it’s magnified for drones in two ways. The first is that they simply have more messages, so as the n2 part becomes large, drone’s contribution becomes larger faster. The other bit is that the decision making code behind drone behavior does a poor job of scaling, often considering all attackable objects on grid when figuring out who to go after. Again, an n2-like problem, where n drones consider attacking (n + num_ships) targets.

In case you haven't been paying attention...11 Best Practices for Low Latency Systems: Choose the right language; Keep it all in memory; Keep data and processing colocated; Keep the system underutilized; Keep context switches to a minimum; Keep your reads sequential; Batch your writes; Respect your cache; Non blocking as much as possible; Async as much as possible; Parallelize as much as possible.

The story of why MemSQL chose Skiplists over Btrees: in-memory datastructure can reduce indirection with direct pointer references, simpler code, lock free, fast, and flexible. When compared to Innodb: MemSQL’s single-threaded scan performance is 5 times faster in the first case and 8 times faster in the second case. There is no black magic involved here. MemSQL needs to run far fewer CPU instructions.

Riak vs. Cassandra – A Brief Comparison. Lives up to its name in that it is brief, from a Riak perspective. Riak's key-value approach is more flexible, achieves multi-datacenter replication by connecting independent clusters, uses vector clocks not just time ordering, always accepts read and writes, read-repair is better for larger datasets.

In a multit-tenant messaging system that uses back pressure make sure you are not affected by other malicious tenants in the system or one tenant will block another with their chattiness. Use a dedicated cluster. Just one of the lessons in Continuous Integration: Scaling to 74,000 Builds Per Day With Travis CI & RabbitMQ.

Good list of MySQL server memory usage troubleshooting tips: Check memory related Global/Session variables; Check “SHOW ENGINE INNODB STATUS” for section “BUFFER POOL AND MEMORY“; Profiling MySQL Memory usage with Valgrind Massif; Check Plot memory usage by monitoring ps output; Memory tables in MySQL 5.7; Memory sections in output of pt-summary and pt-mysql-summary.

Nice tutorial on Writing Interactive Web Applications with Web Actors. It's now possible to code some quite sophisticated architectures in the browser.

Great coverage, highly recommended: Recommender systems, Part 2: Introducing open source engines.

A Uniﬁed Theory of Garbage Collection: We show that all high-performance collectors (for example, deferred reference counting and generational collection) are in fact hybrids of tracing and reference counting. We develop a uniform cost-model for the collectors to quantify the trade-offs that result from choosing different hybridizations of tracing and reference counting.

Read more