Stuff The Internet Says On Scalability For May 10, 2013

Hey, it's HighScalability time:


(In Thailand, they figured out how to solve the age-old queuing problem!)

  • Nanoscale: Plants IM Using Nanoscale Sound Waves; 100 petabytes: CERN data storage
  • Quotable Quotes:
    • Geoff Arnold: Arguably all interesting advances in computer science and software engineering occur when a resource that was previously scarce or expensive becomes cheap and plentiful.
    • @jamesurquhart: "Complexity is a characteristic of the system, not of the parts in it." -Dekker
    • @louisnorthmore: Scaling down - now that's scalability!
    • @peakscale: Where distributed systems people retire to forget the madness: http://en.wikipedia.org/wiki/Antipaxos 
    • @dozba: "The Linux Game Database" ... Well, at least they will never have scaling problems.
    • Michael Widenius: There is no reason at all to use MySQL
    • @steveloughran: Whenever someone says "unlimited scalability", ask if that exceeds the berkenstein bound
    • @nationofminds: "I have infinite MIPS. Unlimited scalability. And zero effing patience." 
    • Endowing cells with logic and memory: Genetic circuits that process and permanently store information are created with recombinases that flip the orientation of DNA cassettes.

  • And you thought scalability didn't pay: Twitter Acquires Palo Alto-Based Scalable Computing Startup Ubalo

  • Search Is Eating The World. The long sought after Nirvana of search and database becoming one may be nigh. 

  • New Finds: @foodfight is an interesting and informative Chef oriented DevOps podcast you may enjoy if that's the sort of thing you enjoy, which you probably do. From which I learned from fellow Way of Kings aficionado Brandon Burton about a new deep systems podcast called Real Talk by James Golick and Joe Damato, who want to talk about things concrete, not like that Hacker News BS.

  • I'd love to see the API: The idea we live in a simulation isn't science fiction. Magic anyone?

  • So maybe you want ATMs to be ACID afterall: Hackers Stole $45 Million From ATMs: They're suspected of working with hackers who twice broke into credit card processing companies' computer systems, stole ATM card data and bypassed the withdrawal limits on the accounts. The technique is known as an "unlimited operation," as the thieves can grab a potentially unlimited amount of cash.

  • Web as Platform is making strides as is shockingly shown by the Unreal Engine 3 running in Firefox in the form of 60MB of minified asm.js-flavoured JavaScript. 

  • In a competition between a credit card sized Rasberry Pi and and a AWS Micro instance who do you think would win? If you gave the Rasberry to the Micro you would be correct. James Malone ran the benchmark in Nginx Requests/Second – Raspberry Pi vs. Amazon EC2. Good discussion on Hacker News. Did not know: micro instances are only for shorts bursts of compute.

  • Drew Crawford says native apps are popular for a very a good reason: JS performance on ARM devices is absolutely abysmal.  It is an order-of-magnitude away from x86-class JavaScript performance.

  • Nice breakdown of the Architecture of Monkey a Linux based Web Server by Eduardo Silva. Covers the scheduler, workers, plugins, memory management, and lots of other juicy details. Looks cleanly done.

  • Distributed Computing at Airbnb: A very interesting stack of Mesos, Chronos, Hadoop, and Storm. Which is used to smartly tease location information from bookings to create a search algorithm for combining dozens of signals to surface the listings guests want.

  • An amazing magnum opus on Key Takeaway Points and Lessons Learned from QCon London 2013. Really too much too comment on, but an interesting read. 

  • Great technical explanation with code samples of Go Make it Rain, a portable game app for both iOS and Android written in C++. They tell you how they made it work on both platforms along with how they handle animation, gamifaction, multiplayer P2P game play.

  • Age discrimination is very real in the software industry but here's some good advice from John Sloan on How To Thrive In The Tech Industry For Decades: it's all about taking personal responsibility for learning and adapting over the years.

  • Google’s Chief Internet Evangelist on Creating the Interplanetary Internet. Seems like it has a lot in common with terrestrial mesh networking. 

  • Erudite The Future of the JVM panel discussion and corresponding Google Group thread.

  • Somewhat surprising: Redis with an SSD swap, not what you want: Redis is designed to work in an environment where random access of memory is very fast. Hash tables, and the way Redis objects are allocated is all based on this concept. The freedom Redis gets from the use of memory allows us to serve much more complex tasks at very good peak performance and with minimal system complexity and underlying assumptions. The outcome of this test was expected and Redis is an in-memory system.

  • Quick, let's patent this in the US: New Zealand reject software patents

  • Jonas Schwammberger with a sweet look at the basics of what programmers need to know about Modern Garbage Collectors under the Hood: It is true that if you are using an object anyways later on, that you should probably reuse it, but don't keep objects you might use at some point in the future in memory. With modern garbage collectors, short lived objects produce much less overhead than long lived ones. The Collector might need to swap it around in the Survivor spaces or if you are unlucky, it might even get promoted. Then you really produced unnecessary overhead. So don't hesitate to throw away what you don't need, it will only matter if you are allocating and throwing away megabytes. If you are doing that, no matter if you are using a Garbage Collector or not, you should rethink your program logic.

  • Ditching two-phased commits. Jimmy Bogard talks about the very real world situation of how to coordinate services that are databases and that don't have transaction managers. The ideas: 1) Idempotency is king. Get this and you’re halfway home 2) Strategies for dealing with downstream effects is a business decision.

  • Virtual Machines Are The New Processes. The advantage of processes over threads is that you CAN'T share state. Monolothic processes acting as a container for various services with various threads more often than not accidently on purpose end up sharing state, which is a performance killer and reliability nightmare. Those old Unix guys with their archaic process architectures are far more right than wrong. 

  • Conflict Handling using Rev Trees Functional Specification: high level data structures of revision trees to support conflict management. The purpose of Rev Tree's is to establish a relationship between the most recent edits of a document that exists on different machines. 

  • Soldier LMeyerov looks at Visualize Parallel Work Stealing in the Browser: Work stealing is a great parallel task scheduling algorithm that provides dynamic load balancing and spatial locality. The idea is that each thread has a local queue of ready-to-fire tasks that it loops through, and if it ever runs out of local tasks, it will steal tasks from another thread's queue. The visualization shows how this works for a parallel top-down tree traversal over a webpage's HTML tree. 

  • RAMP is a non-profit conference with a theme of Scaling Engineering from 1 to 100 Million. It is being held on July 11-12 in gorgeous Budapest. There's a really great speaker line up: Theo Schlossnagle (omniti), Rajiv Eranki (ex Dropbox), Jeremy Edberg (Netflix, previously Reddit), Amar Arsikere (Zynga) and many other from companies like Spotfy, Percona etc.