Stuff The Internet Says On Scalability For May 20, 2011

Submitted for your reading pleasure on this beautiful morning:

  • Group Decision Making in Honey Bee Swarms. In distributed computing systems nodes reach a quorum when deciding what to do as a group. It turns out bees also use quorum logic when deciding on where to nest! Bees do it a bit differently of course:  A scout bee votes for a site by spending time at it, somehow the scouts act and interact so that their numbers rise faster at superior sites, and somehow the bees at each site monitor their numbers there so that they know whether they've reached the threshold number (quorum) and can proceed to initiating the swarm's move to this site. Ants use similar mechanisms to control foraging. Distributed systems may share common mechanisms based on their nature as being a distributed system,  the components may not matter that much.
  • Fire! Fire!  Brent Chapman shows how to put that IT fire out in Incident Command for IT: What We Can Learn from the Fire Department
  • Scale Fail (part 1). Josh Berkus warns against hopping on the trendy train: Scaling an application is all about management of resources and administrative repeatability. Use data so that you work on real unknowns instead of unknown unknowns. And blocking processes, just don't do it.
  • Quotable quotes:
    • @Sri_few_words: Every 600 phones, means a new server in data center" - Cloud Computing being driven strongly by smartphones; tablets
    • @mishok13: I seriously think that using node.js to "minimize memory footprint" and "increase scalability" is f*cking retarded.
    • @varunshoor: Scalability is a b*tch
    • @Mededitor:  What corporate jargon drives you mad? "Scalable," "scalability," and "granular" are on my short list.
    • @talfirevic: Adding scalability features to your code is strangely fulfilling.
  • Unlocked achievements: Netflix Now The Largest Single Source of Internet Traffic In North America, With 1 Billion Views Per Quarter, Blip.tv Becomes A Video Destination
  • Google is trying to make SSL faster to: Eureka! Google breakthrough makes SSL less painful by  Dan Goodin.  More on the Chromium blog: SSL FalseStart Performance Results. We implemented SSL False Start in Chrome 9, and the results are stunning, yielding a significant decrease in overall SSL connection setup times. SSL False Start reduces the latency of a SSL handshake by 30%.
  • Gangstas Don't Scale by Julian Browne. There are lots of ways to achieve scale, and each has its pros and cons. Asynchronous links improve loose coupling but reduce the ability to manage distributed transactions, which is indicated by CAP Theorem as something to avoid if you want availability and partition tolerance over consistency.
  • Memcache has been used for keeping rate limiting statistics. Chris O'Hara goes next gen and explains Rate limiting with Redis. Redis exposes fast and efficient in-memory data structures and provides some useful atomic commands. The Redis hash structure can be used to create a flexible structure where interval-related queries are O(1) amortized.
  • The  FAQ for Google App Engine - Pricing and Features. This is a Google Group thread with lots of hot activity. They are still trying to figure it out. The upshot is instance pricing over CPU pricing, which changes how application need to be built on GAE, it's just not clear what that means yet.
  • A new book, Scalability Rules, has just been released by Martin Abbott and Michael Fisher. Scalability Rules brings together 50 rules that are grounded in experience garnered from over a hundred companies such as eBay, Intuit, PayPal, Etsy, Folica, and Salesforce. You can find it in the HighScalability book store. I have not read it yet but will report after doing so.
  • Lessons Learned in Erlang Land. Kresten Thorup on Erlang, the Actor model, and the misguided conclusion that cloud computing, multi-core processors, and fault tolerance require killing off the object oriented programming paradigm.
  • Daniel Ehenberg discovers why mmap just gets in the way in the end. On the OS depend not.
  • Dave Winer finds Rackspace cloud beats Amazon EC2, by a lot.
  • Huffman Codes well explained by Steven Pigeon. So you can use Huffman’s simple procedure to create your own variable length codes suited to the frequencies of occurrence of the symbols in the source you are considering
  • MongoDB live at Craigslist. The great Jeremy Zawodny recounts some of his experiences moving from MySQL to MongoDB. Craislist needs to store 5 billion documents and gives some details about their setup, but I look forward to his more detailed deep dive that will come later. You may also enjoy Jeremy's Databases 10 Years AgoNowadays, I could probably simulate our old Yahoo! Finance “feeds” infrastructure on my little Thinkpad laptop.
  • Maze62: a dense and speedy alphanumeric encoding for binary data by Denis Altudov. The proposed encoding algorithm is linear in speed, nearly as dense as Base64. More on HackerNews.
  • The ϕ Accrual Failure Detector that is used by Cassandra. 
  • One Pass Real-Time Generational Mark-Sweep Garbage Collection on Lambda the Ultimate. Joe Armstrong and Robert Virding talk about a very simple garbage collector used in Erlang.
  • StumbleUpon is starting technology series on their infrastructure. You can stumble upon the premier epidose at Wrangling Data with Open Source.
  • Data Center Knowledge Video  SeaMicro’s Low-Power Servers. 512 processors in a box, low power usage, fast interconnects, composable cell design.