Stuff The Internet Says On Scalability For June 1, 2012

It's HighScalability Time:

  • Yottabytes : What NSA knows about US; 214ms : ping between San Jose and Fez; $42M : MongoDB is funding scale!; 20K : lines of THX sound code
  • @adrianco: My takeaway from the MongoDB talk at ‪#gluecon‬ is that Mongo is implementing eventual scalability in the next version
  • The death of the general purpose computer is causing strange events like Facebook making their own smart phone. Adam Smith said we all benefit when our neighbors get richer, it creates a bigger pie. We are heading back to the mercantalist notion of a zero sum game. Google is also racing to the bottom Google Product Search To Become Google Shopping, Use Pay-To-Play Model. Zero sum thinking always leads to war. Just sayin.
  • Stuxnet, sometimes you just can't keep it in your pants and Pandora always complained that lid was never on very tight. Bad Prometheus.
  • The Design of LLVM. Chris Lattner with a fascinating, detailed, and surpisingly clear look at LLVM, a universal back-end for C, C++, and Objective C compilers. On Reddit. On Hacker News.
  • Picplum Tech Stack. Good description of a service for sending photos, especially UI issues. Their stack looks like: Coffeescript, Backbone.js, Rails, Unicorn, Resque, Heroku Postgres, Heroku (Nginx), AWS Cloudfron & S3, Padrino, Sinatra, MongoDB. Apparently YC companies get a deal on Heroku. A helpful discussion On Hacker News. Also, technologies used by YC companies. Also also, Heroku Chief Opens the Door to More Processes, Bigger Ecosystem, Less Amazon.
  • Is Twitter trending a move back to server side designs again? For performance reasons (5x improvement) Twitter is reversing their previous decision of rendering the UI on the client using a data API. Or did they just do it wrong and client side rendering is still the right way to do it?
  • Kyle Brandt explains their use of HAProxy and StackExchange: HAProxy scales very well. An an example, the Stack Exchange network use web sockets which maintain open TCP connections. While I am posting this we have 143,000 established TCP sockets on a VMware virtual machine with no issues. The CPU usage on the VM is around 7%.
  • CAP Twelve Years Later: How the "Rules" Have Changed. Eric Brewer's CAP retrospective has escaped the paywall. Murat has notes on the article from his class. The modern CAP goal should be to maximize combinations of consistency and availability that make sense for the specific application. Such an approach incorporates plans for operation during a partition and for recovery afterward.
  • Most of the slides are available for HBasecon. Looks like Hbase is popular in heavy update use cases and as a backend for scalable searching.
  • 21st Century Computer Architecture: architecture as infrastructure, energy first, new technologies, chips to systems, performance + security, parallelism, specialization, cross-layer design, non-volatile memory, near-threshold voltage operation, 3D chips, photonics, breaking layers.
  • Know a Delay: Nagle’s Algorithm and You. Boundary's Kyle Kingsbury recommends: use NODELAY on TCP sockets when you’re sending many small, latency-critical messages. Programs that send many small messages over a TCP socket can cause problems; since each packet requires a 20 byte IP and a 20 byte TCP header, the overhead can saturate the link rapidly. Also, an awesome talk by Glyph Lefkowitz: Through the Ether and Back Again: What Happens to a Packet When You Send.
  • A Simple Algorithm for Finding FrequentElements in Streams and Bags: We present a simple, exact algorithm for identifying in a multiset the items with frequency more than a threshold θ. The algorithm requires two passes, linear time, and space 1/θ. The first pass is an on-line algorithm, generalizing a well-known algorithm for finding a majority element, for identifying a set of at most 1/θ items that includes, possibly among others, all items with frequency greater than θ
  • Go Daddy uses Cassandra for their distributed session store.
  • Scalability resources on Quora.
  • The end game for Iqlect is a stack of components for building an elastic application platform and the first part of the stack is Bangdb, a key value nosql db.
  • Iron.io shares their Best Practices: Scalable Image Processing: processing an image is about a thousand times more demanding than processing text. Switching to a bigger, faster system to process your images is only a short-term fix and can be quite costly. Projects that would normally take 10 hours to complete will now only take 10 minutes.
  • Rackspace is open sourcing a bunch of node.js related code: node-swiz – library For Serializing, Deserializing And Validating Objects In REST APIs; Whiskey - a testing framework; node-elementtree - library To Build And Parse XML Documents
  • Rob Diana with a nice set of links for Geek Reading.

This weeks musical selection: