Stuff The Internet Says On Scalability For March 2, 2012

Please don't squeeze the HighScalability:

  • Quotable quotes:
    • @karmafile: "Scalability" is a much more evil word than we make it out to be
    • @ostaquet: More hardware won't solve #SQL resp. time issues; proper indexing does.
    • @datachick: All computing technology is the rearrangement of data. Data is the center of the universe
    • @jamesurquhart: "Complexity is a characteristic of the system, not of the parts in it."
  • Data is the star of the cat walk, looking fierce in Ilya Katsov's impeccably constructed post on NoSQL Data Modeling Techniques: In this article I provide a short comparison of NoSQL system families from the data modeling point of view and digest several common modeling techniques.
  • Peter Burns talks computer nanosecond time scales as a human might experience them. Your memory == computer registers , L1 cache == papers kept close by, L2 cache == books, RAM == the library down the street, and going to disk is a 3 year odessy for data.
  • Fault Tolerance in a High Volume Distributed System at Netxlix (slidedeck). Ben Christensen with another deep dive on Netflix tech. This time it's on how to support an extreme service architecture by: isolating failures, shedding load, and being resilient to failures. Their solution makes use of multi-pronged: network timeouts and retries, separate threads on per-dependency thread pools, semaphores (via a tryAcquire, not a blocking call), and circuit breakers.
  • Virtualizing storage controllers. StorageMojo looks at the viability of VMs running directly on storage controllers using the Linux KVM hypervisor. By assigning statically assigning cores and RAM to guest, by using direct device assignment, and modifying the  block driver to poll instead of interrupt, it is:  possible to migrate more functionality to controllers without lengthy development cycles, enabling architects to make different tradeoffs.
  • HTML5 Real-Time & Connectivity. If you are looking for an overview of the new HTML5 stack (Web Origin Concept, Cross Document Messaging, CORS, XMLHttpRequest Level 2, WebSocket, Server-Sent Events, SPDY) and what it means for architecture, Rob Nikzad delivers in this fist class presentation. We may now have a viable web platform instead of a patch work of kludges, just in time for the new App-world.
  • Building data structures that are smaller than an Array AND faster (in C). Terence Siganakis shows how Wavelet trees can create smaller data structures by replacing large pointers with indexes.
  • Chips can benefit from High Intensity Training too: Computational sprinting pushes smartphones till they're tired: Normally, these devices are designed for sustained performance, so that they can run full bore forever. We're proposing a computer system that can perform a giant surge of computation, but then gets tired and has time to rest. Under the computational sprinting scheme, up to 15 additional cores would fire up to work in parallel alongside the chip's main core for up to one second. This could speed up the device's response time tenfold.
  • Affinity - the social database. Intriguing new database option with an interesting mix of features. It borrows aspects from RDBMS, OODBMS, document databases, graph databases, and RDF and XML stores.
  • vitess - Scaling MySQL databases for the web: a front-end to MySQL providing an RPC interface that accepts and transmits SQL commands. It is capable of efficiently multiplexing a large number of incoming connections (10K+) over a small number of db connections at reasonable throughput (~10kqps). It also has an SQL parser which gives the server the ability to understand and intelligently reshape the queries it receives.
  • hello heroku world. Evan Weaver with an interesting benchmark of Heroku's various stacks: Finagle and Node made good throughput showings—Node had the most consistent performance profile, but Finagle’s best case was better. Sinatra (hosted by Thin) and Tomcat did OK. Jetty collapsed when pushed past its limit, and Bottle (hosted by wsgiref) was effectively non-functional. Finally, the naive C/Accept “stack” demonstrated an amusing combination of poor performance and good scalability.
  • We live and die by the API. Google is now charging a lot for its map API, which is causing a reroute storm around it. Foursquare is declaring its freedom by joining the OpenStreetMap movement!. Google Maps may be better for now, but if you can't afford it, it doesn't matter.
  • Architecting a Startup in 2012. Rick Mangi with a good look at all the tools available today that help make starting a new venture a breeze. There's the cloud of course, but there's also a bevy of cloud based services avalialble now for Software Development, Project Management, Product/Biz Development, Marketing, and Design.
  • Magnet URI scheme. A content hash used to reference resources available for download via peer-to-peer networks. An approach that can't be locked down by centralized authorities. 
  • The Indian Railway system has a fascinating scalability challenge with their on-line ticketing system: they get most of their load in a few minutes every day, they can't get any more suppply as the number of trains is fixed, so anything they do to scale is basically only helping them say no to people in a more graceful manner. An interesting problem. 
  • By tuning TCP's notorious slow start algorithm, Chemo was able to reduce it round trip times by 20%.
  • Roll your own autocomplete solution using Tries. Vivek with a clear explanation of a complex feature.
  • Interesting Google Groups discussion of adding CAS to Riak
  • Videos from the 25th Neural Information Processing Systems (NIPS) conference in Granada, Spain, are now available. Here's a taste: Big Learning: Algorithms, Systems, and Tools for Learning at Scale, International Workshop on Music and Machine Learning: Learning from Musical Structure

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...