Stuff The Internet Says On Scalability For January 3, 2010

Submitted for your reading pleasure...

  • Quotable Quotes
    • @hofmanndavid: Performance and scalability anxiety makes developers want to catch the flying butterflies
    • @tivrfoa: "Scalability solutions aren't magic. They involve partitioning, indexing and replication." Twitter engineer
    • Alan Perlis: Fools ignore complexity; pragmatists suffer it; experts avoid it; geniuses remove it.
  • CIO update: Post-mortem on the Skype outage. Interesting tale of a cascading collapse in complex, distributed, interactive systems. For more background see the highly illuminating Explaining Supernodes by Dan York.
  • RethinkDB and SSD Databases. SSD was not a revolution by Kevin Burton. What’s really shocking to me, is that while SSD and flash storage is very exciting, it wasn’t as revolutionary in 2010 as I would have liked to have seen.
  • The case for Datastore-Side-Scripting. Russell Sullivan predicts real-time web applications are going in the direction of being entirely event driven, from client (WebSockets) to web-server (Node.js) to datastore (Redisql). And to complete the even driven chain is datastore-side-scripting.
  • Developments that could change everything...
  • What Your Computer Does While You Wait. Gustavo Duarte with an awesome exploration of the latency and throughput – of various subsystems in a modern commodity PC, an Intel Core 2 Duo at 3.0GHz. ...waiting for a hard drive seek is like leaving the building to roam the earth for one year and three months.
  • My thesis - building blocks of a scalable webcrawler by Marc Seeger. This thesis documents my experiences trying to handle over 100 million sets of data while keeping them searchable. All of that happens while collecting and analyzing about 100 new domains per second. It covers topics from the different Ruby VMs (JRuby, Rubinius, YARV, MRI) to different storage-backend (Riak, Cassandra, MongoDB, Redis, CouchDB, Tokyo Cabinet, MySQL, Postgres, ...) and the data-structures that they use in the background.
  • How Twitter Uses NoSQL by Klint Finley. Nice summary of a talk given by Twitter's always informative Kevin Weil. 
  • End-To-End Arguments in System Design by J.H. Saltzer, D.P. Reed and D.D. Clark. This paper presents a design principle that helps guide placement of functions among the modules of a distributed computer system. The principle, called the end-to-end argument, suggests that functions placed at low levels of a system may be redundant or of little value when compared with the cost of providing them at that low level. Examples discussed in the paper include bit error recovery, security using encryption, duplicate message suppression, recovery from system crashes, and delivery acknowledgement. 
  • Pluralcast 31 : Lessons in Performance and Scalability, a good discussion and interview with James World. In the real world people often don't have any idea what their systems really do. How do you change that?
  • Schism: a Workload-Driven Approach to Database Replication and Partitioning by Carlo Curino ,  Evan Jones ,  Yang Zhang ,  Sam Madden. We present Schism, a novel workload-aware approach for database partitioning and replication designed to improve scalability of   distributed databases.
  • Big Data With Little Chips – A Silverbullet Against Datacenter Energy Physics? by ksankar. The gating factor for solving big data problems is not physical space, but energy. 
  • CFP for The Fourth International Workshop on Data Intensive Distributed Computing. The deadline is Jan 31.