Hot Scalability Links For Oct 8, 2010

  • So what happened at the Surge 2010 Conference? A lot...
  • Quotable quotes for 200 Alex:
    • postwait: That which cannot be measured cannot be scaled.
    • gsylvest: Cameron Purdy:Instead of performance, think about scalability. Michael Nygaard:Instead of scalability, think of response time distributions
    • jkalucki: Sometimes this job is a bit too much like high-energy physics: Blast the invisible with a beam of hell fury, then decode the backscatter.
    • DrHayt: Shard for availability, not scalability. Do what is necessary to make sure that your shards to not share the same failure domain.
  • Coda Hale insists You Can't Sacrifice Partition ToleranceWhat I'd like to see, though, is far fewer people unknowingly describing their systems as logical impossibilities.
  • Royans Tharakan has opensourced Cfmap: Publishing, discovering and dashboarding infrastructure state. Cfmap is a scalable, eventually consistent and a fault tolerant repository of state information for error-free configuration for architectures that span multiple datacenters. As virtualization and services oriented architectures have become more common this kind of service has become essential.
  • Modeling relationships in App Engine. Nick Johnson with a succinct look at how to express 1-1, 1-M, M-M on GAE.
  • Jonathan Hsieh on Using Flume to Collect Apache 2 Web Server Logs. Collecting and doing anything useful with massive streams of server log data is consumed a tremendous number of programmer years of effort. Use Flume and Hadoop to do it on the cheap.
  • Kenn Ejima with Thoughts on Redis. The gist was: Use Redis for small datasets that don’t grow fast (stay far less than 1GB). Have at least 2x memory than the dataset. Use default snapshotting and disable AOF. Jeff Darcy replies with: Use Redis for small datasets (less than 50GB this year) that don’t need to be highly available, have memory at least 2x your actual dataset (until the snapshot implementation improves), use frequent snapshotting or AOF (depending on your need for performance vs. durability – not both) and always avoid overcommit.
  • Gunther on The Universal Scalability Law. A lot of people use the term "scalability" without clearly defining it, let alone defining it quantitatively. Computer system scalability must be quantified. If you can't quantify it, you can't guarantee it. The universal law of computational scaling provides that quantification.
  • MongoSV is a full-day conference about the open source database MongoDB, December 3, 2010, Microsoft Research Silicon Valley. 
  • Cassandra has documentation!  Audible cheers reportedly went up at the Cassandra Summit when this was announced.
  • I hadn't really considered how big The Year IPV6 problem will be (http://packetpushers.net/). Not necessarily at the network level as most of the operating systems and switches etc can handle it, but think of all those scripts that hardcode IPV4 regexes? Think of all those tools like Nagios, syslog, etc using IPV4.