Stuff The Internet Says On Scalability For January 28, 2011

Submitted for your reading pleasure...

  • Something we get to say more often than you might expect - funny NoSQL comic: How to Write a CV (SFW)
  • Playtomic shows hows how to handle over 300 million events per day, in real time, on a budget
  • More Speed, at $80,000 a Millisecond. Does latency matter? Oh yes...“On the Chicago to New York route in the US, three milliseconds can mean the difference between US$2,000 a month and US$250,000 a month.”
  • Quotable Quotes
    • @jkalucki: Throwing 1,920 CPUs and 4TB of RAM at an annoyance, as you do. @jointheflock
    • @hkanji: Scale can come quick and come hard. Be prepared.
    • @elenacarstoiu: When you say #Cloud, everybody's thinking lower cost. Agility, scalability and fast access are advantages far more important.
    • @BillGates: From Melinda - Research proves we can save newborn lives at scale 
  • Kosmix with a fascinating look at Cassandra on SSD,  summarizing some of what they've learned over the past year running SSD's and more recently running Cassandra on SSD. Why run something designed to operate in bulk, linear disk, on SSDs, where they aren’t any faster at it? SSD can be extremely valuable for certain types of applications which require a consistent low latency and high concurrency. The 20GM memory limit of the JVM is becoming a limiting factor as SSDs have enough IO power that they will starve for memory long before IO. It's interesting to think about how designs will need to change when they aren't created to  bulk writes and reads to avoid seeks.
  • That's a lot of apps: Apple has 10 Billion App downloads
  • Let other people cache your files with
  • mClock: Handling Throughput Variability for Hypervisor IO Scheduling. This paper introduces a novel algorithm for IO resource allocation in a hypervisor. Our algorithm, mClock, supports proportional-share fairness subject to a minimum reservation and a maximum limit on the IO allocation for VMs.
  • Mesos is an open source cluster operating system. Mesos manages tens to thousands of computers, providing a platform on which to build distributed applications. For example, Hadoop MapReduce, MPI, Hypertable, and Spark, and many other applications, run on Mesos!
  • John Wood with an excellent experience report of using CouchDB in production. Summary: 1) CouchDB views are great for querying aggregate stats on large databases; 2) The learning curve for MapReduce is steep for those coming from a relational DB background; 3) CouchDB is easy to setup and is low maintenance.
  • Jon Udell lists Seven ways to think like the web.  A list of some principles that people apply when they work well together online. It’s the same list that emerges when I talk about computational thinking: 1) Be the authoritative source for your own data; 2) Pass by reference not by value; 3) Know the difference between structured and unstructured data; 4) Create and adopt disciplined naming conventions; 5) Push your data to the widest appropriate scope; 6) articipate in pub/sub networks as both a publisher and a subscriber; 7) Reuse components and services.
  • 1024 cores with a great resource page on lock-free algorithms
  • Greg Linden with a few things that have caught his attention lately.
  • Scalability links for January 19th by Royans.