Hot Scalability Links For November 5th, 2010

So much good stuff this week...

  • Adrian Cockcroft Compares NoSQL Availability ModelsLet's risk feeding the CAP trolls, and try to get some insight into the differences between the many NoSQL contenders. Adrian asks how each NoSQL product will add a movie to its favorites list, read it back, and how this works across availability zones. Much trickier than it sounds with multiple writers. Cassandra and MongoDB answer back.
  • Stuff the Internet Says:
    • @jerng: Reading up on scalability. WHY THE HELL FOR? Because I want to know the future.
    • @freerangedata: The #nosql options are the micro brews/craft beers of data stores. So many good ones, so little time to try them all.
    • @edward_ribeiro: Soon, Darwinism will start to play its role on #NoSQL systems. You know, only the fittest will survive.
    • @connectionreq: I'm always wowed when I hear how Facebook abuses their MySQL databases in crazy ways
    • @louismrose: This is the kind of scalability we should be working on... http://yfrog.com/59qb0oj
    • @jkalucki: I don't know. Scale, stability and centrality bring the sexy to infrastructure.
    • @Taggerz: evented I/O is not magic scalability pixie dust, and like anything, there is a tradeoff.
    • @CBVirtualPA: Instant scalability - How good is it to know that if a big job comes in, you’ve got the back up to handle it. Priceless.
  • Redis at Superfeedr. Each of our redis servers process on average 3500 queries per second.
  • Rick Cattell has updates to his awesome Scalable Datastores paper. 
  • NoSQL In The Wild: How We Do It at Tekpub. We split out the duties of our persistence story cleanly into two camps: reports of things we needed to know to make decisions for the business and data users needed to use our site. Ironically these two different ways of storing data have guided us to do what's natural: put the application data into a high-read, high-availability environment (MongoDb) - put the historical, reporting data into a system that is built to answer questions: a relational data store.
  • Understanding Throughput-Oriented Architectures by Michael Garland, David B. Kirk. For workloads with abundant parallelism, GPUs deliver higher peak computational throughput than latency-oriented CPUs.
  • GFS: Evolution on Fast-Forward. Kirk McKusick and Sean Quinlan discuss the origin and evolution of the Google File System.
  • Why does Scalability matter, and how does Cassandra scale? by Matt Pfeil. For the sake of this discussion, we'll define scalability as the ability to add computational resources to a database in order to gain more throughput.
  • Trying out MySQL Push-Down-Join (SPJ) preview. 50x speedup on cluster based joins. Covers why joins are slow and how to fix it.
  • Building Scalable Database Solution with SQL Azure - Introducing Federation in SQL Azure by Cihan Biyikoglu. Last week, we introduced the Federation concept in SQL Azure in this talk. In the next few posts, I’ll detail the upcoming concept and how to build applications using federations in SQL Azure. This will help you be ready when this technology ships in future versions of SQL Azure.
  • Google continues its quest to speed up the Internet by releasing mod_pagespeed for Apache, which performs many speed optimizations automatically. We’re starting with more than 15 on-the-fly optimizations that address various aspects of web performance.
  • Infrastructure Scalability Pattern: Sharding Sessions on f5 DevCentral. Using hardware for lower latency sharding and load balancing.
  • On Quora, Ian McAllister has a good explanation of Amazon't working backwards approach to product development. Write a press release first and if doesn't sound interesting then you have some work to do.
  • Matthew Fowler gives a good example of using a compute grid to handle the load caused by consolidation of worldwide processing through a single system in order to save operational costs.
  • We are experiencing too much load. Let's add a new server by James Golick. If we're maxed out of one of the resources we need to add capacity, attempting to spin up a new node is only going to make the situation worse. 
  • Scriptable Object Cache by Kuch Bhi Kabhi Bhi. What we have done here is inverted the responsibility. Instead of doing stuff in the code and putting it in the cache, what we can do now is do the stuff atomically in the cache itself. 
  • NoSQL, Heroku, and You by Adam Wiggin. Database-as-as-service is one of the coming decade’s most promising business models. Already services like MongoHQ (MongoDB), Cloudant (CouchDB), and Amazon RDS (MySQL) are offering fully hosted and managed databases to apps running in EC2. 
  • Redis in Practice: Who’s Online? by Luke Melia. Great explanation of how to use the advanced data structure capabilities of Redis to solve practical problems like figuring out who is online.
  • Show 24 – Internet Exchange and Peering Points. The Packet Pushers with all you need to know on the murky underworld of how the Internet actually is formed: peering.
  • MySQL Paginated displays – How to kill performance vs How to improve performance! Ovais Tariq talks about the best way to paginate, something every system needs to do.
  • The case for a cloud computing price warThe long list of issues associated with sessions at the various cloud conferences from security to picking a hypervisor remain moot until the cloud industry becomes more price competitive with on premise computing options.
  • Cassandra’s future @facebook and links to other NoSQL slides by Royans. I heard an unconfirmed rumor that facebook is moving away from Cassandra. Not sure why, or to what, but rumors like this is a concern regardless. 
  • Google with classes on Exploring computational thinking. Let's try to remember computational thinking isn't the only kind of thinking we need, but it does seem to be the only kind emphasized.
  • NoSQL Database Architectures & Hypertable by Doug Judd of Hypertable who presents an overview of NoSQL database architectures and, in particular, Hypertable. Hypertable is an implementation of BigTable. 
  • Interesting conversation(s) with Doug Lea. We are discussing the merits of a no-fence CAS instruction - and exposing this instruction at the Java level.
  • Steve Yegge on Scalable Programming Language Analysis. Love the idea. Make tools dump their meta data so we can actually do something useful with all the information.