Hot Scalability Links For Oct 24, 2010

On a cold and rainy Fall day, a day stolen from winter rather than our usual gorgeous Indian Summers, a day not even the SF Giants winning the pennant can help warm, here are some hot links to read by a digital flame:

  • Using MySQL as a NoSQL - A story for exceeding 750,000 qps on a commodity server by Yoshinori Matsunobu. Wonderfully detailed post on how you can lookup a row by ID really fast if you bypass all the typical MySQL query parsing overhead.
  • Minecraftwiki.net and minecraftforum.net now serve more traffic than Slashdot and Stackoverflow! 1 million pageviews and 100k uniques  per day, per site; 10TB of bandwidth a month; 4+ machines running Varnish, HAProxy, PHP, MySQL, Nginx. 
  • Stuff the Internet Says:
    • @old_sound: Somebody make me a t-shirt that says "I've read the CAP theorem and I liked it" 
    • : How relevant do I think the CAP theorem is? Not at all. I honestly hate conversations where anyone talks about crap.. cap, sorry. 
    • @humidbeing: If you hit limits of mysql, why reinvent the wheel by rolling your own solution when DBs like MS SQL and Oracle have proven scalability?
    • Ayende Rahien: You saved 5 cents, and your code is not readable, congrats!
    • @SutraLite: Is scalability important for a start-up? If so why?
  • Baron Schwartz with a great discussion on the problem of how to replicate data between servers. There's no good answer it seems. MySQL Limitations Part 1: Single-Threaded Replication
  • The Doors said You Cannot Petition the Lord with Prayer and Coda Hale says You Can't Sacrifice Partition Tolerance either. Of the CAP theorem’s Consistency, Availability, and Partition Tolerance, Partition Tolerance is mandatory in distributed systems. You cannot not choose it. Instead of CAP, you should think about your availability in terms of yield (percent of requests answered successfully) and harvest (percent of required data actually included in the responses) and which of these two your system will sacrifice when failures happen.
  • Tony Bain leaves Santa Clause alone and goes after Some NoSQL Myths instead. NoSQL can be fully consistent, Cassandra didn't kill Digg, Tweets are still stored in MySQL, NoSQL doesn't offer unlimited scalability, and the object-relational impedance mismatch still lives in the heart of systems.
  • Kenn Ejima gives his Thoughts on Redis and Jeff Darcy responds with his thoughts on the thoughts. This pretty much sums up the spirit of the exchange: Here, after several paragraphs of pointless bashing, we get to the real nub of the matter: you don’t care about scalability, thus you’ve never invested the time necessary to understand it, and so you play “sour grapes” by dismissing it as a goal.
  • High-End Varnish – 275 thousand requests per second by Kristian Lyngstol.  The sheer amount of parallel execution threads is staggering...
  • Riak introduces a new search a feature which is notable because even after all the talk about Polyglot Persistence, who really wants to implement search themselves?
  • Beyond Hadoop: Next-Generation Big Data Architectures by Bill McColl. Hadoop already old school? Then have a look at SQL, Cloudscale, MPI, BSP, Pregel, Dremel, or Percolator.
  • Onix: A Distributed Control Platform for Large-scale Production Networks. There has been recent interest in a new networking paradigm called Software-Defined Networking (SDN). The crucial enabler for SDN is distributed control platform that shields developers from the details of the underlying physical infrastructure and allows them to write sophisticated control logic against a high-level API. 
  • Thinking Clearly about Performance  by Cary Millsap. A really excellent paper on how performance works. Some of the topics: Response Time vs Throughput; Percentile Specifications; Problem Diagnosis; Amdahl’s Law; Minimizing Risk; Load; Queueing Delay; The Knee; Relevance of the Knee; Capacity Planning; Random Arrivals; Coherency Delay; Measuring; Performance is a Feature. I especially found the discussion of knees is graphs useful because this comes up a lot in real-life.
  • Josh Berkus with some Lessons from PostgreSQL's Git transition. The biggest lesson, though, is not to be in a hurry! It was over three years from PostgreSQL's first Git mirror to final conversion, and 16 months of actual preparation. If you take your time and are ready to retry things that don't work the first time, you should be able to have a successful migration to Git.
  • Kent Langley has rounded up a lot of good links to ZeroMQ
  • Royans gives his scalability links for October 21st as well as a nice article on Scaling Graphite by using Cfmap as the data transport.
  • Scalability of Scissors. Really cool scissors where you can set the blade length. Genius.