Hot Scalability Links For Aug 20, 2010

Lots of good links this week...

  • Membase, powering Farmville's 500k operations *per second*. Of course, some people contend they could do this on their old Vic-20, but this is a useful, vigorous discussion thread on Reddit.
  • Tweets of Gold:
    • kbsingh: I dont understand why some developers think its ok to leave operations people out of scalability decisions
    • karmazilla: I find it a little odd when a database claims to support "massive scalability" when it is not distributed.
    • pcapr: OH: teenagers are eventually consistent
    • tv: Verb suggestion for the act of mapreducing data: "marinating". "Then we marinade it to get the n-gram frequencies."
  • Superfeedr makes The Case against Rate Limiting. Push, don't poll. Of course, the receiving systems may still need to rate limit.
  • Multi-core, Threads & Message Passing by Ilya Grigorik. We need threads, we need events, and we need message passing - it is not a question of which is better.
  • Doug Cutting gives an excellent presentation at Digg about why Avro should be used over Thrift for data serialization on the Hadoop Platform. A schema is sent with the data which makes for smaller payloads, strong typing, easier schema evolution, and no need for code generation. Schemas are defined in JSON so they are easy to parse. Generating code stubs is not required. Small and fast.
  • Google App Engine now supports multi-tenancy and high performance image serving.
  • Video from Berlin Buzzwords ConferenceA conference for developers and users of open source software projects, focussing on the issues of scalable search, data-analysis in the cloud and NoSQL-databases.
  • Transformer: A New Paradigm for Building Data-Parallel Programming Models. Transformer has two layers: a common runtime system and a model-specific system. Using Transformer, the authors show how to implement three programming models: Dryad-like data flow, MapReduce, and All-Pairs.
  • Clojure Workers and Large Scale HTTP Fetching. Covers lots of core issues like queueing, distribution, task models, blocking vs non-blocking, too many open file problems, and so on.
  • Cloudant explains the architecture for CouchDB clusters. Seem very similar to Cassandra and other NoSQL systems, so I guess that means people will like it.
  • Great explanation of  Consistent Hashing by Tom Kleinpeter. Strangely, it does not involve potatoes...
  • Polyglot Persistence: Integrating Low-Latency NoSQL Systems (Cassandra and InfiniteGraph). Todd Stavish began to wonder if it would be possible to extract data from Cassandra for analysis in a graph database. In other-words, implement my own polyglot persistence application by fusing InfiniteGraph and Cassandra. In order to understand what each data-store can give to the other. 

Interested in advertising a job, product, service, or event on Then contact us.