Papers: Readings in Distributed Systems

Marton Trencseni has collected a wonderful list of different papers on distributed systems. He's organized them into the following sections: The Google Papers, Distributed Filesystems, Non-relational Distributed Databases, The Lamport Papers, and Implementation Issues. Many old favorites on the list and some that are likely new to you. My new favorite is "Frangipani: A Scalable Distributed File System." How can you not love "Frangipani" as a word?

Click to read more ...


Scaling in Games & Virtual Worlds  

"Online games and virtual worlds have familiar scaling requirements, but don’t be fooled: everything you know is wrong." Jim Waldo, Sun Microsystems Laboratories * The computational environment for online games or virtual worlds is close to the exact inverse of that found in most markets serviced by the high-tech industry. * The need for a heavyweight client is, in part, an outcome of the evolution of these games. * Latency is the enemy of fun—and therefore the enemy of online games and virtual worlds. * The game server is used both to discourage cheating (by making it much more difficult) and to detect cheating (by seeing patterns of divergence between the game state reported by the client and the game state held by the server). Peer-to-peer technologies might seem a natural fit for the first role of the game server, but this second role means that few if any games or worlds trust their peers enough to avoid the server component. * Using multiple servers is a basic mechanism for scaling the server component of a game to the levels that are being seen in the online world today. * Having multiple servers means that part of building the game is deciding how to partition the load over these servers. The first technique is to exploit the geography of the game or world. The second technique is known as sharding. * While shards allow scale, they do so at the price of player interaction. * The problem is that the culture that has grown up around games and virtual worlds is not one that understands or is overly familiar with the programming techniques that are required to exploit the parallelism inherent in these systems. * It is for these reasons that we started Project Darkstar (, a research effort attempting to build a server-side infrastructure that will exploit the multithreaded, multicore chips being produced and scaled over a large group of machines, while presenting the programmer with the illusion that he or she is developing in a single-threaded, single-machine environment. *The model is a simple event-based one in which input from the client is received by the server, which then sets off a task in response to that event. * This mechanism for concurrency control does require that all tasks access all of their data through the Darkstar data service. Our current implementation uses the Berkeley Database. we believe that we can keep the penalty for accessing through a data service small by caching data in intelligent ways. We also believe that by using the inherent parallelism in these games, we can increase the overall performance of the game as the number of players increases, even if there is a small penalty for individual data access. * We found that additional machines lowered the capacity of the overall system. We are working on removing the choke points so that adding equipment actually adds capacity.

Click to read more ...


Intro to Caching,Caching algorithms and caching frameworks part 1

Informative and well organized post on caching. Talks about: Why do we need cache?, What is Cache?, Cache Hit, Cache Miss, Storage Cost, Retrieval Cost, Invalidation, Replacement Policy, Optimal Replacement Policy, Caching Algorithms, Least Frequently Used (LFU), Least Recently Used (LRU), Least Recently Used 2(LRU2), Two Queues, Adaptive Replacement Cache (ACR), Most Recently Used (MRU), First in First out (FIFO), Distributed caching, Measuring Cache.

Click to read more ...


Just-In-Time Scalability: Agile Methods to Support Massive Growth (IMVU case study)

We started with a small site, a mess of open source, and a small team that didn't know much about scaling.

We ended with a large site, a medium sized team, and an architecture that has scaled.

We never stopped. We used a roadmap and a compass, made weekly changes in direction, regularly shipped code on Wednesday to handle the next weekend's capacity constraints, and shipped new features the whole time.

These are excerpts from the IMVU PDF presentation of their architecture which can be viewed or downloaded here.
IMVU is an online destination where adults and teens meet new people in 3D. IMVU won the 2008 Virtual Worlds Innovation Award and was also named a Rising Star in the 2008 Silicon Valley Technology Fast 50 program.

Click to read more ...


Database Sharding for startups

The most important aspect of a scalable web architecture is data partitioning. Most components in a modern data center are completely stateless, meaning they just do batches of work that is handed to them, but don't store any data long-term. This is true of most web application servers, caches like memcached, and all of the network infrastructure that connects them. Data storage is becoming a specialized function, delegated most often to relational databases. This makes sense, because stateless servers are easiest to scale - you just keep adding more. Since they don't store anything, failures are easy to handle too - just take it out of rotation.

Stateful servers require more careful attention. If you are storing all of your data in a relational database, and the load on that database exceeds its capacity, there is no automatic solution that allows you to simply add more hardware and scale up. (One day, there will be, but that's for another post). In the meantime, most websites are building their own scalable clusters using sharding.

Read more on LessonLearned blog.

Click to read more ...


Reducing Your Website's Bandwidth Usage - how to

In this article Jeff Atwood (a rockstar programmer and one of StackOverflow website founders) discusses the measures of how you can reduce you bandwidth usage and refers specifically on high trafficked websites for which bandwidth is more costly than for an average website.

This is his experience and you can read more on his post on

Click to read more ...


Product: Gearman - Open Source Message Queuing System

Update: New Gearman Server & Library in C, MySQL UDFs. Gearman is an open source message queuing system that makes it easy to do distributed job processing using multiple languages. With Gearman you: farm out work to other machines, dispatching function calls to machines that are better suited to do work, to do work in parallel, to load balance lots of function calls, to call functions between languages, spread CPU usage around your network. Gearman is used by companies like LiveJournal, Yahoo!, and Digg. Digg, for example, runs 300,000 jobs a day through Gearman without any issues. Most large sites use something similar. Why would anyone ever even need a message queuing system? Message queuing is a handy way to move work off your web servers (like image manipulation), to generate thousands of documents in the background, to run the multiple requests in parallel needed to build a web page, or to perform tasks that can comfortably be run in the background and not part of the main request loop for servicing a web request. There's a gearmand server and clients written in Perl, Ruby, Python or C. Use at least two gearmand server daemons for higher availability. The tasks each client can perform are registered with gearman distributes requests for those functions to the client that can implement them. Gearman uses a very robust, if somewhat higher latency, signal-and-pull architecture.

  • According to dormando the flow goes like: * worker connects to all gearmand servers. * worker registers what functions it supports. * worker asks for jobs. * if no jobs, sends command 'pre_sleep' to all gearmand's and sleeps.
  • Client does: * Connect to gearmand. * submit's a job for a particular func.
  • Gearmand does: * Acks the job, finds all *sleeping workers* related to the function. * Sends them all a 'noop' command to wake them up.
  • Worker does: * Urk, I'm awake now. * Worker asks for jobs. * If jobs, do work. * If no jobs, sends command 'pre_sleep' to all gearmand's, etc. Gearman uses an efficient binary protocol and no XML. There's an a line-based text protocol for admin so you can use telnet and hook into Nagios plugins. The system makes no guarantees. If there's a failure the client is told about the failure and the client is responsible for retries. And the queue isn’t persistent. If gearman is restarted the queue is gone.

    Related Articles

  • Gearman Wiki
  • German Google Groups
  • Queue everything and delight everyone by Leslie Michael Orchard.
  • USENIX 2007. Starts at slide 83.
  • PEAR and Gearman by Daniel O'Connor.
  • Amazon Architecture

    Click to read more ...

  • Monday

    Getting ready for the cloud

    This presentation illustrates how one can scale EXISTING JEE application and deploy it on Amazon cloud using GigaSpaces as the scale-out application server while: * Not having to re-write your application * Preventing lock-in to specific cloud provider * Enabling seamless portability between your local environment to cloud environment o No code or configuration change is required between the two environments o Develop local - test on the cloud o Built for iterative development

    Click to read more ...


    17 Distributed Systems and Web Scalability Resources

    Here's a short list of some great resources that I've found very inspirational and thought provoking. I've broken these resources up into two lists: Blogs and Presentations.


    Paper: Sharding with Oracle Database

    The upshot of the paper is Oracle rules and MySQL sucks for sharding. Which is technically probable, if you don't throw in minor points like cost and ease of use. The points where they think Oracle wins: online schema changes, more robust replication, higher availability, better corruption handling, better use of large RAM and multiple cores, better and better tested partitioning features, better monitoring, and better gas mileage.

    Click to read more ...