« Strategy: Limit Result Sets | Main | The Current Pros and Cons List for SimpleDB »

How can I learn to scale my project?

This is a question asked on the ycombinator list and there are some good responses. I gave a quick response, but I particularly like neilk's knock out of the park insightful answer:
  • Read Cal Henderson's book. (I'd add in Theo's book and Release It! too)
  • The center of your design should be the data store, not a process. You transition the data store from state to state, securely and reliably, in small increments.
  • Avoid globals and session state. The more "pure" your function is, the easier it will be to cache or partition.
  • Don't make your data store too smart. Calculations and renderings should happen in a separate, asynchronous process.
  • The data store should be able to handle lots of concurrent connections. Minimize locking. (Read about optimistic locking).
  • Protect your algorithm from the implementation of the data store, with a helper class or module or whatever. But don't (DO NOT) try to build a framework for any conceivable query. Just the ones your algorithm needs.

    Viewing an application as a series of state transitions instead of a blizzard of actions and events is a way under appreciated design perspective. This is one of they key design approaches for making robust embedded systems. A great paper talking about this sort of stuff is Mission Planning and Execution Within the Mission Data System - an effort to make engineering flight software more straightforward and less prone to error through the explicit modeling of spacecraft state. Another interesting paper is CLEaR: Closed Loop Execution and Recovery High-Level Onboard Autonomy for Rover Operations.

    In general I call these Fact Based Architectures. I'm really glad neilk brought it up.
  • Reader Comments (3)

    Curious: what is Cal Henderson's book? Title? URL?

    Scale is definitely about the data.
    Many developers, however, focus on the code at far too low a level, worrying about using single or double quotes and other trivial minutia.

    If you trawl through the presentations by the architects of sites like YouTube you find that the challenges of scale are architectural and impact the data store the most - with requirements like shards and partitioning.

    November 29, 1990 | Unregistered CommenterToby Hede

    Cal Henderson 's book is :
    Building Scalable Web Sites: Building, scaling, and optimizing the next generation of web applications


    November 29, 1990 | Unregistered CommenterKris Buytaert

    The suggestions you make a all very good. I'd add the following other dimension specially for web sites that cater to large numbers of users:

    * Prepare to scale your app horizontally by splitting your data up by user and mapping groups of users to clusters. This way you can expand indefinitely: for every N users you add another cluster of web+app+db servers.
    * To prepare for this, keep data for each user well separated from other users (don't share gratuitously) and add a user_id or account_id field to each table to make it ease to grab a user's data and move it to a different database.
    * If you have "friend" type of links that connect users with each other, keep that as separate as possible since you'll have to special-case that when you move to multiple clusters.
    * Also, keep the user table as simple as possible as you'll have to replicate that so you can direct each user to the right cluster when they log in.

    November 29, 1990 | Unregistered CommenterThorsten

    PostPost a New Comment

    Enter your information below to add a new comment.
    Author Email (optional):
    Author URL (optional):
    Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>