This is a question asked on the ycombinator list and there are some good responses. I gave a quick response, but I particularly like neilk's knock out of the park insightful answer:
Viewing an application as a series of state transitions instead of a blizzard of actions and events is a way under appreciated design perspective. This is one of they key design approaches for making robust embedded systems. A great paper talking about this sort of stuff is Mission Planning and Execution Within the Mission Data System - an effort to make engineering flight software more straightforward and less prone to error through the explicit modeling of spacecraft state. Another interesting paper is CLEaR: Closed Loop Execution and Recovery High-Level Onboard Autonomy for Rover Operations.
In general I call these Fact Based Architectures. I'm really glad neilk brought it up.
Comments
Re: How can I learn to scale my project?
Curious: what is Cal Henderson's book? Title? URL?
Scale is definitely about the data.
Many developers, however, focus on the code at far too low a level, worrying about using single or double quotes and other trivial minutia.
If you trawl through the presentations by the architects of sites like YouTube you find that the challenges of scale are architectural and impact the data store the most - with requirements like shards and partitioning.
Re: How can I learn to scale my project?
Cal Henderson 's book is :
Building Scalable Web Sites: Building, scaling, and optimizing the next generation of web applications
http://www.amazon.com/Building-Scalable-Web-Sites-applications/dp/059610...
Re: How can I learn to scale my project?
The suggestions you make a all very good. I'd add the following other dimension specially for web sites that cater to large numbers of users:
* Prepare to scale your app horizontally by splitting your data up by user and mapping groups of users to clusters. This way you can expand indefinitely: for every N users you add another cluster of web+app+db servers.
* To prepare for this, keep data for each user well separated from other users (don't share gratuitously) and add a user_id or account_id field to each table to make it ease to grab a user's data and move it to a different database.
* If you have "friend" type of links that connect users with each other, keep that as separate as possible since you'll have to special-case that when you move to multiple clusters.
* Also, keep the user table as simple as possible as you'll have to replicate that so you can direct each user to the right cluster when they log in.
Re: How can I learn to scale my project?
I had recently wrote about the lessons from eBay, Amazon, LinkedIn in one of my recent posts: Architecture You Always Wondered About: Lessons Learned at Qcon post.
Below is a summary of the main bullets:
Scalability -- How to Do It Right
any synchronous interaction with the data or business logic tier. Instead, use
an event-driven approach and workflow
that it will fit the partitioning model
get the most out of the available resources. A good place to use parallel
execution is for processing users requests. In this case multiple instances of
each service can take the requests from the messaging system and execute them in
parallel. Another place for parallel processing is using MapReduce for
performing aggregated requests on partitioned data
(LinkedIN seems to fall into this category well), database replication can help
load-balance the read load by splitting the read requests among the replicated
database nodes
of the hot topics of the conference, which also sparked some discussion during
one of the panels I participated in. An argument was made that to reach
scalability you had to sacrifice consistency and handle consistency in your
applications using things such as optimistic locking and asynchronous
error-handling. It also assumes that you will need to handle idempotency in your code. My
argument was that while this pattern addresses scalability, it creates
complexity and is therefore error-prone. During another panel, Dan
Pritchett argued that there are ways to avoid this level of complexity and
still achieve the same goal, as I outlined in this
blog post.
agreement that the database bottleneck can only be solved if database
interactions happen in the background.
Quoting Werner
Vogel again:"To scale: No direct access to the database anymore. Instead
data access is encapsulated in services (code and data together), with a stable,
public interface."
You should also check out the following white paper The Scalability Revolution: From Dead End to Open Road
In GigaSpaces were delivering a platform that does all that for you so you can try it out.
HTH
Nati S.
Post new comment