« Reconnoiter - Large-Scale Trending and Fault-Detection | Main | Writing about cisco loadbalancer? »

13 Scalability Best Practices

AFK Partners has release what they feel are the Best Practices for Scalability:

  1. Asynchronous - Use asynchronous communication when possible. 
  2. Swim Lanes – Create fault isolated “swim lanes” of hardware by customer segmentation.
  3. Cache - Make use of cache at multiple layers.
  4. Monitoring - Understand your application’s performance from a customer’s perspective. 
  5. Replication - Replicate databases for recovery as well as to off load reads to multiple instances.
  6. Sharding - Split the application and databases by service and / or by customer using a modulus. 
  7. Use Few RDBMS Features – Use the OLTP database as a persistent storage device as much as possible. 
  8. Slow Roll – Roll out new code versions slowly, to a small subset of your servers without bringing the entire site down.
  9. Load & Performance Testing – Test the performance of the application version before it goes into production. 
  10. Capacity Planning / Scalability Summits – Know how much capacity you have on all tiers and services in your system. 
  11. Rollback – Always have the ability to rollback a code release.
  12. Root Cause Analysis - Ensure you have a learning culture that is evident by utilizing Root Cause Analysis to find and fix the real cause of issues.
  13. Quality From The Beginning – Quality can’t be tested into a product, it must be designed in from the beginning.
This is just a quick summary, more details on their site.

Reader Comments (3)

Some good stuff here, but I feel like there is a bunch of fud to go along with it:

1: You do need async to mess up your code - run lots of threads, no shared state and use sync calls. Everything will be much more readable in general. Use sync *only* when this fails (yes there are scenarios, but running out of threads is not one of them, unless you are writing a high performance web server from scratch, in which case you should be asking some bigger questions).

5, 6, 7: They make a round-about case for NoSql. In short: use Cassandra (or Voldemort or any of the other scale out key value stores).

9: Perf testing - while this is rather ideal, in most realistic scenarios, its really not worthwhile doing anything more than rudimentary testing. Roll things out, you will get loaded, fix it. This does not apply to you if you are Twitter.

10: Cap planning - again, quite the luxury. Once you figure out you don't have enough, what's next? Shut down servers? I'm guessing not. So you'll probably go with degraded service offerings. If you were going to do that anyway (and its a good idea), why bother with this. Anything more than back of the envelope calculations (okay, fine, some excel) is a waste of time.

13: No waaay. Keen insight there.

So if you take out the nonsense (IMO) ones, you get:
1. Caching
2. Monitoring
3. NoSql
4. Sane release procedures (rolling updates, rollback)

I think they forgot the single *most* important one:
Keep. It. Simple.

Apologies for my wording / tone.

November 29, 1990 | Unregistered CommenterRoosh

All the 13 best practices make a lot of sense. Thanks for putting this together.

Caching at multiple layers is pretty tricky, because it may lead to cache-coherency issues. Cache-incoherency means different layers can be out of sync. You may need advanced methods to quiesce different laters. If you do not do that – it may lead to data inconsistency issues.


November 29, 1990 | Unregistered CommenterMukul Kumar

somebody cand deep-explain point 6 ? -thx.

November 29, 1990 | Unregistered CommenterLuislopez

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>