« Dremel: Interactive Analysis of Web-Scale Datasets - Data as a Programming Paradigm | Main | Basho Lives up to their Name With Consistent Smashing »

7 Scaling Strategies Facebook Used to Grow to 500 Million Users

Robert Johnson, a director of engineering at Facebook, celebrated Facebook's monumental achievement of reaching 500 million users by sharing the scaling principles that helped reach that milestone. In case you weren't suitably impressed by the 500 million user number, Robert ratchets up the numbers game with these impressive figures:
  • 1 million users per engineer
  • 500 million active users
  • 100 billion hits per day
  • 50 billion photos
  • 2 trillion objects cached, with hundreds of millions of requests per second
  • 130TB of logs every day

How did Facebook get to this point?

  1. People Matter Most. It's people who build and run systems. The best tools for scaling are an engineering and operations teams that can handle anything.
  2. Scale Horizontally. Handling exponentially growing traffic requires spreading load arbitrarily across many machines. Using different databases for tables like accounts and profiles only doubles capacity. This approach hurts efficiency, but efficiency is a separate effort from scaling, efficiency by itself doesn't substantially impact scaling.
  3. Move Fast. At every level of scale there are surprises. Surprises are quickly dealt with using a highly qualified cross disciplinary team that is flexible and skilled enough to deal with anything that comes up. Flexibility is more important than any individual technical decision. By moving fast Facebook is also able to try more options and figure out which ones work best.
  4. Change Incrementally. Making small changes and measuring the result is the key to moving fast. Big things are broken up into distinct parts, changes are not batched. Changes can be rolled out on a few machines to a few users. New systems can be built in parallel to old systems with traffic slowly moved over to the new system while results are being measured. Overall system stability is increased by incremental change because you know sooner if a particular strategy is working. It's easier to figure out where things go wrong when dealing with smaller increments.
  5. Measure Everything. Production is where the really useful data comes from. Measure both system and application level statistic to know what's happening. Checkout what's happening in the 95th or 99th percentile as averages hide important issues. 
  6. Small, Independent Teams. Small teams allow work to be done efficiently, quickly, and carefully. Only three people work on photos, for example, the largest photo site on the internet. 
  7. Control and Responsibility. Responsibility requires control. If a team is responsible for something they must control it. For example, Facebook pushes code into production everyday. The person who wrote the code is there to fix anything that goes wrong. If the responsibility of pushing and wring code are split, then the code writer doesn't feel the effect of code that breaks the system. Robert puts it wonderfully: The best way we know of to get great software to these 500 million people is to have a person who understands the importance of what they're doing make a good decision about something they understand and control.

These principles are not really new, but I think when you see them all laid out together like this it's easy to see how they all work together to make a self-reinforcing virtuous circle. You can't move fast unless you have small teams who have control and responsibility. You can't know how your changes are working unless you get those changes into production and measure results. You can't move code into production unless people feel responsible for moving out working code. You can't handle the scale unless you figure out how to scale horizontally, move fast, measure everything, etc. and that all comes down to good people. 

Will these principles be enough to grow the next 500 million users? Because of principle number one, my guess is yes. The world will change, there will be tremendous unforeseen challenges in the future, but good people given the right environment will learn and adapt. What will be the challenge, a challenge that Facebook has met so far, is keeping true to the principles that have got them here, and avoiding the organizational rot the infects so many organizations once they reach a size and complexity tipping point.

Related Articles

References (1)

References allow you to track sources for this article, as well as articles that were written in response to this article.

Reader Comments (10)

> 1 million users per engineer
> 500 million active users
Do they really have 500+ engineers?!

August 2, 2010 | Unregistered CommenterAlex

Your math skills are impeccable!

August 2, 2010 | Unregistered CommenterRintoul

Point #1 ("People Matter Most") is not in the corresponding Facebook Note. Was it deleted from the note later?

August 2, 2010 | Unregistered CommenterTahir

Impressive. Actually whenever I come across impressive numbers like this I find that people matter the most. It would be interesting to get an idea about how they push their code into production and their version control strategies. Actually I have heard that in a traditional DataCenter the operations team is responsible for everything but here the programmer is. We get trashed by the ops. team because the senior management believes only the ops team.

August 3, 2010 | Unregistered CommenterMohan Radhakrishnan

Tahir, I try to organize these types of posts so that they are immediately useful at a glance. The source post does a great job of adding color. When people come here they generally just want actionable material. So, the people idea was discussed in the article, but wasn't pulled out as a principle, and I thought it should have been.

August 3, 2010 | Registered CommenterTodd Hoff

How many product managers do they have? How many tech. managers manage ~500 engineers? Does this number: 500 include support staff?

August 3, 2010 | Unregistered CommenterJohn

@Alex, of course not. I would not suggest believe any of these and suggest to divide all these numbers by 5.

August 4, 2010 | Unregistered CommenterAndrey

I think they forgot to add Principle #8: Hire 500 engineers!

August 5, 2010 | Unregistered CommenterFred

@Fred. If you want to handle 500 million users, don't hire more engineers. Hire user experience professionals. That is how Facebook grew to what it is.

August 9, 2010 | Unregistered CommenterThomas

Nice article, many developers overlook the deployment aspects.

August 22, 2010 | Unregistered CommenterRavi Periasamy

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>