How Facebook Handled the New Year's Eve Onslaught

How does Facebook handle the massive New Year's Eve traffic spike? Thanks to Mike Swift, in Facebook gets ready for New Year's Eve, we get a little insight as to their method for the madness, nothing really detailed, but still interesting.

Problem Setup

  • Facebook expects tha one billion+ photos will be shared on New Year's eve.
  • Facebook's 800 million users are scattered around the world. Three quarters live outside the US. Each user is linked to an average of 130 friends.
  • Photos and posts must appear in less than a second. Opening a homepage requires executing requests on a 100 different servers, and those requests have to be ranked, sorted, and privacy-checked, and then rendered.
  • Different events put different stresses on different parts of Facebook. 
    • Photo and Video Uploads - Holidays require hundreds of terabytes of capacity 
    • News Feed - News events like big sports events and the death of Steve Jobs drive user status updates

Coping Strategies

  • Try to predict the surge in traffic.
  • Run checks on hardware and software to find problems.
  • Designate engineers to be on call.
  • Prepare to bring additional capacity online from data centers. Implication is that your architecture can handle additional capacity and make meaningful use of it.
  • Overbuild a matter of culture so big events aren't big challenges.
  • Prepare emergency parachutes.  These are knobs that can be turned to survive system failures or unanticipated surges in traffic. For example, for a  capacity problem they can serve smaller photos to reduce bandwidth usage. The idea is to not go off-line completely when there's a problem, but to degrade gracefully.

Top Five Global Events for Facebook status updates in 2011

  • Death of Osama Bin Laden, May 2
  • Packers win the Super Bowl, Feb. 6
  • Casey Anthony verdict, July 5.
  • Charlie Sheen, early March.
  • Death of Steve Jobs, Oct. 5