Live Video Streaming At Facebook Scale

Operating at Facebook scale is far from trivial. With 1.49 billion monthly active users (and growing 13 percent yearly), every 60 seconds on Facebook 510 comments are posted, 293,000 statuses are updated, and 136,000 photos are uploaded. And there lies the challenge of serving the masses efficiently and reliably without any outages.

For serving the offline content, whether text (updates, comments, etc.), photos or videos, Facebook developed a sophisticated architecture that includes state-of-the-art data center technology and search engine to traverse and fetch content quickly and efficiently.

But now comes a new type of challenge: A few months ago Facebook rolled out a new service for live streaming called Live for Facebook Mentions, which allows celebs to broadcast live video to their followers. This service is quite similar to Twitter’s Periscope (acquired by Twitter beginning of this year) and the popular Meerkat app, which offer their live video streaming services to all and not just celebs. In fact, Facebook announced this month it is piloting a new service which will offer live streaming to the wide public as well.

While offline photos and videos get uploaded fully and then distributed and made accessible to followers and friends, serving live video streams is much more challenging to implement at scale. And to make things even worse, the viral nature of social media (and of celeb content in particular) often creates spikes where thousands of followers demand the same popular content at the same time, a phenomenon the Facebook team calls the “thundering herd” problem.

An interesting post by Facebook engineering shares information on these challenges and the design approaches they took: Facebook’s system uses Content Delivery Network (CDN) architecture with a two-layer caching of the content, with the edge cache closest to the users and serving 98 percent of the content. This design aims to reduce the load from the backend server processing the incoming live feed from the broadcaster. Another useful optimization for further reducing the load on the backend is request coalescing, whereby when many followers (in the case of celebs it could reach millions!) are asking for some content that’s missing in the cache (cache miss), only one instance request will proceed to the backend to fetch the content on behalf of all to avoid a flood.

It’s interesting to note that the celebs’ service and the newer public service show different considerations and trade-offs of throughput and latency which brought Facebook’s engineering team to make changes to adapt the architecture to the new service:

Where building Live for Facebook Mentions was an exercise in making sure the system didn’t get overloaded, building Live for people was an exercise in reducing latency.

The content itself is broken down into tiny segments of multiplexed audio and video for more efficient distribution and lower latency. The new Live service (for the wide public) even called for changing the underlying streaming protocol (to be based on RTMP instead of HLS) to enable an even better latency, to reduce the lag between broadcaster and viewer by 5x.

This is a fascinating exercise in scalable architecture for live streaming, which is said to effectively scale to millions of broadcasters. Such open discussions can pave the way to smaller companies in the social media, internet of things (IoT) and the ever-more-connected world. You can read Facebook's full post here.

For more on this check out my blog and follow me on Twitter