WordPress.com Serves 70,000 req/sec and over 15 Gbit/sec of Traffic using NGINX

This is a guest post by Barry Abrahamson, Chief Systems Wrangler at Automattic, and Nginx's Coufounder Andrew Alexeev.

WordPress.com serves more than 33 million sites attracting over 339 million people and 3.4 billion pages each month. Since April 2008, WordPress.com has experienced about 4.4 times growth in page views. WordPress.com VIP hosts many popular sites including CNN’s Political Ticker, NFL, Time Inc’s The Page, People Magazine’s Style Watch, corporate blogs for Flickr and KROQ, and many more. Automattic operates two thousand servers in twelve, globally distributed, data centers. WordPress.com customer data is instantly replicated between different locations to provide an extremely reliable and fast web experience for hundreds of millions of visitors.

Problem

WordPress.com, which began in 2005, started on shared hosting, much like all of the WordPress.org sites. It was soon moved to a single dedicated server and then to two servers. In late 2005, WordPress.com opened to the public and by early 2006 had expanded to four web servers, with traffic being distributed using round robin DNS. Soon thereafter WordPress.com expanded to a second data center and then to a third. It quickly became apparent that round robin DNS wasn't a viable long-term solution.

While hardware appliances like F5 BIG-IP's offered many features that WordPress.com required, the 5-member Automattic Systems Team decided to evaluate different options built on existing open source software. Using open source software on commodity hardware provides the ultimate level of flexibility and also comes with a cost savings—"Purchasing a pair of capable hardware appliances in a failover configuration for a single datacenter may be a little expensive, but purchasing and servicing 10 sets for 10 data centers soon becomes very expensive."

At first, the WordPress.com team chose Pound as a software load balancer because of its ease of use and built-in SSL support. After using Pound for about two years, WordPress.com required additional functionality and scalability, namely:

  • On-the-fly reconfiguration capabilities, without interrupting live traffic.
  • Better health check mechanisms, allowing to smoothly and gradually recover from a backend failure, without overloading application infrastructure with unexpected load of requests.
  • Better scalability—both requests per second, and the number of concurrent connections. Pound's thread-based model wasn’t able to reliably handle over 1000 requests per second per load balancing instance.

Solution

In April 2008 Automattic converted all WordPress.com load balancers from Pound to NGINX. Before that Automattic engineers had been using NGINX for Gravatar for a few months and were impressed by its performance and scalability, so moving WordPress.com over was the natural next step. Before switching WordPress.com to NGINX, Automattic evaluated several other products, including HAProxy, and LVS. Here are some of the reasons why NGINX was chosen:

  • Easy, flexible and logical configuration.
  • Ability to reconfigure and upgrade NGINX instances on-the-fly, without dropping user requests.
  • Application request routing via FastCGI, uwsgi or SCGI protocols; NGINX can also serve static content directly from storage for additional performance optimization.
  • The only software tested that was capable of reliably handling over 10,000 request per second of live traffic to WordPress applications from a single server.
  • NGINX’s memory and CPU footprints are minimal, and predictable. After switching to NGINX the CPU usage on the load balancing servers dropped three times.

Overall WordPress.com is serving about 70,000 req/sec and over 15 Gbit/sec of traffic from its NGINX powered load balancers at peak, with plenty of room to grow. Hardware configuration is Dual Xeon 5620 4 core CPUs with hyper-threading, 8-12GB of RAM, running Debian Linux 6.0. As part of high availability setup WordPress.com previously used Wackamole/Spread but has recently started to migrate to Keepalived. Even distribution of inbound requests across NGINX-based web acceleration and load balancing layer is based on DNS round-robin mechanism.

References