Slashdot Architecture - How the Old Man of the Internet Learned to Scale
Site: http://slashdot.org
Information Sources
The Platform
The Stats
The Hardware Architecture
- Running Red Hat 9.
- Rackable 1U servers with 2 Xeon 2.66Ghz processors, 2GB of RAM, and 2x80GB IDE hard drives.
- Two serve static content: javascript, images and the front page for non logged-in users.
- Four serve the front page to logged in users
- 10 handle comment pages.
- Host roles are changed in response to load.
- All NFS mounts are in read-only mode.
- All run CentOS 4.
- 2 in a Master-master configuration:
-- Dual Opteron 270's with 16GB RAM, 4x36GB 15K RPM SCSI
-- One master is the write only database.
-- One master is the read only database.
-- They can failover at any time and switch roles.
- 2 reader databases:
-- Dual Opteron 270's with 8GB RAM, 4x36GB 15K RPM SCSI Drive
-- Each syncs from one of the master databases.
-- Can add more to scale, but plenty fast enough for now.
- 3 miscellaneous databases
-- Quad P3 Xeon 700Mhz with 4GB RAM, 8x36GB 10K RPM SCSI Drives
-- Accesslog writer and accesslog reader. Separate databases are used because moderation and stats require a lot of CPU time for computation.
-- Search database.
The Software Architecture
- Non-logged in user see the same page. This page is a static page that is updated every couple of minutes.
- Logged in users have custom options which can't be cached so generating pages for these users take more resources.
- If a request can't be handled it is forwarded on to a web server.
- Pound servers are run on the same machines as the web servers.
- They are distributed for load balancing and redundancy.
- SSL is handled by the pound server so the web server doesn't need to support SSL.
- Software is mounted from /usr/local on the read-only NFS server.
- The images are kept simple. All that is compiled in is:
-- mod_perl
-- lingerd to free up RAM during delivery.
-- mod_auth_useragent to block bots.
- 1 For SSL.
- 2 for static (.shtml) requests.
- 4 for the dynamic homepage.
- 6 for dynamic comment-delivery pages (comments, article, pollBooth.pl).
- 3 for all other dynamic scripts (ajax, tags, bookmarks, firehose).
- Isolate the servers in case there are performance problems or a DDoS attack on a specific page. The rest of the system will function even when one part is failing.
- For efficiency reasons like httpd-level caching and MaxClients tuning. The web server can be tuned differently for each role. MaxClients is set to 5-15 for dynamic web servers and 25 for static servers. The bottleneck is CPU, not RAM so if requests aren't process quickly then something's wrong and queuing more requests won't help the CPU process them any faster.