Update 4:: Introducing Digg’s IDDB Infrastructure by Joe Stump. IDDB is a way to partition both indexes (e.g. integer sequences and unique character indexes) and actual tables across multiple storage servers (MySQL and MemcacheDB are currently supported with more to follow).
Update 3:: Scaling Digg and Other Web Applications.
Update 2:: How Digg Works and How Digg Really Works (wear ear plugs). Brought to you straight from Digg's blog. A very succinct explanation of the major elements of the Digg architecture while tracing a request through the system. I've updated this profile with the new information.
Update: Digg now receives 230 million plus page views per month and 26 million unique visitors - traffic that necessitated major internal upgrades.
Traffic generated by Digg's over 22 million famously info-hungry users and 230 million page views can crash an unsuspecting website head-on into its CPU, memory, and bandwidth limits. How does Digg handle billions of requests a month?
- Four master databases are partitioned by functionality: promotion, profiles, comments, main. Many slave databases hang off each master.
- Writes go to the masters and reads go to the slaves.
- Transaction-heavy servers use the InnoDB storage engine.
- OLAP-heavy servers use the MyISAM storage engine.
- They did not notice a performance degradation moving from MySQL 4.1 to version 5.
- The schema is denormalized more than "your average database design."
- Sharding is used to break the database into several smaller ones.
The data layer is where most scaling and performance problems are to be found and these are language specific. You'll hit them using Java, PHP, Ruby, or insert your favorite language here.