advertise
« Performance Anti-Pattern | Main | Collectl interface to Ganglia - any interest? »
Saturday
Apr042009

Digg Architecture

Update 4:: Introducing Digg’s IDDB Infrastructure by Joe Stump. IDDB is a way to partition both indexes (e.g. integer sequences and unique character indexes) and actual tables across multiple storage servers (MySQL and MemcacheDB are currently supported with more to follow).
Update 3:: Scaling Digg and Other Web Applications.
Update 2:: How Digg Works and How Digg Really Works (wear ear plugs). Brought to you straight from Digg's blog. A very succinct explanation of the major elements of the Digg architecture while tracing a request through the system. I've updated this profile with the new information.
Update: Digg now receives 230 million plus page views per month and 26 million unique visitors - traffic that necessitated major internal upgrades.

Traffic generated by Digg's over 22 million famously info-hungry users and 230 million page views can crash an unsuspecting website head-on into its CPU, memory, and bandwidth limits. How does Digg handle billions of requests a month?

Site: http://digg.com

Information Sources

  • How Digg Works by Digg
  • How Digg.com uses the LAMP stack to scale upward
  • Digg PHP's Scalability and Performance

    Platform

  • MySQL
  • Linux
  • PHP
  • Lucene
  • Python
  • APC PHP Accelerator
  • MCache
  • Gearman - job scheduling system
  • MogileFS - open source distributed filesystem
  • Apache
  • Memcached

    The Stats

  • Started in late 2004 with a single Linux server running Apache 1.3, PHP 4, and MySQL. 4.0 using the default MyISAM storage engine
  • Over 22 million users.
  • 230 million plus page views per month
  • 26 million unique visitors per month
  • Several billion page views per month
  • None of the scaling challenges faced had anything to do with PHP. The biggest issues faced were database related.
  • Dozens of web servers.
  • Dozens of DB servers.
  • Six specialized graph database servers to run the Recommendation Engine.
  • Six to ten machines that serve files from MogileFS.

    What's Inside

  • Specialized load balancer appliances monitor the application servers, handle failover, constantly adjust the cluster according to health, balance incoming requests and caching JavaScript, CSS and images. If you don't have the fancy load balancers take a look at Linux Virtual Server and Squid as a replacement.
  • Requests are passed to the Application Server cluster. Application servers consist of: Apache+PHP, Memcached, Gearman and other daemons. They are responsible for making coordinating access to different services (DB, MogileFS, etc) and creating the response sent to the browser.
  • Uses a MySQL master-slave setup.
    - Four master databases are partitioned by functionality: promotion, profiles, comments, main. Many slave databases hang off each master.
    - Writes go to the masters and reads go to the slaves.
    - Transaction-heavy servers use the InnoDB storage engine.
    - OLAP-heavy servers use the MyISAM storage engine.
    - They did not notice a performance degradation moving from MySQL 4.1 to version 5.
    - The schema is denormalized more than "your average database design."
    - Sharding is used to break the database into several smaller ones.
  • Digg's usage pattern makes it easier for them to scale. Most people just view the front page and leave. Thus 98% of Digg's database accesses are reads. With this balance of operations they don't have to worry about the complex work of architecting for writes, which makes it a lot easier for them to scale.
  • They had problems with their storage system telling them writes were on disk when they really weren't. Controllers do this to improve the appearance of their performance. But what it does is leave a giant data integrity whole in failure scenarios. This is really a pretty common problem and can be hard to fix, depending on your hardware setup.
  • To lighten their database load they used the APC PHP accelerator MCache.
  • Memcached is used for caching and memcached servers seemed to be spread across their database and application servers. A specialized daemon monitors connections and kills connections that have been open too long.
  • You can configure PHP not parse and compile on each load using a combination of Apache 2’s worker threads, FastCGI, and a PHP accelerator. On a page's first load the PHP code is compiles so any subsequent page loads are very fast.
  • MogileFS, a distributed file system, serves story icons, user icons, and stores copies of each story’s source. A distributed file system spreads and replicates files across a lot of disks which supports fast and scalable file access.
  • A specialized Recommendation Engine service was built to act as their distributed graph database. Relational databases are not well structured for generating recommendations so a separate service was created. LinkedIn did something similar for their graph.

    Lessons Learned

  • The number of machines isn't as important what the pieces are and how they fit together.
  • Don't treat the database as a hammer. Recommendations didn't fit will with the relational model so they made a specialized service.
  • Tune MySQL through your database engine selection. Use InnoDB when you need transactions and MyISAM when you don't. For example, transactional tables on the master can use MyISAM for read-only slaves.
  • At some point in their growth curve they were unable to grow by adding RAM so had to grow through architecture.
  • People often complain Digg is slow. This is perhaps due to their large javascript libraries rather than their backend architecture.
  • One way they scale is by being careful of which application they deploy on their system. They are careful not to release applications which use too much CPU. Clearly Digg has a pretty standard LAMP architecture, but I thought this was an interesting point. Engineers often have a bunch of cool features they want to release, but those features can kill an infrastructure if that infrastructure doesn't grow along with the features. So push back until your system can handle the new features. This goes to capacity planning, something the Flickr emphasizes in their scaling process.
  • You have to wonder if by limiting new features to match their infrastructure might Digg lose ground to other faster moving social bookmarking services? Perhaps if the infrastructure was more easily scaled they could add features faster which would help them compete better? On the other hand, just adding features because you can doesn't make a lot of sense either.
  • The data layer is where most scaling and performance problems are to be found and these are language specific. You'll hit them using Java, PHP, Ruby, or insert your favorite language here.

    Related Articles

    Reader Comments (45)

    Yeh, how come digg only have 30 GB of data? If this report is authentic, I am highly amused.

    November 29, 1990 | Unregistered CommenterPhotoshop Tutorials

    30gb is database data is HUUUUUGE !

    i have been blogging for nearly 4 years and i've used only some 15 mb in database data.

    30 gb in database data is colossal.

    November 29, 1990 | Unregistered CommenterAnonymous

    30 gb database isn't much if they track just a little historic data about their 1.2 million users.

    November 29, 1990 | Unregistered CommenterMartin Leblanc Bakmar

    I am not convinced. Digg would need much more than the 30GB data (whatever) shown here.

    November 29, 1990 | Unregistered CommenterAnonymous

    You're wondering why "only" 30GB? Databases are text. A character is a byte. Do the math.

    November 29, 1990 | Unregistered CommenterAnonymous

    Maybe I will be alone, but for me there is nothing to be proud. It looks standard US company where boxes are cheaper then people, so its easier to buy 100 servers then to pay 10 people to play with EXPLAIN command.

    November 29, 1990 | Unregistered Commenterzed

    You may come from a time when you did have to wring every last performance gain from a piece of hardware. However, it is (somewhat) true that the cost/benefit analysis is moving more toward just throwing more hardware at a problem. Engineers, especially good ones, aren't cheap; hardware is. So while you may not be impressed, I find knowing when to spend less money to accomplish the same task "smart".

    Now, that doesn't mean that proper database design should be thrown out the window. I am actually somewhat ADD when it comes to relations, constraints, indexing, etc. But there comes a time when it just makes sense to do the cost effective thing, and I would hope that is what digg does.

    November 29, 1990 | Unregistered CommenterJonathon

    Wow, I didn't realize how bad my last subject was till just now.

    One last thing, it actually takes quite a bit of effort to be able to scale horizontally. Just because they use a lot of boxes doesn't mean that they didn't spend a lot of time and effort in the overall system architecture and database design. That is currently what I am designing my system to be able to do, and it isn't so I can not worry about optimizing my code, its so that I can capacity plan and buy resources to handle increasing load incrementally and cost effectively. I find being able to do that IS impressive.

    November 29, 1990 | Unregistered CommenterAnonymous

    Maybe the 30 GB he's referring to is the space needed on each cluster node. OS + Application, etc.

    November 29, 1990 | Unregistered CommenterJohn

    I'll be the first to disagree. First, without the right people designing the overall architecture, throwing boxes at a problem won't do much other than increasing your electricity bill. I've seen this first hand with clients. In fact, I've seen "solutions" where clusters were used that actually decreased performance because the design counter-indicated the use of a cluster (in other words, it wasn't a scalable design).

    Personally, I'd rather have a select few architects and administrators and many powerful machines, than be top heavy with staff and not have enough server resources. :)

    --
    Dustin Puryear
    Author, http://www.puryear-it.com/pubs/linux-unix-best-practices">Best Practices for Managing Linux and UNIX Servers
    http://www.puryear-it.com">http://www.puryear-it.com

    November 29, 1990 | Unregistered CommenterDustin Puryear

    What load balancing solution/product is Digg using?

    November 29, 1990 | Unregistered CommenterAnonymous

    sure they can add servers cheaply, but 100 servers for 200M views per month? this compared to 1.1B views per month for 2 boxes at plenty of fish, digg sux?

    November 29, 1990 | Unregistered Commenterjonathan

    Any idea what form of FastCGI is being used at Digg? Or what other people use for PHP/Apache configurations?

    November 29, 1990 | Unregistered CommenterAnonymous

    To clear things up. 30GB of Database is clossal. On DIGG, you just have store [in a database] a short title, short context, a link URL, date, author, and # of diggs and thats about it. And then some more user account data... But that's nothing. 30 GB is 30,000,000,000 bytes (approx) and each character is a byte (UTF8)

    I mean 30GB might sound likes its small for you because you store videos/music/games but we are talking about plaintext here. Get over it.

    November 29, 1990 | Unregistered CommenterKeehun Nam

    A 30gb database is not that big really.

    You forget that the site is not only storing the number of diggs, but who dugg what, who dugg which comments, most likely storing click thru (which i suspect they use to detect fraud). A popular story might have several thousands of diggs, then several thousands of diggs within the comments. This data is all stored.

    We also know digg is sharding so id suspect the databse size is substantially larger than this.

    November 29, 1990 | Unregistered CommenterTony

    BIG-IP I believe :)

    November 29, 1990 | Unregistered CommenterAnonymous

    I highly doubt thats true, they probably do under 20million pageviews with two boxes.

    November 29, 1990 | Unregistered CommenterAnonymous

    I have worked with MS SQL Servers for banks, developing mostly core-of-business and BI applications, where the SQL Server MDF has reached 100GB+. Th database in question tracked every transaction against every account for every client over a period of 5 odd years. Thats big. And that was SQL 2000. And it was fast, very fast. SQL Server and a solid, dependable enterprise architecture, developed on the .NET framework > anything else on earth. And any anti-MS geek that think he can do better using any other (inferior) toolset is just plain dreaming.

    November 29, 1990 | Unregistered CommenterAnonymous

    Like mentioned before - 30GB is quite a large database. I wouldn't say colossal as that implies (to me) something at the edge of its limits that has no room to grow anymroe. Nevertheless, quite a large database.
    On www.freecrm.com we've reached 30GB database size on a mySql 5 setup on 1 fast server but after pruning old email logs we cut it down to around 18GB. That is for over 4 years of heavy usage in a CRM application sporting over 60,000 users. So far mySql works without a hitch and any slow query so far has been due to inadequate indexing.

    -Harel Malka ------------------------
    http://www.harelmalka.com
    http://www.freecrm.com
    http://www.kadoink.com

    November 29, 1990 | Unregistered Commenterharel

    30gb is tiny compared to the databases that I manage.

    Most the production database I manage range from 600gb to the 2tb range all running Oracle 10g.

    Most of these website have trouble because they choose mysql instead of a real database.

    Sites like Bebo that use Oracle scale well. Bebo.com has an infrastructure that supports more than 100 million site page views per day and about 1.2 million image uploads per day using only 6 cpus.

    Link here:

    http://searchoracle.techtarget.com/originalContent/0,289142,sid41_gci1241597,00.html

    November 29, 1990 | Unregistered CommenterEnzo

    What's the size of lucene data?

    November 29, 1990 | Unregistered CommenterAnonymous

    Everyone might want to take note of anyone saying that 30gb is big or even medium sized. You might not want to take their opinion when it comes to anything related to high scalability. For they, obviously, have not had a need to scale.

    November 29, 1990 | Unregistered CommenterBill

    I would like to know more about how/why they use MCache. APC should be fine for local-server caching, memcached for distributed caching. What does MCache help with? I used to use msession, and it had some weird startup overhead on each page. I've switched to using MySQL for sessions instead of msession and it's great. I am not sure why they added MCache into the mix and would like to know how and why...

    November 29, 1990 | Unregistered Commentermike

    I guess each server has 30G each

    November 29, 1990 | Unregistered Commentervenu

    I think the point in calling 30GB small is that you can buy machines with upwards of 128GB of RAM and fit the entire database in cache.

    Considering a quad-core opteron with 32GB of RAM from Dell is going for $6600, it seems like wasting resources on memcached plus all those web servers for, by my math, 77 requests/second is a bit excessive.

    November 29, 1990 | Unregistered CommenterAnonymous

    PostPost a New Comment

    Enter your information below to add a new comment.
    Author Email (optional):
    Author URL (optional):
    Post:
     
    Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>