Example

FeedBurner Architecture

Todd Hoff

12 Jul 2007 — 2 min read

FeedBurner is a news feed management provider launched in 2004. FeedBurner provides custom RSS feeds and management tools to bloggers, podcasters, and other web-based content publishers. Services provided to publishers include traffic analysis and an optional advertising system.

Site: http://www.feedburner.com

Information Sources

FeedBurner - Scalable Web Applications using MySQL and Java

What the Web’s most popular sites are running on

Platform

Java

MySQL

Hibernate

Spring

Tomcat

Cacti

Load balancing: NetScaler Application Switches

Routers, switches: HP, Cisco

DNS: bind

The Stats

FeedBurner is growing faster than MySpace and Digg with 385% traffic growth. Total feeds: 808,707, Number of publishers: 471,686.

11 million subscribers in 190 countries

Scaling History
- July 2004: 300Kbps, 5,600 feeds, 3 app servers, 3 web servers 2 DB servers, Round Robin DNS
- April 2005: 5Mbps, 47,700 feeds, 6 app servers, 6 web servers (same machines)
- September 2005: 20Mbps, 109,200 feeds
- Currently: 250 Mbps bandwidth usage, 310 million feed views per day, 100 Million hits per day

The Architecture

Scalability Problem 1: Plain old reliability
- Single-server failure, seen by 1/3 of all users
- Health Check all the way back to the database that is monitored by load balancers to route requests in to live machines on failure.
- Use Cacti and Nagios for monitoring. Using these tools you can look at uptime and performance to identify performance problems.

Scalability Problem 2: Stats recording/mgmt
- Every hit is recorded which slows everything down because of table level locks.
- Used Doug Lea’s concurrency library to do updates in multiple threads.
- Only stats for today are calculated in real-time. Other stats are calculate lazily.

Scalability Problem 3: Primary DB overload
- Use master DB for everything.
- Balance read and read/write load
- Found where we could break up read vs. read/write
- Balanced master vs. slave load

Scalability Problem 4: Total DB overload
- Everything slowed down, was using the database has cache, used MyISAM
- Add caching layers. RAM on the machines, memcached, and in the database

Scalability Problem 5: Lazy initialization
- When stats get rolled up on demand popular feeds slowed down the whol system
- Turned to batch processing, doing the rollups once a night.

Scalability Problem 6: Stats writes, again
- Wrote to the master too much. More data with each feed. Added more stats tracking for ads, items, and circulation.
- Use merge tables. Truncate the data from 2 days ago.
- Went to horizontal partitioning: ad serving, flare serving, circulation.
- Move hottest tables/queries to own clusters.

Scalability Problem 7: Master DB Failure
- Using a primary and slave there's a single point of failure because it's hard to promote a slave to a master. Went to a multi master solution.

Scalability Problem 8: Power Failure
- Needed a disaster recovery/secondary site.
- Active/active not possible. Too much hardware, didn't like having half the hardware going to waste, and needed a really fast connection between data centers.
- Create custom solution to download feeds to remote servers.

They have two sites in primary and secondary roles (active-passive) as their geographical redundancy plan. They plan on moving to active-active model in the future.

Lessons Learned

Know your DB workload, Cacti really helps with this.

‘EXPLAIN’ all of your queries. Helps keep crushing queries out of the system.

Cache everything that you can.

Profile your code, usually only needed on hard-to-find leaks.

The greatest challenge was finding the most efficient ways to locate hotspots and bottlenecks in the application. With a loose methodology for locating problems, the analysis became very easy. Detailed monitoring was crucial in this, keeping track of disk, CPU and memory usage, slow database queries, handler details in MySQL, etc.