FeedBurner Architecture
FeedBurner is a news feed management provider launched in 2004. FeedBurner provides custom RSS feeds and management tools to bloggers, podcasters, and other web-based content publishers. Services provided to publishers include traffic analysis and an optional advertising system.
Site: http://www.feedburner.com
Information Sources
Platform
The Stats
- July 2004: 300Kbps, 5,600 feeds, 3 app servers, 3 web servers 2 DB servers, Round Robin DNS
- April 2005: 5Mbps, 47,700 feeds, 6 app servers, 6 web servers (same machines)
- September 2005: 20Mbps, 109,200 feeds
- Currently: 250 Mbps bandwidth usage, 310 million feed views per day, 100 Million hits per day
The Architecture
- Single-server failure, seen by 1/3 of all users
- Health Check all the way back to the database that is monitored by load balancers to route requests in to live machines on failure.
- Use Cacti and Nagios for monitoring. Using these tools you can look at uptime and performance to identify performance problems.
- Every hit is recorded which slows everything down because of table level locks.
- Used Doug Lea’s concurrency library to do updates in multiple threads.
- Only stats for today are calculated in real-time. Other stats are calculate lazily.
- Use master DB for everything.
- Balance read and read/write load
- Found where we could break up read vs. read/write
- Balanced master vs. slave load
- Everything slowed down, was using the database has cache, used MyISAM
- Add caching layers. RAM on the machines, memcached, and in the database
- When stats get rolled up on demand popular feeds slowed down the whol system
- Turned to batch processing, doing the rollups once a night.
- Wrote to the master too much. More data with each feed. Added more stats tracking for ads, items, and circulation.
- Use merge tables. Truncate the data from 2 days ago.
- Went to horizontal partitioning: ad serving, flare serving, circulation.
- Move hottest tables/queries to own clusters.
- Using a primary and slave there's a single point of failure because it's hard to promote a slave to a master. Went to a multi master solution.
- Needed a disaster recovery/secondary site.
- Active/active not possible. Too much hardware, didn't like having half the hardware going to waste, and needed a really fast connection between data centers.
- Create custom solution to download feeds to remote servers.