| Memcached |
Update 5: Nuts & Bolts: HAproxy . Nice explanation (post, screencast) by Mark Imbriaco of why HAProxy (load balancing proxy server) is their favorite (fast, efficient, graceful configuration, queues requests when Mongrels are busy) for spreading dynamic content between Apache web servers and Mongrel application servers.
Update 4: O'Rielly's Tim O'Brien interviews David Hansson, Rails creator and 37signals partner. Says BaseCamp scales horizontally on the application and web tier. Scales up for the database, using one "big ass" 128GB machine. Says: As technology moves on, hardware gets cheaper and cheaper. In my mind, you don't want to shard unless you positively have to, sort of a last resort approach.
Update 3: The need for speed: Making Basecamp faster. Pages now load twice as fast, cut CPU usage by a third and database time by about half. Results achieved by: Analysis, Caching, MySQL optimizations, Hardware upgrades.
Update 2: customer support is handled in real-time using Campfire.
Update: highly useful information on creating a customer billing system.
In the giving spirit of Christmas the folks at 37signals have shared a bit about how their system works. 37signals is most famous for loosing Ruby on Rails into the world and they've use RoR to make their very popular Basecamp, Highrise, Backpack, and Campfire products. RoR takes a lot of heat for being a performance dog, but 37signals seems to handle a lot of traffic with relatively normal sounding resources. This is just an initial data dump, they promise to add more details later. As they add more I'll update it here.
The primero recommendation for speeding up a website is almost always to add cache and more cache. And after that add a little more cache just in case. Memcached is almost always given as the recommended cache to use. What we don't often hear is how to effectively use a cache in our own products. MySQL hosted two excellent webinars (referenced below) on the subject of how to deploy and use memcached. The star of the show, other than MySQL of course, is Farhan Mashraqi of Fotolog. You may recall we did an earlier article on Fotolog in Secrets to Fotolog's Scaling Success, which was one of my personal favorites.
Fotolog, as they themselves point out, is probably the largest site nobody has ever heard of, pulling in more page views than even Flickr. Fotolog has 51 instances of memcached on 21 servers with 175G in use and 254G available. As a large successful photo-blogging site they have very demanding performance and scaling requirements. To meet those requirements they've developed a sophisticated approach to using memcached that others can learn from and emulate. We'll cover some of the highlightable strategies from the webinar down below the fold.

Lukas Biewald shares a fascinating slam by slam recount of how his FaceStat (upload your picture and be judged by the masses) site was battered by a link on Yahoo's main page that caused an almost instantaneous 650,000 page view jump on their site. Yahoo spends considerable effort making sure its own properties can handle the truly massive flow from the main page. Turning the Great Eye of the Internet towards an unsuspecting newborn site must be quite the diaper ready experience. Theo Schlossnagle eerily prophesized about such events in The Implications of Punctuated Scalabilium for Website Architecture: massive, unexpected and sudden traffic spikes will become more common as a fickle internet seeks ever for new entertainments (my summary). Exactly FaceStat's situation.
This is also one of our first exposures to an application written on Merb, a popular Ruby on Rails competitor. For those who think Ruby is the problem, their architecture now serves 100 times the original load.
How did our fine FaceStat fellowship fair against Yahoo’s onslaught?
FeedBurner is a news feed management provider launched in 2004. FeedBurner provides custom RSS feeds and management tools to bloggers, podcasters, and other web-based content publishers. Services provided to publishers include traffic analysis and an optional advertising system.
Update: Jake in Does Django really scale better than Rails? thinks apps like FFS shouldn't need so much hardware to scale.
In a short three months Friends for Sale (think Hot-or-Not with a market economy) grew to become a top 10 Facebook application handling 200 gorgeous requests per second and a stunning 300 million page views a month. They did all this using Ruby on Rails, two part time developers, a cluster of a dozen machines, and a fairly standard architecture. How did Friends for Sale scale to sell all those beautiful people? And how much do you think your friends are worth on the open market?
Pownce is a new social messaging application competing micromessage to micromessage with the likes of Twitter and Jaiku. Still in closed beta, Pownce has generously shared some of what they've learned so far. Like going to a barrel tasting of a young wine and then tasting the same wine after some aging, I think what will be really interesting is to follow Pownce and compare the Pownce of today with the Pownce of tomorrow, after a few years spent in the barrel. What lessons lie in wait for Pownce as they grow?
A fascinating and detailed story of how LiveJournal evolved their system to scale. LiveJournal was an early player in the free blog service race and faced issues from quickly adding a large number of users. Blog posts come fast and furious which causes a lot of writes and writes are particularly hard to scale. Understanding how LiveJournal faced their scaling problems will help any aspiring website builder.
Mixi is a fast growing social networking site in Japan. They provide services like: diary, community, message, review, and photo album. Having a lot in common with LiveJournal they also developed many of the same approaches. Their write up on how they scaled their system is easily one of the best out there.
memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.
Danga Interactive developed memcached to enhance the speed of LiveJournal.com, a site which was already doing 20 million+ dynamic page views per day for 1 million users with a bunch of webservers and a bunch of database servers. memcached dropped the database load to almost nothing, yielding faster page load times for users, better resource utilization, and faster access to the databases on a memcache miss.
This post covers two main options for scaling-out MySql and compare between them. The first is based on data-base clustering and the second is based on In Memory clustering a.k.a Data Grid. A special emphasis is given to a pattern which shows how to scale our existing data base without changing it through a combination of Data Grid and data base as a background service. This pattern is referred to as Persistency as a Service (PaaS). It also address many of the fequently asked question related to how performance, reliability and scalability is achieved with this pattern.
Update 2: a commenter in Twitter Fails Macworld Keynote Test said this entry needs to be updated. LOL. My uneducated guess is it's not a language or architecture problem, but more a problem of not being able to add hardware fast enough into their data center. The predictability of this problem is debatable, but once you have it, it's hard to fix.
Update: Twitter releases Starling - light-weight persistent queue server that speaks the MemCache protocol. It was built to drive Twitter's backend, and is in production across Twitter's cluster.
Twitter started as a side project and blew up fast, going from 0 to millions of page views within a few terrifying months. Early design decisions that worked well in the small melted under the crush of new users chirping tweets to all their friends. Web darling Ruby on Rails was fingered early for the scaling problems, but Blaine Cook, Twitter's lead architect, held Ruby blameless:
For us, it’s really about scaling horizontally - to that end, Rails and Ruby haven’t been stumbling blocks, compared to any other language or framework. The performance boosts associated with a “faster” language would give us a 10-20% improvement, but thanks to architectural changes that Ruby and Rails happily accommodated, Twitter is 10000% faster than it was in January.
If Ruby on Rails wasn't to blame, how did Twitter learn to scale ever higher and higher?
Update: added slides Small Talk on Getting Big. Scaling a Rails App & all that Jazz
Fotolog, a social blogging site centered around photos, grew from about 300 thousand users in 2004 to over 11 million users in 2007. Though they initially experienced the inevitable pains of rapid growth, they overcame their problems and now manage over 300 million photos and 800,000 new photos are added each day. Generating all that fabulous content are 20 million unique monthly visitors and a volunteer army of 30,000 new users each day. They did so well a very impressed suitor bought them out for a cool $90 million. That's scale meets success by anyone standards. How did they do it?
Slashdot effect: overwhelming unprepared sites with an avalanche of reader's clicks after being mentioned on Slashdot. Sure, we now have the "Digg effect" and other hot new stars, but Slashdot was the original. And like many stars from generations past, Slashdot plays the elder statesman's role with with class, dignity, and restraint. Yet with millions and millions of users Slashdot is still box office gold and more than keeps up with the young'ins. And with age comes the wisdom of learning how to handle all those users. Just how does Slashdot scale and what can you learn by going old school?
Caching is like aspirin for headaches. Head hurts: pop a 'sprin. Slow site: add caching. Facebook must have a lot of headaches because they popped 805 memcached servers between 10,000 web servers and 1,800 MySQL servers and they reportedly have a 99% cache hit rate. But what's the best way for you to cache for your application? It's a remarkably complex and rich topic. Alexey Kovyrin talks about one common caching problem called the Dog Pile Effect in Dog-pile Effect and How to Avoid it with Ruby on Rails. Glenn Franxman also has a Django solution in MintCache.
Data is usually cached because it's too expensive to calculate for every hit. Maybe it's a gnarly SQL query you want to avoid and a little stale data is OK. Or maybe the amount of data you have is simply larger than physical memory on any one machine. Or maybe you have the temerity to write to your database and cause its cache to flush so database caching isn't sufficient at a certain level of scale.
Typical examples are for caching article vote counts, comment threads, and event streams. One familiar example that bit me hard is displaying the the top N blog articles. Do you want to scan through your entire access log table for every page display? Absolutely not. Especially when the nightly backups are going on and the network is very slow. Not good :-) Yet you still want to update the results every X minutes so the stats stay fresh.
Data freshness requires a refrigeration truck or an expiry time on your cache entry that causes stats to be periodically recalculated. Now, what happens when your cached data expires and a 1000 requests simultaneously try to recalculate the expensive to calculate data? Database load spikes and the world nearly ends. And since memcached operations are not atomic it's possible stale data could be cached and you'll serve stale data. Which kind of defeats of the purpose of taking load off the data while providing accurate data. So, how do you unpile the dogs?
Update 2: Michael Galpin in Cache Money and Cache Discussions likes memcached for it's expiry policy, complex graph data, process data, but says MySQL has many advantages: SQL, Uniform Data Access, Write-through, Read-through, Replication, Management, Cold starts, LRU eviction.
Update: Dormando asks Should you use memcached? Should you just shard mysql more?. The idea of caching is the most important part of caching as it transports you beyond a simple CRUD worldview. Plan for caching and sharding by properly abstracting data access methods. Brace for change. Be ready to shard, be ready to cache. React and change to what you push out which is actually popular, vs over planning and wasting valuable time.
Feedster's François Schiettecatte wonders if Fotolog's 21 memcached servers wouldn't be better used to further shard data by adding more MySQL servers? He mentions Feedster was able to drop memcached once they partitioned their data across more servers. The algorithm: partition until all data resides in memory and then you may not need an additional memcached layer.
Parvesh Garg goes a step further and asks why people think they should be using MySQL at all?
ThemBid provides a market where people needing work done broadcast their request and accept bids from people competing for the job. Unlike many of the sites profiled at HighScalability, ThemBid is not in the popular press as often as Paris Hilton. It's not a media darling or a giant of the industry. But what I like is they have a strategy, a point-of-view for building websites and were gracious enough to share very detailed instructions on how to go about building a website. They even delve into actual installation details of the various software packages they use. Anyone can benefit by taking a look at their work.
TypePad is considered the largest paid blogging service in the world. After experience problems because of their meteoric growth, they eventually transitioned to an architecture patterned after their sister company, LiveJournal.
Recent comments
1 hour 47 sec ago
2 hours 19 min ago
9 hours 56 min ago
10 hours 3 min ago
15 hours 42 min ago
20 hours 37 min ago
20 hours 45 min ago
20 hours 48 min ago