RoR

Todd Hoff's picture

Scaling Twitter: Making Twitter 10000 Percent Faster

Update 6: Some interesting changes from Twitter's Evan Weaver: everything in RAM now, database is a backup; peaks at 300 tweets/second; every tweet followed by average 126 people; vector cache of tweet IDs; row cache; fragment cache; page cache; keep separate caches; GC makes Ruby optimization resistant so went with Scala; Thrift and HTTP are used internally; 100s internal requests for every external request; rewrote MQ but kept interface the same; 3 queues are used to load balance requests; extensive A/B testing for backwards capability; switched to C memcached client for speed; optimize critical path; faster to get the cached results from the network memory than recompute them locally.
Update 5: Twitter on Scala. A Conversation with Steve Jenson, Alex Payne, and Robey Pointer by Bill Venners. A fascinating discussion of why Twitter moved to the Java JVM for their server infrastructure (long lived processes) and why they moved to Scala to program against it (high level language, static typing, functional). Ruby is used on the front-end but wasn't performant or reliable enough for the back-end.
Update 4: Improving Running Components at Twitter by Evan Weaver. Tells how Twitter changed their infrastructure to go from handling 3 requests to 139 requests a second. They moved to a messaging model, asynchronous process, 3 levels of cache, and moved their middleware to a mixture C and Scala/JVM.
Update 3: Upgrading Twitter without service disruptions by Gojko Adzic. Lots of good updates on the new Twitter architecture.
Update 2: a commenter in Twitter Fails Macworld Keynote Test said this entry needs to be updated. LOL. My uneducated guess is it's not a language or architecture problem, but more a problem of not being able to add hardware fast enough into their data center. The predictability of this problem is debatable, but once you have it, it's hard to fix.
Update: Twitter releases Starling - light-weight persistent queue server that speaks the MemCache protocol. It was built to drive Twitter's backend, and is in production across Twitter's cluster.

Todd Hoff's picture

Heroku - Simultaneously Develop and Deploy Automatically Scalable Rails Applications in the Cloud

Update 2: Heroku has gone live!. Congratulations to the team. It's difficult right now to get a feeling for the relative cost and reliability of Heroku, but it's an impressive accomplishment and a viable option for people looking for a delivery platform.
Update: Heroku Architecture. A great interactive presentation of the Heroku stack. Requests flow into Nginx used as a HTTP Reverse Proxy. Nginx routes requests into a Varnish based HTTP cache. Then requests are injected into an Erlang based routing mesh that balances requests across a grid of dynos. Dynos are your application "VMs" that implement application specific behaviors. Dynos themselves are a stack of: POSIX, Ruby VM, App Server, Rack, Middleware, Framework, Your App. Applications can access PostgreSQL. Memcached is used as an application caching layer.

Todd Hoff's picture

At Some Point the Cost of Servers Outweighs the Cost of Programmers

This is the intriguing quote by Bill Venners in an interview with Twitter's Alex Payne on Twitter's heretical switch from a pure Ruby stack to a Ruby on Rails stack on the front-end and JVM/Scala on the back-end:


So performance was also one of the problems with JRuby, which I [Bill Venners] think helps explain better why they'd [Twitter] prefer Scala over Ruby or JRuby for some things.

I have often heard Rubyists say that although Ruby is slower than Java, for many things it is plenty fast enough, and they are right. The logic goes further, saying that servers are cheap, and programmers expensive, so it makes sense to tradeoff some runtime performance for programmer productivity. And I think that's very often true too, but not always. If you have enough traffic, at some point the cost of servers outweighs the cost of programmers. I'm not sure whether Twitter is past that point, but they get a lot of traffic. And frankly this isn't an intrinsic tradeoff. Other dynamic languages are faster than Ruby, and Scala is too. And people can be quite productive in these other languages too, including Scala.

I feel Alex's Max Payne. You might wonder why the geekosphere cares so passionately which technology stack Twitter uses? Well, it's Twitter and it's Ruby on Rails. That's like the Lindsay Lohan and Samantha Ronson of tech buzz. It creates it's own self-sustaining posting reaction. Boom!

It took some giant cajones to switch from a well defended platform like Ruby on Rails to an obscure language like Scala. Few people would have been brave enough to pull the trigger on that decision.

Twitter didn't take this large leap out of ignorance or incompetence. Twitter's Steve Jenson said they spent several weeks going over our options, running extensive load tests, and presented our findings to the team at each stage. We did our due diligence.

They did the work and came to conclusions valid for their situation. They have to follow their own bliss. They aren't telling you to use Scala. They aren't telling you not to use Ruby. Have at it. But they have chosen the path less traveled and seem happy with the direction they are heading. If you aren't happy with their decision then that's a you problem, not a them problem.

Todd Hoff's picture

13 Screencasts on How to Scale Rails

Gregg Pollack has made 13 screen casts on how to scale rails:

  • Episode #1 - Page Responsiveness
  • Episode #2 - Page Caching
  • Episode #3 - Cache Expiration
  • Episode #4 - New Relic RPM
  • Episode #5 - Advanced Page Caching
  • Episode #6 - Action Caching
  • Episode #7 - Fragment Caching
  • Episode #8 - Memcached
  • Episode #9 - Taylor Weibley & Databases
  • Episode #10 - Client-side Caching
  • Episode #11 - Advanced HTTP Caching
  • Episode #12 - Jesse Newland & Deployment
  • Episode #13 - Jim Gochee & Advanced RPM

    For a good InfoQ interview with Greg take a look at Gregg Pollack and the How-To of Scaling Rails.

  • Building and Scaling a Startup on Rails: 12 Things We Learned the Hard Way

    Garry Tan, cofounder of Posterous, lists 12 lessons for scaling that apply to more than just Rails.

  • Use cloud storage for static files.
  • Use HTTP Cache Control to tell the browser what it can cache.
  • Use Sphinx for text search.
  • Use InnoDB for more crash resistant and faster writes.
  • Don't use textbook Rails ActiveRecord objects. Use New Relic to find exactly what is slow in your system.
  • Use memcache later so you find your database bottlenecks now.
  • Use mongrel proctitle to find your slow queries. You are only as fast as your slowest queries.
  • Use asynchronous job queuing to do work in parallel.
  • Use monitoring so you'll know when your site went down and why.
  • Learn by reading the source code, fixing problems, and submitting them back to the community.
  • Use new plugins. Old plugins can't be trusted.
  • Use new information. Old information can't be trusted.

  • Olio Web2.0 Toolkit - Evaluate Web Technologies and Tools

    How do you evaluate and decide which web technologies (and there are myriads out there) to use for your new web application, which one potentially gives you the best performance, which one will likely give you the shortest time-to-market? The Apache incubator project Olio might help.

    Olio is a is an open source web 2.0 toolkit to help evaluate the suitability, functionality and performance of web technologies. Olio defines an example web2.0 application (an events site somewhat like yahoo.com/upcoming) and provides three initial implementations : PHP, Java EE and RubyOnRails (ROR). The toolkit also defines ways to drive load against the application in order to measure performance.

    Apache Olio could be used to

    • Understand how to use various web 2.0 technologies such as AJAX, memcached, mogileFS etc. Use the code in the application to understand the subtle complexities involved and how to get around issues with these technologies.
    • Evaluate the differences in the three implementations: php, ruby and java to understand which might best work for your situation.
    • Within each implementation, evaluate different infrastructure technologies by changing the servers used (e.g: apache vs lighttpd, mysql vs postgre, ruby vs Jruby etc.)
    • Drive load against the application to evaluate the performance and scalability of the chosen platform.
    • Experiment with different algorithms (e.g. memcache locking, a different DB access API) by replacing portions of code in the application.

    Olio started it's life as the web2.0kit developed by Sun Microsystems in colloboration with U.C. Berkeley RAD Lab and was presented on Velocity2008.

    How to Scale with Ruby on Rails

    By George Palmer of 3dogsbark.com. Covers:
    * How you start out: shared hosting, web server DB on same machine. Move two 2 machines. Minimal code changes.
    * Scaling the database. Add read slaves on their own machines. Then master-master setup. Still minimal code changes.
    * Scaling the web server. Load balance against multiple application servers. Application servers scale but the database doesn't.
    * User clusters. Partition and allocate users to their own dedicated cluster. Requires substantial code changes.
    * Caching. A large percentage of hits are read only. Use reverse proxy, memcached, and language specific cache.
    * Elastic architectures. Based on Amazon EC2. Start and stop instances on demand. For global applications keep a cache on each continent, assign users to clusters by location, maintain app servers on each continent, use transaction replication software if you must replicate your site globally.

    Todd Hoff's picture

    37signals Architecture

    Update 6: Things We’ve Learned at 37Signals. Themes: less is more; don't worry be happy.
    Update 5: Nuts & Bolts: HAproxy . Nice explanation (post, screencast) by Mark Imbriaco of why HAProxy (load balancing proxy server) is their favorite (fast, efficient, graceful configuration, queues requests when Mongrels are busy) for spreading dynamic content between Apache web servers and Mongrel application servers.
    Update 4: O'Rielly's Tim O'Brien interviews David Hansson, Rails creator and 37signals partner. Says BaseCamp scales horizontally on the application and web tier. Scales up for the database, using one "big ass" 128GB machine. Says: As technology moves on, hardware gets cheaper and cheaper. In my mind, you don't want to shard unless you positively have to, sort of a last resort approach.
    Update 3: The need for speed: Making Basecamp faster. Pages now load twice as fast, cut CPU usage by a third and database time by about half. Results achieved by: Analysis, Caching, MySQL optimizations, Hardware upgrades.
    Update 2: customer support is handled in real-time using Campfire.
    Update: highly useful information on creating a customer billing system.

    In the giving spirit of Christmas the folks at 37signals have shared a bit about how their system works. 37signals is most famous for loosing Ruby on Rails into the world and they've use RoR to make their very popular Basecamp, Highrise, Backpack, and Campfire products. RoR takes a lot of heat for being a performance dog, but 37signals seems to handle a lot of traffic with relatively normal sounding resources. This is just an initial data dump, they promise to add more details later. As they add more I'll update it here.

    Todd Hoff's picture

    Scaling Bumper Sticker: A 1 Billion Page Per Month Facebook RoR App

    Several months ago I attended a Joyent presentation where the spokesman hinted that Joyent had the chops to support a one billion page per month Facebook Ruby on Rails application. Even under a few seconds of merciless grilling he would not give up the name of the application. Now we have the big reveal: it was LinkedIn's Bumper Sticker app. For those not currently sticking things on bumps, Bumper Sticker is quite surprisingly a viral media sharing application that allows users to express their individuality by sticking small virtual stickers on Facebook profiles. At the time I was quite curious how Joyent's cloud approach could be leveraged for this kind of app. Now that they've released a few details, we get to find out.

    Todd Hoff's picture

    Strategy: Break Up the Memcache Dog Pile

    Caching is like aspirin for headaches. Head hurts: pop a 'sprin. Slow site: add caching. Facebook must have a lot of headaches because they popped 805 memcached servers between 10,000 web servers and 1,800 MySQL servers and they reportedly have a 99% cache hit rate. But what's the best way for you to cache for your application? It's a remarkably complex and rich topic. Alexey Kovyrin talks about one common caching problem called the Dog Pile Effect in Dog-pile Effect and How to Avoid it with Ruby on Rails. Glenn Franxman also has a Django solution in MintCache.

    Data is usually cached because it's too expensive to calculate for every hit. Maybe it's a gnarly SQL query you want to avoid and a little stale data is OK. Or maybe the amount of data you have is simply larger than physical memory on any one machine. Or maybe you have the temerity to write to your database and cause its cache to flush so database caching isn't sufficient at a certain level of scale.

    Typical examples are for caching article vote counts, comment threads, and event streams. One familiar example that bit me hard is displaying the the top N blog articles. Do you want to scan through your entire access log table for every page display? Absolutely not. Especially when the nightly backups are going on and the network is very slow. Not good :-) Yet you still want to update the results every X minutes so the stats stay fresh.

    Data freshness requires a refrigeration truck or an expiry time on your cache entry that causes stats to be periodically recalculated. Now, what happens when your cached data expires and a 1000 requests simultaneously try to recalculate the expensive to calculate data? Database load spikes and the world nearly ends. And since memcached operations are not atomic it's possible stale data could be cached and you'll serve stale data. Which kind of defeats of the purpose of taking load off the data while providing accurate data. So, how do you unpile the dogs?

    Syndicate content