Several months ago I attended a Joyent presentation where the spokesman hinted that Joyent had the chops to support a one billion page per month Facebook Ruby on Rails application. Even under a few seconds of merciless grilling he would not give up the name of the application. Now we have the big reveal: it was LinkedIn's Bumper Sticker app. For those not currently sticking things on bumps, Bumper Sticker is quite surprisingly a viral media sharing application that allows users to express their individuality by sticking small virtual stickers on Facebook profiles. At the time I was quite curious how Joyent's cloud approach could be leveraged for this kind of app. Now that they've released a few details, we get to find out.
Ruby on Rails (rapid prototype development approach)
Facebook
Joyent Accelerator - provides a highly scalable on-demand infrastructure for running web sites, including rich web applications written in Ruby on Rails, PHP, Python and Java. Joyent Accelerators are next-generation virtual computers that can grow and multiply (or shrink and consolidate) depending on the real world demands faced by your Web application. Accelerators are built on OpenSolaris, multi-core (8+), RAM-rich servers (32GB+ each) and vast amounts of NAS storage.
Masochism Plugin - provides an easy solution for Ruby on Rails applications to work in a replicated database environment. Connection proxy sends some database queries (those in a transaction, update statements, and ActiveRecord::Base#reload) to a master database, and the rest to the slave database.
The Stats
1 billion page views per month
13.5 million installations
1.5 million daily active users. Recruited 1 million users in first 46 days.
20-27 million canvas page views a day
13 web application servers running Nginx and Mongrel
8 static asset servers serving over 3,500,000 stickers (migrating to a CDN)
4 MySQL servers in a master/slave configuration using Masochism as a proxy to load balance database operations.
Cost is about $25K/month.
The Architecture
Bumper Sticker was an experiment to see how fast the Light Engineering Development (LED) team at LinkedIn could build a Ruby on Rails Facebook application.
RoR was an easy an environment to prototype in, but they needed a production environment in which they could quickly develop, deploy, and scale. Joyent was selected.
Some Notes on Joyent: * Joyent is a scale on demand cloud. Allows customers to have a dynamic data center instead of being stuck using their own rigid infrastructure. * There's an API if you need one. The service is unmanaged, you get root on all your boxes. * They consider their infrastructure to be better and more open than Amazon. You get access to a high end load balancer and the capabilities of OpenSolaris (Dtrace, Zones, lower request processing overhead, sub 10 second reboot times). * Joyent's primary scalability principle is to organize apps around silos built from their powerful Accelerator blocks: put applications on different servers based on the quality of service you want to give them. For example, put static content on their own servers so the static content is always served fast and reliably. This allows you to prioritize based on what's important to you. You could, for example, prioritize the virality of your application by putting the Invite Friends functionality on their own servers, thus assuring the growth of your application through your viral functionality possibly at the expense of less important functionality. * Has three data centers in the US and are opening a fourth, none in Europe. * Considers their secret sauce to be their highly sophisticated administration system which allows a few people to easily manage a large infrastructure. * Has a peering relationship with Facebook. That means there are direct high-speed fiber links between Joyent’s data center in Emeryville and Facebook’s data center in San Francisco.
80% of the content for Bumber Sticker is static. The Facebook API can directly render content at a specified memory location. Bumper Sticker was able to use the scripting feature of F5 BIG-IP load balancer to directly load static content by passing a pointer to the Facebook API.
The Lessons
Rails scales exactly like any other app. Take into account all the components from the moment the request is received at the load balancer all the way down and all the way back again.
The development process is: put some measurements in place, find problems, fix problems, more people adopt and scale you out of your solution, and the cycle repeats. Sun's Dtrace feature makes it easy to instrument the stack to identify bottlenecks.
Rails scales as long as the development team using it understands that many of the bottlenecks are exactly those faced by developers on any other database-driven web platform.
Hit a disk spindle and you are screwed. Avoid going to the database or the file system. The more they avoided disk the fewer timeouts they experienced.
Convert anything dynamic into static content. Dynamic content is your enemy. Convert anything dynamic into static content so it can be removed from the disk path.
Push content to the edge. Move content as close to the client as possible. Move cache to the CDN. Reduce time going across the network.
Faster means more viral. On a viral system the better the performance the more people can play with your application. The more people who play with your system the more likely they are to pull more people in, which means the more the app will spread and go viral. Bumper Sticker has been successful at creating a community of fans who enjoy uploading and sharing their own stickers.
Some issues:
Since most of the content is static and served by the load balancer, the impact of Rails in the system is not clear.
The functionality of Bumper Sticker is relatively simple. What would the impact be on scalability if other often requested features like search were added?