Update: Jake in Does Django really scale better than Rails? thinks apps like FFS shouldn't need so much hardware to scale.
In a short three months Friends for Sale (think Hot-or-Not with a market economy) grew to become a top 10 Facebook application handling 200 gorgeous requests per second and a stunning 300 million page views a month. They did all this using Ruby on Rails, two part time developers, a cluster of a dozen machines, and a fairly standard architecture. How did Friends for Sale scale to sell all those beautiful people? And how much do you think your friends are worth on the open market?
Site: http://www.facebook.com/apps/application.php?id=7019261521
Our system is designed for our Facebook application, Friends for Sale.
It's basically Hot-or-Not with a market economy. At the time of this
writing it's the 10th most popular application on Facebook.
Their Facebook description reads: Buy and sell your friends as pets! You can make your pets poke, send gifts, or just show off for you.
Make money as a shrewd pets investor or as a hot commodity! Friends for Sale is the bees knees!
We designed this as more of an experiment to see if we understood
virality concepts and metrics on Facebook. I guess we do. =)
As a Facebook application, every request is dynamic so no page caching
is possible. Also, it is a very interactive, write heavy application
so scaling the database was a challenge.
We memcached extensively early on - every page reload results in 0 SQL
calls. We use Rail's fragment caching with custom expiration logic
mostly.
We had more than half a million unique visitors yesterday and growing fast.
We're on track to do more than 300 million page views this month.
We used around 3 terabytes of bandwidth last month. This month should
be at least 5TB or so. This number is just for a few icons and
XHTML/CSS.
We don't really have unique documents ... we do have around 10 million
user profiles though.
The only images we store are a few static image icons.
We went from around 3M page views per day a month ago to more than 10M
page views a day. A month before that we were doing 1M page views per
day. So that's around a 300% monthly growth rate but that is
plateauing. On a request per second basis, we get around 200 requests
per second.
It's all free.
It's around 1% per day, with a growth rate of 3% or so per day in
terms of installed users.
We had roughtly 2.1 million unique visitors in the past month
according to Google.
It's a relatively standard Rails cluster. We have a dedicated front
end proxy balancer / static web server running nginx, which proxies
directly to 6, 4 core 8 GB application servers. Each application
server runs 16 mongrels for a total of 96 mongrels. The front end load
balancer proxies directly to the mongrel ports. In addition, we run a
4 GB memcache instance on each application server, along with a local
starling distributed queue server and misc background processes.
We use god to monitor our processes.
On the DB layer, we have 2 32GB 4 core servers with 4x 15K SCSI RAID
10 disks in a master-slave setup. We use Dr Nic's magic
multi-connection's gem in production split reads and writes to each
box.
We are adding more slaves right now so we can distribute the read load
better and have better redundancy and backup policies. We also get
help from Percona (the mysqlperformanceblog guys) for remote DBA work.
We're hosted on Softlayer - they're a fantastic host. The only problem
was that their hardware load balancing server doesn't really work very
well ... we had lots of problems with hanging connections and latency.
Switching a dedicated box running just nginx fixed everything.
It really isn't. On the application layer we are shared-nothing so
it's pretty trivial. On the database side we're still with a
monolithic master and we're trying to push off sharding for as long as
we can. We're still vertically scaled on the database side and I think
we can get away with it for quite some time.
The three things that are unique is -
1. Neither of the two developers in involved had previous experience in large scale Rails deployment.
2. Our growth trajectory is relatively rare in the history of Rails deployments
3. We had very little opportunity for static page caching - each request does hit the full Rails stack
We learned that a good host, good hardware, and a good DBA are very
important. We used to be hosted on Railsmachine, which to be fair is
an excellent shared hosting company and they did go out of there way
to support us. In the end though, we were barely responsive for a good
month due to hardware problems, and it only took two hours to get up
and running on Softlayer without a hitch. Choose a good host if you
plan on scaling, because migrating isn't fun.
The most important thing we learned is that your scalability problems
is pretty much always, always, always the database. Check it first,
and if you don't find anything, check again. Then check again. Without
exception, every performance problem we had can be traced to the
database server, the database configuration, the query, or the use and
non-use of indices.
We definitely should have gotten on to a better host earlier in the
game so we would have been up.
We definitely wouldn't change our choice of framework - Rails was
invaluable for rapid application development, and I think we've pretty
much proven that two guys without a lot of scaling experience can
scale a Rails app up. The whole 'but does Rails scale?' discussion
sounds like a bunch of masturbation - the point is moot.
We have two Rails developers, inclusive of me. We very recently
retained the services of a remote DBA for help on the database end.
On the technical side, 2 part time (now full time), and 1 remote DBA
contractor.
The full time employees are also located in the SOMA area of San Francisco.
The two developers server as co-founders . I (Siqi) was responsible
for front end design and development early on, but since I had some
experience with deployment I also ended up handling network operations
and deployment as well. My co founder Alex is responsible for the bulk
of the Rails code - basically all the application logic is from him.
Now I find myself doing more deep back end network operations tasks
like MySQL optimization and replication - it's hard to find time to
get back to the front end which is what I love. But it's been a real
fun learning experience so I've been eating up all I can from this.
Yes - basically find the smartest people you can, give them the best
deal possible, and get out of their way. The best managers GET OUT OF
THE WAY, so I try to run the company as much as I can with that in
mind. I think I usually fail at it.
We'd have to have some really good communication tools in the cloud -
somebody would have to be a Basecamp nazi. I think remote work /
outsourcing is really difficult - I prefer to stay away with from it
for core development. For something like MySQL DBA or even sysadmin -
it might make more sense.
We use Rails with a bunch of plugins, most notable cache-fu from Chris
Wanstrath and magic multi connections from Dr. Nic. I use VIM as the
editor with the rails.vim plugin.
Ruby / Rails
We now have 12 servers in the cluster.
4 DB servers, 6 application servers, 1 staging server, and 1 front end server.
We order them from Softlayer - there's a less than 4 hour turn around
for most boxes, which is awesome.
CentOS 5 (64 bit)
nginx
MySQL 5.1
We just use nginx's built in proxy balancer.
We use a dedicated hosting service, Softlayer.
We use NAS for backups but internal SCSI drives for our production boxes.
Across all of our boxes we probably have around ... 5 TB of storage or
thereabouts.
Ad-hoc. We haven't done a proper capacity planning study, to our detriment.
Nope.
Nope.
Right now we just persist it to the database - it would be fairly easy
to use memcache directly for this purpose though.
Master/slave right now. We're moving towards a Master/Multi-slave with
a read only load balancing proxy to the slave cluster.
We do it in software via nginx.
Rails.
None.
Starling
We run network ads. We also weight our various ad networks by eCPM on
our application layer.
Nope.
2 developers.
Me: Front end design, development, limited Rails. Obviously, recently
proficient in MySQL optimization and large scale Rails deployment.
Alex: application logic development, front end design, general
software engineering.
Alex develops on OSX while I develop on Ubuntu. We use SVN for version
control. I use VIM for editing and Alex uses TextMate.
On the logic layer, it's very test driven - we test extensively. On
the application layer, it's all about quick iterations and testing.
We cache both in memcache with no TTL, and we just manually expire.
None.
We use Pingdom for external website monitoring - they're really good.
Right now we're just relying on our external monitoring and
Softlayer's ping monitoring. We're investigating FiveRuns for
monitoring as a possible solution to server monitoring.
We don't.
We deploy to staging and run some sanity tests, then we do a deploy to
all application servers.
We trace back every SQL query in development to make sure we're not
doing any unnecessary calls or model instantiations. Other than that,
we haven't done any real benchmarking.
Carefully.
User feedback and critical thinking. We are big believers in
simplicity so we are pretty careful to consider before we add any
major features.
We use a home grown metrics tracking system for virality optimization,
and we also use Google Analytics.
Yes, from the time to time we will tweak aspects of our design to
optimize for virality.
Don't know to all of the above.
We use LVM to do incrementals on a weekly and daily basis.
Right now they are done manually, except for new Rails application
deployments. We use capistrano to update and restart our application
servers.
We usually migrate on a slave first and then just switch masters.
Not very good.
Oh we wish.
Nope
CPM - more page views more money. We also have incentivized direct
offers through our virtual currency.
Word of mouth - the social graph. We just leverage viral design tactics to grow.
I think Ruby is pretty particularly cool. But no, not really - we're
not doing rocket science, we're just trying to get people laid.
No, that wouldn't be very smart.
Hm. I'd say none if you haven't scaled up anything before, and a lot
if you have. It's hard to know what's actually going to be the problem
until you've actually been through and see what real load problems
look like. Once you've done that, then you have enough domain
knowledge to do some actual meaningful up front design on our next go
around.
How unreliable vendor hardware can be, and how different support can
be from host to host. The number one most important thing you will
need is a scaled up dedicated host who can support your needs. We use
Softlayer and we can't recommend them highly enough.
On the other hand, it's surprising how far just a master-multislave
setup can take you on commodity hardware. You can easily do a Billion
page views per month on this setup.
It doesn't really, we just fix bottle necks as they come and we see them coming.
Brad Fitzpatrick for inventing memcache, and anyone who has
successfully horizontally scaled anything.
We will have to start sharding by users soon as we hit database size
and write limits.
I'd really like to thank Siqi taking the time to answer all my questions and provide this fascinating look in to their system. It's amazing what you've done in so little time. Excellent job and thanks again.
Comments
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
I'd love to know how they funded this project. Since they're not charging for anything, how can they afford "4 DB servers, 6 application servers, 1 staging server, and 1 front end server"? How can they afford "3 terabytes of data" a month?
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
It pretty much funds it self. 10MM pages a day, times some reasonable CPM. You can kind of do the math.
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
Very interesting. Thanks for the interview and thanks for the insightful answers!
Rails doesn't scale
Um you guys probably weren't at the meeting when it was decided that "Rails doesn't scale". I'll forgive you this once, but don't let me catch you scaling Rails again.
Re: Friends for Sale Architecture
How many lines of code? Test/code ratio? Coverage stats? What plugins?
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
My roommate is addicted to friends for sale.. he's always on it, scouting out hot girls with cheap pictures because "he'll make a lot of money off them" I think its a lot of work for $1000 of virtual cash, but hey, I wasn't that in to Hot or Not either. great job for just being two people.
Re: Friends for Sale Architecture
Thanks for the fascinating interview! Good luck scaling even further!
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
"I'd love to know how they funded this project. Since they're not charging for anything, how can they afford "4 DB servers, 6 application servers, 1 staging server, and 1 front end server"? How can they afford "3 terabytes of data" a month?"
Their setup costs less than a Honda Civic..
-- Greg
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
may i point out the notice on homepage? "Are we down again? Call our ghetto monitoring system at 714-423-2748!"
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
Nice info and comments!
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
yes, thanks for the insights, guys
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
96 mongrels means they just handle 96 simultaneous users. Using 6 quad-core 8Gb machines this number is simply ridiculous!
Any share-nothing system scales. 1 user-per-box scales, but c'mon! do the math!
Indeed "Rails doesn't scale"!
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
Lucindo:
You are a little confused about how Mongrels work in Rails. 96 mongrels means that we can handle, on average:
96 * (1 second)/(avg request time in seconds) requests, which is about 400 requests per second, which is probably a couple thousand 'simultaneous users'.
People who say Rails doesn't scale - can't scale.
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
Siqi Chen:
I think you are a little confused with the difference between simultaneous users and requests per second. Rails isn't thread-safe, this means each process (mongrel, fastcgi, or whatever you use to serve a rails request) will handle one user at time.
If you take 240ms to complete a request, with 96 process you can handle 400 requests per second on average. But you really can only have 96 simultaneous requests.
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
Indeed, memcached scales wonderfully. It's a shame that rails is verty limited due to its nature, otherwise we would have another good tool for high load applications. To use such huge servers for that low amount of connections is really a shame (4GB for memcached + 4GB for system/rails for 16 simultaneous sessions).
Keep up the good work and let us know when you manage to go beyond the 16 users barriers. I'm very interested to check what rails are doing to get better.
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
Lucindo:
No, I'm quite clear on the difference. How are you defining 'simultaneous users'? That's a very different metric than the 'simultaneous requests' term you just threw out.
You acknowledge that we can handle 400 requests per second. In practice, an actual user sends out a request maybe once every 10-30 seconds, so 400 requests a second translates to thousands of simultaneous, actual users.
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
Siqi Chen:
Simply as this: if 99 users send a request to the system at the very same time, one will have to wait. If you prefer to call this "simultaneous requests", ok. A system with these hardware handling at most 98 simultaneous requests is simply ridiculous.
But I'm pretty sure the problem is Rails itself, not the application.
OMFG !
Lucindo, first of all your claims about thread safety are way too ridiculous. Obviously, you have never heard of Buzzwords like Copy-on-write, Event Based Web server, Multi-core ( write them down, you might be able to use them in your next trollfest article ;-) ).
If you want to waste 50 years to earn 10 M$ and buy an F1 car to commute to work, may the god bless you.
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
oy, rails troll ahoy. pratik needs to learn how to read.
Re: OMFG !
Pratik:
I've never used Rails or Mongrel, but from what I've read above it seems you're advocating an "Event Based Web server" that uses a thread-per-request strategy instead of a multiplexed I/O strategy one, which sounds pretty inconsistent. This means the OS is handling the events, NOT the blocking I/O server in question. In case you're backed by a good threads implementation (NPTL, whatever) then your cost will be the additional stack-per-client. Obviously, you have never heard of Buzzwords like ACE, Grizzly, Erlang, Stackless and even Unify ;-)
Re: OMFG !
Ricardo,
Clearly you're not qualified enough to speak about Rails as you have never used it :) Now, if you have a tiny bit of idea about Ruby, you'd know that Ruby uses green threads and not OS level threads.
But hey, sometimes knowing the buzzword is just not enough. For example, you mention Erlang without knowing a jackshit about light weight concurrency! Please google it.
Re: OMFG !
Pratik,
Please don't say you think Erlang is based on green threads in the same sense as Ruby. Threads are allowed to share state, Erlang's "green processes" are not (but, hey, copy-on-write, as you mentioned, is a useful buzzword). Google for Joe Armstrong's thesis.
Why is it that average railers feel omniscient ? I've got to try it someday ... and as a bonus become "qualified", because I'm just one more sinner that used Perl's Maypole before DHH took inspiration from it and thus am not surprised by RoR. Btw, your strong belief that "I know jackshit" just confirms what I've said.
PS: In case you didn't realize, I'm just mirroring your attitude, you probably know who Joe Armstrong and Doug Schmidt are. Let's all be friends now.
Re: OMFG
"In case you're backed by a good threads implementation (NPTL, whatever) then your cost will be the additional stack-per-client. Obviously, you have never heard of Buzzwords like .....ERLANG....... ;-)"
NPTL and Erlang's processes
More than this far from each other ;-)
It's not about feeling omniscient, it's about being so sick and tired of non-rails people talking about scaling without never having used it themselves. Lame, innit ?
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
Patrik:
Don't feel personally offended by what I said. I'm just making a point about rails scalability, but first lets define some concepts[1]:
- scalability: is a desirable property of a system, a network, or a process, which indicates its ability to either handle growing amounts of work in a graceful manner, or to be readily enlarged.
- scale up: To scale vertically means to add resources to a single node in a system.
- scale out: To scale horizontally means to add more nodes to a system.
On a real live system you have to worry about the two means of scalability: scale out and scale up. And why is that?
Take a system that handle one request per node each time and scales out easily. When the load on your system grows the only thing you can do is scale out, adding more nodes. Soon you'll see that is impracticable, you need to be able to scale up also.
The problem with rails is: it is a non thread-safe behemoth. I may be wrong, but what you think the system described on this post runs only 16 mongrels per node? And 16 mongrels means 16 simultaneous requests. This is the only point I'm trying to make.
And I'm not considering development time and all other Rails claims, or if it is good or any other Rails merit. As any tool, Rails isn't suitable for all jobs.
If Rails was thread-safe you could run just one mongrel and use the ruby green threads (maybe hundreds of it), and even the event driven mongrel.
REFS:
[1] http://en.wikipedia.org/wiki/Scalability
Re: Friends for Sale Architecture
Lucindo,
Yeah, 16 mongrels == 16 concurrent requests. But *practically* speaking, overall throughput is of much more importance than that.
Also, ruby uses green threads which suffer from IO blocking. Due to that, thread never give true concurrency and you need to run multiple processes in order to exhaust available cpu/memory.
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
Love the Application, my friends and I are stating careers in Investment Banking and Trading, so find it a great way relax and unwind. Our trades are mostly within a select circle of mutual friends. A suggestion (if not already in progress) would be a friends online now on friends for sale feature.
Keep up the good work
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
Pratik: AFAIK, Ruby's green threads do not suffer from IO blocking. There is a layer in the Ruby runtime that translates Ruby-side blocking IO ops into non-blocking system IO and until the IO is completed, only the Ruby code is blocked, the VM is free to run another Ruby thread.
Friends for Sale
How to easy earn money at Friends for sales??? Please teach me... i need other way to earn more money like get the first bonus $10,000.00 to claim... Thank You!!!
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
400 requests per second is what a single cheap machine should handle not a whole cluster. Friends for Sale is often down, data is inconsistent (you should reconsider your expiration policies). Your architecture still has much way to go to be called scalable, but good job until now and thanks for sharing your experience, it's very valuable.
Ruby On Rails scalability problems have nothing to do with the load of the system, but with system complexity. Even PERL could scale (in performance terms) with a proper architecture.
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
what is your monthly cost for machine, storage and bandwidth ? is there a profit in your business plan ?
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
I think alot of you 'Rails cant scale' people are missing the point.
Ok, rails is pretty heavy, and you need a relativley large amount of horizontal capacity thrown at it to get big thoughput numbers.
*BUT* it can scale. You just throw more horizontal capacity at it.
Servers are cheap in the grand scheme of things - developer hours and maintainace hours are not!
A 300 Million Page View/Month Facebook RoR App
I agree with some comments above.
Also, a note on the RAM usage. They have 16 mongrels/app box with 8GB of RAM [I think 2 being used for memcache?] Anyway, disregarding memcache, that's 500MB RAM/mongrel! Were they cpu bound or memory bound? I used to think that rails apps were RAM bound. Maybe I was wrong.
The fact that they seemed happy with 'how well it scaled' and that they survived with only a small team on a reasonably popular site and didn't complain harshly does say something. Guess if you can afford the RAM then maybe rails isn't so bad after all :)
-R
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
Does anyone performed the calculations to estimate which price such a service is worth?
Re: Friends for Sale Architecture - Month Facebook RoR App
Thanks for the fascinating interview! Good luck scaling even further
Re: A 300 Million Page View/Month Facebook RoR App
Facebook enables rapid social distribution of new applications through the social graph.
good.
Re: A 300 Million Page View/Month Facebook RoR App
But I'm pretty sure the problem is Rails itself, not the application.
Re: A 300 Million Page View/Month Facebook RoR App
Brad Fitzpatrick for inventing memcache, and anyone who has
successfully horizontally scaled anything.
Re: A 300 Million Page View/Month Facebook RoR App
Servers are cheap in the grand scheme of things - developer hours and maintainace hours are not!
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
So wait, they're bragging that it took them 12 beefy servers to serve 300 million page views a month?
Thanks
nice work, thanks
Are these numbers right??
OK, I'm not a Ruby/RoR expert but I'm always evaluating new languages and frameworks. (new for me, anyway).
So, the way I understand it, each page request is passed to ONE mongrel instance. So if I had a server with 16 mongrel instances that would mean I can only serve 16 requests at the EXACT SAME TIME. Granted, each request might only be 200ms.
OK, now to crunch some numbers.
6 APP servers (not counting the full 12) running 16 mongrels each.
300M page requests per month = 10M page requests per day.
10M page requests per day / 6 SERVERS = 1,666,667 requests per server per day.
1,666,667 requests / 16 mongrels = 104,167 requests per mongrel per day.
104,167 / 24 hours = 4,340 requests per hour per mongrel per day.
4,340 / 60 minutes = 72 requests per minute per mongrel per day.
72 / 60 seconds = 1.2 requests per second per mongrel per day.
So, each mongrel is handling 1.2 requests per second. So each server is handling 19.2 requests per second? Are these numbers real?
If a request took 200ms then that means effectively each server is handling 19.2*5 = 96 requests per second?
That is a lot. If my math serves me, then technically, couldn't ONE server running 16 mongrels actually handle that 300M page views? Assuming 200ms per page?
I'm not flamebaiting...just curious.
cbmeeks
http://bitcircle.com
Find what you want, have fun doing it.
Re:
Facebook enables rapid social distribution of new applications through the social graph.
good.
Re: Friends for Sale Architecture - A 300 Million Page View/Mont
Its not that tough. All they needed was money for it and they collected that easily by showing advertisments on their webpage from which they got funded!
Re: Friends for Sale Architecture
Great Interview!
Re: Friends for Sale Architecture - A 300 Million Page View
I wonder how many visits facebook poker have.. :)
hmm
Servers are cheap in the grand scheme of things - developer hours and maintainace hours are not!
good job!
Great Interview!
nice
Thanks for this information. nice article.
Kalite Belgesi
Thank you
FULL OF INFORMATION
FH Hi,everyone.It's been a long time that such an informative site i came across.This website should be made familiar to all the computer users and especially to students.
=============================================================================
peterheins
widecircles
Post new comment