Twitter as a scalability case study

A lot has been said already about Twitter's scalability issues. Many have given Twitter as an anti-pattern of how not to deal with scalability and have suggested different solutions for scaling it. As Twitter is famously a Ruby-on-Rails deployment, this case has also been used as a weapon in the language/platform wars between the RoR and Java camps, and to a lesser degree, also with the LAMP (PHP) camp

Click to read more ...


DB2 Express-C

Searching around the HS website I noticed that there are no articles regarding db2, which has an express edition, free of charge and from what I know there aren't any restrictions. Being a powerful database system I thought it could make be an alternative to MySQL, PostgreSQL databases. Here is the IBM statement: "DB2 Express Edition for Community (DB2 Express-C) is a no charge data server for use in development and deployment. DB2 Express-C supports a full range of APIs, drivers, and interfaces for application development including PHP, C/C++, and .NET. In addition, DB2 Express-C V9 contains advanced XML features. DB2 Express-C provides ISVs an ideal starting database server for Web, enterprise, and eBusiness applications. This IBM Redbook provides fundamentals of DB2 application development with DB2 Express-C. It covers the DB2 Express-C installation and configuration for application development and skills and techniques for building DB2 applications with XML, PHP, C/C++, Java, and .NET. Code examples are used to demonstrate how to develop a DB2 application in a different language. By following the examples provided, you will be able to learn DB2 application development with XML, PHP, C/C++, Java, and .NET in a short time." Download the redbook about db2 express-c.

Click to read more ...


WebSphere Commerce High Availability and Performance Configurations

Nobody came up with an example of a website powered by a Websphere product (which has a community edition) and backed up by a DB2 database. I guess you all know about so here's the story: While the re-emergence of 35-year-old Andre Agassi and the continued dominance of wunderkind Maria Sharapova have highlighted the on-court headlines at this year's U.S. Open Tennis Championships in Flushing Meadows, N.Y., IBM is hoping its new Power5 chip-based IT support for can make news among those more interested in .NET than tennis nets. Big Blue has partnered with the U.S. Tennis Association and the U.S. Open -- the most prestigious tennis tournament in the U.S. -- since 1992. Together, they launched in 1995 so racket heads could follow the matches online. The iSeries' role this year is in powering a Web-based end-user application called "Point Tracker," a graphics tool using autonomic technology that recreates the trajectory of every shot. On-court cameras capture and record ball position data for every forehand, ace and volley. Once that data is integrated with the scoring data, the shot data is pushed to the Web site to enable visitors to follow the action online. IBM is running the Web site on an eServer pSeries system, a Power5-based server. Two pSeries systems, models p550 -- released two weeks ago -- and p570, replaced Web and application servers to help automate the infrastructure that supports the Web site. The 2005 U.S. Open Web site traffic will be managed by Big Blue from a "virtualized" server environment at one of the three hosting locations. According to IBM, the pSeries systems allow IBM to consolidate several servers onto two larger boxes. The pSeries p5 systems handle workloads from Web serving to fan polling, feedback and player search applications, which are managed from each pSeries p5 server as a virtualized environment using Power-based virtualization technologies such as Micro-Partitioning, Virtual I/O Server and Partition Load Manager, which consolidate AIX 5L and multiple Linux operating environments onto a single system. Approximately 2.8 million fans visited during the two-week tournament in 2004. More information on this technologies can be found here: Quote from IBM redbooks: Building a high performance and high availability commerce site is not a trivial task -- from having right capacity hardware to handle the workload to properly testing the code change before deploying in production site. This redbook covers several major areas that need to be considered when using WebSphere Commerce Server and provide solution on how to address them. Here are some of the topics: 1. How to build a Commerce site to deal with various kind of unplanned outage? Topic including utilizing WebSphere Application Server Network Deployment 6.0 and IBM DB2 High Availability disaster Recovery (HADR) in Commerce environment. 2. How to build a Commerce site to deal with planned outages such as software fix and operation update? Topic including uses of WebSphere Application Server's Rolling update feature and uses of Commerce's Staging Server and Content Management. 3. How to proactively monitoring the commerce site prevent potential problem happening? Various Tools should be discussed including various WebSphere Application Server build-in tools and Tivoli's Composite Application Management. 4. How to utilized dynacache to future enhance your Commerce Site's performance? Topics includes additional Commerce command caching introduce in Commerce Fix pack and e-spot caching. 5. What's the methodology of doing performance and scalability testing on Commerce site? Tools that may be covered included Tivoli Performance Tester 6. Techniques on migrate a high volume Commerce site to newer Commerce release. " end Quote Maybe some of us can find this useful, Websphere Community Edition is a free Java™ EE 5 server for building and managing Java™ applications. Download this Redbook

Click to read more ...


New Facebook Chat Feature Scales to 70 Million Users Using Erlang

UpdateErlang at Facebook by Eugene Letuchy. How Facebook uses Erlang to implement Chat, AIM Presence, and Chat Jabber support. 

I've done some XMPP development so when I read Facebook was making a Jabber chat client I was really curious how they would make it work. While core XMPP is straightforward, a number of protocol extensions like discovery, forms, chat states, pubsub, multi user chat, and privacy lists really up the implementation complexity. Some real engineering challenges were involved to make this puppy scale and perform. It's not clear what extensions they've implemented, but a blog entry by Facebook's Eugene Letuchy hits some of the architectural challenges they faced and how they overcame them.

A web based Jabber client poses a few problems because XMPP, like most IM protocols, is an asynchronous event driven system that pretty much assumes you have a full time open connection. After logging in the server sends a client roster information and presence information. Your client has to be present to receive the information. If your client wants to discover the capabilities of another client then a request is sent over the wire and some time later the response comes back. An ID is used to map the reply to the request. All responses are intermingled. IM messages can come in at any time. Subscription requests can come in at any time.

Facebook has the client open a persistent connection to the IM server and uses long polling to send requests and continually get data from the server. Long polling is a mixture of client pull and server push. It works by having the client make a request to the server. The client connection blocks until the server has data to return. When it does data is returned, the client processes it, and then is in position to make another request of the server and get any more data that has queued up in the mean time. Obviously there are all sorts of latency, overhead, and resource issues with this approach. The previous link discusses them in more detail and for performance information take a look at Performance Testing of Data Delivery Techniques for AJAX Applications by Engin Bozdag, Ali Mesbah and Arie van Deursen.

From a client perspective I think this approach is workable, but obviously not ideal. Your client's IMs, presence changes, subscription requests, and chat states etc are all blocked on the polling loop, which wouldn't have a predictable latency. Predictable latency can be as important as raw performance.

The real scaling challenge is on the server side. With 70 million people how do you keep all those persistent connections open? Well, when you read another $100 million was invested in Facebook for hardware you know why. That's one hella lot of connections. And consider all the data those IM servers must store up in between polling intervals. Looking at the memory consumption for their servers would be like watching someone breath. Breath in- streams of data come in and must be stored waiting for the polling loop. Breath out- the polling loops hit and all the data is written to the client and released from the server. A ceaseless cycle. In a stream based system data comes in and is pushed immediately out the connection. Only socket queue is used and that's usually quite sufficient. Now add network bandwidth for all the XMPP and TCP protocol overhead and CPU to process it all and you are talking some serious scalability issues.

So, how do you handle all those concurrent connections? They chose Erlang. When you first hear Erlang and Jabber you think ejabberd, an open source Erlang based XMPP server. But since the blog doesn't mention ejabberd it seems they haven't used it .

Why Erlang? First, the famous Yaws vs Apache shootout where "Apache dies at about 4,000 parallel sessions. Yaws is still functioning at over 80,000 parallel connections." Erlang is naturally good at solving high concurrency problems. Yet following the rule that no benchmark can go unchallenged, Erik Onnen calls this the Worst Measurement Ever and has some good reasoning behind it.

In any case, Erlang does nicely match the problem space. Erlang's approach to a concurrency problem is to throw a very light weight Erlang process at each state machine you want to be concurrent. Code-wise that's more natural than thread pools, async IO, or thread per connection systems. Until Linux 2.6 it wasn't even possible to schedule large numbers of threads on a single machine. And you are still devoting a lot of unnecessary stack space to each thread. Erlang will make excellent use of machine resources to handle all those connections. Something anyone with a VPS knows is hard to do with Apache. Apache sucks up memory with joyous VPS killing abandon.

The blog says C++ is used to log IM messages. Erlang is famously excellent for its concurrency prowess and equally famous for being poor at IO, so I imagine C++ was needed for efficiency.

One of the downsides of multi-language development is reusing code across languages. Facebook created Thrift to tie together the Babeling Tower of all their different implementation languages. Thrift is a software framework for scalable cross-language services development. It combines a powerful software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, and Ruby. Another approach might be to cross language barriers using REST based services.

A problem Facebook probably doesn't have to worry about scaling is the XMPP roster (contact list). Handling that many user accounts would challenge most XMPP server vendors, but Facebook has that part already solved. They could concentrate on scaling the protocol across a bunch of shiny new servers without getting bogged down in database issues. Wouldn't that be nice :-) They can just load balance users across servers and scalability is solved horizontally, simply by adding more servers. Nice work.


Scaling an image upload service

Hi, First of all I want to to say that this is an extremely interesting and informative website. i have enjoyed reading the various posts on how the big sites scale to meet the needs of their customers. The service we are developing is a webcam service. The client application sends images to the server via HTTP POST and they are saved in folder specified by the users id. When a new image is sent to the server it will overwrite the current image. Users can then view the images via our web server. Ideally we want the images to upload as quickly as possible and allow users to view them as quickly as possible. Would I be correct to assume that when the number of uploading clients exceeds the capability of the server the only way to scale is to add more hardware. Also I assume that to use HTTP accelerator caches will not speed up viewing the images as the new images will invalidate the cache. I appreciate any input on the subject.

Click to read more ...


Hitting 300 SimbleDB Requests Per Second on a Small EC2 Instance

High Performance Multithreaded Access to Amazon SimpleDB is a great follow up to the idea in How SimpleDB Differs from a RDBMS that more programming is the price paid for performance in SimpleDB. It shows how much work and infrastructure is required to batter better performance out of SimpleDB. Remember, in SimpleDB you get keys to records from queries so if you want to get all the fields for records you need to make separate requests. Since SimpleDB isn't exactly a speed daemon the obvious strategy is to parallelize. Even if a job takes a 100 msecs you can get a lot done in a little time if you can execute enough jobs in parallel. Parallelization is the approach taken by Haakon@AWS in his Java code example of how to get the most out of SimpleDB. You can find the code at Indexing and Querying Amazon S3 Metadata with Amazon SimpleDB. We'll also consider how a back-end service architecture built on Erlang may be a better fit with cloud computing. Two general mechanisms of parallelism are available: threads and boxes. To get the most bang out of a single machine you need threads (events, etc). To scale beyond the load handled by a single machine you need multiple boxes. The example code uses the Executor Thread Pool for parallelism within a program. Thread pools are a pretty common idiom by now. Amazon's queue service SQS was used to distribute work amongst boxes. Work was queued to SQS in batches of 1000 work items. The items were pulled by the thread pool and processed. Why 1000? The idea is to balance processing overhead with work overhead. You don't want popping items off SQS to dominate your processing time so you have to do enough work in each pass to make it worth the investment. The architecture uses two thread pools: one to run queries and one to get record values. Applications must carefully tune the number of threads in each pool so the queries to overwhelm the gets. Using a query thread pool with 2 threads and a get thread pool with 32 threads it was possible to perform 300 TPS on a small EC2 instances. Theoretically the advantage of this architecture is that it will scale to any size you need. SQS is your work distribution backbone and you just spin up the number of thread pool instances you need. The disadvantage is that this is a lot of programmer effort. But let's consider that you had to do some serious processing on each record, you would need something like this approach anyway to scale out the processing. But to perform simple aggregation operations it's total overkill which is why more time needs to be spent on the write site of the equation in SimpleDB/BigTable than the read side as we are used to with a RDBMS. What's the best way to go parallel? On the front-end life is simple. Go shared nothing and compose your pages from scalable back-end services. This is how Amazon does it and it's how Google AppEngine does it. GAE completely punts on the back-end service layer architecture. Unfortunately we still need to create a back-end architecture for more complex applications. Thread pools and SQS is one parallelization approach. Instead of thread pools something like Java's fork/join framework could be used. Initially I thought piling on more low level primitive threading facilities into Java was the wrong way to go. Yes, it is a "'multicore-friendly lightweight parallel framework' that supports a style of parallel programming where problems are recursively split into smaller fragments, solved in parallel and recombined," but it's also a style of programming that is very difficult to program correctly. If cloud architectures will rely on these primitives for efficiency then I think we have regressed. Erlang style architectures described by Luke Hoersten in Scalable Web Apps: Erlang + Python is a simpler more reliable to programming model. An event driven actor based approach is much harder to screw up than closely cooperating threads in a shared memory space. Erlang originally ran in embedded systems where the requirement was to reliably squeeze the most work possible out of limited CPU and other compute resources. Oddly enough the embedded node of old closely parallels your basic cloud VM. Start your work horse Erlang (or other similar system) instances and let them efficiently chew up your work loads. Erlang's scheduling model fits perfectly with a service centric job engine cloud instance. It will get more work done then your typical thread based system ever would.

Click to read more ...


HSCALE - Handling 200 Million Transactions Per Month Using Transparent Partitioning With MySQL Proxy

Update 2: A HSCALE benchmark finds HSCALE "adds a maximum overhead of about 0.24 ms per query (against a partitioned table)." Future releases promise much improved results. Update: A new presentation at An Introduction to HSCALE. After writing Skype Plans for PostgreSQL to Scale to 1 Billion Users, which shows how Skype smartly uses a proxy architecture for scaling, I'm now seeing MySQL Proxy articles all over the place. It's like those "get rich quick" books that say all you have to do is visualize a giraffe with a big yellow dot superimposed over it and by sympathetic magic giraffes will suddenly stampede into your life. Without realizing it I must have visualized transparent proxies smothered in yellow dots. One of the brightest images is a wonderful series of articles by Peter Romianowski describing the evolution of their proxy architecture. Their application is an OLTP system executing 200 million transaction per month, tables with more than 1.5 billion rows, and a 600 GB total database size. They ran into a wall buying bigger boxes and wanted to move to a sharded architecture. The question for them was: how do you implement sharding? In the first article four approaches to sharding were identified:

  • Using MySQL Cluster
  • Using MySQL Proxy with transparent query rewriting and load balancing
  • Implement it into a JDBC driver
  • Implement it into the application data access layer. The proxy solution was selected because it's transparent to the application layer. Applications need not know about the partitioning scheme to make it work. Not mucking with apps is a big win. The downside is implementation complexity. How do you parse a query and and map it correctly to the right server? Will this cause a big performance degradation? How is this new more complex and dynamic system to be tested? Can we run the same queries they did before or will they have to rewrite parts of their application? A lot of questions to be worked out. The second article starts working out those problems using MySQL Proxy. The process was broken into a few steps:
  • Analyze the query to find out which tables are involved and what the parition key would be.
  • Validate the query and reject queries that cannot be analyzed.
  • Determine the partition table / database. This could be done by a simple lookup, a hashing function or anything else.
  • Rewrite the query and replace the table names with the partition table names.
  • Execute the query on the correct database server and return the result back to the client. Some of the comments were concerned that a modulus scheme was being used to identify a partition. The recommendation was to use a directory service for mapping to partitions instead. A directory service allows you to logically map partitions behind the scenes and doesn't tie you to a deterministic physical mapping. After getting all this working they generously released it to the world as HSCALE - Transparent MySQL Partitioning: HSCALE is a plugin written for MySQL Proxy which allows you to transparently split up tables into multiple tables called partitions. In later versions you will be able to put each partition on a different MySQL server. Application based partitioning means that your split up your data logically and rewrite your application to select the right piece of data (i.e. partition) at any given time. More on application based partitioning. Read here some more about what could be done with HSCALE. HSCALE helps in application based partitioning. Using the MySQL Proxy it sits between your application and the database server. Whenever a sql statement is sent to the server HSCALE analyzes it to find out whether a partitioned table is used. It then tries to find out which partition the sql statement should go to. Access release .1 at HSCALE 0.1 released - Partitioning Using MySQL Proxy. The transparent proxy ability is very powerful, but what we are lacking that various companies have created internally is a partition management layer. How do you move partitions? How do you split partitions when a table outgrows the shard or performance declines? Lots of cool tools still to build.

    Related Articles

  • HSCALE - Transparent MySQL Partitioning
  • Pero: HSCALE 0.1 released - Partitioning Using MySQL Proxy
  • Pero: MySQL Partitioning on Application Side
  • Pero: Progress on MySQL Proxy Partitioning
  • HighScalability: Flickr Architecture - more information on partitioning.
  • Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web
  • HighScalabilty: An Unorthodox Approach to Database Design : The Coming of the Shard.

    Click to read more ...

  • Monday

    Put the web server on a diet and increase scalability

    Misusing HTTP sessions is probably the number one obstacle to building scalable web sites today. Here are some tips how to consume HTTP sessions responsibly.

    Click to read more ...


    Product: nginx

    Update 6: nginx_http_push_module. Turn nginx into a long-polling message queuing HTTP push server.

    Update 5: In Load Balancer Update Barry describes how moved from Pound to Nginx and are now "regularly serving about 8-9k requests/second and about 1.2Gbit/sec through a few Nginx instances and have plenty of room to grow!".
    Update 4: Nginx better than Pound for load balancing. Pound spikes at 80% CPU, Nginx uses 3% and is easier to understand and better documented.
    Update 3: combines two cool tools together for better performance in Nginx and Memcached, a 400% boost!.
    Update 2: Software Project on Installing Nginx Web Server w/ PHP and SSL. Breaking away from mother Apache can be a scary proposition and this kind of getting started article really helps easy the separation.
    Update: Slicehost has some nice tutorials on setting up Nginx.

    From their website:
    Nginx ("engine x") is a high-performance HTTP server and reverse proxy, as well as an IMAP/POP3/SMTP proxy server. Nginx was written by Igor Sysoev for, Russia's second-most visited website, where it has been running in production for over two and a half years. Igor has released the source code under a BSD-like license. Although still in beta, Nginx is known for its stability, rich feature set, simple configuration, and low resource consumption.

    Bob Ippolito says of Nginx:

    The only solution I know of that's extremely high performance that offers all of the features that you want is Nginx... I currently have Nginx doing reverse proxy of over tens of millions of HTTP requests per day (thats a few hundred per second) on a single server. At peak load it uses about 15MB RAM and 10% CPU on my particular configuration (FreeBSD 6).

    Under the same kind of load, Apache falls over (after using 1000 or so processes and god knows how much RAM), Pound falls over (too many threads, and using 400MB+ of RAM for all the thread stacks), and Lighty leaks more than 20MB per hour (and uses more CPU, but not significantly more).

    See Also


  • Nginx vs. Lighttpd for a small VPS
  • nginx: high performance smpt/pop/imap proxy
  • Light Weight Web Server
  • Nginx and Mirror on Demand
  • Running Drupal with Clean URL on Nginx or Lighttpd
  • Goodbye Pound, Hello Nginx
  • Using Nginx, SSI and Memcache to Make Your Web Applications Faster
  • Ruby on Rails hosting with Nginx
  • NGINX Tutorial: Developing Modules
  • Friday

    Friends for Sale Architecture - A 300 Million Page View/Month Facebook RoR App

    Update: Jake in Does Django really scale better than Rails? thinks apps like FFS shouldn't need so much hardware to scale.

    In a short three months Friends for Sale (think Hot-or-Not with a market economy) grew to become a top 10 Facebook application handling 200 gorgeous requests per second and a stunning 300 million page views a month. They did all this using Ruby on Rails, two part time developers, a cluster of a dozen machines, and a fairly standard architecture. How did Friends for Sale scale to sell all those beautiful people? And how much do you think your friends are worth on the open market? 


    Information Sources

  • Siqi Chen and Alexander Le, co-creators of Friends for Sale, answering my standard questionairre.
  • Virality on Facebook

    The Platform

  • Ruby on Rails
  • CentOS 5 (64 bit)
  • Capistrano - update and restart application servers.
  • Memcached
  • MySQL
  • Nginx
  • Starling - distributed queue server
  • Softlayer - hosting service
  • Pingdom - for website monitoring
  • LVM - logical volume manager
  • Dr. Nics Magic Multi-Connections Gem - split database reads and writes to servers

    The Stats

  • 10th most popular application on Facebook.
  • Nearly 600,000 active users.
  • Half a million unique visitors a day and growing fast.
  • 300 million page views a month.
  • 300% monthly growth rate, but that is plateauing.
  • 2.1 million unique visitors in the past month
  • 200 requests per second.
  • 5TB of bandwidth per month.
  • 2 part time (now full time), and 1 remote DBA contractor.

  • 4 DB servers, 6 application servers, 1 staging server, and 1 front end server.
    - 6, 4 core 8 GB application servers.
    - Each application server runs 16 mongrels for a total of 96 mongrels. -
    - 4 GB memcache instance on each application server
    - 2 32GB 4 core servers with 4x 15K SCSI RAID 10 disks in a master-slave setup

    Getting to Know You

  • What is your system is for?

    Our system is designed for our Facebook application, Friends for Sale.
    It's basically Hot-or-Not with a market economy. At the time of this
    writing it's the 10th most popular application on Facebook.

    Their Facebook description reads: Buy and sell your friends as pets! You can make your pets poke, send gifts, or just show off for you.
    Make money as a shrewd pets investor or as a hot commodity! Friends for Sale is the bees knees!

  • Why did you decide to build this system?

    We designed this as more of an experiment to see if we understood virality concepts and metrics on Facebook. I guess we do. =)

  • What particular design/architecture/implementation challenges do your system have?

    As a Facebook application, every request is dynamic so no page caching is possible. Also, it is a very interactive, write heavy application so scaling the database was a challenge.

  • What did you do to meet these challenges?

    We memcached extensively early on - every page reload results in 0 SQL calls. We use Rail's fragment caching with custom expiration logic mostly.

  • How big is your system?

    We had more than half a million unique visitors yesterday and growing fast. We're on track to do more than 300 million page views this month.

  • What is your in/out bandwidth usage?

    We used around 3 terabytes of bandwidth last month. This month should be at least 5TB or so. This number is just for a few icons and XHTML/CSS.

  • How many documents, do you server? How many images? How much data?

    We don't really have unique documents ... we do have around 10 million user profiles though.

    The only images we store are a few static image icons.

  • How fast are you growing?

    We went from around 3M page views per day a month ago to more than 10M page views a day. A month before that we were doing 1M page views per day. So that's around a 300% monthly growth rate but that is plateauing. On a request per second basis, we get around 200 requests per second.

  • What is your ratio of free to paying users?

    It's all free.

  • What is your user churn?

    It's around 1% per day, with a growth rate of 3% or so per day in terms of installed users.

  • How many accounts have been active in the past month?

    We had roughtly 2.1 million unique visitors in the past month according to Google.

  • What is the architecture of your system?

    It's a relatively standard Rails cluster. We have a dedicated front end proxy balancer / static web server running nginx, which proxies directly to 6, 4 core 8 GB application servers. Each application server runs 16 mongrels for a total of 96 mongrels. The front end load balancer proxies directly to the mongrel ports. In addition, we run a 4 GB memcache instance on each application server, along with a local starling distributed queue server and misc background processes.

    We use god to monitor our processes.

    On the DB layer, we have 2 32GB 4 core servers with 4x 15K SCSI RAID 10 disks in a master-slave setup. We use Dr Nic's magic multi-connection's gem in production split reads and writes to each

    We are adding more slaves right now so we can distribute the read load better and have better redundancy and backup policies. We also get help from Percona (the mysqlperformanceblog guys) for remote DBA work.

    We're hosted on Softlayer - they're a fantastic host. The only problem was that their hardware load balancing server doesn't really work very well ... we had lots of problems with hanging connections and latency. Switching a dedicated box running just nginx fixed everything.

  • How is your system architected to scale?

    It really isn't. On the application layer we are shared-nothing so it's pretty trivial. On the database side we're still with a monolithic master and we're trying to push off sharding for as long as we can. We're still vertically scaled on the database side and I think we can get away with it for quite some time.

  • What do you do that is unique and different that people could best learn from?

    The three things that are unique is -

    1. Neither of the two developers in involved had previous experience in large scale Rails deployment.
    2. Our growth trajectory is relatively rare in the history of Rails deployments
    3. We had very little opportunity for static page caching - each request does hit the full Rails stack

  • What lessons have you learned? Why have you succeeded? What do you wish you would have done differently? What wouldn't you change?

    We learned that a good host, good hardware, and a good DBA are very important. We used to be hosted on Railsmachine, which to be fair is an excellent shared hosting company and they did go out of there way to support us. In the end though, we were barely responsive for a good month due to hardware problems, and it only took two hours to get up and running on Softlayer without a hitch. Choose a good host if you plan on scaling, because migrating isn't fun.

    The most important thing we learned is that your scalability problems is pretty much always, always, always the database. Check it first, and if you don't find anything, check again. Then check again. Without exception, every performance problem we had can be traced to the database server, the database configuration, the query, or the use and non-use of indices.

    We definitely should have gotten on to a better host earlier in the game so we would have been up.

    We definitely wouldn't change our choice of framework - Rails was invaluable for rapid application development, and I think we've pretty much proven that two guys without a lot of scaling experience can scale a Rails app up. The whole 'but does Rails scale?' discussion sounds like a bunch of masturbation - the point is moot.

  • How is your team setup?

    We have two Rails developers, inclusive of me. We very recently retained the services of a remote DBA for help on the database end.

  • How many people do you have?

    On the technical side, 2 part time (now full time), and 1 remote DBA contractor.

  • Where are they located?

    The full time employees are also located in the SOMA area of San Francisco.

  • Who performs what roles?

    The two developers server as co-founders . I (Siqi) was responsible for front end design and development early on, but since I had some experience with deployment I also ended up handling network operations and deployment as well. My co founder Alex is responsible for the bulk of the Rails code - basically all the application logic is from him. Now I find myself doing more deep back end network operations tasks like MySQL optimization and replication - it's hard to find time to get back to the front end which is what I love. But it's been a real fun learning experience so I've been eating up all I can from this.

  • Do you have a particular management philosophy?

    Yes - basically find the smartest people you can, give them the best deal possible, and get out of their way. The best managers GET OUT OF THE WAY, so I try to run the company as much as I can with that in mind. I think I usually fail at it.

  • If you have a distributed team how do you make that work?

    We'd have to have some really good communication tools in the cloud - somebody would have to be a Basecamp nazi. I think remote work / outsourcing is really difficult - I prefer to stay away with from it
    for core development. For something like MySQL DBA or even sysadmin - it might make more sense.

    What do you use?

    We use Rails with a bunch of plugins, most notable cache-fu from Chris Wanstrath and magic multi connections from Dr. Nic. I use VIM as the editor with the rails.vim plugin.

  • Which languages do you use to develop your system?

    Ruby / Rails

  • How many servers do you have?

    We now have 12 servers in the cluster.

  • How are they allocated?

    4 DB servers, 6 application servers, 1 staging server, and 1 front end server.

  • How are they provisioned?

    We order them from Softlayer - there's a less than 4 hour turn around for most boxes, which is awesome.

  • What operating systems do you use?

    CentOS 5 (64 bit)

  • Which web server do you use?


  • Which database do you use?

    MySQL 5.1

  • Do you use a reverse proxy?

    We just use nginx's built in proxy balancer.

  • How is your system deployed in data centers?

    We use a dedicated hosting service, Softlayer.

  • What is your storage strategy?

    We use NAS for backups but internal SCSI drives for our production boxes.

  • How much capacity do you have?

    Across all of our boxes we probably have around ... 5 TB of storage or

  • How do you grow capacity?

    Ad-hoc. We haven't done a proper capacity planning study, to our detriment.

  • Do you use a storage service?


  • Do you use storage virtualization?


  • How do you handle session management?

    Right now we just persist it to the database - it would be fairly easy to use memcache directly for this purpose though.

  • How is your database architected? Master/slave? Shard? Other?

    Master/slave right now. We're moving towards a Master/Multi-slave with a read only load balancing proxy to the slave cluster.

  • How do you handle load balancing?

    We do it in software via nginx.

  • Which web framework/AJAX Library do you use?


  • Which real-time messaging frame works do you use?


  • Which distributed job management system do you use?


  • How do you handle ad serving?

    We run network ads. We also weight our various ad networks by eCPM on our application layer.

  • Do you have a standard API to your website?


  • How many people are in your team?

    2 developers.

  • What skill sets does your team possess?

    Me: Front end design, development, limited Rails. Obviously, recently proficient in MySQL optimization and large scale Rails deployment.
    Alex: application logic development, front end design, general software engineering.

  • What is your development environment?

    Alex develops on OSX while I develop on Ubuntu. We use SVN for version control. I use VIM for editing and Alex uses TextMate.

  • What is your development process?

    On the logic layer, it's very test driven - we test extensively. On the application layer, it's all about quick iterations and testing.

  • What is your object and content caching strategy?

    We cache both in memcache with no TTL, and we just manually expire.

  • What is your client side caching strategy?


    How do you manage your system?

  • How do check global availability and simulate end-user performance?

    We use Pingdom for external website monitoring - they're really good.

  • How do you health check your server and networks?

    Right now we're just relying on our external monitoring and Softlayer's ping monitoring. We're investigating FiveRuns for monitoring as a possible solution to server monitoring.

  • How you do graph network and server statistics and trends?

    We don't.

  • How do you test your system?

    We deploy to staging and run some sanity tests, then we do a deploy to all application servers.

  • How you analyze performance?

    We trace back every SQL query in development to make sure we're not doing any unnecessary calls or model instantiations. Other than that, we haven't done any real benchmarking.

  • How do you handle security?


  • How do you decide what features to add/keep?

    User feedback and critical thinking. We are big believers in simplicity so we are pretty careful to consider before we add any major features.

  • How do you implement web analytics?

    We use a home grown metrics tracking system for virality optimization,
    and we also use Google Analytics.

  • Do you do A/B testing?

    Yes, from the time to time we will tweak aspects of our design to optimize for virality.

    How is your data center setup?

  • Which firewall product do you use?
  • Which DNS service do you use?
  • Which routers do you use?
  • Which switches do you use?
  • Which email system do you use?
  • How do you handle spam?
  • How do you handle virus checking of email and uploads?

    Don't know to all of the above.

  • How do you backup and restore your system?

    We use LVM to do incrementals on a weekly and daily basis.

  • How are software and hardware upgrades rolled out?

    Right now they are done manually, except for new Rails application deployments. We use capistrano to update and restart our application servers.

  • How do you handle major changes in database schemas on upgrades?

    We usually migrate on a slave first and then just switch masters.

  • What is your fault tolerance and business continuity plan?

    Not very good.

  • Do you have a separate operations team managing your website?

    Oh we wish.

  • Do you use a content delivery network? If so, which one and what for?


  • What is your revenue model?

    CPM - more page views more money. We also have incentivized direct offers through our virtual currency.

  • How do you market your product?

    Word of mouth - the social graph. We just leverage viral design tactics to grow.

  • Do you use any particularly cool technologies are algorithms?

    I think Ruby is pretty particularly cool. But no, not really - we're not doing rocket science, we're just trying to get people laid.

  • Do your store images in your database?

    No, that wouldn't be very smart.

  • How much up front design should you do?

    Hm. I'd say none if you haven't scaled up anything before, and a lot if you have. It's hard to know what's actually going to be the problem until you've actually been through and see what real load problems look like. Once you've done that, then you have enough domain knowledge to do some actual meaningful up front design on our next go around.

  • Has anything surprised your either for the good or bad?

    How unreliable vendor hardware can be, and how different support can be from host to host. The number one most important thing you will need is a scaled up dedicated host who can support your needs. We use Softlayer and we can't recommend them highly enough.

    On the other hand, it's surprising how far just a master-multislave setup can take you on commodity hardware. You can easily do a Billion page views per month on this setup.

  • How does your system evolve to meet new scaling challenges?

    It doesn't really, we just fix bottle necks as they come and we see them coming.

  • Who do you admire?

    Brad Fitzpatrick for inventing memcache, and anyone who has successfully horizontally scaled anything.

  • How are you thinking of changing your architecture in the future?

    We will have to start sharding by users soon as we hit database size and write limits.

    Their Thoughts on Facebook Virality

  • Facebook models the social graph in digital form as accurately and completely as possible.
  • Social graph is more important that features.
  • Facebook enables rapid social distribution of new applications through the social graph.
  • Your application idea should be: social, engaging, and universal.
  • The social aspect makes it viral.
  • Engaging makes it monetizable.
  • Universal gives it potential.
  • Friends for Sale is social because you are buying and selling your social graph.
  • It's engaging because it's a twist on an idea, low pressure, flirty, and a bit cynical.
  • It's universal because everyone is vain, has a price, and wants to flirt with hot people.
  • Every touch point in the application is a potential for recruiting new users.
  • Every user converts 1.4 other users which is the basis for exponential growth.
  • For every new user track the number of invites, notifications, minifeed items, profile clicks, and other channels.
  • For every channel track the percent clicked, converted, uninstalls.

    Lessons Learned

  • Scaling from the start is a requirement on Facebook. They went to 1 million pages/day in 4 weeks.
  • Ruby on Rails can scale.
  • Anything scales on the right architecture. Focus on architecture and operations.
  • You need a good DBA, good host, and good well configured hardware.
  • With caching and the heavy duty servers available today, you can go a long time without adopting more complicated database architectures.
  • The social graph is real. It's truly staggering the number of accessible users on Facebook with the right well implemented viral application.
  • Most performance problems are in the database. Look to the database server, the database configuration, the query, or the use and non-use of indexes.
  • People still use Vi!

    I'd really like to thank Siqi taking the time to answer all my questions and provide this fascinating look in to their system. It's amazing what you've done in so little time. Excellent job and thanks again.