We started with a small site, a mess of open source, and a small team that didn't know much about scaling.
We ended with a large site, a medium sized team, and an architecture that has scaled.
We never stopped. We used a roadmap and a compass, made weekly changes in direction, regularly shipped code on Wednesday to handle the next weekend's capacity constraints, and shipped new features the whole time.
These are excerpts from the IMVU PDF presentation of their architecture which can be viewed or downloaded here.
IMVU is an online destination where adults and teens meet new people in 3D. IMVU won the 2008 Virtual Worlds Innovation Award and was also named a Rising Star in the 2008 Silicon Valley Technology Fast 50 program.
The most important aspect of a scalable web architecture is data partitioning. Most components in a modern data center are completely stateless, meaning they just do batches of work that is handed to them, but don't store any data long-term. This is true of most web application servers, caches like memcached, and all of the network infrastructure that connects them. Data storage is becoming a specialized function, delegated most often to relational databases. This makes sense, because stateless servers are easiest to scale - you just keep adding more. Since they don't store anything, failures are easy to handle too - just take it out of rotation.
Stateful servers require more careful attention. If you are storing all of your data in a relational database, and the load on that database exceeds its capacity, there is no automatic solution that allows you to simply add more hardware and scale up. (One day, there will be, but that's for another post). In the meantime, most websites are building their own scalable clusters using sharding.
Read more on LessonLearned blog.
In this article Jeff Atwood (a rockstar programmer and one of StackOverflow website founders) discusses the measures of how you can reduce you bandwidth usage and refers specifically on high trafficked websites for which bandwidth is more costly than for an average website.
This is his experience and you can read more on his post on CodingHorror.com.
Update: New Gearman Server & Library in C, MySQL UDFs. Gearman is an open source message queuing system that makes it easy to do distributed job processing using multiple languages. With Gearman you: farm out work to other machines, dispatching function calls to machines that are better suited to do work, to do work in parallel, to load balance lots of function calls, to call functions between languages, spread CPU usage around your network. Gearman is used by companies like LiveJournal, Yahoo!, and Digg. Digg, for example, runs 300,000 jobs a day through Gearman without any issues. Most large sites use something similar. Why would anyone ever even need a message queuing system? Message queuing is a handy way to move work off your web servers (like image manipulation), to generate thousands of documents in the background, to run the multiple requests in parallel needed to build a web page, or to perform tasks that can comfortably be run in the background and not part of the main request loop for servicing a web request. There's a gearmand server and clients written in Perl, Ruby, Python or C. Use at least two gearmand server daemons for higher availability. The tasks each client can perform are registered with gearman distributes requests for those functions to the client that can implement them. Gearman uses a very robust, if somewhat higher latency, signal-and-pull architecture.
This presentation illustrates how one can scale EXISTING JEE application and deploy it on Amazon cloud using GigaSpaces as the scale-out application server while: * Not having to re-write your application * Preventing lock-in to specific cloud provider * Enabling seamless portability between your local environment to cloud environment o No code or configuration change is required between the two environments o Develop local - test on the cloud o Built for iterative development
Here's a short list of some great resources that I've found very inspirational and thought provoking. I've broken these resources up into two lists: Blogs and Presentations.
The upshot of the paper is Oracle rules and MySQL sucks for sharding. Which is technically probable, if you don't throw in minor points like cost and ease of use. The points where they think Oracle wins: online schema changes, more robust replication, higher availability, better corruption handling, better use of large RAM and multiple cores, better and better tested partitioning features, better monitoring, and better gas mileage.
I have two servers connected via Internet (NOT IN THE SAME LAN) serving the same website (http://www.ourexample.com).The problem is files uploaded on serverA and serverB cannot see each other immediately,thus rsync with certain intervals is not a good solution. Can anybody give me some advice on the following options? 1.NFS over Internet for file sharing 2.sshfs 3.inotify(our system's kernel does not support this and we donot want to risk upgrading our kernel as well) 4.drbd in active-active mode 5 or any other solutions Any suggestions will be welcomed. Thank you in advance.
Datacenterknowledge.com: In an effort to boost its refocused cloud computing initiative, Sun Microsystems (JAVA) has acquired Q-layer, a Belgian provider that automates the deployment of both public and private clouds. Sun says Q-layer’s technology will help users instantly provision servers, storage, bandwidth and applications. Do you have experience with Q-layers technology like its Virtual Private DataCenter and NephOS?
It seems that HTTP calls have become a default way to think about distributed systems. HTTP and Web services definitely have a lot to offer, but they are not the only way to do things and there are definitely cases where web is not the right choice. Unfortunately, lots of people just stick with web services and hack on, trying to fit a square peg in a round hole. In cases such as these, a different distribution paradigm can save us quite a lot of time and effort both in development and later in maintenance. One of those different paradigms is messaging.