What's cool about starting a new project is you finally have a chance to do it right. You of course eventually mess everything up in your own way, but for that one moment the world has a perfect order, a rightness that feels satisfying and good. Arne Claassen, the CTO of notify.me, a brand new real time notification delivery service, is in this honeymoon period now. Arne has been gracious enough to share with us his philosophy of how to build a notification service. I think you'll find it fascinating because Arne goes into a lot of useful detail about how his system works. His main design philosophy is to minimize the bottlenecks that form around synchronous access, that is when some resource is requested and the requestor ties up more resources, waiting for a response. If the requested resource can’t be delivered in a timely manner, more and more requests pile up until the server can’t accept any new ones. Nobody gets what they want and you have an outage. Breaking synchronous operations into asynchronous operations by separating request and response into separate message passing actions, stops the resource overload. Instead of a system going down from too many parallel requests, it can works its way through a backlog of requests as fast as it can. And in most cases the request/response cycles are so fast that they appear like a linear sequence of events. Notify.me is taking the innovative and risky strategy of using ejabberd, an XMPP based system, as their internal messaging and routing layer. Will Erlang and Mnesia (Erlang's database) be able to keep up with traffic and keep low latencies as traffic scales? It will be interesting to find out. If you are interested in notify.me they've kindly offered 500 beta accounts for HS readers: http://notify.me/user/account/create/highscale
This is a question everyone must struggle with when building out their datacenter. Storage choices are always the ones I have the least confidence in. David Marks in his blog You Can Change It Later! asks the question Should I get a SAN to scale my site architecture? and answers no. A better solution is to use commodity hardware, directly attach storage on servers, and partition across servers to scale and for greater availability. David's reasoning is interesting:
Update: Digg on their choice and use of Puppet. They chose puppet over cfengine, and bcfg2 because they liked Puppet's resource abstraction layer (RAL), the ability to implement configuration management incrementally, support for bundles, and the overall design philosophy. Puppet implements a declarative (what not how) configuration language for automating common administration tasks. It's the system every large site writes for themselves and it's already made for you! Ilike was able to "easily" scale from 0 to hundreds of servers using Puppet. I can't believe I've never seen this before. It looks really cool. What is Puppet and how can it help you scale your website operations? From the Puppet website: Puppet has been developed to help the sysadmin community move to building and sharing mature tools that avoid the duplication of everyone solving the same problem. It does so in two ways: * It provides a powerful framework to simplify the majority of the technical tasks that sysadmins need to perform * The sysadmin work is written as code in Puppet's custom language which is shareable just like any other code. This means that your work as a sysadmin can get done much faster, because you can have Puppet handle most or all of the details, and you can download code from other sysadmins to help you get done even faster. The majority of Puppet implementations use at least one or two modules developed by someone else, and there are already tens of recipes available in Puppet's CookBook. This sound good. But does it work in the field? HJK Solutions' Adam Jacob says it does: Puppet enables us to get a huge jump-start on building automated, scaleable, easy to manage infrastructures for our clients. Using puppet, we: 1. Automate as much of the routine systems administration tasks as possible. 2. Get 10 minute unattended build times from bare metal, most of which is data transfer. Puppet takes it the rest of the way, getting the machines ready to have applications deployed on them. It’s down to two and a half minutes for Xen. 3. Bootstrap our clients production environments while building their development environment. I can’t stress how cool this really is. Because we are expressing the infrastructure at a higher level, when it comes time to deploy your production systems, it’s really a non-event. We just roll out the Puppet Master and an Operating System auto-install environment, and it’s finished. 4. Cross-pollinate between clients with similar architectures. We work with several different shops using Ruby on Rails, all of whom have very similar infrastructure needs. By using Puppet in all of them, when we solve a problem for one client, we’ve effectively solved it for the others. I love being able to tell a client that we solved a problem for them, and all it’s going to cost is the time it takes for us to add the recipe. Puppet, today, is a tool that is good enough to handle the vast majority of issues encountered in building scalable infrastructures. Even the places where it falls short are almost always just a matter of it being less elegant than it could be, and the entire community is working on making those parts better.
OK, there is no "they" and "they" wouldn't care if you knew anyway. After all, this isn't a blog about really important stuff like investing, acne cures, or cheap natural cleansing products. But the secrets are real. Super cloud scaling consultant Kent Langley has put together a comprehensive checklist to consider when developing for the cloud:
At eBay, one of the primary architectural forces we contend with every day is scalability. It colors and drives every architectural and design decision we make. With hundreds of millions of users worldwide, over two billion page views a day, and petabytes of data in our systems, this is not a choice - it is a necessity.
In a scalable architecture, resource usage should increase linearly (or better) with load, where load may be measured in user traffic, data volume, etc. Where performance is about the resource usage associated with a single unit of work, scalability is about how resource usage changes as units of work grow in number or size. Said another way, scalability is the shape of the price-performance curve, as opposed to its value at one point in that curve.
There are many facets to scalability - transactional, operational, development effort. In this article, I will outline several of the key best practices we have learned over time to scale the transactional throughput of a web-based system. Most of these best practices will be familiar to you. Some may not. All come from the collective experience of the people who develop and operate the eBay site.
Read the rest of the article on InfoQ.
The transport-level server load balancing architectures described in the first half of this article are more than adequate for many Web sites, but more complex and dynamic sites can't depend on them. Applications that rely on cache or session data must be able to handle a sequence of requests from the same client accurately and efficiently, without failing. In this follow up to his introduction to server load balancing, Gregor Roth discusses various application-level load balancing architectures, helping you decide which one will best meet the business requirements of your Web site.
The first half of this article describes transport-level server load balancing solutions, such as TCP/IP-based load balancers, and analyzes their benefits and disadvantages. Load balancing on the TCP/IP level spreads incoming TCP connections over the real servers in a server farm. It is sufficient in most cases, especially for static Web sites. However, support for dynamic Web sites often requires higher-level load balancing techniques. For instance, if the server-side application must deal with caching or application session data, effective support for client affinity becomes an important consideration.
Here in Part 2, I'll discuss techniques for implementing server load balancing at the application level to address the needs of many dynamic Web sites.
Read the rest of the article on JavaWorld.
Server farms achieve high scalability and high availability through server load balancing, a technique that makes the server farm appear to clients as a single server. In this two-part article, Gregor Roth explores server load balancing architectures, with a focus on open source solutions. Part 1 covers server load balancing basics and discusses the pros and cons of transport-level server load balancing.
The barrier to entry for many Internet companies is low. Anyone with a good idea can develop a small application, purchase a domain name, and set up a few PC-based servers to handle incoming traffic. The initial investment is small, so the start-up risk is minimal. But a successful low-cost infrastructure can become a serious problem quickly. A single server that handles all the incoming requests may not have the capacity to handle high traffic volumes once the business becomes popular. In such a situations companies often start to scale up: they upgrade the existing infrastructure by buying a larger box with more processors or add more memory to run the applications.
Read the rest of the article on JavaWorld.
EVE Online is "The World's Largest Game Universe", a massively multiplayer online game (MMO) made by CCP. EVE Online's Architecture is unusual for a MMOG because it doesn't divide the player load among different servers or shards. Instead, the same cluster handles the entire EVE universe. It is an interesting to compare this with the Architecture of the Second Life Grid. How do they manage to scale?
- EVE Insider Dev Blog
- EVE Online FAQ
- Massively - Eve Evolved: Eve Online's Server Model and its discussion on Slashdot
- EVE Online Forums
- Massively Multiplayer Game Development 2
- Stackless Python used for both server and client game logic. It allows programmers to reap the benefits of thread-based programming without the performance and complexity problems associated with conventional threads.
- SQL Server
- Blade servers with SSDs for high IOPS
- Plans to use Infiniband interconnects for low latency networking
- Founded in 1997
- ~300K active users
- Up to 40K concurrent users
- Battles involving hundreds of ships
- 250M transactions per day
- Proxy Blades - These are the public facing segment of the EVE Cluster - they are responsible for taking player connections and establishing player communication within the rest of the cluster.
- SOL Blades - These are the workhorses of Tranquility. The cluster is divided across 90 - 100 SOL blades which run 2 nodes each. A node is the primarily CPU intensive EVE server process running on one core. There are some SOL blades dedicated to one busy solar systems such as Jita, Motsu and Saila.
- Database Cluster - This is the persistence layer of EVE Online. The running nodes interact heavily with the Database, and of course pretty much everything to do with the game lives here. Thanks to Solid-state drives, the database is able to keep up with the enormous I/O load that Tranquility generates.
- With innovative ideas MMO games can scale up to the hundreds of players in the same battle.
- SSDs will in fact bridge the gap huge performance gap between the memory and disks to some extent.
- Low latency Infiniband network interconnect will enable larger clusters.
ArchitectureThe EVE Cluster is broken into 3 distinct layers
Lessons LearnedThere are many interesting facts about the architecture of the EVE Online MMOG such as the use of Stacless Python and SSDs.
One particularly interesting EC2 third party provider is GigaSpaces with their XAP platform that provides in memory transactions backed up to a database. The in memory transactions appear to scale linearly across machines thus providing a distributed in-memory datastore that gets backed up to persistent storage.
Abstract—This paper presents the architecture and characteristics of a memory database intended to be used as a cache engine for web applications. Primary goals of this database are speed and efficiency while running on SMP systems with several CPU cores (four and more). A secondary goal is the support for simple metadata structures associated with cached data that can aid in efficient use of the cache. Due to these goals, some data structures and algorithms normally associated with this field of computing needed to be adapted to the new environment.