Amazon is fixing two of their major problems: no static IP addresses and single datacenter operation. By adding these two new features developers can finally build a no apology system on Amazon. Before you always had to throw in an apology or two. No, we don't have low failover times because of the silly DNS games and unexceptionable DNS update and propagation times and no, we don't operate in more than one datacenter. No more. Now Amazon is adding Elastic IP Addresses and Availability Zones. Elastic IP addresses are far better than normal IP addresses because they are both in tight with Jessica Alba and they are: Static IP addresses designed for dynamic cloud computing. An Elastic IP address is associated with your account, not a particular instance, and you control that address until you choose to explicitly release it. Unlike traditional static IP addresses, however, Elastic IP addresses allow you to mask instance or availability zone failures by programmatically remapping your public IP addresses to any instance associated with your account. Rather than waiting on a data technician to reconfigure or replace your host, or waiting for DNS to propagate to all of your customers, Amazon EC2 enables you to engineer around problems with your instance or software by programmatically remapping your Elastic IP address to a replacement instance. About the new feature RightScale says: Amazon did a very nice job in creating something much more powerful than simply adding “static IPs” to their offering. They are giving us dynamically remappable IP addresses that fit well into the overall cloud computing paradigm that we can use to manage servers better than with traditional hosting solutions. Mostly good news. It's not great news because RightScale also says "Assigning or reassigning an IP to an instance takes a couple of minutes..." So it's not as speedy as one would hope, but at least you don't have to wait for TTL to kick in and everyone up and down the stack to get new IP addresses. Cached static IP addresses will always be valid, which simplifies and speeds things up considerably, especially when using redundant load balancers as the entry point into your system. The other power feature added was the ability to specify which datacenter your instances run in. Amazon calls this feature Availability Zones: Availability Zones provide the ability to place instances in multiple locations. Amazon EC2 locations are composed of regions and availability zones. Regions are geographically dispersed and will be in separate geographic areas or countries. Currently, Amazon EC2 exposes only a single region. Availability zones are distinct locations that are engineered to be insulated from failures in other availability zones and provide inexpensive, low latency network connectivity to other availability zones in the same region. Regions consist of one or more availability zones. By launching instances in separate availability zones, you can protect your applications from failure of a single location. You might also be wondering how fast connections are between datacenters. They are said to be: "inexpensive, low latency network connectivity to other availability zones in the same region." I tend to believe this. I've been surprised before how fast datacenter links can be in that you didn't have code specially for these configurations. How this will impact S3 and SimpleDB latencies is an interesting question. And how system design will need to change once datacenters are in different regions of the world is another interesting question. Other services still have more datacenters, more geographically dispersed datacenters than Amazon, and better content migration capabilities, but this is a great first step that allows developers to add another layer of reliability to their systems. Update: I thought this was a good explanation of using static IPs in Some more about EC2: Slashdot hasn't run many stories on EC2 (none that I know of) because until now it's been a niche service. Without a way to guarantee that you can have a static IP, there had been a single point of failure: if your outward-facing VMs all went down, your only recourse was to start up more VMs on new, dynamically-assigned IPs, point your DNS to them, and wait hours for your users' DNS caches to expire. That meant that while it may have been a good service for sites that needed to do massive private computation, it was an unacceptable hosting service. Now with static IPs, you basically set up your service to have several VMs which provide the outward-facing service (maybe running a webserver, or a reverse proxy for your internal webservers), and you point your public, static IPs at those. If one or more of them goes down, you start up new copies of those VMs and repoint the IPs to them. No DNS changes required.
Greg Linden links to a heavily lesson ladened LISA 2007 paper titled On Designing and Deploying Internet-Scale Services by James Hamilton of the Windows Live Services Platform group. I know people crave nitty-gritty details, but this isn't a how to configure a web server article. It hitches you to a rocket and zooms you up to 50,000 feet so you can take a look at best web operations practices from a broad, yet practical perspective. The author and his team of contributors obviously have a lot of in the trenches experience. Many non-obvious topics are covered. And there's a lot to learn from.
The paper has too many details to cover here, but the big sections are:
In the recommendations we see some of our old favorites:
Personally, I'm still trying to figure out how to make something simple.
Next are some good thoughts on how to design operations friendly software:
And the paper continues along the same lines in each section. Good detailed advice on lots of different topics.
You'll undoubtedly agree with some of the advice and disagree with some. Greg wants faster release cycles, thinks having server affinity for some things is OK, and thinks the advice on allowing humans to throttle load won't work in a crisis. Perfectly valid points, but what's fun is to consider them. Some companies, for example, have a dead-man's switch that must be thrown before one master can failover to another in a multi-datacenter situation. Is that wrong or right? Only the shadow knows.
The advice to "document all conceivable component failures and modes and combinations" sounds good but is truly difficult to do in practice. I went through this process once on a telco project and it took months just to cover all the failure scenarios on a few cards. But the spirit is right I think.
My favorite part of the whole paper is:
We have long believed that 80% of operations issues originate in design and development, so this section
on overall service design is the largest and most important. When systems fail, there is a natural tendency
to look first to operations since that is where the problem actually took place. Most operations issues,
however, either have their genesis in design and development are best solved there.
Understand this and I think much of the rest follows naturally.
Comet has popularized asynchronous non-blocking HTTP programming, making it practically indistinguishable from reverse Ajax, also known as server push. This JavaWorld article takes a wider view of asynchronous HTTP, explaining its role in developing high-performance HTTP proxies and non-blocking HTTP clients, as well as the long-lived HTTP connections associated with Comet.
The RAD Lab (Reliable Adaptive Distributed Systems Laboratory) wants to leapfrog the Big Switch and create The Next Big Switch, skipping the cloud/utility evolutionary stage altogether. This hyper-evolutionary niche buster develops technology so advanced the cloud disperses and you can go back to building your own personal datacenters again. Where Google took years to create their datacenters, using a prefab Datacenter Operating System you might create your own in a long holiday weekend. Not St. Patrick's of course. Their vision: Enable one person to invent and run the next revolutionary IT service, operationally expressing a new business idea as a multi-million-user service over the course of a long weekend. By doing so we hope to enable an Internet "Fortune 1 million". How? By wizardry in the form of a “datacenter operating system” created from a pinch of "statistical machine learning (SML)" and a tincture of "recent insights from networking and distributed systems." But like most magics it's not so outlandish once you understand it:
Hi. I'm looking for a way to share files between EC2 nodes. Currently we are using glusterfs to do this. It has been reliable recently, but in the past it has crashed under high load and we've had trouble starting it up again. We've only been able to restart it by removing the files, restarting the cluster, and filing it up again with our files from backup. This takes ages, and will take even longer the more files we get. What worries me is that it seems to make each node a point of failure for the entire system. One node crashes and soon the entire cluster has crashed. The other problem is adding another node. It seems like you have to take down the whole thing, reconfigure to include the new node, and restart. This kind of defeats the horizontal scaling strategy. We are using 2 EC2 instances as web servers, 1 as a DB master, and 1 as a slave. GlusterFS is installed on the web server machines as well as the DB slave machine (we backup files to s3 from this machine). The files are mostly thumbnails, but also some larger images and media files. Does anyone have a good solution for sharing files between EC2 nodes? I like the ThruDB [http://trac.thrudb.org/] concept of using the local filesystem as a cache for S3, but I'm not sure if ThruDB is mature enough yet. Or maybe some kind of distributed filesystem built on top of git would work? Any ideas? Thanks! ~rvr
[Tim O'Reilly] Continuing my series of queries about how "Web 2.0" companies used databases, I asked Cal Henderson of Flickr to tell me "how the folksonomy model intersects with the traditional database. How do you manage a tag cloud?"
I am working on the design for my database and can't seem to come up with a firm schema. I am torn between normalizing the data and dealing with the overhead of joins and denormalizing it for easy sharding. The data is essentially music information per user: UserID, Artist, Album, Song. This lends itself nicely to be normalized and have separate User, Artist, Album and Song databases with a table full of INTs to tie them together. This will be in a mostly read based environment and with about 80% being searches of data by artist album or song. By the time I begin the query for artist, album or song I will already have a list of UserID's to limit the search by. The problem is that the tables can get unmanageably large pretty quickly and my plan was to shard off users once it got too big. Given this simple data relationship what are the pros and cons of normalizing the data vs denormalizing it? Should I go with 4 separate, normalized tables or one 4 column table? Perhaps it might be best to write the data in both formats at first and see what query speed is like once the tables fill up... Another potential issue would be the fact that inserts will be coming in batches of about 500 - 2000+ per user at a time which will be pretty intensive to pull off for the normalized table as there will need to be quite a few selects for each insert due to the fact that the artist, album or song may already be in the database or it may not requiring an insert. What do you all think?
Paper: Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web
Consistent hashing is one of those ideas that really puts the science in computer science and reminds us why all those really smart people spend years slaving over algorithms. Consistent hashing is "a scheme that provides hash table functionality in a way that the addition or removal of one slot does not significantly change the mapping of keys to slots" and was originally a way of distributing requests among a changing population of web servers. My first reaction to the idea was "wow, that's really smart" and I sadly realized I would never come up with something so elegant. I then immediately saw applications for it everywhere. And consistent hashing is used everywhere: distributed hash tables, overlay networks, P2P, IM, caching, and CDNs. Here's the abstract from the original paper and after the abstract are some links to a few very good articles with accessible explanations of consistent hashing and its applications in the real world. Abstract: We describe a family of caching protocols for distributed networks that can be used to decrease or eliminate the occurrence of hot spots in the network. Our protocols are particularly designed for use with very large networks such as the Internet, where delays caused by hot spots can be severe, and where it is not feasible for every server to have complete information about the current state of the entire network. The protocols are easy to implement using existing network protocols such as TCP/IP, and require very little overhead. The protocols work with local control, make efficient use of existing resources, and scale gracefully as the network grows. Our caching protocols are based on a special kind of hashing that we call consistent hashing. Roughly speaking, a consistent hash function is one which changes minimally as the range of the function changes. Through the development of good consistent hash functions, we are able to develop caching protocols which do not require users to have a current or even consistent view of the network. We believe that consistent hash functions may eventually prove to be useful in other applications such as distributed name servers and/or quorum systems. Other excellent resources for learning more about consistent hashing are at: