Strategy: Survive a Comet Strike in the East With Reserved Instances in the West

Ali Sadat of MuleSoft gave interesting presentation at Saturday's Talk Cloudy to Me! event about their experiences moving Mule iON, their ESB (enterprise service bus) product to the cloud.

First, a little about Talk Cloudy to Me. This event is the second one day cloud event put on by Sebastian Stadil, founder of Scalr, as part of the Cloud Computing Meetup group, also created by Sebastian. Sebastian has become a master at running these mini-conference style events. Really a quality job by him and his dedicated crew. These events are free, sponsored by various vendors; they are short, 11-5; the food is good, Thai; the venue is nice, eBay; they are on topic, with cloud and other speakers giving 30-45 minute talks.  More on the event when the video comes out. I type as fast as I can but I can't do much without the video.

Back to Ali Sadat. While he gave a lot of lessons--the integration will your billing system will take a lot longer than you think; talk to your Amazon account reps as they have good advice you might not have thought about; and move off EBS in favor of using local drives in a replicated configuration--the lesson that stuck with me was:

Buy reserved instances in Amazon's US West region (or some secondary region) in case a comet hits the East region (or your primary region).

Besides the vivid imagery, the idea is that if a AZ goes down everyone will be moving to another region or AZ. In the event of a large failure you may not find the instances you need in your preferred failover AZ. So the way to guarantee you'll have the resources you need is to buy reserved instances ahead of time so you'll have them when you'll need them. What they do is replicate to 3 AZs within a region and asynchronously replicate to S3. On a major failure the reserved instances in a different region can come up in read-only mode, reload from S3, and provide read-only service until the problem is fixed in the primary region. If it looks like the problem won't be fixed in a reasonable amount of time, I imagine the read-only version of the system could be promoted to a write available system.