Strategy: Use Spare Region Capacity to Survive Availability Zone Failures
In the wake of the recent Amazon problems Ryan Lackey offers some practical first responder cloud survival advice:
If you're a large site (particularly a PaaS) on AWS and care about availability, you need to have spare capacity in your region (using Reserve Instances, like Netflix does) to cover when a single AZ disappears, and your own external to AWS load balancing (not DNS based), with your own per-AZ subsidiary load balancers (nginx or whatever) running within EC2.
You need a robust database layer, ideally multi-region or AWS + nonAWS, but that's more site specific.
Going multiregion is the next step, and the above is an essential part of getting to that point.