« Stuff The Internet Says On Scalability For July 13, 2012 | Main | FictionPress: Publishing 6 Million Works of Fiction on the Web »

4 Strategies for Punching Down Traffic Spikes

Travis Reeder in Spikability - An Application's Ability to Handle Unknown and/or Inconsistent Load gives four good ways of handling spikey loads:

  1. Have more resources than you'll ever need. Estimate the maximum traffic you'll need and keep that many servers running. Downside is you are paying for capacity you aren't using.
  2. Disable features during high loads. Reduce load by disabling features or substituting in lighter weight features. Downside is users to have access to features.
  3. Auto scaling. Launch new servers in response to load. Downsides are it's complicated to setup and slow to respond. Random spikes will cause cycling of instances going up and down.
  4. Use message queues. Queues soak up work requests during traffic spikes. More servers can be started to process work from the queue. Resources aren't wasted and features are disabled. Downside is increased latency.

Reader Comments (5)

Two more strategies that you missed:
1. simply throttle traffic. If you're over capacity, just answer X% of the requests with an "fail whale" message - it's not ideal since you alienate some customers, but it's better than having latencies so big that the service becomes unusable by 100% of your customers. In fact, it could be argued that often the second strategy in the article (disable features) is simply a particular form of throttling (where you choose to stop serving some kind of requests).
2. Cache & serve static versions of your dynamic pages. This works best if you suddenly have a lot of page views due to being "reddited". And you can make this happen automatically - just pass all your dynamic requests through a cache like Varnish; if the backend system doesn't respond in a timely fashion, the cache can simply serve the last version of the page that it saw (there are issues with authenticated pages since you don't want do divulge private data to other customers, but for public pages this strategy works fine).

July 12, 2012 | Unregistered Commentervirgil

The author's preference for message queues seems to be a little short sighted and biased. Yes, message queues are a great way to protect against spikes impacting back end services, but the over all latency, complexity, and increased infrastructure and management of them are serious drawbacks. It also is not a solution that is very customer facing, as in not needing a number of extra web and app layer servers to accommodate the flood that will ultimately go un-answered as messages time out in queues. Fire and forget is great for the web/app, but there needs to be communication back to the requester (user) when the message does fail and it is better to fail fast than make the requester wait and then fail (IMHO).

Plus the situation where traffic spikes for such a short amount of time is not the only to be worried about. Normal usage patterns such as "heavy Monday's" are essentially spikes when looked at over large enough a time scale. Where I am currently working we see nearly twice as much traffic on Monday as we do the rest of the business week and nearly four times what we see on the weekend. Auto Scaling is much more appropriate (in our situation) but we do combine it with message queues to protect the back end services. We try to cache as much as possible before a request ever has to hit a message queue and try to only fail when the service is actually down. If we make a user wait then we better give them exactly what they want...

July 13, 2012 | Unregistered CommenterDipesh

If you use XMPP instead of AJAX, then you can disable or discourage features during high load periods by having users subscribe to the presence of components. When the component gets overloaded, it broadcasts a presence indicating as such. Clients receive that presence packet and change the GUI appropriately.

July 14, 2012 | Unregistered CommenterGlenn

Well, the trade-offs can be listed as a question: what are you willing to trade off for throughput? Latency? Consistency? Money? Of course, the answer is some mix of all three. Tuning your cache timeouts (e.g. consistency) to your load level is a good start, as is queueing the cache-refresh (e.g. serve out of cache the whole time, and even on expiration, serve out of cache until the backend has the new version). Also, there's the matter of how quickly you can turn on various VMs -- cache servers, backends, etc. A fast-boot cache server may do some good work for you.

July 22, 2012 | Unregistered CommenterLally

The ability to scale dynamically in the face of variable workloads is one of the promises of the cloud and it has been largely kept in respect to the upper two tiers of the app stack (presentation and logic) but NOT in the case of the database. That's the whole premise of the company I work at, ParElastic, that brings elasticity to the database tier.

Item #3 in the post above is not so hard with us in the mix. Check us out ...


July 30, 2012 | Unregistered CommenterAmrith Kumar

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>