Strategy: Limit Result Sets

Release It! author Michael Nygard tells a tale of two web sites, both brought low by unexpectedly huge unbounded results sets that slowed down their sites to the speed of a Christmas checkout line.

I've committed this error more than a few times. During testing the results sets are often small, so you don't see problems. Or when a product is new you don't have a lot of data so everything is fine, until some magic line is crossed and you get that dreaded 2AM fix it call.

My most embarrassing bug of this type caused a rather spectacular failure at a customer site as the variance in response times was out of spec and this kicked in penalty clauses. What happened was the customer had a larger network than we could even test (customers always get the good stuff). I took a lock and went to get all the data. Because the result set was so much larger in their larger system I took the lock for many more milliseconds than I should have. Unknown to me a chunk of code on the critical path also was in the lock path and all hell broke loose. I had to change the logic to process the result set in fixed size deterministic chunks, releasing locks as I went. I even had to measure CPU usage and back off after a certain amount of CPU was used. But all was well again. I then hunted down every other place I made the same mistake. And there were a few. To solve this problem in general I developed an architecture supporting scheduling work by CPU usage.

A common theme in many of the profiles on this site is protecting your system from requests that can bring down the system. Mailinator has a lot resource exhaustion problems and does a good job solving them.

Ebay has an interesting strategy of doing as little work as possible in the database which leads them to do joins in application space. Which is exactly the opposite of this strategy's conclusion. But I think this may be going too far. With proper indexes performing selects in the database to minimize the result sets would seem to be a win as databases are good at this sort of thing. Yah, relational databases suck at doing top 10 type of logic, so calculate that on the fly and cache it.

How can you bound results sets?

  • Michael's strategy:
    You should make your apps be paranoid about their data. If your app processes one record at a time, then looping through an entire result set might be OK---as long as you're not making a user wait while you do. But if your app that turns rows into objects, then it had better be very selective about its SELECTs. The relationships might not be what you expect. The data producer might have changed in a surprising way, particularly if it's not under your control. Purging routines might not be in place, or might have gotten broken. Definitely don't trust some other application or batch job to load your data in a safe way.

  • A diagnostic tactic is to benchmark all page response times and look for pages that take too long. This should be an automated system that emails you or shows alerts in your dashboard system. And because you logged everything you can figure out why the response was slow and then take measures to fix it.