sampling

Todd Hoff's picture

Strategy: Sample to Reduce Data Set

Update: Arjen links to video Supporting Scalable Online Statistical Processing which shows
"rather than doing complete aggregates, use statistical sampling to provide a reasonable estimate (unbiased guess) of the result."

When you have a lot of data, sampling allows you to draw conclusions from a much smaller amount of data. That's why sampling is a scalability solution. If you don't have to process all your data to get the information you need then you've made the problem smaller and you'll need fewer resources and you'll get more timely results.

Syndicate content