Strategy: Don't Use Polling for Real-time Feeds

Ivan Zuzak wrote a fascinating article on Real-time feed processing and filtering using Google App Engine to build Feed-buster, a service that inserts MediaRSS tags into feeds that don't have them. He talks about using polling and PubSubHubBub (real-time) to process FriendFeed feeds. Ivan is trying to devise a separate filtering service where:

  1. filtering services should be applied as close to the publisher as possible so notifications that nobody wants don’t waste network resource.
  2. processing services should be applied as close to the subscriber so that the original update may be transported through the network as a single notification for as long as possible.

Besides being a generally interesting article, Ivan makes an insightful observation on the nature of using polling services in combination with metered Infrastructure/Platform services:

Polling is bad because AppEngine applications have a fixed free daily quota for consumed resources, when the number of feeds the service processed increased - the daily quota was exhausted before the end of the day because FF polls the service for each feed every 45 minutes.

This fits directly in with the ideas in Cloud Programming Directly Feeds Cost Allocation Back into Software Design. My general preference is to poll a distributed queue for work items. It's robust and allows your system to control it's own resource usage by determining when to poll. Otherwise you can easily be overwhelmed by fast pushers. Here the overwhelming is going the other way. Your budget is being overwhelmed by the polling requests. And the more you try approximate real-time with frequent polling requests the more your budget is busted.It's a cool example of how costs, algorithm, and platform choices all feed into and shape product architectures.