« Paper: The End of an Architectural Era (It’s Time for a Complete Rewrite) | Main | Using HTTP cache headers effectively »

Implementing large scale web analytics

Does anyone know of any articles or papers that discuss the nuts and bolts of how web analytics is implemented at organizations with large volumes of web traffic and a critcal business need to analyze that data - e.g. places like Amazon.com, eBay, and Google?

Just as a fun project I'm planning to build my own web log analysis app that can effectively index and query large volumes of web log data (i.e. TB range). But first I'd like to learn more about how it's done in the organizations whose lifeblood depends on this stuff. Even just a high level architectural overview of their approaches would be nice to have.

Reader Comments (2)

There is a video of a Facebook presentation on the same. On the Yahoo cofee talk series I think.

November 29, 1990 | Unregistered CommenterAnonymous

I can't seem to find it, but I did find this:


The post/article can be summarized as "we use Hadoop". I wish there were more details though.

November 29, 1990 | Unregistered Commenterrobw

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>