No inside knowledge here but my understanding was that BigTable was a storage mechanism and MapReduce was a distributed calculation infrastructure. One of the big issues in getting the M/R and RDBMS folks talking is that most of the M/R folks come from or work in the NLP field, where answers are inherently subjective and fuzzy matching algorithms are king. I find similar frustrations getting strongly typed and dynamically typed language people to see that each has it's sweet spot.
I've found that "Managing Gigabytes" (Witten) and " Foundations of Statistical Natural Language Processing" (Manning/Shuetze) to be the best inoculation for RDBMS folks trying to think about NLP and text search problems.
BTW - One criticism I would have on M/R is that it seems horribly inefficient in terms of computation. Their goal was probably development agility for parallel computation, so that's not a big ding. I worry about inter-node bandwidth, though. When working on a text search engine back in 93, I "cleaned up" some code in a way that pushed one data map from L1 cache to main memory and dropped indexing speed by 40%.
Re: Database People Hating on MapReduce
No inside knowledge here but my understanding was that BigTable was a storage mechanism and MapReduce was a distributed calculation infrastructure. One of the big issues in getting the M/R and RDBMS folks talking is that most of the M/R folks come from or work in the NLP field, where answers are inherently subjective and fuzzy matching algorithms are king. I find similar frustrations getting strongly typed and dynamically typed language people to see that each has it's sweet spot.
I've found that "Managing Gigabytes" (Witten) and " Foundations of Statistical Natural Language Processing" (Manning/Shuetze) to be the best inoculation for RDBMS folks trying to think about NLP and text search problems.
BTW - One criticism I would have on M/R is that it seems horribly inefficient in terms of computation. Their goal was probably development agility for parallel computation, so that's not a big ding. I worry about inter-node bandwidth, though. When working on a text search engine back in 93, I "cleaned up" some code in a way that pushed one data map from L1 cache to main memory and dropped indexing speed by 40%.
Cheers,
Clark