Re: Database People Hating on MapReduce

No inside knowledge here but my understanding was that BigTable was a storage mechanism and MapReduce was a distributed calculation infrastructure. One of the big issues in getting the M/R and RDBMS folks talking is that most of the M/R folks come from or work in the NLP field, where answers are inherently subjective and fuzzy matching algorithms are king. I find similar frustrations getting strongly typed and dynamically typed language people to see that each has it's sweet spot.

I've found that "Managing Gigabytes" (Witten) and " Foundations of Statistical Natural Language Processing" (Manning/Shuetze) to be the best inoculation for RDBMS folks trying to think about NLP and text search problems.

BTW - One criticism I would have on M/R is that it seems horribly inefficient in terms of computation. Their goal was probably development agility for parallel computation, so that's not a big ding. I worry about inter-node bandwidth, though. When working on a text search engine back in 93, I "cleaned up" some code in a way that pushed one data map from L1 cache to main memory and dropped indexing speed by 40%.

Cheers,
Clark

Reply

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd><div ?=?><p ?=?> <img ?=?><h1 ?=?><h2 ?=?><h3 ?=?>
  • Lines and paragraphs break automatically.
  • Glossary terms will be automatically marked with links to their descriptions
  • You may link to webpages through the weblinks registry

More information about formatting options

To combat spam, please enter the code in the image.