Update: Yahoo! Launches World's Largest Hadoop Production Application. A 10,000 core Hadoop cluster produces data used in every Yahoo! Web search query. Raw disk is at 5 Petabytes. Their previous 1 petabyte database couldn't handle the load and couldn't grow larger. Greg Linden thinks the Google cluster has way over 133,000 machines.
From an InfoQ interview with project lead Doug Cutting, it appears Hadoop, an open source distributed computing platform, is making good progress towards their 1.0 release. They've successfully reached a 1000 node cluster size, improved file system integrity, and jacked performance by 20x in the last year.
How they are making progress could be a good model for anyone:
I have read the blog about Mailtrust/Rackspace as well the interesting things with Google and Yahoo. Who else is using Hadoop/MapReduce to solve business problems.
TIA
johnmwillis.com
How do you query hundreds of gigabytes of new data each day streaming in from over 600 hyperactive servers? If you think this sounds like the perfect battle ground for a head-to-head skirmish in the great MapReduce Versus Database War, you would be correct.
Bill Boebel, CTO of Mailtrust (Rackspace's mail division), has generously provided a fascinating account of how they evolved their log processing system from an early amoeba'ic text file stored on each machine approach, to a Neandertholic relational database solution that just couldn't compete, and finally to a Homo sapien'ic Hadoop based solution that works wisely for them and has virtually unlimited scalability potential.
Dryad is Microsoft's answer to Google's map-reduce. What's the question: How do you process really large amounts of data? My initial impression of Dryad is it's like a giant Unix command line filter on steroids. There are lots of inputs, outputs, tees, queues, and merge sorts all connected together by a master exec program. What else does Dryad have to offer the scalable infrastructure wars?
Recent comments
1 hour 51 min ago
8 hours 59 min ago
1 day 1 hour ago
1 day 4 hours ago
1 day 5 hours ago
1 day 6 hours ago
1 day 8 hours ago
1 day 17 hours ago
1 day 21 hours ago
2 days 1 hour ago