mapreduce

Need help with your Hadoop deployment? This company may help!

A group of top Silicon Valley engineers (ex-Yahoo, Facebook, Google) have come together to launch a new startup called Cloudera.
Not yet launched, it intends to help other companies adopt a promising software platform called Hadoop.

Hadoop is an open-source software project (written in Java) designed to let developers write and run applications that process huge amounts of data. While it could potentially improve a wide range of other software, the ecosystem supporting its implementation is still developing. Which is where Cloudera hopes to make a place for itself.

More on Hadoop: It uses the Google-introduced MapReduce systems framework that divides applications into small blocks of work, creating multiple replicas of data blocks that it places on various computer nodes.

It is already in use at large companies like Yahoo.
Read more about Cloudera here.

Is MapReduce going mainstream?

Compares MapReduce to other parallel processing approaches and suggests new paradigm for clouds and grids

tmielika's picture

MapReduce framework Disco

Disco is an open-source implementation of the MapReduce framework for distributed computing. It was started at Nokia Research Center as a lightweight framework for rapid scripting of distributed data processing tasks. The Disco core is written in Erlang. The MapReduce jobs in Disco are natively described as Python programs, which makes it possible to express complex algorithmic and data processing tasks often only in tens of lines of code.

Marcelb's picture

Distributed Computing & Google Infrastructure

A couple of videos about distributed computing with direct reference on Google infrastructure.
You will get acquainted with:

--MapReduce the software framework implemented by Google to support parallel computations over large (greater than 100 terabyte) data sets on commodity hardware
--GFS and the way it stores it's data into 64mb chunks
--Bigtable which is the simple implementation of a non-relational database at Google

Cluster Computing and MapReduce Lectures 1-5.

Syndicate content