A group of top Silicon Valley engineers (ex-Yahoo, Facebook, Google) have come together to launch a new startup called Cloudera.
Not yet launched, it intends to help other companies adopt a promising software platform called Hadoop.
Hadoop is an open-source software project (written in Java) designed to let developers write and run applications that process huge amounts of data. While it could potentially improve a wide range of other software, the ecosystem supporting its implementation is still developing. Which is where Cloudera hopes to make a place for itself.
More on Hadoop: It uses the Google-introduced MapReduce systems framework that divides applications into small blocks of work, creating multiple replicas of data blocks that it places on various computer nodes.
It is already in use at large companies like Yahoo.
Read more about Cloudera here.
Compares MapReduce to other parallel processing approaches and suggests new paradigm for clouds and grids
Disco is an open-source implementation of the MapReduce framework for distributed computing. It was started at Nokia Research Center as a lightweight framework for rapid scripting of distributed data processing tasks. The Disco core is written in Erlang. The MapReduce jobs in Disco are natively described as Python programs, which makes it possible to express complex algorithmic and data processing tasks often only in tens of lines of code.
A couple of videos about distributed computing with direct reference on Google infrastructure.
You will get acquainted with:
--MapReduce the software framework implemented by Google to support parallel computations over large (greater than 100 terabyte) data sets on commodity hardware
--GFS and the way it stores it's data into 64mb chunks
--Bigtable which is the simple implementation of a non-relational database at Google
Recent comments
1 day 18 min ago
1 day 58 min ago
1 day 6 hours ago
1 day 18 hours ago
2 days 11 min ago
2 days 1 hour ago
2 days 9 hours ago
2 days 12 hours ago
2 days 18 hours ago
2 days 19 hours ago