Paper: GraphLab: A New Framework For Parallel Machine Learning
Wednesday, June 30, 2010 at 8:41AM
Todd Hoff in BigData, Paper, graph

In the never ending quest to figure out how to do something useful with never ending streams of data, GraphLab: A New Framework For Parallel Machine Learning wants to go beyond low-level programming, MapReduce, and dataflow languages with a new parallel framework for ML (machine learning) which exploits the sparse structure and common computational patterns of ML algorithms. GraphLab enables ML experts to easily design and implement efficient scalable parallel algorithms by composing problem specific computation, data-dependencies, and scheduling.  Our main contributions include: 

From the abstract:

Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently  expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies while ensuring data consistency and achieving a high degree of parallel performance. We demonstrate the expressiveness of the GraphLab framework by designing and implementing parallel versions of belief propagation, Gibbs sampling, Co-EM, Lasso and Compressed Sensing. We show that using GraphLab we can achieve excellent parallel performance on large scale real-world problems.           

Related Articles

Article originally appeared on (http://highscalability.com/).
See website for complete article licensing information.