« Stuff The Internet Says On Scalability For July 20, 2012 | Main | Strategy: Kill Off Multi-tenant Instances with High CPU Stolen Time »

Disks Ain't Dead Yet: GraphChi - a disk-based large-scale graph computation

GraphChi uses a Parallel Sliding Windows method which can: process a graph with mutable edge values efficiently from disk, with only a small number of non-sequential disk accesses, while supporting the asynchronous model of computation.

The result is graphs with billions of edges can be processed on just a single machine. It uses a vertex-centric computation model similar to Pregel, which supports iterative algorithims as apposed to the batch style of MapReduce. Streaming graph updates are supported.

About GraphChi, Carlos Guestrin, codirector of Carnegie Mellon's Select Lab, says:

A Mac Mini running GraphChi can analyze Twitter's social graph from 2010—which contains 40 million users and 1.2 billion connections—in 59 minutes. "The previous published result on this problem took 400 minutes using a cluster of about 1,000 computers

Related Articles

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>