« Paper: Making reliable distributed systems in the presence of software errors | Main | Facebook Secrets of Web Performance »

Strategy: Using Lots of RAM Often Cheaper than Using a Hadoop Cluster

Solving problems while saving money is always a problem. In Nobody ever got fired for using Hadoop on a cluster they give some counter-intuitive advice by showing a big-memory server may  provide better performance per dollar than a cluster:

  1. For jobs where the input data is multi-terabyte or larger a Hadoop cluster is the right solution.
  2. For smaller problems memory has reached a GB/$ ratio where it is technically and financially feasible to use a single server with 100s of GB of DRAM rather than a cluster. Given the majority of analytics jobs do not process huge data sets, a cluster doesn't need to be your first option. Scaling up RAM saves on programmer time, reduces programmer effort, improved accuracy, and reduces hardware costs.


Reader Comments (1)

Interesting perspective when it comes to watching your spend. I would ask what would happen when that server fails? I'm not sure that using a single server allows you to fail gracefully.

May 20, 2013 | Unregistered CommenterDave Berardi

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>