Wednesday, April 24, 2013 at 9:25AM
Solving problems while saving money is always a problem. In Nobody ever got ﬁred for using Hadoop on a cluster they give some counter-intuitive advice by showing a big-memory server may provide better performance per dollar than a cluster:
- For jobs where the input data is multi-terabyte or larger a Hadoop cluster is the right solution.
- For smaller problems memory has reached a GB/$ ratio where it is technically and financially feasible to use a single server with 100s of GB of DRAM rather than a cluster. Given the majority of analytics jobs do not process huge data sets, a cluster doesn't need to be your first option. Scaling up RAM saves on programmer time, reduces programmer effort, improved accuracy, and reduces hardware costs.