Improve small job completion times by 47% by running full clones.
The idea is most jobs are small. Researchers found 82% of jobs on Facebook's cluster were less than 10 tasks. Clusters have a median utilization of under 20%. And since small jobs are particularly sensitive to stragglers the audacious solution is to proactively launch clones of a job as they are submitted and pick the result from the earliest clone. The result is an average completion time of all the small jobs improved by 47% using cloning, at the cost of just 3% extra resources.
For more details take a look at the very interesting Why Let Resources Idle? Aggressive Cloning of Jobs with Dolly.