Paper

Paper: Heracles: Improving Resource Efficiency at Scale

High Scalability

04 Jun 2015 — 1 min read

Underutilization and segregation are the classic strategies for ensuring resources are available when work absolutely must get done. Keep a database on its own server so when the load spikes another VM or high priority thread can't interfere with RAM, power, disk, or CPU access. And when you really need fast and reliable networking you can't rely on QOS, you keep a dedicated line.

Google flips the script in Heracles: Improving Resource Efficiency at Scale, shooting for high resource utilization while combining different load profiles.

I'm assuming the project name Heracles was chosen not simply for his legendary strength, but because when strength failed, Heracles could always depend on his wits. Who can ever forget when Heracles tricked Atlas into taking the sky back onto his shoulders? Good times.

The problem: better utilization of compute resources while complying service level objectives (SLOs) for latency-critical (LC) and best effort batch (BE) tasks:

Several studies have established that the average server utilization in most datacenters is low, ranging between 10% and 50%...The cost of such underutilization can be significant. For instance, Google websearch servers often have an average idleness of 30% over a 24 hour period. For a hypothetical cluster of 10,000 servers, this idleness translates to a wasted capacity of 3,000 servers.

The solution:

a) coordinated management of multiple isolation mechanisms is key to achieving high utilization without SLO violations; b) carefully separating interference into independent subproblems is effective at reducing the complexity of the dynamic control problem; and c) a local, real-time controller that monitors latency in each server is sufficient...Heracles manages 4 mechanisms to mitigate interference: core isolation, LLC isolation, power isolation, network traffic isolation.

The result:

We evaluate Heracles using production latency critical and batch work loads from Google and demonstrate average server utilizations of 90% without latency violations across all the load and colocation scenarios that we evaluated...Heracles also improves throughput/TCO by 15% to 300%.

Kafka 101

This is a guest article by Stanislav Kozlovski, an Apache Kafka Committer. If you would like to connect with Stanislav, you can do so on Twitter and LinkedIn. Originally developed in LinkedIn during 2011, Apache Kafka is one of the most popular open-source Apache projects out there. So far it

Capturing A Billion Emo(j)i-ons

This blog post was written by Dedeepya Bonthu. This is a repost from her Medium article, approved by the author. In stadiums, sports fans love to express themselves by cheering for their favorite teams, holding up placards and team logos. Emoji’s allow fans at home to rapidly express themselves,

Brief History of Scaling Uber

This blog post was written by Josh Clemm, Senior Director of Engineering at Uber Eats. This is a repost from his LinkedIn article, approved by the author. On a cold evening in Paris in 2008, Travis Kalanick and Garrett Camp couldn't get a cab. That's when

Behind AWS S3’s Massive Scale

This is a guest article by Stanislav Kozlovski, an Apache Kafka Committer. If you would like to connect with Stanislav, you can do so on Twitter and LinkedIn. AWS S3 is a service every engineer is familiar with. It’s the service that popularized the notion of cold-storage to the

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale