Paper: Large-scale cluster management at Google with Borg
Joe Beda (@jbeda): Borg paper is finally out. Lots of reasoning for why we made various decisions in #kubernetes. Very exciting.
The hints and allusions are over. We now have everything about Google's long rumored Borg project in one iconic Google style paper: Large-scale cluster management at Google with Borg.
When Google blew our minds by audaciously treating the Datacenter as a Computer it did not go unnoticed that by analogy there must be an operating system for that datacenter/computer.
Now we have the story behind a critical part of that OS:
Google's Borg system is a cluster manager that runs hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to tens of thousands of machines.
It achieves high utilization by combining admission control, efficient task-packing, over-commitment, and machine sharing with process-level performance isolation. It supports high-availability applications with runtime features that minimize fault-recovery time, and scheduling policies that reduce the probability of correlated failures. Borg simplifies life for its users by offering a declarative job specification language, name service integration, real-time job monitoring, and tools to analyze and simulate system behavior.
We present a summary of the Borg system architecture and features, important design decisions, a quantitative analysis of some of its policy decisions, and a qualitative examination of lessons learned from a decade of operational experience with it.
Virtually all of Google’s cluster workloads have switched to use Borg over the past decade. We continue to evolve it, and have applied the lessons we learned from it to Kubernetes
The next version of Borg was called Omega and Omega is being rolled up into Kubernetes (steersman, helmsman, sailing master), which has been open sourced as part of Google's Cloud initiative.
Note how the world has changed. A decade ago when Google published their industry changing Big Table and Map Reduce papers they launched a thousand open source projects in response. Now we are not only seeing Google open source their software instead of others simply copying the ideas, the software has been released well in advance of the paper describing the software.
The future is still in balance. There's a huge fight going on for the future of what software will look like, how it is built, how it is distributed, and who makes the money. In the search business keeping software closed was a competitive advantage. In the age of AWS the only way to capture hearts and minds is by opening up your software. Interesting times.