From bare-metal to Kubernetes

This is a guest post by Hugues Alary, Lead Engineer at Betabrand,  a retail clothing company and crowdfunding platform, based in San Francisco. This article was originally published here.

How migrating Betabrand's bare-metal infrastructure to a Kubernetes cluster hosted on Google Container Engine solved many engineering issues—​from hardware failures, to lack of scalability of our production services, complex configuration management and highly heterogeneous development-staging-production environments—​and allowed us to achieve a reliable, available and scalable infrastructure.

This post will walk you through the many infrastructure changes and challenges Betabrand met from 2011 to 2018.


Early infrastructure

VPS

Betabrand’s infrastructure has changed many times over the course of the 7 years I’ve worked here.

In 2011, the year our CTO hired me, the website was hosted on a shared server with a Plesk interface and no root access (of course). Every newsletter send—​to at most a few hundred people—​would bring the website to its knees and make it crawl, even completely unresponsive at times.

My first order of business became finding a replacement and move the website to its own dedicated server.

Rackspace

After a few days of online research, we settled on a VPS—​8GB RAM, 320GO Disk, 4 Virtual CPU, 150Mbps of bandwidth—​at Rackspace . A few more days and we were live on our new infrastructure composed of… 1 server; running your typical Linux, Apache, PHP, MySQL stack, with a hint of Memcached.

Unsurprisingly, this infrastructure quickly became obsolete.

Not only didn’t it scale at all but, more importantly, every part of it was a Single Point Of Failure. Apache down? Website down. Rackspace instance down? Website down. MySQL down… you get the idea.

Another aspect of it was its cost.

Our average monthly bill quickly climbed over $1,000. Which was quite a price tag for a single machine and the—​low—​amount of traffic we generated at the time.

After a couple years running this stack, mid-2013, I decided it was time to make our website more scalable, redundant, but also more cost effective.

I estimated, we needed a minimum of 3 servers to make our website somewhat redundant which would amount to a whopping $14400/year at Rackspace. Being a really small startup, we couldn’t justify that "high" of an infrastructure bill; I kept looking.

The cheapest option ended up to be running our stack on bare-metal servers.

OVH

I had worked in the past with OVH and had always been fairly satisfied (despite mixed reviews online). I estimated that running 3 servers at OVH would amount to $3240/year, almost 5 times less expensive than Rackspace.

Not only was OVH cheaper, but their servers were also 4 times more powerful than Rackspace’s: 32GB RAM, 8 CPUs, SSDs and unlimited bandwidth.

To top it off they had just opened a new datacenter in North America.

A few weeks later Betabrand.com was hosted at OVH in Beauharnois, Canada.

Hardware infrastructure

Between 2013 and 2017, our hardware infrastructure went through a few architectural changes.

Towards the end of 2017, our stack was significantly larger than it used to be, both in terms of software and hardware.

Betabrand.com ran on 17 bare-metal servers:

2 HAProxy machines in charge of SSL Offloading configured as hot-standby

2 varnish-cache machines configured in a hot-standby load-balancing to our webservers

5 machines running Apache and PHP-FPM

2 redis servers, each running 2 separate instances of redis. 1 instance for some application caching, 1 instance for our PHP sessions

3 MariaDB servers configured as master-master, though used in a master-slave manner

3 Glusterd servers serving all our static assets

Each machine would otherwise run one or multiple processes like keepalived, Ganglia, Munin, logstash, exim, backup-manager, supervisord, sshd, fail2ban, prerender, rabbitmq and… docker.

However, while this infrastructure was very cheap, redundant and had no single point of failure, it still wasn’t scalable and was also much harder to maintain.

The scalability and maintainability issue

Administering our server "fleet" now involved writing a set of Ansible scripts and maintaining them, which, despite Ansible being an amazing software, was no easy feat.

Even though it will make its best effort to get you there, Ansible doesn’t guarantee the state of your system.

For example, running your Ansible scripts on a server fleet made of heterogeneous OSes (say debian 8 and debian 9) will bring all your machines to a state close to what you defined, but you will most likely end up with discrepancies; the first one being that you’re running on Debian 8 and Debian 9, but also software versions and configurations being different on some servers and others.

I searched quite often for an Ansible replacement, but never found better.
I looked into Puppet but found its learning curve too steep, and, from reading other people’s recipes, was taken aback by what seemed to be too many different ways of doing the same thing. Some people might think of this as flexibility, I see it as complexity.
SaltStack caught my eyes but also found it very hard to learn; despite their extensive, in depth documentation, their nomenclature choices (mine, pillar, salt, etc) never stuck with me; and it seemed to suffer the same issue as Puppet regarding complexity.
Nix package manager and NixOS sounded amazing, to the exception that I didn’t feel comfortable learning a whole new OS (I’ve been using Debian for years) and was worried that despite their huge package selection, I would eventually need packages not already available, which would then become something new to maintain.
Those are the only 3 I looked at but I’m sure there’s many other tools out there I’ve probably never heard of.

Writing Ansible scripts and maintaining them, however, wasn’t our only issue; adding capacity was another one.

With bare-metal, it is impossible to add and remove capacity on the fly. You need to plan your needs well in advance: buy a machine—​usually leased for a minimum of 1 month—​wait for it to be ready—​which can take from 2 minutes to 3 days--, install its base os, install Ansible’s dependencies (mainly python and a few other packages) then, finally, run your Ansible scripts against it.

For us this entire process was wholly unpractical and what usually happened is that we’d add capacity for an anticipated peak load, but never would remove it afterwards which in turn added to our costs.

It is worth noting, however, that even though having unused capacity in your infrastructure is akin to setting cash on fire, it is still a magnitude less expensive on bare-metal than in the cloud. On the other hand, the engineering headaches that come with using bare-metal servers simply shift the cost from purely material to administrative ones.

In our bare-metal setup capacity planning, server administration and Ansible scripting were just the tip of the iceberb.

Scaling development processes

In early 2017, while our infrastructure had grown, so had our team.

We hired 7 more engineers making us a small 9 people team, with skillsets distributed all over the spectrum from backend to frontend with varying levels of seniority.

Even in a small 9 people team, being productive and limiting the amount of bugs deployed to production warrants a simple, easy to setup and easy to use development-staging-production trifecta.

Setting up your development environment as a new hire shouldn’t take hours, neither should upgrading or re-creating it.

Moreover, a company-wide accessible staging environment should exist and match 99% of your production, if not 100%.

Unfortunately, in our hardware infrastructure reaching this harmonious trifecta was impossible.

The development environment

First of all, everybody in our engineering team uses MacBook Pros, which is an issue since our stack is linux based.

However, asking everybody to switch to linux and potentially change their precious workflow wasn’t really ideal. This meant that the best solution was to provide a development environment agnostic of developers' personal preferences in machines.

I could only see two obvious options:

Either provide a Vagrant stack that would run multiple virtual machines (17 potentially, though, more realistically, 1 machine running our entire stack), or, re-use the already written ansible scripts and run them against our local macbooks.

After investigating Vagrant, I felt that using virtual machines would hinder performances too much and wasn’t worth it. I decided, for better or worse, to go the Ansible route (in hindsight, this probably wasn’t the best decision).

We would use the same set of Ansible scripts on production, staging and dev. The caveat being of course that our development stack, although close to production, was not a 100% match.

This worked well enough for a while; However, the mismatch caused issues later when, for example, our development and production MySQL versions weren’t aligned. Some queries that ran on dev, wouldn’t on production.

The staging environment

Secondly, having a development and production environments running on widely different softwares (mac os versus debian) meant that we absolutely needed a staging environment.

Not only because of potential bugs caused by version mismatches, but also because we needed a way to share new features to external members before launch.

Once again I had multiple choices:

buy 17 servers and run ansible against them. This would double our costs though and we were trying to save money.

setup our entire stack on a unique linux server, accessible from the outside. Cheaper solution, but once again not providing an exact replica of our production system.

I decided to implement the cost-saving solution.

An early version of the staging environment involved 3 independant linux servers, each running the entire stack. Developers would then yell across the room (or hipchat) "taking over dev1", "is anybody using dev3?", "dev2 is down :/".

Overall, our development-staging-production setup was far from optimal: it did the job; but definitely needed improvements.

The advent of Docker

In 2013 Dotcloud released Docker.

The Betabrand use case for Docker was immediately obvious. I saw it as the solution to simplify our development and staging environments; by getting rid of the ansible scripts (well, almost; more on that later).

Those scripts would now only be used for production.

At the time, one main pain point for the team was competing for our three physical staging servers: dev1, dev2 and dev3; and for me maintaining those 3 servers was a major annoyance.

After observing docker for a few months, I decided to give it a go in April 2014.

After installing docker on one of the staging servers, I created a single docker image containing our entire stack (haproxy, varnish, redis, apache, etc.) then over the next few months wrote a tool (sailor) allowing us to create, destroy and manage an infinite number of staging environment accessible via individual unique URLs.

Worth noting that docker-compose didn’t exist at that time; and that putting your entire stack inside one docker image is of course a big no-no but that’s an unimportant detail here.

From this point on, the team wasn’t competing anymore for access to the staging servers. Anybody could create a new, fully configured, staging container from the docker image using sailor. I didn’t need to maintain the servers anymore either; better yet, I shut down and cancelled 2 of them.

Our development environment, however, still was running on macos (well, "Mac OS X" at the time) and using the Ansible scripts.

Then, sometime around 2016 docker-machine was released.

Docker machine is a tool taking care of deploying a docker daemon on any stack of your choice: virtualbox, aws, gce, bare-metal, azure, you name it, docker-machine does it; in one command line.

I saw it as the opportunity to easily and quickly migrate our ansible-based development environment to a docker based one. I modified sailor to use docker-machine as its backend.

Setting up a development environment was now a matter of creating a new docker-machine then passing a flag for sailor to use it.

At this point, our development-staging process had been simplified tremendously; at least from a dev-ops perspective: anytime I needed to upgrade any software of our stack to a newer version or change the configuration, instead of modifying my ansible scripts, asking all the team to run them, then running them myself on all 3 staging servers; I could now simply push a new docker image.

Ironically enough, I ended up needing virtual machines (which I had deliberately avoided) to run docker on our macbooks. Using vagrant instead of Ansible would have been a better choice from the get go. Hindsight is always 20/20.

Using docker for our development and staging systems paved the way to the better solution that Betabrand.com now runs on.

Kubernetes

Because Betabrand is primarily an e-commerce platform, Black Friday loomed over our website more and more each year.

To our surprise, the website had handled increasingly higher loads since 2013 without failing in any major catastrophe, but, it did require a month long preparation beforehand: adding capacity, load testing and optimizing our checkout code paths as much as we possibly could.

After preparing for Black Friday 2016, however, it became evident the infrastructure wouldn’t scale for Black Friday 2017; I worried the website would become inacessible under the load.

Luckily, sometime in 2015, the release of Kubernetes 1.0 caught my attention.

Just like I saw in docker an obvious use-case, I knew k8s was what we needed to solve many of our issues. First of all, it would finally allow us to run an almost identical dev-staging-production environment. But also, would solve our scalability issues.

I also evaluated 2 other solutions, Nomad and Docker Swarm, but Kubernetes seemed to be the most promising.

For Black Friday 2017, I set out to migrate our entire infra to k8s.

Although I considered it, I quickly ruled out using our current OVH bare-metal servers for our k8s nodes since it would play against my goal of getting rid of Ansible and not dealing with all the issue that comes with hardware servers. Moreover, soon after I started investigating Kubernetes, Google released their managed Kubernetes (GKE) offer, which I rapidly came to choose.

Learning Kubernetes

Migrating to k8s first involved gaining a strong understanding its architecture and its concepts, by reading the online documentation.

Most importantly understanding containers, Pods, Deployments and Services and how they all fit together. Then in order, ConfigMaps, Secrets, Daemonsets, StatefulSets, Volumes, PersistentVolumes and PersistentVolumeClaims.

Other concepts are important, though less necessary to get a cluster going.

Once I assimilated those concepts, the second, and hardest, step involved translating our bare-metal architecture into a set of YAML manifests.

From the beginning I set out to have one, and only one, set of manifests to be used for the creation of all three development, staging and production environment. I quickly ran into needing to parameterized my YAML manifests, which isn’t out-of-the-box supported by Kubernetes. This is where Helm [1] comes in handy.

from the Helm website: Helm helps you manage Kubernetes applications—​Helm Charts helps you define, install, and upgrade even the most complex Kubernetes application.

Helm markets itself as a package manager for Kubernetes, I originally used it solely for its templating feature though. I have, now, also come to appreciate its package manager aspect and used it to install Grafana [2] and Prometheus [3].

After a bit of sweat and a few tears, our infrastructure was now neatly organized into 1 Helm package, 17 Deployments, 9 ConfigMaps, 5 PersistentVolumeClaims, 5 Secrets, 18 Services, 1 StatefulSet, 2 StorageClasses, 22 container images.

All that was left was to migrate to this new infrastructure and shutdown all our hardware servers.

Officially migrating

October 5th 2017 was the night.

Pulling the trigger was extremely easy and went without a hitch.

I created a new GKE cluster, ran helm install betabrand --name production, imported our MySQL database to Google Cloud SQL, then, after what actually took about 2 hours, we were live in the Clouds.

The migration was that simple.

What helped a lot of course, was the ability to create multiple clusters in Google GKE: before migrating our production, I was able to rehearse through many test migration, jotting down every steps needed for a successful launch.

Black Friday 2017 was very successful for Betabrand and the few technical issues we ran into weren’t associated to the migration.

The development/staging environments

Our development machines run a Kubernetes cluster via Minikube [4].

The same YAML manifests are being used to create a local development environment or a "production-like" environment.

Everything that runs on Production, also runs in Development. The only difference between the two environments is that our development environment talks to a local MySQL database, whereas production talks to Google Cloud SQL.

Creating a staging environment is exactly the same as creating a new production cluster: all that is needed is to clone the production database instance (which is only a few clicks or one command line) then point the staging cluster to this database via a --set database parameter in helm.

A year after

It’s now been a year and 2 months since we moved our infrastructure to Kubernetes and I couldn’t be happier.

Kubernetes has been rock solid in production and we have yet to experience an outage.

In anticipation of a lot of traffic for Black Friday 2018, we were able to create an exact replica of our production services in a few minutes and do a lot of load testing. Those load tests revealed specific code paths performing extremely poorly that only a lot of traffic could reveal and allowed us to fix them before Black Friday.

As expected, Black Friday 2018 brought more traffic than ever to Betabrand.com, but k8s met its promises, and, features like the HorizontalPodAutoscaler coupled to GKE’s node autoscaling allowed our website to absorb peak loads without any issues.

K8s, combined with GKE, gave us the tools we needed to make our infrastructure reliable, available, scalable and maintainable.


1. https://helm.sh/2. https://grafana.com/3. https://prometheus.io/4. https://github.com/kubernetes/minikube