I owe High Scalability a great deal of credit for the idea behind my latest software project. I was reading about how an older tool I helped create, Func, was used at Tumblr, and it kicked some ideas into gear. This article is about what happened from that idea.
My observation, which the article reinforced, was that many shops end up using a configuration management tool (Puppet, Chef, cfengine), a separate deployment tool (Capistrano, Fabric) and yet another separate ad-hoc task execution tool (Func, pssh, etc) because one class of tool historically hasn't been good at all three jobs.
My other observation (not from the article) was that the whole "infrastructure as code" movement, while revolutionary, and definitely great for many, was probably secretly grating on a good number of systems administrators. As a software developer, I myself can emphasize -- the software design/development/testing process is frequently painful, and I would rather think of infrastructure as being data-driven. Data is supposed to be simple, programs are often not. This is why I made Ansible.
Ansible: How is it Different?
Ansible is a configuration management tool, deployment tool, and ad-hoc task execution tool all in one.
It requires no daemons or any other software to start managing remote machines -- it works using SSHd (using paramiko, to make it smoother), which is something nearly everyone is running already. Because it's using SSH, it should easily pass a security audit and be usable in places that would be resistant to running a root-level daemon with a custom PKI infrastructure. Best of all, you should probably be able to completely understand most of Ansible in about 20-30 minutes. Hopefully less.
I also wanted to make Ansible maximally extensible. Ansible modules can be written in any language -- not just Ruby or Python, but any language capable of returning JSON or key=value text pairs. Bash or Perl is fine! In this way, Ansible manages to sidestep most of the popular Python vs Ruby language wars entirely, and should be of interest to people who like both -- or neither.
As Ansible is pretty new, it is probably best to grab Ansible from a git checkout. Packages for distributions are coming soon. See the instructions here to get started.
The key concept is that there's really not much of anything to setup. There are no configuration files, daemons, or databases. Ansible does have a host file, which defines what hosts are in what "groups", and you target hosts either by globs "*.example.com" or by groups. If you want to store this inventory list in LDAP, Cobbler, or something else instead, we have a facility for that, but I won't cover it here. If a host isn't listed in the host file, ansible won't manage it.
A host file, which defaults to /etc/ansible/hosts, looks like this:
[dbservers] alpha.example.com beta.example.com [appservers] gamma.example.com delta.example.com
What's perhaps more interesting is that no software needs to be installed on the remote machines. This means that if you have a clean image of your favorite OS running somewhere, Ansible can start managing that system immediately.
This lack of needing installed software on the remote machines makes it very useful for places where you have a large number of (perhaps legacy) nodes, but no good way to bootstrap them. So, if you have a lot of machines now, but no automated way to manage them, you don't have to visit each node and get it ready for management. This should also make it great for consultants who have to get something done, but have to get in, get out, and leave no trace.
Using Playbooks For Configuration
Ansible has a powerful but simple configuration management and multinode orchestration format called a "Playbook". One of the main goals for Playbooks is to keep them free of programming-like syntax and nesting, so they are easy to review and audit. Again, the motive behind Ansible is "infrastructure is data", not "infrastructure is code".
Rather than reproduce everything here, see github.com/mpdehaan/ansible-examples for a simplified example of setting up Ganglia. I've used CentOS-6 as the basis for this example, so users using other operating systems can at least get the idea of what a playbook looks like.
Here's a full play (below). A playbook can contain more than one play and each play can select a different group of hosts to work with. Hosts are typically defined in /etc/ansible/hosts, but can also be defined by external software.
- hosts: nodes;ganglia.example.com user: root tasks: # what roles to apply? - include: tasks/common.yml - include: tasks/monitored_server.yml handlers: - include: handlers/handlers.yml
In the playbook above, we target all nodes in the group "nodes" and in addition to that, explicitly, also add the server named "ganglia.example.com".
Various steps to perform on each host can be stored in the play, or, like I've done here, in separate files to encourage readability. "Handlers" are just like "tasks", but are event-driven, and only get triggered when "change events" occur. If you're familiar with notify/subscribe in Puppet, it's exactly the same concept.
Just to show a bit more, here's what the monitored_server.yml file looks like:
# file: monitored_server.yml # this file defines behavior for the 'monitored_server' class of nodes - name: install gmond action: yum pkg=ganglia-gmond state=installed - name: configure gmond action: template src=templates/etc/ganglia/gmond.conf dest=/etc/ganglia/gmond.conf owner=root group=root notify: - restart gmond - name: ensure ganglia is running action: service name=gmond state=running
As you can see, playbooks are relatively free of programming-language like syntax. They are just a list of steps to perform for each group of hosts. While this looks like a script, it's not. Each step is "idempotent" (as you will may remember from Puppet or Chef), meaning that only changes that need to be made actually get made.
Deployment and Orchestration
So, that's clearly an ops-side configuration example. Why is this system good for app-side deployment and orchestration?
Well, playbooks are ordered and push-based, so you can address one group of hosts and then another. If you need to update your database server and then upgrade your app servers, it's no problem to do very explicit ordering where you jump back and forth between addressing different groups of hosts, just like lines on a football team. Just include multiple plays in the same playbook file, all in order.
You don't just have to work in terms of packages either. Ansible also ships with an example 'git' resource for checking out dynamic language apps straight from source.
Ad Hoc Tasks
A need that comes up frequently when managing a large number of systems is that of running ad-hoc tasks or making things happen via scripts on several machines at once.
Suppose you need to shutdown a service right now (in an emergency) or reboot several nodes. Configuration management and deployment tools are obviously the wrong tools to address this because you don't want to describe the desired steady state of the system, or even a process, you just want to run some commands. But you don't want to have to install a special purpose tool just for this either.
Ansible allows these kinds of steps to be done using the same management path from which playbooks are applied using the /usr/bin/ansible command line. You can also use the exact same resources that you use in playbooks, making things easy to remember.
ansible cluster01 -m service -a "name=memcached state=restarted" ansible cluster01 -m shell -a "/sbin/reboot" # etc
Rather than trying to fully document the application here, if you are interested in learning more, or have ideas about the project, see the Ansible web site, follow the project on github, or join the Google Group. If ansible isn't right for your environment, I at least hope it provides some interesting insight into ways of managing software systems.