advertise
« Stuff The Internet Says On Scalability For April 20, 2012 | Main | YouTube Strategy: Adding Jitter isn't a Bug »
Wednesday
Apr182012

Ansible - A Simple Model-Driven Configuration Management and Command Execution Framework

This is guest post by Michael DeHaan (@laserllama), a software developer and architect, on Ansible, a simple deployment, model-driven configuration management, and command execution framework.

I owe High Scalability a great deal of credit for the idea behind my latest software project. I was reading about how an older tool I helped create, Func, was used at Tumblr, and it kicked some ideas into gear. This article is about what happened from that idea.

My observation, which the article reinforced, was that many shops end up using a configuration management tool (Puppet, Chef, cfengine), a separate deployment tool (Capistrano, Fabric) and yet another separate ad-hoc task execution tool (Func, pssh, etc) because one class of tool historically hasn't been good at all three jobs.

My other observation (not from the article) was that the whole "infrastructure as code" movement, while revolutionary, and definitely great for many, was probably secretly grating on a good number of systems administrators. As a software developer, I myself can emphasize -- the software design/development/testing process is frequently painful, and I would rather think of infrastructure as being data-driven. Data is supposed to be simple, programs are often not. This is why I made Ansible.

Ansible: How is it Different?

Ansible is a configuration management tool, deployment tool, and ad-hoc task execution tool all in one.

It requires no daemons or any other software to start managing remote machines -- it works using SSHd (using paramiko, to make it smoother), which is something nearly everyone is running already. Because it's using SSH, it should easily pass a security audit and be usable in places that would be resistant to running a root-level daemon with a custom PKI infrastructure. Best of all, you should probably be able to completely understand most of Ansible in about 20-30 minutes. Hopefully less.

I also wanted to make Ansible maximally extensible. Ansible modules can be written in any language -- not just Ruby or Python, but any language capable of returning JSON or key=value text pairs. Bash or Perl is fine! In this way, Ansible manages to sidestep most of the popular Python vs Ruby language wars entirely, and should be of interest to people who like both -- or neither.

Initial Setup

As Ansible is pretty new, it is probably best to grab Ansible from a git checkout. Packages for distributions are coming soon. See the instructions here to get started.

The key concept is that there's really not much of anything to setup. There are no configuration files, daemons, or databases. Ansible does have a host file, which defines what hosts are in what "groups", and you target hosts either by globs "*.example.com" or by groups. If you want to store this inventory list in LDAP, Cobbler, or something else instead, we have a facility for that, but I won't cover it here. If a host isn't listed in the host file, ansible won't manage it.

A host file, which defaults to /etc/ansible/hosts, looks like this:

[dbservers]
alpha.example.com
beta.example.com

[appservers]
gamma.example.com
delta.example.com

What's perhaps more interesting is that no software needs to be installed on the remote machines. This means that if you have a clean image of your favorite OS running somewhere, Ansible can start managing that system immediately.

This lack of needing installed software on the remote machines makes it very useful for places where you have a large number of (perhaps legacy) nodes, but no good way to bootstrap them. So, if you have a lot of machines now, but no automated way to manage them, you don't have to visit each node and get it ready for management. This should also make it great for consultants who have to get something done, but have to get in, get out, and leave no trace.

Using Playbooks For Configuration

Ansible has a powerful but simple configuration management and multinode orchestration format called a "Playbook". One of the main goals for Playbooks is to keep them free of programming-like syntax and nesting, so they are easy to review and audit. Again, the motive behind Ansible is "infrastructure is data", not "infrastructure is code".

Rather than reproduce everything here, see github.com/mpdehaan/ansible-examples for a simplified example of setting up Ganglia. I've used CentOS-6 as the basis for this example, so users using other operating systems can at least get the idea of what a playbook looks like.

Here's a full play (below). A playbook can contain more than one play and each play can select a different group of hosts to work with. Hosts are typically defined in /etc/ansible/hosts, but can also be defined by external software.

- hosts: nodes;ganglia.example.com
  user: root
  tasks:
     # what roles to apply?
     - include: tasks/common.yml
     - include: tasks/monitored_server.yml
  handlers:
     - include: handlers/handlers.yml

In the playbook above, we target all nodes in the group "nodes" and in addition to that, explicitly, also add the server named "ganglia.example.com".

Various steps to perform on each host can be stored in the play, or, like I've done here, in separate files to encourage readability. "Handlers" are just like "tasks", but are event-driven, and only get triggered when "change events" occur. If you're familiar with notify/subscribe in Puppet, it's exactly the same concept.

Just to show a bit more, here's what the monitored_server.yml file looks like:

# file: monitored_server.yml
# this file defines behavior for the 'monitored_server' class of nodes

- name: install gmond
  action: yum pkg=ganglia-gmond state=installed

- name: configure gmond
  action: template src=templates/etc/ganglia/gmond.conf dest=/etc/ganglia/gmond.conf owner=root group=root
  notify:
      - restart gmond

- name: ensure ganglia is running
  action: service name=gmond state=running

As you can see, playbooks are relatively free of programming-language like syntax. They are just a list of steps to perform for each group of hosts. While this looks like a script, it's not. Each step is "idempotent" (as you will may remember from Puppet or Chef), meaning that only changes that need to be made actually get made.

Deployment and Orchestration

So, that's clearly an ops-side configuration example. Why is this system good for app-side deployment and orchestration? 

Well, playbooks are ordered and push-based, so you can address one group of hosts and then another. If you need to update your database server and then upgrade your app servers, it's no problem to do very explicit ordering where you jump back and forth between addressing different groups of hosts, just like lines on a football team. Just include multiple plays in the same playbook file, all in order.

You don't just have to work in terms of packages either. Ansible also ships with an example 'git' resource for checking out dynamic language apps straight from source.

Ad Hoc Tasks

A need that comes up frequently when managing a large number of systems is that of running ad-hoc tasks or making things happen via scripts on several machines at once.

Suppose you need to shutdown a service right now (in an emergency) or reboot several nodes. Configuration management and deployment tools are obviously the wrong tools to address this because you don't want to describe the desired steady state of the system, or even a process, you just want to run some commands. But you don't want to have to install a special purpose tool just for this either.

Ansible allows these kinds of steps to be done using the same management path from which playbooks are applied using the /usr/bin/ansible command line. You can also use the exact same resources that you use in playbooks, making things easy to remember.

ansible cluster01 -m service -a "name=memcached state=restarted"
ansible cluster01 -m shell -a "/sbin/reboot"
# etc

Learning More

Rather than trying to fully document the application here, if you are interested in learning more, or have ideas about the project, see the Ansible web site, follow the project on github, or join the Google Group. If ansible isn't right for your environment, I at least hope it provides some interesting insight into ways of managing software systems.

Reader Comments (9)

Since reading your post, I've tried asinble and here's my thought:
1. I'm fabric and puppet user and I dont think your configuration schema looks simpler than these 2 awesome tools. Python itself is design for readability, so using fabric is easy and simple if you know a little about python.
2. Why you use json (or simple json) to parse result from remote server? When I tried your tool, I have to install simplejson on remote server in order to make it works, none of my server have simplejson installed. Instead of using json, why don't you just return output as a simple dictionary and parse result later on localhost?

April 19, 2012 | Unregistered Commenterhungnv

I really like the ideas of Ansible, but I'm discouraged at how young the project appears. Granted the same thing could be said of Puppet a year ago, but is there a list of people successfully using Ansible in their environments? It's not entirely clear to me if Ansible is ready for prime-time.

April 20, 2012 | Unregistered Commenterwjohnson

hungnv,

Obviously it would best if Todd answered, but reading through the docs suggests that simplejson is only required for managed hosts with python < 2.6.

"If you have any managed-nodes with python older than 2.6, you will also need:

python-simplejson"

April 20, 2012 | Unregistered Commenterwjohnson

Python is (somewhat) designed for readability for when writing computer programs. This type of complexity naturally lends into infrastructure becoming the result of a computer program, which is an auditing challenge. I believe that is not right to describe systems infrastructure in a programming language. Also as a software developer, I don't believe automating infrastructure should require the levels of complexity that software development requires. Others are bound to disagree with this, which is part of the reason there are so many choices in this space. Still, for auditing and compliance purposes, this is designed around, explicitly, not doing this. The core audience includes programmers, but is not limited to those who can program. In this way, by concentrating on the interface, it's not a Python tool or a Ruby tool -- it's a tool.

As for your requirements discussion -- True, if you are not running a current Python (Enterprise Linux 6 or later has 2.6 included stock), you do need python-simplejson currently, which I suspect many people will just want to bake in with their kickstarts or images. The reason we need this is tools like ohai, as well as our own built-in facts (in Ansible 0.0.3, out Monday), return structured data. If we taught setup to also to pickle if there was no simplejson, and also the same for the yum apt/modules, it could auto-bootstrap even in those cases! Excellent topic! How about bringing it up on the mailing list?

April 20, 2012 | Unregistered CommenterMichael DeHaan

wjohnson,

Yes, quite a few people are using Ansible in their environments. The best thing for you to do is stop by the IRC (preferably during the week), and you can talk to everyone. The mailing list is great too.

There's not much for you to worry about WRT project age, so don't let that discourage you. We have almost 150 github followers and with only 1/60th the amount of code total -- that's a crazy amount of peer review relative to the lines of code involved. My advice is to try it out and experience it first hand in your staging environment or in a VM. I'm really happy if you are comparing us to Puppet a year ago, because Puppet's about 5 years old (plus?).

If it helps, I'm also the guy who wrote Cobbler (http://github.com/cobbler) which is in production use practically everywhere, and nearly all of the ideas have already been proven out using other related tools. I also worked for Puppet a while -- I clearly don't think Ansible is going to fit into everyone's brain, but for people who want easier to use tools. This has given me a huge chance to talk to several hundreds of users about how they configure systems and this is based on observations from all of that.

I can promise playbooks are only going to evolve in fowards compatible ways -- but if you want to snapshot something and make sure it doesn't change, just fork it on github! Anyway, yeah, try it out on your staging environment, and if you see any thing missing you'd like to see, just drop me a line on IRC, github, or the mailing list. Thanks!

April 21, 2012 | Unregistered CommenterMichael DeHaan

wjohnson,

If you read my comment carefully, you will notice that I talked about both json and simplejson, since there's no support json module on python 2.6, Ansible try to catch exception for python 2.4, to use simplejson instead of JSON. The main idea is, this tool should not require any additional package on server side if it's using SSH, just read stdout, stderr from server and return predefined data structure.

April 22, 2012 | Unregistered Commenterhungnv


[...]
My observation, which the article reinforced, was that many shops end up using a configuration management tool (Puppet, Chef, cfengine), a separate deployment tool (Capistrano, Fabric) and yet another separate ad-hoc task execution tool (Func, pssh, etc) because one class of tool historically hasn't been good at all three jobs.
[...]

Michael, as you explain these tools have been dedicated to particular systems automation goals, they do something,say Config Mgmt, and they do it the best they can. Could you explain why in your opinion this merging makes sense? Is it just Config Mgmt and Command Orchestration (CO) that you think should be merged? What about the functions of something like Jenkins? Shouldn't that also be part of the "all-in-one" tool approach? What about Logging and Monitoring?
From the perspective of someone with less experience than yourself, the all-in-one approach seems counter intuitive as it removes the modularity that _seems_ necessary to complex systems automation/mgmt, so it'd be great to read your reasoning.

April 25, 2012 | Unregistered CommenterJaime Gago

Jaime,

It would be best if we merged the Linux Kernel with Tux Racer as well.

Ok, kidding.

The truth is all of the tasks are not far unrelated at all, but the design of the various applications so far to date have made them difficult.

Ansible is still a very simple tool, much smaller in source code base than any of the configuration management tools out there, so it's by no means getting close to the terribleness of a software monolith. If you're interested, I would encourage you to check out the project and experiment.

I strongly believe that people using three seperate languages to accomplish tasks with so much overlap is a poor way to go. Having to contend with the Puppet DSL, the Fabric language, and Func at the same time is pretty rough -- that's probably three different ways you have to deal with restarting services ... I'd really like one resource model to use between all of these things, and that's what Ansible provides. There's so much overlap I'm quite surprised it hasn't been done already.

April 28, 2012 | Unregistered CommenterMichael DeHaan

Interesting... my friend just pointed me to Ansible and I landed here on a web search for it. I gotta say that I do like it quite a bit. I definitely like not having to run software on the machines and using ssh. Honestly, I'm glad it's an all-in-one solution. It helps me get things done faster. I'm currently starting something new and I'm trying to figure out which route to go: 1) Chef 2) Puppet 3) Pallet and now 4) Ansible. I'm leaning toward Pallet and Ansible.

April 8, 2013 | Unregistered CommenterAdrian Rodriguez

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>