advertise
« Development of highly scalable web site | Main | How to update video views count effectively? »
Wednesday
Apr022008

Product: Supervisor - Monitor and Control Your Processes

It's a sad fact of life, but processes die. I know, it's horrible. You start them, send them out into process space, and hope for the best. Yet sometimes, despite your best coding, they core dump, seg fault, or some other calamity befalls them. Unlike our messy biological world so cruelly ruled by entropy, in the digital world processes can be given another chance. They can be restarted. A greater destiny awaits. And hopefully this time the random lottery of unforeseen killing factors will be avoided and a long productive life will be had by all.

This is fun code to write because it's a lot more complicated than you might think. And restarting processes is a highly effective high availability strategy. Most faults are transient, caused by an unexpected series of events. Rather than taking drastic action, like taking a node out of production or failing over, transients can be effectively masked by simply restarting failed processes. Though complexity makes it a fun problem, it's also why you may want to "buy" rather than build. If you are in the market, Supervisor looks worth a visit.

Adapted from their website:
Supervisor is a Python program that allows you to start, stop, and restart other programs on UNIX systems. It can restart crashed processes.

  • It is often inconvenient to need to write "rc.d" scripts for every single process instance. rc.d scripts are a great lowest-common-denominator form of process initialization/autostart/management, but they can be painful to write and maintain. Additionally, rc.d scripts cannot automatically restart a crashed process and many programs do not restart themselves properly on a crash. Supervisord starts processes as its subprocesses, and can be configured to automatically restart them on a crash. It can also automatically be configured to start processes on its own invocation.
  • It's often difficult to get accurate up/down status on processes on UNIX. Pidfiles often lie. Supervisord starts processes as subprocesses, so it always knows the true up/down status of its children and can be queried conveniently for this data.
  • Users who need to control process state often need only to do that. They don't want or need full-blown shell access to the machine on which the processes are running. Supervisorctl allows a very limited form of access to the machine, essentially allowing users to see process status and control supervisord-controlled subprocesses by emitting "stop", "start", and "restart" commands from a simple shell or web UI.
  • Users often need to control processes on many machines. Supervisor provides a simple, secure, and uniform mechanism for interactively and automatically controlling processes on groups of machines.
  • Processes which listen on "low" TCP ports often need to be started and restarted as the root user (a UNIX misfeature). It's usually the case that it's perfectly fine to allow "normal" people to stop or restart such a process, but providing them with shell access is often impractical, and providing them with root access or sudo access is often impossible. It's also (rightly) difficult to explain to them why this problem exists. If supervisord is started as root, it is possible to allow "normal" users to control such processes without needing to explain the intricacies of the problem to them.
  • Processes often need to be started and stopped in groups, sometimes even in a "priority order". It's often difficult to explain to people how to do this. Supervisor allows you to assign priorities to processes, and allows user to emit commands via the supervisorctl client like "start all", and "restart all", which starts them in the preassigned priority order. Additionally, processes can be grouped into "process groups" and a set of logically related processes can be stopped and started as a unit.

    Supervisor also has a web interface and an XMP-RPC interface:
  • A (sparse) web user interface with functionality comparable to supervisorctl may be accessed via a browser if you start supervisord against an internet socket. Visit the server URL (e.g. http://localhost:9001/) to view and control process status through the web interface after activating the configuration file's [inet_http_server] section.
    XML-RPC Interface
  •  

  • The same HTTP server which serves the web UI serves up an XML-RPC interface that can be used to interrogate and control supervisor and the programs it runs. To use the XML-RPC interface, connect to supervisor's http port with any XML-RPC client library and run commands against it. An example of doing this using Python's xmlrpclib client library is as follows.


    Related Articles



  • PyCon Presentation: Supervisor as a Platform
  • Monitor Pylons application with supervisord
  • Supervisor Manual
  • Reader Comments (5)

    How is this different from daemontools ?

    November 29, 1990 | Unregistered Commenterallspaw

    A brief blurb from http://supervisord.org/2006/08/08/supervisor-vs-launchd/:
    "daemontools has too much focus on security as opposed to being a process manager for my taste." Hopefully someone will know more.

    November 29, 1990 | Unregistered CommenterTodd Hoff

    The means of process management is exactly like daemontools. It just wraps it up a bit differently. It gives you a more interactive way of managing the processes (supervisorctl), and provides a couple different kinds of APIs to manage the processes. It also handles multiple processes at once (which I guess launchd also does, but daemontools doesn't). It has some other features as well. You can also do things like give non-root users access to restart processes, and there's a web interface to see the status of processes, and supervisor can manage and rotate output, and other stuff.

    If you like the model of daemontools, but don't like the interface so much, then supervisor would be good to look at. If you like daemontools, but want to extend the process management programmatically, then supervisor might also be good to look at.

    November 29, 1990 | Unregistered CommenterIan Bicking

    I've used monit to handle similar situations before. It's okay as long as you keep it under control. But, it's very persistent and become a nuisance if your not careful. It's a bit different with a few other types of features of varying usefulness.

    Monit
    http://www.tildeslash.com/monit/

    November 29, 1990 | Unregistered CommenterKent

    Eh, I just discussed this in somewhat detail in our blog and added this service for our managed customers.

    We do this at the application/port level not just to see if the processed died. There are many other situations where the process is running but no one is home (so to speak). Testing at the application level if the service produces some predefined result(s) is the best way of doing this. Keep in mind daemontools or supervisor just check to see if only if the process exists. A much limited test and not conclusive.

    http://www.supportem.com/blog/article/163

    IMHO you are best using nagios and creating an event handler to restart the service. It covers much more situations.

    --
    Larry Ludwig
    HostCube
    http://www.hostcube.com/

    November 29, 1990 | Unregistered Commenterempowering

    PostPost a New Comment

    Enter your information below to add a new comment.
    Author Email (optional):
    Author URL (optional):
    Post:
     
    Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>