A services watchdog written in Perl

This Perl script was initially written about 3 or 4 years ago, but still does its daily job: monitor that certain processes are running and if not, restart them (plus send out a notification).

The script should run as the root user (since it will be restarting services or processes), and can easily be configured as a cron job.

Let’s say you’ve saved your script in /root/bin/services_watchdog.pl, then the cronjob could look like:

# Check if main services are still running
*/5 * * * * /root/bin/services_watchdog.pl

This will check every 5 minutes if all required processes are running.

The script itself will start with:

#!/usr/bin/perl
use strict; use warnings;
use Proc::ProcessTable;

The script uses Proc::ProcessTable to retrieve a full process list.
The processes it should monitor, are easily defined in a hash:

my %services_table = (
        'MySQL' => {
                'cmd' => '/etc/init.d/mysql restart',
                're' => '/usr/sbin/mysqld --basedir=/usr',
        },
        'Postfix' => {
                'cmd' => '/etc/init.d/postfix restart',
                're' => '/usr/lib/postfix/master',
        },
# ...
);

The actual processing of the process list can be done in a minumum of lines:

my %nok_services = %services_table ;
my $services = join '|', map {$services_table{$_}->{'re'}} keys %services_table;

my $process_table = Proc::ProcessTable->new;

foreach my $process ( @{ $process_table->table } ) {
        if($process->cmndline =~ m/($services)/) {
                my $proc_name = (grep { $1 =~ m/Q$services_table{$_}->{'re'}E/} keys %services_table)[0];
                delete $nok_services{$proc_name}                                  if exists($nok_services{$proc_name});
                $services = join '|', map {$services_table{$_}->{'re'}}           keys %services_table;
        }
}

Finally, the script tries to restart the missing service and prints a log message:

foreach my $process (keys %nok_services) {
        print "$process NOT running! Restarting...n";
        system($nok_services{$process}->{'cmd'});
        if ($? == -1) {
                print "failed to execute: $!n";
        }
        elsif ($? & 127) {
                printf "child died with signal %d, %s coredumpn", ($? & 127),  ($? & 128) ? 'with':'without';
        }
        else {
                printf "child exited with value %dn", $? >> 8;
        }
}

And that’s all folks!

Comments are closed.