Perl – Johnny Morano's Tech Articles

Read the HAProxy UNIX socket file using Perl

Johnny Morano — Mon, 25 Apr 2022 10:52:45 +0000

HAProxy provides a socket file which can be used to do maintenance (enable/ disable backends, retrieve information and statistics, …).

The statistics part contains quite some interesting information for monitoring and alerting.

The below Perl code snippit will loop over a glob of socket files (for instance when you have multiple HAProxy configurations running as separate processes) and print out the values returned by the “show info” command.

use IO::Socket::UNIX;

foreach my $socket_file (glob("/run/haproxy/*.sock")){
    print "- Reading socket: $socket_file\n";
    my $client = IO::Socket::UNIX->new(
        Type => SOCK_STREAM(),
        Peer => $socket_file,
    );

    print "- show info\n";
    print $client "show info\n";
    my $header = <$client>;
    chomp($header);

    $header =~ s/^#\s+//;
    my @keys = split ',', $header;
    print "- header:$header\n";

    while (my $line = <$client>){
        next unless $line =~ /^.+/;

        chomp($line);
        my @values = split ',', $line;
        print " - Got $line\n";
        print "   $keys[$_]: ".($values[$_]//'')."\n" foreach 0..$#keys;
    }

    close $client;
}

Managing LDAP passwords with Perl

Johnny Morano — Mon, 25 Apr 2022 09:30:40 +0000

OpenLDAP Software is an open source implementation of the Lightweight Directory Access Protocol.

Many graphical interfaces are available for managing user accounts in OpenLDAP like PHPLDAPAdmin (http://phpldapadmin.sourceforge.net/wiki/index.php/Main_Page) or LAM (https://www.ldap-account-manager.org/lamcms/).

When generating a bulk amount of accounts with automation or just managing user details with a simple script, allows much more flexibility and can be even quicker.

LDAP passwords can be stored or changed by using an LDIF file. This LDIF file needs 3 required lines:

The “dn” you are about to change
the “changetype” set to “modify“
A “replace” line containing the field you want to change (in our case, since we are changing the password, this will be “userPassword“)

Your LDAP password can be stored either in clear-text (which is not advisable) or by sending a SHA-hash. The SHA-hash must include the salt at the end and must be base64 encoded.

The code snippit below will call a subroutine called generate_password() which comes from a previous article (Secure Password Generator in Perl).

At the end of the script, it will print out the LDIF file content, which needs to be saved to change.ldif. As last, it will print the ldapmodify command to make the actual change. You will need to know the admin password for this. Alternatively, you could also make this change using your own dn for authentication.

use Digest::SHA;
use MIME::Base64;

my $random_password = generate_password(24);
my $random_salt     = generate_password(3);

my $ctx = Digest::SHA->new;
$ctx->add($random_password);
$ctx->add($random_salt);
my $hashedPasswd = encode_base64($ctx->digest . $random_salt, '');

print "password: $random_password\n";
print "salt: $random_salt\n";
print <

Perl script to monitor the rate of logs

Johnny Morano — Thu, 07 Apr 2022 12:39:50 +0000

In a previous article (IPTables Logging in JSON with NFLOG and ulogd2) we learned how to log certain IPTables rules to JSON log files.

Monitoring the logs in real-time on the command line, can also be very useful when debugging either the rules themselves or when analyzing certain issues. Rather than just looking at the logs, in some situations it might be useful to track the rate of the log messages. A self-written Perl script can be useful as it allows to be flexible when it comes to:

parsing logs
formatting the output (with colors or tables or …)
calculating statistics
…

The following Perl script uses a few modules which need to be present:

use IO::Async::Timer::Periodic;
use IO::Async::Loop;
use Time::HiRes qw/time/;
use Term::ANSIColor qw(:constants);
use Getopt::Long;

The first two modules can be installed on Debian systems with:

apt install libio-async-perl

The others are part of the normal Perl packages and do not require any extra installation.

Next the script will use a polling mechanism to read from standard output at fixed intervals, to calculate the rate of the unique log lines. The default polling rate is set to 2 seconds but it can be managed through command line parameters:

my $last_poll_time = time;

my $poll_rate = 2;
GetOptions (
    'p|pollrate=i' => \$poll_rate,
);

my $loop = IO::Async::Loop->new;
my $timer = IO::Async::Timer::Periodic->new(
   interval => $poll_rate,
   on_tick  => \&log_rate
);

$timer->start;
$loop->add( $timer );
$loop->run;

Finally, the script will define a subroutine called log_rate, which will read from standard output (or even a file) at each poll interval. Important is of course that the log lines from standard output do not contain unique data such as timestamps. The output must be as generic as possible.

Example:

tail -qf /var/log/ulog/blocked_detailed.json /var/log/ulog/blocked.json /var/log/ulog/passed.json  | jq -r --unbuffered '."oob.prefix"' 
blocked: invalid state
blocked: invalid state
blocked: invalid state
blocked: invalid state
blocked: invalid state
action=blocked
action=blocked
action=blocked
action=blocked
action=blocked
action=passed
action=passed
action=passed
action=passed

The code snippit for log_rate could contain:

sub log_rate {
    local $SIG{ALRM} = sub { die time, " time exceeded to read STDIN\n" };

    alarm($poll_rate);
    my $h;
    eval {
        local $| = 1;
        while (my $line = <>) {
            chomp($line);
            $h->{$line}++;
        }
    };
    alarm(0);

    return unless keys %$h;

    my $delta_time = time - $last_poll_time;
    print DARK WHITE . sprintf("%d: ", time) . RESET;
    print( BOLD WHITE . $_ ." [" . GREEN . sprintf("%.2f/s", $h->{$_}/$delta_time) . BOLD WHITE "] | " . RESET) foreach keys %$h; 
    print "\n";

    $last_poll_time = time;
}

Line 2 will start with declaring the “ALARM” signal. This signal is called when the alarm timeout has been reached (see further below).

Line 4 defines the alarm timeout in seconds: meaning: if everything below line 4 (until the next alarm line) takes longer than the defined timeout in seconds, the “ALRM” signal handler defined at line 2 will be called, which basically stops the code execution with a die (which in theory should stop the script with an exit 1).

Line 5 defines a hash reference which is required down below, to temporarily store unique log lines.

Line 6 until 12 define an eval block. The eval block will catch the ALRM signal die (once reached) without stopping the script with an exit 1. Inside the eval block, the standard output will be read with the diamond operator (<>) and unique lines will be counted and stored in the $h hash reference.

Line 13, right after the eval block, sets to alarm timeout to 0 again, which means it is disabled. This allows that only execution of the eval block will be evaluated for timeout.

Line 15 ensures that only when log lines were discovered and stored in the temporary hash-ref $h, that rates will be printed to the screen.

The rest of the code will take care of printing the discovered log lines with their rates to the screen. Colors from Term::ANSIColor are used to make the output more vivid.

Example output:

The full version of the script can be found at: https://github.com/insani4c/perl_tools/tree/master/log_rate

Perl: Archive E-Mails in an IMAP Folder

Johnny Morano — Thu, 05 Nov 2015 09:33:47 +0000

IMAP folders are really because you can have your e-mails synchronized on multiple devices, without losing e-mails across your devices when retrieving your new e-mails. IMAP folders actually also aren’t that cool because e-mails are usually never deleted or even archived. Having millions of e-mails can make some e-mail readers on certain devices really slow.

The below script is an example how to clean and archive e-mails. The script was written in Perl and tested on a Courier IMAP server. Remember the Perl devise: there are million ways to write Perl scripts.

The following modules were used:

use Net::IMAP::Simple::SSL;
use Email::Simple;
use Getopt::Long qw/:config bundling/;
use DateTime;
use YAML qw/LoadFile/;
use Log::Log4perl;
use Pod::Usage;
use Data::Dumper;

The configuration file is written in YAML and the logging of the script is handled by Log4Perl.

log4perl.logger.shihai= DEBUG, shihaiLogfile
log4perl.logger.hostlistdb= DEBUG, shihaiLogfile

log4perl.appender.shihaiLogfile          = Log::Log4perl::Appender::File
log4perl.appender.shihaiLogfile.filename = /var/tmp/shihai.log
log4perl.appender.shihaiLogfile.layout   = Log::Log4perl::Layout::PatternLayout
log4perl.appender.shihaiLogfile.layout.ConversionPattern = %d %p [%x][%r millis][%c][%F{1}:%L][%M] %m%n

The configuration file contains the login credentials and the threshold values:

imap:
    host: 'mail.example.com'
    user: 'bob@example.com'
    pass: 'myultrasecretpassword'

threshold:
    archive:
        years: 3
    delete:
        years: 8

The threshold values are in fact the same parameters that can be used in the DateTime method ‘subtract()’;

The main loop of the program will:

Connect to the IMAP server
Retrieve all IMAP folders and loop through them
Loop through all messages in each mailbox

# Connect to the IMAP server
my $imap = connect_imap( $config->{imap} );

# get all mailboxes and loop through them
my @mailboxes = get_mailboxes($imap);
foreach my $mailbox_name (@mailboxes){

    # Skip all Archive boxes 
    next if $mailbox_name =~ /Archive/;

    # select the mailbox and get the number of messages
    my $mb = $imap->select($mailbox_name);
    unless(defined $mb){
        $logger->error("Mailbox [$mailbox_name] doesn't exist: ", $imap->errstr());
        next;
    }

    $logger->info("Scanning $mailbox_name");

    # loop through the messages
    foreach my $i (1 .. $mb){
        my ($from, $subject, $date, $year) = get_mail_header($imap, $i);

        if(defined $date){
            if($date < $delete_date ){
                delete_mail($imap, $i);
            }
            elsif($date < $archive_date){
                $logger->info("Archiving [$i][$from][$subject][$date]");
                my $archive_box = get_archive_box($imap, $mailbox_name);
                move_mail($imap, $i, $archive_box)
            }
        }
    }

    $imap->expunge_mailbox($mailbox_name);
}

$imap->quit;

Let’s go through the different subroutines called in the mainloop.

connect_imap()

sub connect_imap {
    my ($cfg) = @_;
    my $logger = Log::Log4perl->get_logger('shihai.archive_mail');

    my $imap = Net::IMAP::Simple::SSL->new($cfg->{host})
        or $logger->logdie("Unable to connect to IMAP server: $Net::IMAP::Simple::errstr");
    $logger->info("Connected to IMAP host $cfg->{host}");

    unless( $imap->login($cfg->{user}, $cfg->{pass}) ){
        $logger->logdie("Login failed: ", $imap->errstr);

    }
    $logger->info("Logged to IMAP host $cfg->{host} as user '$cfg->{user}'");

    return $imap;
}

The parameters expected in the ‘$cfg’ hash are:

host
user
pass

get_mailboxes()

sub get_mailboxes {
    my ($imap) = @_;
    my @mailboxes = $imap->mailboxes;
    my $logger = Log::Log4perl->get_logger('shihai.archive_mail');

    return @mailboxes;
}

Ok, I admit, I shouldn’t have created an extra subroutine for it… but i was kind of in a flow!

get_mail_header()

sub get_mail_header {
    my ($imap, $i) = @_;
    my $logger = Log::Log4perl->get_logger('shihai.archive_mail');

    my $header = $imap->top($i);
    unless( $header ){
        $logger->error("No header found for message $i in ", $imap->current_box);
    }
    
    my $email = Email::Simple->new(join '', @{ $header });
    unless( $email ){
        $logger->error("No Email::Simple object, skipping...");
        return
    }

    my ($subject) = $email->header('Subject');
    my ($date)    = $email->header('Date');
    my ($from)    = $email->header('From');

    # $logger->debug("Got e-mail [$from] [$subject] [$date]");

    unless(defined $date){
        $logger->error("No date found: ", $email->header_obj->as_string);
        delete_mail($imap, $i);
        return;
    }
    
    my($junk, $day, $month, $year) = ( $date =~ m/(...,\s+)?([0-9]{1,2})\s+(...)\s+(\d{4})/ );

    my $date_obj;
    if(defined $year && defined $month && defined $day){
        $date_obj = DateTime->new(
            year  => $year,
            month => $months{$month},
            day   => $day,
        );
    }    

    return ($from, $subject, $date_obj, $year, $month, $day);
}

This subroutine takes the ‘$imap’ object and the message number as input parameters. It will then retrieve the mail header and convert it to an ‘Email::Simple’ object. I’ve chosen this module so I can easily extract header fields.
If no ‘Date:’ field was found in the e-mail, then the tool will just delete the email. I don’t like e-mails with wrong or missing headers (they’re usually spam anyway).

delete_mail()

sub delete_mail {
    my($imap, $i) = @_;
    my $logger = Log::Log4perl->get_logger('shihai.archive_mail');

    if( $imap->delete($i) ){
        $logger->info("Deleted message number $i from ", $imap->current_box);
    }
}

Pretty straightforward.

get_archive_box()

sub get_archive_box{
    my($imap, $mailbox_name) = @_;
    my $logger = Log::Log4perl->get_logger('shihai.archive_mail');

    my ($archive_box) = ($mailbox_name);
    $archive_box =~ s/INBOX/INBOX.Archive/;

    if( not grep /^$archive_box$/, @mailboxes) {
        create_mailbox($imap, $archive_box);
        subscribe($imap, $archive_box);
    }

    return $archive_box;       
}

This subroutine will assemble the archive mailbox name. It will then check if the mailbox already exists and otherwise create it and subscribe to it.

create_mailbox()

sub create_mailbox {
    my($imap, $mb) = @_;
    my $logger = Log::Log4perl->get_logger('shihai.archive_mail');

    $imap->create_mailbox($mb) or $logger->logdie("Mailbox creation '$mb' failed: ", $imap->errstr());
    $logger->info("Created mailbox $mb");
}

It will basically just create the mailbox and log about it.

subscribe()

sub subscribe {
    my($imap, $box) =@_;
    my $logger = Log::Log4perl->get_logger('shihai.archive_mail');

    $imap->folder_subscribe($box);
    $logger->info("Subscribed to mailbox $box");
}

move_mail()

sub move_mail {
    my($imap, $i, $new_box) = @_;
    my $logger = Log::Log4perl->get_logger('shihai.archive_mail');

    if( $imap->copy($i, $new_box) ){
        $logger->info("Copied message number [$i] from ", $imap->current_box ," to [$new_box]");
        delete_mail($imap, $i);
    }
}

Moving an e-mail consists of copying it first to the new mailbox and then afterwards removing it from the old mailbox.

And that’s basically it!

Perl: SSL Communication in web applications

Johnny Morano — Thu, 06 Nov 2014 10:27:34 +0000

The following demonstrates how to create a strict SSL communication between client and server, using HTTP.
This setup could used when creating a web API which requires strong encryption and only allows clients which have a properly signed certificate.

The Apache configuration in the below example will actually require 2 web servers:

one proxy host, which will accept the SSL connection, verify, check for ACLs and then forward the connection unencrypted internally
one internal web server which will actually contain the WebAPI scripts

This article explains how to use Mojolicious for the WebAPI side and LWP::UserAgent to send and receive the WebAPI calls. We will furthermore use JSON to send and receive information.

First we need to have or create a set of OpenSSL certificates.
The below example uses self signed certificates, since they don’t cost any money and suit perfect for the purpose of this example.
There a million howto’s on the internet which explains these steps very thoroughly, so I won’t reinvent the wheel. I’m just going to post the steps I took to create:

a CA certificate
a client certificate
a server certificate

cd /path/to/SSL
cp /etc/ssl/openssl.cnf example.cnf
vim example.cnf  # Edit the file to your needs
openssl genrsa -aes256 -out private/example_com_ca.key 4096
openssl req -config example.cnf -new -x509 -extensions v3_ca -key private/example_com_ca.key -out certs/example_com_ca.crt -days 3650
openssl req -config example.cnf -new -nodes -keyout private/client01.key -out client01.csr -days 365
openssl ca -config example.cnf -policy policy_anything -out certs/client01.crt -infiles client01.csr
openssl req -config example.cnf -new -nodes -keyout private/server.key -out server.csr -days 365
openssl ca -config example.cnf -policy policy_anything -out certs/server.crt -infiles server.csr

Next we will need to configure our web server (this example uses the Apache web server) in order to use our self signed certificates, and to proxy forward our WebAPI calls.

SSLEngine on
SSLCertificateFile       /path/to/SSL/certs/server.crt
SSLCertificateKeyFile    /path/to/SSL/private/server.key
SSLCertificateChainFile  /path/to/SSL/certs/example_com_ca.crt
SSLCACertificateFile     /path/to/SSL/certs/example_com_ca.crt
SSLVerifyClient require

ProxyPass /send/         http://internal-host/send.pl/
ProxyPassReverse /send/  http://internal-host/send.pl/


            Options FollowSymLinks MultiViews
            AllowOverride All
            Order deny,allow
            allow from localhost
            allow from 8.8.8.8 # The client IP address
            deny from all

The above the configuration for the external proxy server. The internal web server should have a pretty straight-forward configuration:

a cgi-handler for the Perl extension ‘.pl’

I could have also send those proxy requests to an internal Mojolicious application, listening on a specific port. I’ll leave that for another article.

The test client script is going to make a SSL connection to the external web server, send some JSON and wait for the server to send some JSON data back. The interesting part in the below script is how to set up the SSL connection.

#!/usr/bin/perl
use strict; use warnings;

use HTTP::Request;
use LWP::UserAgent;
use IO::Socket::SSL;
use JSON;

my $data = {
    username  => 'skipper',
    password  => 'secret',
    variable  => 'value',
};

my $uri = 'https://example.com/send/event';
my $json = encode_json( $data );
my $req = HTTP::Request->new( 'POST', $uri );
$req->header( 'Content-Type' => 'application/json' );
$req->content( $json );
 
my $lwp = LWP::UserAgent->new(
    ssl_opts => {
        SSL_use_cert    => 1,
        SSL_version     => 'TLSv12',
        SSL_verify_mode => SSL_VERIFY_PEER,
        SSL_ca_file     => "/path/to/SSL/certs/example_com_ca.crt",
        SSL_cert_file   => "/path/to/SSL/certs/client01.crt",
        SSL_key_file    => "/path/to/SSL/private/client01.key",
    },
) or die "SSL Connection failed: $!";
my $res = $lwp->request( $req );
if ($res->is_success) {
    print "RESPONSE:", $res->content . "\n";
} 
else {
    print "ERROR: ", $res->status_line . "\n";
}

The server example uses the Mojolicious frame work. Mojolicious is the porn for every Perl WebAPI developer. If you don’t know it, you should be ashamed and start reading about it right away.

#!/usr/bin/env perl
 
use Mojolicious::Lite;

# A helper to identify visitors
helper whois => sub {
    my $c               = shift;
    my $headers         = $c->req->headers;
    my $agent           = $c->req->headers->user_agent || 'Anonymous';
    my $local_ip        = $c->tx->remote_address;
    my $remote_ip       = $headers->header('x-forwarded-for');

    return { 
        agent      => $agent, 
        local_ip   => $local_ip,
        remote_ip  => $remote_ip,
   };
};

any '/' => sub {
  my $c = shift;
  $c->render( text => "There is nothing to see here, move along" );
};
 
post '/event' => sub {
    my $c = shift;
    my $json = $c->req->json;
    my $data = {
        username        => $json->{username},
        password        => $json->{password},
        whois           => $c->whois,
    };
    $c->render( json => $data );
};



### IMPORTANT
app->secret('some_cool_secret');
app->start;

Example output of the test client script:

$ perl test_send.pl 
RESPONSE:
{"whois":{"remote_ip":"176.9.64.17","agent":"libwww-perl\/6.04","local_ip":"176.9.64.17"},"password":"secret","username":"skipper"}

Perl: Create schema backups in PostgreSQL

Johnny Morano — Fri, 22 Aug 2014 09:09:20 +0000

At my recent job, I was asked to create a backup procedure, which would dump a PostgreSQL schema to a compressed file and which was able to create weekly and daily backups.
The backups had to be full backups each time a backup was made and the amount of daily and weekly backups should be defined through thresholds.

The PostgreSQL tool used for those backups is ‘pg_dump‘ and I have used Perl to script all the interesting stuff together.

The script will basically go through the following steps:

Check the backup path for the required directories (and if not, create them)
Rotate old backups based on thresholds
Create a new backup

The script shown below is just an example and probably needs to be adopted for your own needs. The script works for me and the environment it was created in.

First things first.
The script uses the following Perl modules:

use DateTime;
use Pod::Usage;
use YAML qw/LoadFile/;
use File::Path qw/make_path/;
use File::Copy;
use Data::Dumper;
use POSIX qw/setuid/;

A YAML configuration file is used to provide the script with essential information. An example configuration file looks like the following:

thresholds:
    daily: 7
    weekly: 4

backup_path: /data/backup/schema_backups

database: my_db

daily_to_weekly_pattern: sunday

schemas:
    - my_cool_schema
    - my_not_so_cool_schema

Remember: YAML is sensitive about tabs!

Command line arguments are set up in the script by using Getopt::Long.

my ($help, $cfg_file, $schema, $verbose, $debug) = @_;
# Check command line arguments
GetOptions(
    "help"     => \$help,
    "verbose"  => \$verbose,
    "debug"    => \$debug,
    "cfg=s"    => \$cfg_file,
    "schema=s" => \$schema,
);
pod2usage(1) if $help;

The script needs to run as the ‘postgres’ user. Should it be executed by another user (for instance root), then script will try to switch to the ‘postgres’ user.

my ($user) = ( split /\c/, getpwuid($<) )[0]; 
unless ($user eq 'postgres') { 
    p_info("Script $0 needs to run as 'postgres', switching user..."); 
    setuid(scalar getpwnam 'postgres'); 
}

Next we will load the configuration file and check if a schema name was supplied on the command line. If one was defined, then we will override the schema names which were set in the configuration, and only create a backup of that one schema name.

if(defined $cfg_file){
    if( -f $cfg_file ){
        p_info("Loading configuration file '$cfg_file'");
        $cfg = LoadFile($cfg_file);
    }
    else {
        die "No such configuration file '$cfg_file'\n";
    }
}

$cfg->{schemas} = [$schema] if defined $schema;

And now we are ready for the mainloop of the script:

foreach my $s (@{ $cfg->{schemas} }){
    check_current_backups($s);
    create_backup($s);
}

For each schema, we will first check if the required directories are in place and otherwise create them. Afterwards we will check those directories for older backups.

sub check_current_backups {
    my($schema) = @_;

    check_directory_structure($schema);
    check_backups('daily', $schema);
    check_backups('weekly', $schema);
}

sub check_directory_structure {
    my($schema) = @_;

    foreach my $period (qw/daily weekly/){
        my $_path = return_backup_path($period, $schema);;
        p_info("Checking path '$_path'");
        unless(-d $_path){
            make_path($_path);
            p_info("Created path '$_path'");
        }
    }
}

# check if older backups need rotation / deletion
sub check_backups {
    my($period, $schema) = @_;

    my $path = return_backup_path($period, $schema);

    my @files = glob("$path/*");
    my @sorted = sort { get_date($b) <=> get_date($a) } @files;

    if(scalar @sorted >= $cfg->{thresholds}{$period}){
        p_info("Rotating backups for period '$period'");
        rotate_backups($period, \@sorted);
    }
}

The rotation of the backups works like follows:
– If the day threshold has been reached (for instance 7 daily backups), then those files will be nominated for rotation or deletion

The rotation itself is custom designed for my current job. Each backup filename is appended the day name (Monday, Tuesday, …). Backup files matching a certain pattern (in my situation ‘sunday’) will be moved into the ‘weekly’ backup path, other old files will be deleted.

Since rotation is done before a backup is created, we will delete one more as required file (since a new backup file is going to be created in a few lines further).

sub rotate_backups {
    my($period, $files) = @_;

    p_debug("All Files: ".Dumper($files));
    p_debug("$period threshold: ".$cfg->{thresholds}{$period});

    # make a true copy
    my (@to_move_files) = (@{ $files });
    # The @files contains all backup files, with the youngest as element 0, the oldest 
    # backup as last element.
    # @to_move_files is a slice of @files, starting from the position threshold - 1, 
    # until the end of the array. Those files will be either rotated or removed
    @to_move_files = @to_move_files[ $cfg->{thresholds}{$period} -1 .. $#to_move_files ];
    p_debug("TO MOVE FILES: ".Dumper(\@to_move_files));

    if($period eq 'daily'){
        foreach my $file (@to_move_files){
            # move backups to weekly
            if($file =~ /$cfg->{daily_to_weekly_pattern}/){
                p_info("Moving daily backup '$file' to weekly");
                move($file, return_backup_path('weekly', $schema) . '/' . $file)
            }
            else {
                p_info("Removing backup '$file'");
                unlink($file);
            }
        }
    }

    if($period eq 'weekly'){
        foreach my $file (@to_move_files){
            # remove files
            p_info("Removing backup '$file'");
            unlink($file);
        }
    }
}

At this point now, the required directory structure has been checked and is present and older backup files have been rotated or deleted.
Finally we can create the actual backup:

sub create_backup {
    my($schema) =@_;

    p_info("Creating backup for schema '$schema', database:" . $cfg->{database});
    my $now = DateTime->now;
    my $path = return_backup_path('daily', $schema) 
                . '/' . $now->ymd('') . $now->hms('')
                . '_' .lc($now->day_name) 
                . '.dump.sql';

    # Create the dump file
    my $dump_output = do{
        local $/;
        open my $c, '-|', "pg_dump -v -n $schema -f $path $cfg->{database} 2>&1" 
            or die "pg_dump for '$schema' failed: $!";
        <$c>;
    };
    p_debug('pg_dump output: ', $dump_output);

    # GZIP the dump file
    my $gzip_output = do{
        local $/;
        open my $c, '-|', "gzip $path 2>&1" 
            or die "gzip for '$path' failed: $!";
        <$c>;
    };
    p_debug('gzip output: ', $gzip_output);

    # change the permissions
    chmod 0660, "$path.gz";

    p_info("Created backup for schema '$schema' in '$path.gz'");
}

The backup is created by issuing pg_dump for that schema and it will produce a normal text SQL file. This file will be compressed with gzip and afterwards the file permissions will be changed to 0660. This means that, since the backup file is created by the postgres user, only the postgres user will have access to this file.

The full script and configuration file can be found at https://github.com/insani4c/perl_tools/tree/master/backup_schema

Monitor running processes with Perl

Johnny Morano — Thu, 15 May 2014 12:33:22 +0000

Update: This article is updated thanks to Colin Keith his excellent comment. I was extremely inspired by it

Maintaining a large number of servers cannot be done without proper programming skills. Each good system administrator must therefor make sure he knows how to automate his daily works.

Although many many programming languages exist, most persons will only write code in one. I happen to like Perl.

In this next blog post, I am going to show how to create a script which can be deployed on all the Linux servers you need to maintain and need to check for certain running services.

Of course, a tool as Nagios together with NRPE and a configured event-handler could also be used, but lately I was often in the situation that the ‘nrpe daemon’ crashed, Nagios was spewing a lot of errors and the event-handler… well, since nrpe was down, the event-handler of course couldn’t connect or do anything. So why rely on a remote triggered action, when a simple script could be used.

The following script will check a default list of services and can additionally load or overwrite these services. A regular expression can be used to check for running processes, and of course, a startup command needs to be defined. And that is all the script will and should do.

The script uses three CPAN modules:

The first one will be used to get a full listing of all running processes and the second one will provide us a means for using configuration files.

So let’s start our script:

#!/usr/bin/env perl 
use strict; use warnings;
use utf8;

use Proc::ProcessTable;
use YAML qw/LoadFile/;
use File::Slurp;

# Default set of processes to watch
my %default_services = (
    'NRPE' => {
        'cmd'     => '/etc/init.d/nagios-nrpe-server restart',
        're'      => '/usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d',
	'pidfile' => '/var/tmp/nagios-nrpe-server.pid',
    },
    'Freshclam' => {
        'cmd'     => '/etc/init.d/clamav-freshclam restart',
        're'      => '/usr/bin/freshclam -d --quiet',
	'pidfile' => '/var/tmp/clamav-freshclam.pid',
    },
    'Syslog-NG' => {
        'cmd'     => '/etc/init.d/syslog-ng restart',
        're'      => '/usr/sbin/syslog-ng -p /var/run/syslog-ng.pid',
	'pidfile' => '/var/run/syslog-ng.pid',     
    },
    'VMToolsD' => {
        'cmd'     => '/etc/init.d/vmware-tools restart',
        're'      => '/usr/sbin/vmtoolsd',
	'pidfile' => '/var/tmp/vmtoolsd.pid',
    },
    'Munin-Node' => {
        'cmd'     => '/etc/init.d/munin-node restart',
        're'      => '/usr/sbin/munin-node',
	'pidfile' => '/var/tmp/munin-node.pid',
    },
);

my (%services) = (%default_services);

Until now, no rocket science. We load the required modules, we defined our default services that need to be checked.

Next part, check if there is a configuration file on disk. The script looks for a hard-coded path ”/etc/default/watchdog.yaml”:

# Check if there is a local config file and if yes, load them in the services hash
if( -f '/etc/default/watchdog.yaml' ){
    my $local_config = LoadFile '/etc/default/watchdog.yaml';

    %services = (%default_services, %{ $local_config->{services} });
}

The last Perl statement actually allows to overwrite one or more (or even all) the default defined services.

Now let’s see if these processes are actually running. The following code was hugely inspired by Colin Keith’s comment below. I have combined his examples together with my code.

Let’s first have a look at the code:

# Get current process table
my $processes = Proc::ProcessTable->new;
my %procs; 
my %matched_procs;
foreach my $p (@{ $processes }){
    $procs{ $p->{pid} } = $p->{cmndline};
    foreach my $s (keys %services){
        if($p->{cmndline} =~ m#$services{$s}->{re}#){
            $matched_procs{$s}++;
            last;
        }
    }
}

# Search the process table for not running services
foreach my $service ( keys %services ) {
    if(exists($services{$service}->{pidfile}) && -f $services{$service}->{pidfile} ) {
        my $pid = read_file( glob($services{$service}->{pidfile}) );
 
        # If we get a pid ensure that it is running, and that we can signal it
        $pid && exists($procs{$pid}) && kill(0, $pid) && next;  
        
        # Remove the stale PID file because no running process for this PID file
        unlink( $services{$service}->{pidfile} );
    }
    else {
        # check if the configured process regex matches
        if( exists($matched_procs{$service}) ){
            # process is running but has no PID file
            next;
        }
    }
	
    # Execute the service command
    system( $services{$service}->{'cmd'} );

    # Check the exit code of the service command
    if ($? == -1) {
        print "Failed to restart '$service' with '$services{$service}->{cmd}': $!\n";
    }
    elsif ($? & 127) {
        printf "Restart of '$service' died with signal %d, %s coredump\n", ($? & 127),  ($? & 128) ? 'with':'without';
    }
    else {
        printf "Process '$service' successfully restarted, exit status:  %d\n", $? >> 8;
    }
}

Lines 2 retrieves the current process list. We will save that information in two hashes with a little less information, because we actually only need the PID and the actual ‘command line’ of each process.

At line 16 we will start looping through the processes we have defined in the %services hash.
Inspired by Colins post, we will check if the process’ PID file is still there and if one is configured. If it still exists, we will then verify if the PID stored in the PID file, exists in the process list, which we have stored in %procs. This happens in lines 18-21.
At line 21, if the process is still running and the PID matches, we will check the next service to check (&& next part)
If the process is not running anymore but the PID file was still in the defined path, then it will be removed at line 24.

Otherwise, if no PID file was found or no PID file was configured, we will check the process list with the regular expression defined for that process. We have already created a hash, %matched_procs between lines 7 and 10, which we will use for this checking. If the process exists in the hash, we will skip and check the next process to be checked.

Now, if there was no PID file or the PID file was removed at line 24, the process will be started again. This happens at line 35.
I’ve executed it with the ‘system’ function since I want to have the output of this command directly in STDOUT. And of course, the last thing to do is to check if the process started up correctly or not by checking its exit code.

Now save that script to for instance ‘watchdog.pl’ and configure it in a cron job.
Example:

*/5 * * * * root /usr/local/bin/watchdog.pl

And here’s an example of the configuration file:

services:
    Exim-Mailserver:
        cmd: /etc/init.d/exim4 restart
        re: /usr/sbin/exim4 -bd -q30m
    Ossec-Agent:
        cmd: /etc/init.d/ossec restart
        re: !!perl/regexp '(?:ossec-agentd|ossec-logcollector|ossec-syscheckd)'

Link to script source code: https://github.com/insani4c/perl_tools/tree/master/watchdog

Postgresql: Monitor sequence scans with Perl

Johnny Morano — Wed, 12 Feb 2014 07:33:26 +0000

Not using indexes or huge tables without indexes, can have a very negative impact on the duration of a SQL query. The query planner will decide to make a sequence scan, which means that the query will go through the table sequentially to search for the required data. When this table is only 100 rows big, you will probably not even notice it is making a sequence scans, but if your table is 1,000,000 rows big or even more, you can probably optimize your table to use indexes to result in faster searches.

In the example script we will be using a Storable state file and we will the statistics as a JSON object in the PostgreSQL database.

First let’s take a look at the query we will be executing:

SELECT schemaname, relname, seq_tup_read 
FROM pg_stat_all_tables 
WHERE seq_tup_read > '0' 
      AND relname NOT LIKE 'pg_%'
ORDER BY seq_tup_read desc

As you can see, PostgreSQL stores all the information we need about our tables in just one table, called pg_stat_all_tables. In this table there is a column called seq_tup_read, which will contain the information we need.

Just reading out this information is not going to be enough, because it contains information since the startup of your PostgreSQL database. Since production databases aren’t restarted (that often), we will have to compare this information with some previous information (hence the Storable state file).
Our plan is to run the script in a cronjob, each 5 minutes.

The statistics are also stored in as a JSON object in a database, just so that we could build some web interface for the statistics, in a later stage. And we want to keep a history of these statistics.

Furthermore the script will setuid to postgres (same like su – postgres on the command line), so that it could connect to the PostgreSQL UNIX socket file.

use strict;
use warnings;
use utf8;

use DBI;
use DateTime;
use POSIX qw/setuid/;
use Text::ASCIITable;
use JSON;

my $db   = 'mydatabase';
if(scalar @ARGV){
    $db = shift @ARGV;
}

my $host = '/var/run/postgresql';
my $user = 'postgres';
my $pass = 'undef';

my $state_db   = 'database_statistics';
my $state_host = '192.168.1.1';
my $state_user = 'skeletor';
my $state_pass = 'he-manisawhimp';

my $state_file = '/var/tmp/sequence_read.state';

# suid to postgres
setuid(scalar getpwnam 'postgres');

# define and open up the state file
my $state = {};
$state = retrieve $state_file if -f $state_file;

my $now      = DateTime->now;

# Connect to the database which we want to monitor
my $dbh = DBI->connect("dbi:Pg:dbname=$db;host=$host", $user, $pass) 
                or die "Could not connect to database: $!\n";

# Connect to the database that will be used to store the statistics
my $state_dbh = DBI->connect("dbi:Pg:dbname=$state_db;host=$state_host", $state_user, $state_pass) 
                or die "Could not connect to the State database '$state_db': $!\n";

my $sql = <<EOF;
SELECT schemaname, relname, seq_tup_read 
FROM pg_stat_all_tables 
WHERE seq_tup_read > '0' 
      AND relname NOT LIKE 'pg_%'
ORDER BY seq_tup_read desc
EOF

# Get the statistics
my $results = $dbh->selectall_arrayref( $sql, undef);

# Store the statistics as a JSON object in the second databse
eval {
    $state_dbh->do('INSERT INTO mydbschema.seq_tup_read (data) VALUES(?)', undef, encode_json($results));
};
if($@){
    print "Insert into state-db failed: $@\n";
}

# Prepare a nice ASCII table for output
my $t = Text::ASCIITable->new({ headingText => 'Seq Tup Read ' . $now->ymd('-')     . ' ' . $now->hms(':')});
$t->setCols('Schema Name','Relation Name ', 'Seq Tup Read', 'Increase (delta)');

my $row_count = 0;
foreach my $r (@{$results}){
    last if $row_count > 25;

    my (@values) = (@{$r});
    my ($increase, $delta) = (0, 0);
    # Calculate the increase and its delta
    if(defined $state->{last}{$r->[0].':'.$r->[1]}{seq_tup_read}){
        $increase = $r->[2] - $state->{last}{$r->[0].':'.$r->[1]}{seq_tup_read};
        $delta    = $increase / $state->{last}{$r->[0].':'.$r->[1]}{seq_tup_read} * 100;
        my $str = sprintf '%.0f (%.4f %%)', $increase, $delta;
        push @values, ($str);
    }
    else {
        push @values, '0 (0%)';
    }
    # Store this information for the next run of the script
    $state->{last}{$r->[0].':'.$r->[1]}{seq_tup_read} = $r->[2];
    $state->{last}{$r->[0].':'.$r->[1]}{delta}        = $delta;
    $state->{last}{$r->[0].':'.$r->[1]}{increase}     = $increase;

    # Only add the information to ASCII output table if there was an increase
    next unless $increase > 0;
    $t->addRow(@values);
    $row_count++;
}
# Print out the ASCII table
print $t;

nstore $state, $state_file;

Postgresql: Monitor unused indexes

Johnny Morano — Tue, 11 Feb 2014 09:09:08 +0000

Working on large database systems, with many tables and many indexes, it is easy to loose the overview on what is actually being used and what is just consuming unwanted disk space.
If indexes are not closely monitored, they could end up using undesired space and moreover, they will consume unnecessary CPU cycles.

Statistics about indexes can be easily retrieved from the PostgreSQL database system. All required information is stored in two tables:

pg_stat_user_indexes
pg_index

When joining these two tables, interesting information can be read in the following columns:

idx_scan: has the query planner used this index for an ‘Index Scan’, the number returned is the amount of times it was used
idx_tup_read: how many tuples have been read by using the index
idx_tup_fetch: how many tuples have been fetch by using the index

A neat function called pg_relation_size() allows to fetch the on-disk size of a relation, in this case the index.

Based on this information, the monitoring query will be built up as follows:

SELECT 
    relid::regclass AS table, 
    indexrelid::regclass AS index, 
    pg_size_pretty(pg_relation_size(indexrelid::regclass)) AS index_size, 
    idx_tup_read, 
    idx_tup_fetch, 
    idx_scan
FROM 
    pg_stat_user_indexes 
    JOIN pg_index USING (indexrelid) 
WHERE 
    idx_scan = 0 
    AND indisunique IS FALSE

Now, all we need to do is write a script which stores this information in some kind of file and periodically report about the statistics.

First of all we will need a configuration file, which contains the database credentials.
I’ve chosen YAML because it is so versatile.

It will contain two important sets of information:

The database credentials
path to the state file

Example:

dsn: "dbi:Pg:host=/var/run/postgresql;database=testdb"
user: postgres
pass:
state_file: /var/tmp/monitor_unused_indexes.state

As you can see, we will be connect to the PostgreSQL database by using its UNIX socket.

The script will use Text::ASCIITable to output the statistics in a nice table. Storable is used to save our statistics to disk.

In the below script, we will check if an index was unused in a timespan of 30 days. If yes, the script will report this index to STDOUT.
Therefore, we will store a score and timestamp for each unused index in the state file.

#!/usr/bin/env perl
use strict;
use warnings;
use utf8;
use DBI;
use Storable qw/nstore retrieve/;
use YAML qw/LoadFile/;
use POSIX qw/setuid/;
use Getopt::Long;
use DateTime;
use Text::ASCIITable;

my $cfg_file = './monitor_unused_indexes.yaml';
my $verbose = 0;
GetOptions("cfg=s" => \$cfg_file,
           "verbose|v" => \$verbose, 
        );

my $sql = <<EOS;
SELECT 
    relid::regclass AS table, 
    indexrelid::regclass AS index, 
    pg_size_pretty(pg_relation_size(indexrelid::regclass)) AS index_size, 
    idx_tup_read, 
    idx_tup_fetch, 
    idx_scan
FROM 
    pg_stat_user_indexes 
    JOIN pg_index USING (indexrelid) 
WHERE 
    idx_scan = 0 
    AND indisunique IS FALSE
EOS

my ($cfg) = LoadFile($cfg_file);

# suid to postgres, other whatever user is configured in the config.yaml file
setuid(scalar getpwnam $cfg->{user});

# Connect to the database
my $dbh = DBI->connect($cfg->{dsn}, $cfg->{user}, $cfg->{pass}) 
            or die "Could not connect to database: $! (DBI ERROR: ".$DBI::errstr.")\n";

my $state;
if(-f $cfg->{state_file}){
    $state = retrieve $cfg->{state_file};
}

# Fetch the statistics
my $results = $dbh->selectall_arrayref( $sql, undef );

my $now_dt   = DateTime->now;

# Initialize the ASCII table
my $t = Text::ASCIITable->new({ headingText => 'INDEX STATISTICS'});
$t->setCols(qw/Table Index Index_Size idx_tup_read idx_tup_fetch idx_scan/);

# Analyze the results
foreach my $r (@$results){
    if($verbose){
        $t->addRow(@{$r});
    }
    # Only update the state file if --verbose was not specified.
    # This way the script can be check manually with --verbose many times and executed for instance
    # from a cronjob once a day without --verbose
    else {
        if(defined $state->{unused_indexes}{$r->[1]}){
            my $first_dt = DateTime->from_epoch( epoch => $state->{unused_indexes}{$r->[1]}{first_hit} );
            if($first_dt->add(days => $state->{unused_indexes}{$r->[1]}{score})->day == $now_dt->day ) {
                $state->{unused_indexes}{$r->[1]}{score}++;
            }
            else {
                $state->{unused_indexes}{$r->[1]}{score}     = 1;
                $state->{unused_indexes}{$r->[1]}{first_hit} = $now_dt->epoch;;
            }
        }
        else {
            $state->{unused_indexes}{$r->[1]}{score}     = 1;
            $state->{unused_indexes}{$r->[1]}{first_hit} = $now_dt->epoch;;
        }
    }
}

# Print out the statistics table, if --verbose was specified
print $t if $verbose; 

# Store the statistics to disk in a state file
nstore $state, $cfg->{state_file};

foreach my $idx (keys %{ $state->{unused_indexes} }){
    my $first_dt = DateTime->from_epoch( epoch => $state->{unused_indexes}{$idx}{first_hit} );
    if( $first_dt->add(days => 30) <= $now_dt ){
        my $line = "Index: $idx ready for deletion";
        $line .= " (score:" . $state->{unused_indexes}{$idx}{score};
        $line .= " (|first_hit:" . DateTime->from_epoch(epoch => $state->{unused_indexes}{$idx}{first_hit})->ymd . ")";

        print $line."\n" if $verbose;
    }
}

Postgresql 9.3: Creating an index on a JSON attribute

Johnny Morano — Fri, 27 Dec 2013 10:28:25 +0000

Recently I’ve discovered some very interesting new features in the PostgreSQL 9.3 database.
First of all, a new data type has been introduced: JSON. Together with this new data type, new functions were also introduced.

These new features now simply for instance saving web forms in your Postgresql database. Or actually any kind of dynamic data, such as for instance Perl hashes. Plus, thanks to the new JSON functions, this data can be easily searched and indexed.

Let’s start with creating a test table.

CREATE SEQUENCE data_seq    
    START WITH 1    
    INCREMENT BY 1    
    NO MINVALUE    
    NO MAXVALUE    
    CACHE 1;

CREATE TABLE data (    
    id bigint DEFAULT nextval('data_seq'::regclass) NOT NULL,
    form_name TEXT,
    form_data JSON
);

I’ve inserted into this table 100k rows of test data with a very simple Perl script.

#!/usr/bin/perl
use strict;
use DBI;
use AnyEvent;
use AnyEvent::Util;
$AnyEvent::Util::MAX_FORKS = 25;

print "Inserting test data...\n";
my $cv = AnyEvent->condvar;
$cv->begin;
foreach my $d (0..100000){
    $cv->begin;
    fork_call {
        my($d) = @_;
        my $name = do{local $/; open my $c, '-|', 'pwgen -B -s -c1 64'; <$c>};
        chomp($name);
        my $dbh = DBI->connect("dbi:Pg:host=/var/run/postgresql;dbname=test;port=5432",'postgres', undef);
        $dbh->do(qq{insert into data (form_name,form_data) VALUES('test_form', '{"c":{"d":"ddddd"},"name":"$name","b":"bbbbb", "count":$d}')});
        $dbh->disconnect;
        return $d;
    } $d,
    sub {
        my ($count) = @_;
        print "$d ";
        $cv->end;
    }
} 
$cv->end;
$cv->recv;
print "\n\nDone\n";

Now let’s assume that the JSON data we are going to insert (or have inserted) always contains the attribute field ‘name’. On this attribute we will create the following database index:

CREATE INDEX ON data USING btree (form_name, json_extract_path_text(form_data,'name'));

The above example creates a multi-column index.

Now let’s a make our first test.
The first test will not use the index we have created previously.

EXPLAIN ANALYZE VERBOSE SELECT * FROM data WHERE form_name = 'test_form' AND form_data->>'name' = 'cbcO5twuPnAYJ1VLV6gsEv9zWs2AbQxQ9PoALLr2w6Rwpr2PtoQHCCK0hyOMuIME';
                                                                             QUERY PLAN                                                                              
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on data  (cost=0.00..4337.28 rows=500 width=102) (actual time=28.608..129.945 rows=1 loops=1)
   Filter: ((data.form_name = 'test_form'::text) AND ((data.form_data ->> 'name'::text) = 'cbcO5twuPnAYJ1VLV6gsEv9zWs2AbQxQ9PoALLr2w6Rwpr2PtoQHCCK0hyOMuIME'::text))
   Rows Removed by Filter: 100000
 Total runtime: 129.968 ms
(5 rows)

130ms for searching through 100k rows, is actually quite ok.

Now let’s see how we can speed up this query by using the index we’ve created.

EXPLAIN ANALYZE VERBOSE SELECT * FROM data WHERE form_name = 'test_form' AND json_extract_path_text(form_data,'name') = 'cbcO5twuPnAYJ1VLV6gsEv9zWs2AbQxQ9PoALLr2w6Rwpr2PtoQHCCK0hyOMuIME';
                                                                             QUERY PLAN                                                                                                
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using data_form_name_json_extract_path_text_idx on data  (cost=0.42..8.44 rows=1 width=102) (actual time=0.056..0.057 rows=1 loops=1)
   Index Cond: ((data.form_name = 'test_form'::text) AND (json_extract_path_text(data.form_data, VARIADIC '{name}'::text[]) = 'cbcO5twuPnAYJ1VLV6gsEv9zWs2AbQxQ9PoALLr2w6Rwpr2PtoQHCCK0hyOMuIME'::text))
 Total runtime: 0.084 ms
(4 rows)

0.084ms! That’s is about 1625 times faster! What makes this index extremely interesting is that the index has only been created on one attribute of the JSON data and not on the entire JSON data. This will keep the index data small and thus will be kept longer in your database’ memory.