dark

Perl: Create schema backups in PostgreSQL

blank
blank

At my recent job, I was asked to create a backup procedure, which would dump a PostgreSQL schema to a compressed file and which was able to create weekly and daily backups.
The backups had to be full backups each time a backup was made and the amount of daily and weekly backups should be defined through thresholds.

The PostgreSQL tool used for those backups is ‘pg_dump‘ and I have used Perl to script all the interesting stuff together.

The script will basically go through the following steps:

  • Check the backup path for the required directories (and if not, create them)
  • Rotate old backups based on thresholds
  • Create a new backup

The script shown below is just an example and probably needs to be adopted for your own needs. The script works for me and the environment it was created in.

First things first.
The script uses the following Perl modules:

use DateTime;
use Pod::Usage;
use YAML qw/LoadFile/;
use File::Path qw/make_path/;
use File::Copy;
use Data::Dumper;
use POSIX qw/setuid/;

A YAML configuration file is used to provide the script with essential information. An example configuration file looks like the following:

thresholds:
    daily: 7
    weekly: 4

backup_path: /data/backup/schema_backups

database: my_db

daily_to_weekly_pattern: sunday

schemas:
    - my_cool_schema
    - my_not_so_cool_schema

Remember: YAML is sensitive about tabs!

Command line arguments are set up in the script by using Getopt::Long.

my ($help, $cfg_file, $schema, $verbose, $debug) = @_;
# Check command line arguments
GetOptions(
    "help"     => $help,
    "verbose"  => $verbose,
    "debug"    => $debug,
    "cfg=s"    => $cfg_file,
    "schema=s" => $schema,
);
pod2usage(1) if $help;

The script needs to run as the ‘postgres’ user. Should it be executed by another user (for instance root), then script will try to switch to the ‘postgres’ user.

my ($user) = ( split /\c/, getpwuid($<) )[0]; 
unless ($user eq 'postgres') { 
    p_info("Script $0 needs to run as 'postgres', switching user..."); 
    setuid(scalar getpwnam 'postgres'); 
}

Next we will load the configuration file and check if a schema name was supplied on the command line. If one was defined, then we will override the schema names which were set in the configuration, and only create a backup of that one schema name.

if(defined $cfg_file){
    if( -f $cfg_file ){
        p_info("Loading configuration file '$cfg_file'");
        $cfg = LoadFile($cfg_file);
    }
    else {
        die "No such configuration file '$cfg_file'\n";
    }
}

$cfg->{schemas} = [$schema] if defined $schema;

And now we are ready for the mainloop of the script:

foreach my $s (@{ $cfg->{schemas} }){
    check_current_backups($s);
    create_backup($s);
}

For each schema, we will first check if the required directories are in place and otherwise create them. Afterwards we will check those directories for older backups.

sub check_current_backups {
    my($schema) = @_;

    check_directory_structure($schema);
    check_backups('daily', $schema);
    check_backups('weekly', $schema);
}

sub check_directory_structure {
    my($schema) = @_;

    foreach my $period (qw/daily weekly/){
        my $_path = return_backup_path($period, $schema);;
        p_info("Checking path '$_path'");
        unless(-d $_path){
            make_path($_path);
            p_info("Created path '$_path'");
        }
    }
}

# check if older backups need rotation / deletion
sub check_backups {
    my($period, $schema) = @_;

    my $path = return_backup_path($period, $schema);

    my @files = glob("$path/*");
    my @sorted = sort { get_date($b) <=> get_date($a) } @files;

    if(scalar @sorted >= $cfg->{thresholds}{$period}){
        p_info("Rotating backups for period '$period'");
        rotate_backups($period, \@sorted);
    }
}

The rotation of the backups works like follows:
– If the day threshold has been reached (for instance 7 daily backups), then those files will be nominated for rotation or deletion

The rotation itself is custom designed for my current job. Each backup filename is appended the day name (Monday, Tuesday, …). Backup files matching a certain pattern (in my situation ‘sunday’) will be moved into the ‘weekly’ backup path, other old files will be deleted.

Since rotation is done before a backup is created, we will delete one more as required file (since a new backup file is going to be created in a few lines further).

sub rotate_backups {
    my($period, $files) = @_;

    p_debug("All Files: ".Dumper($files));
    p_debug("$period threshold: ".$cfg->{thresholds}{$period});

    # make a true copy
    my (@to_move_files) = (@{ $files });
    # The @files contains all backup files, with the youngest as element 0, the oldest 
    # backup as last element.
    # @to_move_files is a slice of @files, starting from the position threshold - 1, 
    # until the end of the array. Those files will be either rotated or removed
    @to_move_files = @to_move_files[ $cfg->{thresholds}{$period} -1 .. $#to_move_files ];
    p_debug("TO MOVE FILES: ".Dumper(\@to_move_files));

    if($period eq 'daily'){
        foreach my $file (@to_move_files){
            # move backups to weekly
            if($file =~ /$cfg->{daily_to_weekly_pattern}/){
                p_info("Moving daily backup '$file' to weekly");
                move($file, return_backup_path('weekly', $schema) . '/' . $file)
            }
            else {
                p_info("Removing backup '$file'");
                unlink($file);
            }
        }
    }

    if($period eq 'weekly'){
        foreach my $file (@to_move_files){
            # remove files
            p_info("Removing backup '$file'");
            unlink($file);
        }
    }
}

At this point now, the required directory structure has been checked and is present and older backup files have been rotated or deleted.
Finally we can create the actual backup:

sub create_backup {
    my($schema) =@_;

    p_info("Creating backup for schema '$schema', database:" . $cfg->{database});
    my $now = DateTime->now;
    my $path = return_backup_path('daily', $schema) 
                . '/' . $now->ymd('') . $now->hms('')
                . '_' .lc($now->day_name) 
                . '.dump.sql';

    # Create the dump file
    my $dump_output = do{
        local $/;
        open my $c, '-|', "pg_dump -v -n $schema -f $path $cfg->{database} 2>&1" 
            or die "pg_dump for '$schema' failed: $!";
        <$c>;
    };
    p_debug('pg_dump output: ', $dump_output);

    # GZIP the dump file
    my $gzip_output = do{
        local $/;
        open my $c, '-|', "gzip $path 2>&1" 
            or die "gzip for '$path' failed: $!";
        <$c>;
    };
    p_debug('gzip output: ', $gzip_output);

    # change the permissions
    chmod 0660, "$path.gz";

    p_info("Created backup for schema '$schema' in '$path.gz'");
}

The backup is created by issuing pg_dump for that schema and it will produce a normal text SQL file. This file will be compressed with gzip and afterwards the file permissions will be changed to 0660. This means that, since the backup file is created by the postgres user, only the postgres user will have access to this file.

The full script and configuration file can be found at https://github.com/insani4c/perl_tools/tree/master/backup_schema

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Previous Post
blank

Monitor running processes with Perl

Next Post
blank

Perl: SSL Communication in web applications

Related Posts