dark

Watch a directory for uploaded files

blank

In some situations, you need to watch a directory for uploaded files and move/process them immediately. Many scripts work in a polling kind of way, which will check every x amounts of seconds for new files. This is nowadays completely out of fashion and generaly not cool.

A real watcher script uses the OS’s built-in features and the Linux OS is therefor the number one choice for these kind of operations.
The Linux kernel has a feature called inotify. inotify is an inode-base file system notification mechanism, which can be used in almost every real programming language.

The following example uses the Perl programming language. The Perl module that will be used is called Linux::Inotify2, which is able to catch the inotify notifications.

The example script will watch a certain directory for files (which will be uploaded using HTTP, FTP, SFTP or which will be just moved into that specific directory).
It will also use other Perl modules, such as:
– AnyEvent: for event looping
– File::Copy: to move the uploaded to another directory
– Digest::MD5: to check if a file has been already uploaded, we don’t want duplicates
– Storable: to keep a state file of uploaded files
– use features: to use to cool new features of Perl

The first part of the script calls all the required modules and configures some variables that will used in the script.

#!/usr/bin/perl
use strict; use warnings;

# Requirements
use AnyEvent;
use Linux::Inotify2;
use File::Copy qw/move/;
use Digest::MD5 qw/md5_hex/;
use Storable qw/nstore retrieve/;
use feature qw/say switch/;

# Config
my $savedir   = '/tmp/savedir';
my $chroot    = '/tmp/upload_dir';
my $statefile = '/tmp/upload_statefile';
my $state;

Next the script will check if there is already a statefile on disk and if so, the script will load it.
It will also catch the SIGTERM and SIGINT signals (when the script is killed or stopped) and store the statefile when the signals are caught.

if(-f $statefile){
    $state = retrieve($statefile);
}

# Store state by 'murder murder murder, kill kill kill'
$SIG{TERM} = sub { $state && nstore( $state, $statefile ); exit 0 };
$SIG{INT}  = $SIG{TERM};

Now the fun part starts. The script will
– initialize the event loop
– create a watcher
– wait for events.

Only three notification signals will watched:
– IN_MOVED_TO
– IN_CLOSE_WRITE
– IN_DELETE
Other signals are not important. When the IN_CLOSE_WRITE is received, so when a file is uploaded or moved into the directory, our action will be called. For the sake of the example, the script will just move the new file into a new directory if it passes some tests.

# enable event loop
my $cv = AnyEvent->condvar;

# watch dir
my $dir_inotify = Linux::Inotify2->new;
my $dir_w = $dir_inotify->watch(
    $chroot,
    IN_MOVED_TO|IN_CLOSE_WRITE|IN_DELETE,
    sub {
        my $e = shift;
        my $filename = $e->fullname;

        given($e->mask){
            when(IN_MOVED_TO)       { say scalar localtime() . " file:$filename IN_MOVED_TO" }
            when(IN_CLOSE_WRITE)    { say scalar localtime() . " file:$filename IN_CLOSE_WRITE";
                                      check_n_move($filename);
                                    }
            when(IN_DELETE)         { say scalar localtime() . " file:$filename IN_DELETE" }
            default                 { say scalar localtime() . " The end of the world!";
                                      unlink $filename;
                                    }
        };
        
    }
);
my $inotify_w; $inotify_w = AnyEvent->io (
       fh => $dir_inotify->fileno, poll => 'r', cb => sub { $dir_inotify->poll }
);

# Wait for events
$cv->recv();

The action used in the IN_CLOSE_WRITE notification, is defined below. It is pretty straightforward is left for the reader to analyze.

#
# The Subs
#
sub check_n_move {
    my ($filename) = @_;
    return unless -f $filename;
    my $basename = (split /\//, $filename)[-1];

    # MD5 sum
    my $filedata = do {local $/;local @ARGV="$filename";<>};
    if($@){
        say scalar localtime() . " MD5 of $filename failed: $@";
        unlink $filename;
        return 0
    }
    my $md5 = md5_hex($filedata);
    say scalar localtime() . " $filename has MD5sum $md5";

    # Check state
    if(exists($state->{$basename}) && $state->{$basename} ne $md5) {
        say scalar localtime() . " $filename already uploaded, but has new content, updating...";
    }
    elsif(exists($state->{$basename}) && $state->{$basename} eq $md5){
        say scalar localtime() . " $filename already uploaded, skipping...";
        unlink $filename;
        return 0
    }

    # Move file
    my ($newfilename) = "$savedir/$basename";
    eval { move $filename, $newfilename };
    if($@){
        say scalar localtime() . " move of $filename to $newfilename failed: $@ ($!)";
        say scalar localtime() . " $filename removed, skipping...";
        unlink $filename;
        return 0
    }
    say scalar localtime() . " moved $filename to $newfilename";

    # Update state
    $state->{$basename} = $md5;

    return 0;
}
Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Previous Post

dmesg with human readable timestamps

Next Post

Resize a LVM partition in a Debian VMWare VM

Related Posts