dark

Perl: Archive E-Mails in an IMAP Folder

blank
blank

IMAP folders are really because you can have your e-mails synchronized on multiple devices, without losing e-mails across your devices when retrieving your new e-mails. IMAP folders actually also aren’t that cool because e-mails are usually never deleted or even archived. Having millions of e-mails can make some e-mail readers on certain devices really slow.

The below script is an example how to clean and archive e-mails. The script was written in Perl and tested on a Courier IMAP server. Remember the Perl devise: there are million ways to write Perl scripts.

The following modules were used:

use Net::IMAP::Simple::SSL;
use Email::Simple;
use Getopt::Long qw/:config bundling/;
use DateTime;
use YAML qw/LoadFile/;
use Log::Log4perl;
use Pod::Usage;
use Data::Dumper;

The configuration file is written in YAML and the logging of the script is handled by Log4Perl.

log4perl.logger.shihai= DEBUG, shihaiLogfile
log4perl.logger.hostlistdb= DEBUG, shihaiLogfile

log4perl.appender.shihaiLogfile          = Log::Log4perl::Appender::File
log4perl.appender.shihaiLogfile.filename = /var/tmp/shihai.log
log4perl.appender.shihaiLogfile.layout   = Log::Log4perl::Layout::PatternLayout
log4perl.appender.shihaiLogfile.layout.ConversionPattern = %d %p [%x][%r millis][%c][%F{1}:%L][%M] %m%n

The configuration file contains the login credentials and the threshold values:

imap:
    host: 'mail.example.com'
    user: 'bob@example.com'
    pass: 'myultrasecretpassword'

threshold:
    archive:
        years: 3
    delete:
        years: 8

The threshold values are in fact the same parameters that can be used in the DateTime method ‘subtract()’;

The main loop of the program will:

  • Connect to the IMAP server
  • Retrieve all IMAP folders and loop through them
  • Loop through all messages in each mailbox
# Connect to the IMAP server
my $imap = connect_imap( $config->{imap} );

# get all mailboxes and loop through them
my @mailboxes = get_mailboxes($imap);
foreach my $mailbox_name (@mailboxes){

    # Skip all Archive boxes 
    next if $mailbox_name =~ /Archive/;

    # select the mailbox and get the number of messages
    my $mb = $imap->select($mailbox_name);
    unless(defined $mb){
        $logger->error("Mailbox [$mailbox_name] doesn't exist: ", $imap->errstr());
        next;
    }

    $logger->info("Scanning $mailbox_name");

    # loop through the messages
    foreach my $i (1 .. $mb){
        my ($from, $subject, $date, $year) = get_mail_header($imap, $i);

        if(defined $date){
            if($date < $delete_date ){
                delete_mail($imap, $i);
            }
            elsif($date < $archive_date){
                $logger->info("Archiving [$i][$from][$subject][$date]");
                my $archive_box = get_archive_box($imap, $mailbox_name);
                move_mail($imap, $i, $archive_box)
            }
        }
    }

    $imap->expunge_mailbox($mailbox_name);
}

$imap->quit;

Let’s go through the different subroutines called in the mainloop.

connect_imap()

sub connect_imap {
    my ($cfg) = @_;
    my $logger = Log::Log4perl->get_logger('shihai.archive_mail');

    my $imap = Net::IMAP::Simple::SSL->new($cfg->{host})
        or $logger->logdie("Unable to connect to IMAP server: $Net::IMAP::Simple::errstr");
    $logger->info("Connected to IMAP host $cfg->{host}");

    unless( $imap->login($cfg->{user}, $cfg->{pass}) ){
        $logger->logdie("Login failed: ", $imap->errstr);

    }
    $logger->info("Logged to IMAP host $cfg->{host} as user '$cfg->{user}'");

    return $imap;
}

The parameters expected in the ‘$cfg’ hash are:

  • host
  • user
  • pass

get_mailboxes()

sub get_mailboxes {
    my ($imap) = @_;
    my @mailboxes = $imap->mailboxes;
    my $logger = Log::Log4perl->get_logger('shihai.archive_mail');

    return @mailboxes;
}

Ok, I admit, I shouldn’t have created an extra subroutine for it… but i was kind of in a flow!

get_mail_header()

sub get_mail_header {
    my ($imap, $i) = @_;
    my $logger = Log::Log4perl->get_logger('shihai.archive_mail');

    my $header = $imap->top($i);
    unless( $header ){
        $logger->error("No header found for message $i in ", $imap->current_box);
    }
    
    my $email = Email::Simple->new(join '', @{ $header });
    unless( $email ){
        $logger->error("No Email::Simple object, skipping...");
        return
    }

    my ($subject) = $email->header('Subject');
    my ($date)    = $email->header('Date');
    my ($from)    = $email->header('From');

    # $logger->debug("Got e-mail [$from] [$subject] [$date]");

    unless(defined $date){
        $logger->error("No date found: ", $email->header_obj->as_string);
        delete_mail($imap, $i);
        return;
    }
    
    my($junk, $day, $month, $year) = ( $date =~ m/(...,\s+)?([0-9]{1,2})\s+(...)\s+(\d{4})/ );

    my $date_obj;
    if(defined $year && defined $month && defined $day){
        $date_obj = DateTime->new(
            year  => $year,
            month => $months{$month},
            day   => $day,
        );
    }    

    return ($from, $subject, $date_obj, $year, $month, $day);
}

This subroutine takes the ‘$imap’ object and the message number as input parameters. It will then retrieve the mail header and convert it to an ‘Email::Simple’ object. I’ve chosen this module so I can easily extract header fields.
If no ‘Date:’ field was found in the e-mail, then the tool will just delete the email. I don’t like e-mails with wrong or missing headers 😉 (they’re usually spam anyway).

delete_mail()

sub delete_mail {
    my($imap, $i) = @_;
    my $logger = Log::Log4perl->get_logger('shihai.archive_mail');

    if( $imap->delete($i) ){
        $logger->info("Deleted message number $i from ", $imap->current_box);
    }
}

Pretty straightforward.

get_archive_box()

sub get_archive_box{
    my($imap, $mailbox_name) = @_;
    my $logger = Log::Log4perl->get_logger('shihai.archive_mail');

    my ($archive_box) = ($mailbox_name);
    $archive_box =~ s/INBOX/INBOX.Archive/;

    if( not grep /^$archive_box$/, @mailboxes) {
        create_mailbox($imap, $archive_box);
        subscribe($imap, $archive_box);
    }

    return $archive_box;       
}

This subroutine will assemble the archive mailbox name. It will then check if the mailbox already exists and otherwise create it and subscribe to it.

create_mailbox()

sub create_mailbox {
    my($imap, $mb) = @_;
    my $logger = Log::Log4perl->get_logger('shihai.archive_mail');

    $imap->create_mailbox($mb) or $logger->logdie("Mailbox creation '$mb' failed: ", $imap->errstr());
    $logger->info("Created mailbox $mb");
}

It will basically just create the mailbox and log about it.

subscribe()

sub subscribe {
    my($imap, $box) =@_;
    my $logger = Log::Log4perl->get_logger('shihai.archive_mail');

    $imap->folder_subscribe($box);
    $logger->info("Subscribed to mailbox $box");
}

move_mail()

sub move_mail {
    my($imap, $i, $new_box) = @_;
    my $logger = Log::Log4perl->get_logger('shihai.archive_mail');

    if( $imap->copy($i, $new_box) ){
        $logger->info("Copied message number [$i] from ", $imap->current_box ," to [$new_box]");
        delete_mail($imap, $i);
    }
}

Moving an e-mail consists of copying it first to the new mailbox and then afterwards removing it from the old mailbox.

And that’s basically it!

1 comment
Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Previous Post
blank

Perl: SSL Communication in web applications

Next Post
blank

OSSEC: building an OpenBSD package

Related Posts