Postgresql: Monitor unused indexes

Working on large database systems, with many tables and many indexes, it is easy to loose the overview on what is actually being used and what is just consuming unwanted disk space.
If indexes are not closely monitored, they could end up using undesired space and moreover, they will consume unnecessary CPU cycles.

Statistics about indexes can be easily retrieved from the PostgreSQL database system. All required information is stored in two tables:

  • pg_stat_user_indexes
  • pg_index

When joining these two tables, interesting information can be read in the following columns:

  • idx_scan: has the query planner used this index for an ‘Index Scan’, the number returned is the amount of times it was used
  • idx_tup_read: how many tuples have been read by using the index
  • idx_tup_fetch: how many tuples have been fetch by using the index

A neat function called pg_relation_size() allows to fetch the on-disk size of a relation, in this case the index.

Based on this information, the monitoring query will be built up as follows:

SELECT 
    relid::regclass AS table, 
    indexrelid::regclass AS index, 
    pg_size_pretty(pg_relation_size(indexrelid::regclass)) AS index_size, 
    idx_tup_read, 
    idx_tup_fetch, 
    idx_scan
FROM 
    pg_stat_user_indexes 
    JOIN pg_index USING (indexrelid) 
WHERE 
    idx_scan = 0 
    AND indisunique IS FALSE

Now, all we need to do is write a script which stores this information in some kind of file and periodically report about the statistics.

First of all we will need a configuration file, which contains the database credentials.
I’ve chosen YAML because it is so versatile.

It will contain two important sets of information:

  • The database credentials
  • path to the state file

Example:

dsn: "dbi:Pg:host=/var/run/postgresql;database=testdb"
user: postgres
pass:
state_file: /var/tmp/monitor_unused_indexes.state

As you can see, we will be connect to the PostgreSQL database by using its UNIX socket.

The script will use Text::ASCIITable to output the statistics in a nice table. Storable is used to save our statistics to disk.

In the below script, we will check if an index was unused in a timespan of 30 days. If yes, the script will report this index to STDOUT.
Therefore, we will store a score and timestamp for each unused index in the state file.

#!/usr/bin/env perl
use strict;
use warnings;
use utf8;
use DBI;
use Storable qw/nstore retrieve/;
use YAML qw/LoadFile/;
use POSIX qw/setuid/;
use Getopt::Long;
use DateTime;
use Text::ASCIITable;

my $cfg_file = './monitor_unused_indexes.yaml';
my $verbose = 0;
GetOptions("cfg=s" => \$cfg_file,
           "verbose|v" => \$verbose, 
        );

my $sql = <<EOS;
SELECT 
    relid::regclass AS table, 
    indexrelid::regclass AS index, 
    pg_size_pretty(pg_relation_size(indexrelid::regclass)) AS index_size, 
    idx_tup_read, 
    idx_tup_fetch, 
    idx_scan
FROM 
    pg_stat_user_indexes 
    JOIN pg_index USING (indexrelid) 
WHERE 
    idx_scan = 0 
    AND indisunique IS FALSE
EOS

my ($cfg) = LoadFile($cfg_file);

# suid to postgres, other whatever user is configured in the config.yaml file
setuid(scalar getpwnam $cfg->{user});

# Connect to the database
my $dbh = DBI->connect($cfg->{dsn}, $cfg->{user}, $cfg->{pass}) 
            or die "Could not connect to database: $! (DBI ERROR: ".$DBI::errstr.")\n";

my $state;
if(-f $cfg->{state_file}){
    $state = retrieve $cfg->{state_file};
}

# Fetch the statistics
my $results = $dbh->selectall_arrayref( $sql, undef );

my $now_dt   = DateTime->now;

# Initialize the ASCII table
my $t = Text::ASCIITable->new({ headingText => 'INDEX STATISTICS'});
$t->setCols(qw/Table Index Index_Size idx_tup_read idx_tup_fetch idx_scan/);

# Analyze the results
foreach my $r (@$results){
    if($verbose){
        $t->addRow(@{$r});
    }
    # Only update the state file if --verbose was not specified.
    # This way the script can be check manually with --verbose many times and executed for instance
    # from a cronjob once a day without --verbose
    else {
        if(defined $state->{unused_indexes}{$r->[1]}){
            my $first_dt = DateTime->from_epoch( epoch => $state->{unused_indexes}{$r->[1]}{first_hit} );
            if($first_dt->add(days => $state->{unused_indexes}{$r->[1]}{score})->day == $now_dt->day ) {
                $state->{unused_indexes}{$r->[1]}{score}++;
            }
            else {
                $state->{unused_indexes}{$r->[1]}{score}     = 1;
                $state->{unused_indexes}{$r->[1]}{first_hit} = $now_dt->epoch;;
            }
        }
        else {
            $state->{unused_indexes}{$r->[1]}{score}     = 1;
            $state->{unused_indexes}{$r->[1]}{first_hit} = $now_dt->epoch;;
        }
    }
}

# Print out the statistics table, if --verbose was specified
print $t if $verbose; 

# Store the statistics to disk in a state file
nstore $state, $cfg->{state_file};

foreach my $idx (keys %{ $state->{unused_indexes} }){
    my $first_dt = DateTime->from_epoch( epoch => $state->{unused_indexes}{$idx}{first_hit} );
    if( $first_dt->add(days => 30) <= $now_dt ){
        my $line = "Index: $idx ready for deletion";
        $line .= " (score:" . $state->{unused_indexes}{$idx}{score};
        $line .= " (|first_hit:" . DateTime->from_epoch(epoch => $state->{unused_indexes}{$idx}{first_hit})->ymd . ")";

        print $line."\n" if $verbose;
    }
}

flattr this!


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>