Postgresql: Monitor sequence scans with Perl


Not using indexes or huge tables without indexes, can have a very negative impact on the duration of a SQL query. The query planner will decide to make a sequence scan, which means that the query will go through the table sequentially to search for the required data. When this table is only 100 rows big, you will probably not even notice it is making a sequence scans, but if your table is 1,000,000 rows big or even more, you can probably optimize your table to use indexes to result in faster searches.

In the example script we will be using a Storable state file and we will the statistics as a JSON object in the PostgreSQL database.

First let’s take a look at the query we will be executing:

SELECT schemaname, relname, seq_tup_read 
FROM pg_stat_all_tables 
WHERE seq_tup_read > '0' 
      AND relname NOT LIKE 'pg_%'
ORDER BY seq_tup_read desc

As you can see, PostgreSQL stores all the information we need about our tables in just one table, called pg_stat_all_tables. In this table there is a column called seq_tup_read, which will contain the information we need.

Just reading out this information is not going to be enough, because it contains information since the startup of your PostgreSQL database. Since production databases aren’t restarted (that often), we will have to compare this information with some previous information (hence the Storable state file).
Our plan is to run the script in a cronjob, each 5 minutes.

The statistics are also stored in as a JSON object in a database, just so that we could build some web interface for the statistics, in a later stage. And we want to keep a history of these statistics.

Furthermore the script will setuid to postgres (same like su – postgres on the command line), so that it could connect to the PostgreSQL UNIX socket file.

use strict;
use warnings;
use utf8;

use DBI;
use DateTime;
use POSIX qw/setuid/;
use Text::ASCIITable;
use JSON;

my $db   = 'mydatabase';
if(scalar @ARGV){
    $db = shift @ARGV;

my $host = '/var/run/postgresql';
my $user = 'postgres';
my $pass = 'undef';

my $state_db   = 'database_statistics';
my $state_host = '';
my $state_user = 'skeletor';
my $state_pass = 'he-manisawhimp';

my $state_file = '/var/tmp/sequence_read.state';

# suid to postgres
setuid(scalar getpwnam 'postgres');

# define and open up the state file
my $state = {};
$state = retrieve $state_file if -f $state_file;

my $now      = DateTime->now;

# Connect to the database which we want to monitor
my $dbh = DBI->connect("dbi:Pg:dbname=$db;host=$host", $user, $pass) 
                or die "Could not connect to database: $!\n";

# Connect to the database that will be used to store the statistics
my $state_dbh = DBI->connect("dbi:Pg:dbname=$state_db;host=$state_host", $state_user, $state_pass) 
                or die "Could not connect to the State database '$state_db': $!\n";

my $sql = <<EOF;
SELECT schemaname, relname, seq_tup_read 
FROM pg_stat_all_tables 
WHERE seq_tup_read > '0' 
      AND relname NOT LIKE 'pg_%'
ORDER BY seq_tup_read desc

# Get the statistics
my $results = $dbh->selectall_arrayref( $sql, undef);

# Store the statistics as a JSON object in the second databse
eval {
    $state_dbh->do('INSERT INTO mydbschema.seq_tup_read (data) VALUES(?)', undef, encode_json($results));
    print "Insert into state-db failed: $@\n";

# Prepare a nice ASCII table for output
my $t = Text::ASCIITable->new({ headingText => 'Seq Tup Read ' . $now->ymd('-')     . ' ' . $now->hms(':')});
$t->setCols('Schema Name','Relation Name ', 'Seq Tup Read', 'Increase (delta)');

my $row_count = 0;
foreach my $r (@{$results}){
    last if $row_count > 25;

    my (@values) = (@{$r});
    my ($increase, $delta) = (0, 0);
    # Calculate the increase and its delta
    if(defined $state->{last}{$r->[0].':'.$r->[1]}{seq_tup_read}){
        $increase = $r->[2] - $state->{last}{$r->[0].':'.$r->[1]}{seq_tup_read};
        $delta    = $increase / $state->{last}{$r->[0].':'.$r->[1]}{seq_tup_read} * 100;
        my $str = sprintf '%.0f (%.4f %%)', $increase, $delta;
        push @values, ($str);
    else {
        push @values, '0 (0%)';
    # Store this information for the next run of the script
    $state->{last}{$r->[0].':'.$r->[1]}{seq_tup_read} = $r->[2];
    $state->{last}{$r->[0].':'.$r->[1]}{delta}        = $delta;
    $state->{last}{$r->[0].':'.$r->[1]}{increase}     = $increase;

    # Only add the information to ASCII output table if there was an increase
    next unless $increase > 0;
# Print out the ASCII table
print $t;

nstore $state, $state_file;

  1. Nice, I was working on something like this but with bash scripts and a repository database. Your approach appears to be quicker to implement. I never thought of storing the data as a JSON object. More flexible than my approach. Thanks for posting.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Previous Post

Postgresql: Monitor unused indexes

Next Post

Monitor running processes with Perl

Related Posts