Not using indexes or huge tables without indexes, can have a very negative impact on the duration of a SQL query. The query planner will decide to make a sequence scan, which means that the query will go through the table sequentially to search for the required data. When this table is only 100 rows big, you will probably not even notice it is making a sequence scans, but if your table is 1,000,000 rows big or even more, you can probably optimize your table to use indexes to result in faster searches.
In the example script we will be using a Storable state file and we will the statistics as a JSON object in the PostgreSQL database.
First let’s take a look at the query we will be executing:
SELECT schemaname, relname, seq_tup_read FROM pg_stat_all_tables WHERE seq_tup_read > '0' AND relname NOT LIKE 'pg_%' ORDER BY seq_tup_read desc
As you can see, PostgreSQL stores all the information we need about our tables in just one table, called pg_stat_all_tables. In this table there is a column called seq_tup_read, which will contain the information we need.
Just reading out this information is not going to be enough, because it contains information since the startup of your PostgreSQL database. Since production databases aren’t restarted (that often), we will have to compare this information with some previous information (hence the Storable state file).
Our plan is to run the script in a cronjob, each 5 minutes.
The statistics are also stored in as a JSON object in a database, just so that we could build some web interface for the statistics, in a later stage. And we want to keep a history of these statistics.
Furthermore the script will setuid to postgres (same like su – postgres on the command line), so that it could connect to the PostgreSQL UNIX socket file.
use strict; use warnings; use utf8; use DBI; use DateTime; use POSIX qw/setuid/; use Text::ASCIITable; use JSON; my $db = 'mydatabase'; if(scalar @ARGV){ $db = shift @ARGV; } my $host = '/var/run/postgresql'; my $user = 'postgres'; my $pass = 'undef'; my $state_db = 'database_statistics'; my $state_host = '192.168.1.1'; my $state_user = 'skeletor'; my $state_pass = 'he-manisawhimp'; my $state_file = '/var/tmp/sequence_read.state'; # suid to postgres setuid(scalar getpwnam 'postgres'); # define and open up the state file my $state = {}; $state = retrieve $state_file if -f $state_file; my $now = DateTime->now; # Connect to the database which we want to monitor my $dbh = DBI->connect("dbi:Pg:dbname=$db;host=$host", $user, $pass) or die "Could not connect to database: $!\n"; # Connect to the database that will be used to store the statistics my $state_dbh = DBI->connect("dbi:Pg:dbname=$state_db;host=$state_host", $state_user, $state_pass) or die "Could not connect to the State database '$state_db': $!\n"; my $sql = <<EOF; SELECT schemaname, relname, seq_tup_read FROM pg_stat_all_tables WHERE seq_tup_read > '0' AND relname NOT LIKE 'pg_%' ORDER BY seq_tup_read desc EOF # Get the statistics my $results = $dbh->selectall_arrayref( $sql, undef); # Store the statistics as a JSON object in the second databse eval { $state_dbh->do('INSERT INTO mydbschema.seq_tup_read (data) VALUES(?)', undef, encode_json($results)); }; if($@){ print "Insert into state-db failed: $@\n"; } # Prepare a nice ASCII table for output my $t = Text::ASCIITable->new({ headingText => 'Seq Tup Read ' . $now->ymd('-') . ' ' . $now->hms(':')}); $t->setCols('Schema Name','Relation Name ', 'Seq Tup Read', 'Increase (delta)'); my $row_count = 0; foreach my $r (@{$results}){ last if $row_count > 25; my (@values) = (@{$r}); my ($increase, $delta) = (0, 0); # Calculate the increase and its delta if(defined $state->{last}{$r->[0].':'.$r->[1]}{seq_tup_read}){ $increase = $r->[2] - $state->{last}{$r->[0].':'.$r->[1]}{seq_tup_read}; $delta = $increase / $state->{last}{$r->[0].':'.$r->[1]}{seq_tup_read} * 100; my $str = sprintf '%.0f (%.4f %%)', $increase, $delta; push @values, ($str); } else { push @values, '0 (0%)'; } # Store this information for the next run of the script $state->{last}{$r->[0].':'.$r->[1]}{seq_tup_read} = $r->[2]; $state->{last}{$r->[0].':'.$r->[1]}{delta} = $delta; $state->{last}{$r->[0].':'.$r->[1]}{increase} = $increase; # Only add the information to ASCII output table if there was an increase next unless $increase > 0; $t->addRow(@values); $row_count++; } # Print out the ASCII table print $t; nstore $state, $state_file;
Nice, I was working on something like this but with bash scripts and a repository database. Your approach appears to be quicker to implement. I never thought of storing the data as a JSON object. More flexible than my approach. Thanks for posting.
Hi there,
Could you perhaps give a small schema for the destination/stats db?