Postgresql: Monitor unused indexes

Working on large database systems, with many tables and many indexes, it is easy to loose the overview on what is actually being used and what is just consuming unwanted disk space.
If indexes are not closely monitored, they could end up using undesired space and moreover, they will consume unnecessary CPU cycles.

Statistics about indexes can be easily retrieved from the PostgreSQL database system. All required information is stored in two tables:

  • pg_stat_user_indexes
  • pg_index

When joining these two tables, interesting information can be read in the following columns:

  • idx_scan: has the query planner used this index for an ‘Index Scan’, the number returned is the amount of times it was used
  • idx_tup_read: how many tuples have been read by using the index
  • idx_tup_fetch: how many tuples have been fetch by using the index

A neat function called pg_relation_size() allows to fetch the on-disk size of a relation, in this case the index.

Based on this information, the monitoring query will be built up as follows:

Now, all we need to do is write a script which stores this information in some kind of file and periodically report about the statistics.

First of all we will need a configuration file, which contains the database credentials.
I’ve chosen YAML because it is so versatile.

It will contain two important sets of information:

  • The database credentials
  • path to the state file

Example:

As you can see, we will be connect to the PostgreSQL database by using its UNIX socket.

The script will use Text::ASCIITable to output the statistics in a nice table. Storable is used to save our statistics to disk.

In the below script, we will check if an index was unused in a timespan of 30 days. If yes, the script will report this index to STDOUT.
Therefore, we will store a score and timestamp for each unused index in the state file.


Comments

Postgresql: Monitor unused indexes — 3 Comments

  1. Nice script! Thank you for sharing your idea and handson!

    1) Why not only execute the query instead of the script to check index usage? If we use the script to create historic data about usage, can’t the pg_stat_user_indexes pg_index tables give us that information?

    2) If I really need the script, what frequency it needs to be called?

    Thank you!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.