dark

PostgreSQL 9.2 Master – Slave Monitoring

Nagios plugin script written in Bash to check the master-slave replication in PostgreSQL (tested on PostgreSQL 9.2.4) (executed on the slave).
The script will report how many bytes the slave server is behind, and how many seconds ago the last replay of data occurred.

The script must be executed as ‘postgres’ user.

#!/bin/bash

# $Id: check_slave_replication.sh 3421 2013-08-09 07:52:44Z jmorano $

STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
 
## Master (p_) and Slave (s_) DB Server Information	
export s_host=$1
export s_port=$2
export p_db=$3
export p_host=$4
export p_port=$5
 
export psql=/opt/postgresql/bin/psql
export bc=/usr/bin/bc
 
## Limits
export  critical_limit=83886080 # 5 * 16MB, size of 5 WAL files
export   warning_limit=16777216 # 16 MB, size of 1 WAL file
 
master_lag=$($psql -U postgres -h$p_host -p$p_port -A -t -c "SELECT pg_xlog_location_diff(pg_current_xlog_location(), '0/0') AS offset" $p_db)
slave_lag=$($psql -U postgres  -h$s_host -p$s_port -A -t -c "SELECT pg_xlog_location_diff(pg_last_xlog_receive_location(), '0/0') AS receive" $p_db)
replay_lag=$($psql -U postgres -h$s_host -p$s_port -A -t -c "SELECT pg_xlog_location_diff(pg_last_xlog_replay_location(), '0/0') AS replay" $p_db)
replay_timediff=$($psql -U postgres -h$s_host -p$s_port -A -t -c "SELECT NOW() - pg_last_xact_replay_timestamp() AS replication_delay" $p_db)
 
if [[ $master_lag -eq '' || $slave_lag -eq '' || $replay_lag -eq '' ]]; then
    echo "CRITICAL: Stream has no value to compare (is replication configured or connectivity problem?)"
    exit $STATE_CRITICAL
else
    if [[ $master_lag -eq $slave_lag && $master_lag -eq $replay_lag && $slave_lag -eq $replay_lag ]] ; then
        echo "OK: Stream: MASTER:$master_lag Slave:$slave_lag Replay:$replay_lag"
        exit $STATE_OK
    else
        if [[ $master_lag -eq $slave_lag ]] ; then
            if [[ $master_lag -ne $replay_lag ]] ; then
                if [ $(bc <<< $master_lag-$replay_lag) -lt $warning_limit ]; then
                    echo "OK: Stream: MASTER:$master_lag Replay:$replay_lag :: REPLAY BEHIND"
                    exit $STATE_OK
                else
                    echo "WARNING: Stream: MASTER:$master_lag Replay:$replay_lag :: REPLAY $(bc <<< $master_lag-$replay_lag)bytes BEHIND (${replay_timediff}seconds)"
                    exit $STATE_WARNING
                fi
            fi
        else
            if [ $(bc <<< $master_lag-$slave_lag) -gt $critical_limit ]; then
                echo "CRITICAL: Stream: MASTER:$master_lag Slave:$slave_lag :: STREAM BEYOND CRITICAL LIMIT ($(bc <<< $master_lag-$slave_lag)bytes)"
                exit $STATE_CRITICAL
            else
                if [ $(bc <<< $master_lag-$slave_lag) -lt $warning_limit ]; then
                    echo "OK: Stream: MASTER:$master_lag Slave:$slave_lag Replay:$replay_lag :: STREAM BEHIND"
                    exit $STATE_OK
                else
                    echo "WARNING: Stream: MASTER:$master_lag Slave:$slave_lag :: STREAM BEYOND WARNING LIMIT ($(bc <<< $master_lag-$replay_lag)bytes)"
                    exit $STATE_WARNING
                fi
            fi
        fi
        echo "UNKNOWN: Stream: MASTER: $master_lag Slave: $slave_lag Replay: $replay_lag"
        exit $STATE_UNKNOWN
    fi
fi

Possible outputs:

$ bash check_slave_replication.sh 192.168.0.1 5432 live 192.168.0.2 5432
WARNING: Stream: MASTER:1907958306184 Replay:1907878056888 :: REPLAY 80249296bytes BEHIND (00:03:14.056747seconds)
$ bash check_slave_replication.sh 192.168.0.1 5432 live 192.168.0.2 5432
OK: Stream: MASTER:2055690128376 Slave:2055690143144 Replay:2055690193744 :: STREAM BEHIND
$ bash check_slave_replication.sh 192.168.0.1 5432 live 192.168.0.2 5432
OK: Stream: MASTER:2055690497120 Replay:2055690497328 :: REPLAY BEHIND
$ bash check_slave_replication.sh 192.168.0.1 5432 live 192.168.0.2 5432
OK: Stream: MASTER:2055691704672 Slave:2055691704672 Replay:2055691704672
14 comments
    1. Yes, that function is only available starting from 9.2 (hence the version in the header of the post ;-))

  1. This script expects that you will alwasy logon as the postgres user itself, while this is not alweays the case.

    I added an extra option for the user, and edited the psql commands so it will use this user instead of the static postgres user.

    export p_user=$6

    master_lag=$($psql -U $p_user -h$p_host -p$p_port -A -t -c “SELECT pg_xlog_location_diff(pg_current_xlog_location(), ‘0/0’) AS offset” $p_db)
    slave_lag=$($psql -U $p_user -h$s_host -p$s_port -A -t -c “SELECT pg_xlog_location_diff(pg_last_xlog_receive_location(), ‘0/0’) AS receive” $p_db)
    replay_lag=$($psql -U $p_user -h$s_host -p$s_port -A -t -c “SELECT pg_xlog_location_diff(pg_last_xlog_replay_location(), ‘0/0’) AS replay” $p_db)
    replay_timediff=$($psql -U $p_user -h$s_host -p$s_port -A -t -c “SELECT NOW() – pg_last_xact_replay_timestamp() AS replication_delay” $p_db)

  2. Hi, when I run it, I got this error. Please help to solve it. Thanks!

    : No such file or directoryne 18: /usr/bin/psql
    : No such file or directoryne 19: /usr/bin/psql
    : No such file or directoryne 20: /usr/bin/psql
    : No such file or directoryne 21: /usr/bin/psql
    monitor_master_slave.sh: line 58: syntax error: unexpected end of file

    1. Hi

      Where do you have ‘psql’ installed? Execute ‘which psql’ to find its location, or execute ‘locate bin/psql’ 😉

  3. Does this work for streaming replication?.. I receive an error when trying to run. Streaming Replication has the secondary always in recovery mode.

    ERROR: recovery is in progress
    HINT: WAL control functions cannot be executed during recovery.
    CRITICAL: Stream has no value to compare (is replication configured or connectivity problem?)

  4. Hi, when I run it, I got this error. Please help to solve it. Thanks!

    My operative system is debian

    /opt/postgresql/bin/psql: No such file or directory
    /opt/postgresql/bin/psql: No such file or directory
    /opt/postgresql/bin/psql: No such file or directory
    /opt/postgresql/bin/psql: No such file or directory

    1. On the command line, write ‘which psql’ and configure that path in the script 😉

Leave a Reply to Johnny Morano Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Previous Post

Recursive Inotify Daemon

Next Post

Secure Password Generator in Perl

Related Posts