Nagios plugin script written in Bash to check the master-slave replication in PostgreSQL (tested on PostgreSQL 9.2.4) (executed on the slave).
The script will report how many bytes the slave server is behind, and how many seconds ago the last replay of data occurred.
The script must be executed as ‘postgres’ user.
#!/bin/bash # $Id: check_slave_replication.sh 3421 2013-08-09 07:52:44Z jmorano $ STATE_OK=0 STATE_WARNING=1 STATE_CRITICAL=2 STATE_UNKNOWN=3 ## Master (p_) and Slave (s_) DB Server Information export s_host=$1 export s_port=$2 export p_db=$3 export p_host=$4 export p_port=$5 export psql=/opt/postgresql/bin/psql export bc=/usr/bin/bc ## Limits export critical_limit=83886080 # 5 * 16MB, size of 5 WAL files export warning_limit=16777216 # 16 MB, size of 1 WAL file master_lag=$($psql -U postgres -h$p_host -p$p_port -A -t -c "SELECT pg_xlog_location_diff(pg_current_xlog_location(), '0/0') AS offset" $p_db) slave_lag=$($psql -U postgres -h$s_host -p$s_port -A -t -c "SELECT pg_xlog_location_diff(pg_last_xlog_receive_location(), '0/0') AS receive" $p_db) replay_lag=$($psql -U postgres -h$s_host -p$s_port -A -t -c "SELECT pg_xlog_location_diff(pg_last_xlog_replay_location(), '0/0') AS replay" $p_db) replay_timediff=$($psql -U postgres -h$s_host -p$s_port -A -t -c "SELECT NOW() - pg_last_xact_replay_timestamp() AS replication_delay" $p_db) if [[ $master_lag -eq '' || $slave_lag -eq '' || $replay_lag -eq '' ]]; then echo "CRITICAL: Stream has no value to compare (is replication configured or connectivity problem?)" exit $STATE_CRITICAL else if [[ $master_lag -eq $slave_lag && $master_lag -eq $replay_lag && $slave_lag -eq $replay_lag ]] ; then echo "OK: Stream: MASTER:$master_lag Slave:$slave_lag Replay:$replay_lag" exit $STATE_OK else if [[ $master_lag -eq $slave_lag ]] ; then if [[ $master_lag -ne $replay_lag ]] ; then if [ $(bc <<< $master_lag-$replay_lag) -lt $warning_limit ]; then echo "OK: Stream: MASTER:$master_lag Replay:$replay_lag :: REPLAY BEHIND" exit $STATE_OK else echo "WARNING: Stream: MASTER:$master_lag Replay:$replay_lag :: REPLAY $(bc <<< $master_lag-$replay_lag)bytes BEHIND (${replay_timediff}seconds)" exit $STATE_WARNING fi fi else if [ $(bc <<< $master_lag-$slave_lag) -gt $critical_limit ]; then echo "CRITICAL: Stream: MASTER:$master_lag Slave:$slave_lag :: STREAM BEYOND CRITICAL LIMIT ($(bc <<< $master_lag-$slave_lag)bytes)" exit $STATE_CRITICAL else if [ $(bc <<< $master_lag-$slave_lag) -lt $warning_limit ]; then echo "OK: Stream: MASTER:$master_lag Slave:$slave_lag Replay:$replay_lag :: STREAM BEHIND" exit $STATE_OK else echo "WARNING: Stream: MASTER:$master_lag Slave:$slave_lag :: STREAM BEYOND WARNING LIMIT ($(bc <<< $master_lag-$replay_lag)bytes)" exit $STATE_WARNING fi fi fi echo "UNKNOWN: Stream: MASTER: $master_lag Slave: $slave_lag Replay: $replay_lag" exit $STATE_UNKNOWN fi fi
Possible outputs:
$ bash check_slave_replication.sh 192.168.0.1 5432 live 192.168.0.2 5432 WARNING: Stream: MASTER:1907958306184 Replay:1907878056888 :: REPLAY 80249296bytes BEHIND (00:03:14.056747seconds) $ bash check_slave_replication.sh 192.168.0.1 5432 live 192.168.0.2 5432 OK: Stream: MASTER:2055690128376 Slave:2055690143144 Replay:2055690193744 :: STREAM BEHIND $ bash check_slave_replication.sh 192.168.0.1 5432 live 192.168.0.2 5432 OK: Stream: MASTER:2055690497120 Replay:2055690497328 :: REPLAY BEHIND $ bash check_slave_replication.sh 192.168.0.1 5432 live 192.168.0.2 5432 OK: Stream: MASTER:2055691704672 Slave:2055691704672 Replay:2055691704672
when I run this, I get:
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
ERROR: function pg_xlog_location_diff(text, unknown) does not exist
LINE 1: SELECT pg_xlog_location_diff(pg_last_xlog_receive_location()…
would this be a suggested fix ?
http://vibhorkumar.wordpress.com/2013/02/18/pg_xlog_location_diff-function-for-postgreqsqlppas/
Yes, that function is only available starting from 9.2 (hence the version in the header of the post ;-))
and yes, I’m running 9.1
did it and it works like a charm. cheers 🙂
This script expects that you will alwasy logon as the postgres user itself, while this is not alweays the case.
I added an extra option for the user, and edited the psql commands so it will use this user instead of the static postgres user.
export p_user=$6
master_lag=$($psql -U $p_user -h$p_host -p$p_port -A -t -c “SELECT pg_xlog_location_diff(pg_current_xlog_location(), ‘0/0’) AS offset” $p_db)
slave_lag=$($psql -U $p_user -h$s_host -p$s_port -A -t -c “SELECT pg_xlog_location_diff(pg_last_xlog_receive_location(), ‘0/0’) AS receive” $p_db)
replay_lag=$($psql -U $p_user -h$s_host -p$s_port -A -t -c “SELECT pg_xlog_location_diff(pg_last_xlog_replay_location(), ‘0/0’) AS replay” $p_db)
replay_timediff=$($psql -U $p_user -h$s_host -p$s_port -A -t -c “SELECT NOW() – pg_last_xact_replay_timestamp() AS replication_delay” $p_db)
Hi, when I run it, I got this error. Please help to solve it. Thanks!
: No such file or directoryne 18: /usr/bin/psql
: No such file or directoryne 19: /usr/bin/psql
: No such file or directoryne 20: /usr/bin/psql
: No such file or directoryne 21: /usr/bin/psql
monitor_master_slave.sh: line 58: syntax error: unexpected end of file
Hi
Where do you have ‘psql’ installed? Execute ‘which psql’ to find its location, or execute ‘locate bin/psql’ 😉
Does this work for streaming replication?.. I receive an error when trying to run. Streaming Replication has the secondary always in recovery mode.
ERROR: recovery is in progress
HINT: WAL control functions cannot be executed during recovery.
CRITICAL: Stream has no value to compare (is replication configured or connectivity problem?)
you need to swap your host and slave – you might have them reversed
Hi, when I run it, I got this error. Please help to solve it. Thanks!
My operative system is debian
/opt/postgresql/bin/psql: No such file or directory
/opt/postgresql/bin/psql: No such file or directory
/opt/postgresql/bin/psql: No such file or directory
/opt/postgresql/bin/psql: No such file or directory
On the command line, write ‘which psql’ and configure that path in the script 😉
or just…
export psql=`which psql`
export bc=`which bc`
or just…
export psql=\`which psql\`
export bc=\`which bc\`