<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Postgresql &#8211; Johnny Morano&#039;s Tech Articles</title>
	<atom:link href="https://jmorano.moretrix.com/tag/postgresql/feed/" rel="self" type="application/rss+xml" />
	<link>https://jmorano.moretrix.com</link>
	<description>Ramblings of an old-fashioned space cowboy</description>
	<lastBuildDate>Sat, 09 Apr 2022 08:46:25 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.6.2</generator>

<image>
	<url>https://jmorano.moretrix.com/wp-content/uploads/2022/04/cropped-jmorano_emblem-32x32.png</url>
	<title>Postgresql &#8211; Johnny Morano&#039;s Tech Articles</title>
	<link>https://jmorano.moretrix.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Deploy a PostgreSQL database with an initial schema using Ansible</title>
		<link>https://jmorano.moretrix.com/2022/04/deploy-a-postgresql-database-with-an-initial-schema-using-ansible/</link>
					<comments>https://jmorano.moretrix.com/2022/04/deploy-a-postgresql-database-with-an-initial-schema-using-ansible/#respond</comments>
		
		<dc:creator><![CDATA[Johnny Morano]]></dc:creator>
		<pubDate>Sat, 09 Apr 2022 08:46:24 +0000</pubDate>
				<category><![CDATA[Automation]]></category>
		<category><![CDATA[Blog]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Ansible]]></category>
		<category><![CDATA[DevOps]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Postgresql]]></category>
		<guid isPermaLink="false">https://jmorano.moretrix.com/?p=1479</guid>

					<description><![CDATA[Ansible is a great automation tool to manage operating systems, but also to manage database like PostgreSQL. Many&#8230;]]></description>
										<content:encoded><![CDATA[
<p><a rel="noreferrer noopener" href="https://www.ansible.com/" data-type="URL" data-id="https://www.ansible.com/" target="_blank">Ansible</a> is a great automation tool to manage operating systems, but also to manage database like <a rel="noreferrer noopener" href="https://postgresql.org" data-type="URL" data-id="https://postgresql.org" target="_blank">PostgreSQL</a>. Many <a rel="noreferrer noopener" href="https://docs.ansible.com/ansible/latest/collections/community/postgresql/index.html" data-type="URL" data-id="https://docs.ansible.com/ansible/latest/collections/community/postgresql/index.html" target="_blank">Ansible modules</a> are available to create playbooks which execute various database administration tasks.</p>



<p>In this article we will have a closer look how to ensure that </p>



<ul class="wp-block-list"><li>a default database has been created</li><li>a set of configured extensions have been installed</li><li>an initial database schema has been deployed</li></ul>



<p>The user that will log on to the remote host using Ansible (and <code>SSH</code>), must be able to become the <code>postgres</code> user using <code>sudo</code> without a password (in this example, can be changed of course).</p>



<p>The tasks below are split in two big tasks, both containing a block. <a rel="noreferrer noopener" href="https://docs.ansible.com/ansible/latest/user_guide/playbooks_blocks.html" data-type="URL" data-id="https://docs.ansible.com/ansible/latest/user_guide/playbooks_blocks.html" target="_blank">Blocks</a> allow to group certain tasks which might need the same variables (for instance the same tags, become variables, &#8230;).</p>



<p>The first block will ensure that the default database has been created, based on a YAML variable called <code>{{ db_name }}</code>. This variable must be set in either a <code>group_vars</code> file, <code>host_vars</code> file or supplied at the command line.</p>



<p>In the same block, Ansible will also ensure that the <code>pg_stat_statements</code> extension is enabled and installed in the above database. <a href="https://www.postgresql.org/docs/current/pgstatstatements.html" data-type="URL" data-id="https://www.postgresql.org/docs/current/pgstatstatements.html" target="_blank" rel="noreferrer noopener">pg_stat_statements</a> is a very useful extension to debug or display statistics regarding the SQL statements executed on that specific database.</p>



<p>And the end of the first block, a task was added to loop over a YAML variable called <code>{{ shared_preload_extensions }}</code>. The variable is a supposed to be a list and should contain all extra extensions required to enabled to the above database. Shared preload extensions also need to be configured in the <code>postgresql.conf</code> file, which is not covered in this article.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="yaml" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">- name: Default Database
  tags: default_database
  vars:
    ansible_become_user: postgres
    ansible_become_method: sudo
    ansible_become_pass: null
  block:
    - name: Ensure required database
      postgresql_db:
        name: "{{ db_name }}"
        encoding: UTF-8

    - name: Ensure pg_stat_statements extension
      postgresql_ext:
        name: "pg_stat_statements"
        db: "{{ db_name }}"
        schema: public
        state: present

    - name: Ensure shared_preload extensions
      postgresql_ext:
        name: "{{ item }}"
        db: "{{ db_name }}"
        state: present
      loop: "{{ shared_preload_libraries.split(',') }}"
      loop_control:
        label: " {{ item }}"

- name: Ensure initial database schema
  tags: default_database
  block:
    - name: Schema definition file
      template:
        src: "{{ db_name }}/schema_definition.sql.j2"
        dest: /var/lib/postgresql/initial_schema_definition.sql
        owner: postgres
        group: postgres
        mode: 0600
      when: ( role_path + "/templates/" + db_name + "/schema_definition.sql.j2" ) is file
      register: initial_schema

    - name: Apply the schema definition
      ignore_errors: True
      vars:
        ansible_become_user: postgres
        ansible_become_method: sudo
        ansible_become_pass: null
      community.postgresql.postgresql_query:
        db: "{{ db_name }}"
        path_to_script: /var/lib/postgresql/initial_schema_definition.sql
        encoding: UTF-8
        as_single_query: yes
      when: initial_schema.changed

  rescue:
    - debug:
        msg: "No schema definition found for {{db_name}}, skipping..."
</pre>



<p>The second block of tasks will ensure that an initial database schema is deployed. This block consists of two tasks only. </p>



<p>The first task will upload the schema definition SQL-based file to a specific path on the remote host.</p>



<p>The second task will try to execute that SQL file, if it was changed by the first task. This means of course that if the file was changed manually on the remote host (or even removed), Ansible <mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-vivid-red-color">will update that file again</mark> in the first task and it will deploy the full file again in the second task! This might overwrite or break your database.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://jmorano.moretrix.com/2022/04/deploy-a-postgresql-database-with-an-initial-schema-using-ansible/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Perl: Create schema backups in PostgreSQL</title>
		<link>https://jmorano.moretrix.com/2014/08/perl-create-schema-backups-in-postgresql/</link>
					<comments>https://jmorano.moretrix.com/2014/08/perl-create-schema-backups-in-postgresql/#respond</comments>
		
		<dc:creator><![CDATA[Johnny Morano]]></dc:creator>
		<pubDate>Fri, 22 Aug 2014 09:09:20 +0000</pubDate>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Dev]]></category>
		<category><![CDATA[Postgresql]]></category>
		<category><![CDATA[SysAdmin]]></category>
		<guid isPermaLink="false">http://jmorano.moretrix.com/?p=1114</guid>

					<description><![CDATA[At my recent job, I was asked to create a backup procedure, which would dump a PostgreSQL schema&#8230;]]></description>
										<content:encoded><![CDATA[
<p>At my recent job, I was asked to create a backup procedure, which would dump a PostgreSQL schema to a compressed file and which was able to create weekly and daily backups.<br />The backups had to be full backups each time a backup was made and the amount of daily and weekly backups should be defined through thresholds.</p>



<p>The PostgreSQL tool used for those backups is &#8216;<code>pg_dump</code>&#8216; and I have used Perl to script all the interesting stuff together.</p>



<p>The script will basically go through the following steps:</p>



<ul class="wp-block-list"><li>Check the backup path for the required directories (and if not, create them)</li><li>Rotate old backups based on thresholds</li><li>Create a new backup</li></ul>



<p>The script shown below is just an example and probably needs to be adopted for your own needs. The script works for me and the environment it was created in.</p>



<p>First things first.<br />The script uses the following Perl modules:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">use DateTime;
use Pod::Usage;
use YAML qw/LoadFile/;
use File::Path qw/make_path/;
use File::Copy;
use Data::Dumper;
use POSIX qw/setuid/;
</pre>



<p>A YAML configuration file is used to provide the script with essential information. An example configuration file looks like the following:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">thresholds:
    daily: 7
    weekly: 4

backup_path: /data/backup/schema_backups

database: my_db

daily_to_weekly_pattern: sunday

schemas:
    - my_cool_schema
    - my_not_so_cool_schema
</pre>



<p>Remember: YAML is sensitive about tabs!</p>



<p>Command line arguments are set up in the script by using Getopt::Long.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">my ($help, $cfg_file, $schema, $verbose, $debug) = @_;
# Check command line arguments
GetOptions(
    "help"     =&amp;gt; \$help,
    "verbose"  =&amp;gt; \$verbose,
    "debug"    =&amp;gt; \$debug,
    "cfg=s"    =&amp;gt; \$cfg_file,
    "schema=s" =&amp;gt; \$schema,
);
pod2usage(1) if $help;
</pre>



<p>The script needs to run as the &#8216;postgres&#8217; user. Should it be executed by another user (for instance root), then script will try to switch to the &#8216;postgres&#8217; user.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">my ($user) = ( split /\c/, getpwuid($&amp;lt;) )[0]; 
unless ($user eq 'postgres') { 
    p_info("Script $0 needs to run as 'postgres', switching user..."); 
    setuid(scalar getpwnam 'postgres'); 
}</pre>



<p>Next we will load the configuration file and check if a schema name was supplied on the command line. If one was defined, then we will override the schema names which were set in the configuration, and only create a backup of that one schema name.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">if(defined $cfg_file){
    if( -f $cfg_file ){
        p_info("Loading configuration file '$cfg_file'");
        $cfg = LoadFile($cfg_file);
    }
    else {
        die "No such configuration file '$cfg_file'\n";
    }
}

$cfg-&amp;gt;{schemas} = [$schema] if defined $schema;
</pre>



<p>And now we are ready for the mainloop of the script:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">foreach my $s (@{ $cfg-&amp;gt;{schemas} }){
    check_current_backups($s);
    create_backup($s);
}
</pre>



<p>For each schema, we will first check if the required directories are in place and otherwise create them. Afterwards we will check those directories for older backups.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">sub check_current_backups {
    my($schema) = @_;

    check_directory_structure($schema);
    check_backups('daily', $schema);
    check_backups('weekly', $schema);
}

sub check_directory_structure {
    my($schema) = @_;

    foreach my $period (qw/daily weekly/){
        my $_path = return_backup_path($period, $schema);;
        p_info("Checking path '$_path'");
        unless(-d $_path){
            make_path($_path);
            p_info("Created path '$_path'");
        }
    }
}

# check if older backups need rotation / deletion
sub check_backups {
    my($period, $schema) = @_;

    my $path = return_backup_path($period, $schema);

    my @files = glob("$path/*");
    my @sorted = sort { get_date($b) &amp;lt;=&amp;gt; get_date($a) } @files;

    if(scalar @sorted &amp;gt;= $cfg-&amp;gt;{thresholds}{$period}){
        p_info("Rotating backups for period '$period'");
        rotate_backups($period, \@sorted);
    }
}
</pre>



<p>The rotation of the backups works like follows:<br />&#8211; If the day threshold has been reached (for instance 7 daily backups), then those files will be nominated for rotation or deletion</p>



<p>The rotation itself is custom designed for my current job. Each backup filename is appended the day name (Monday, Tuesday, &#8230;). Backup files matching a certain pattern (in my situation &#8216;sunday&#8217;) will be moved into the &#8216;weekly&#8217; backup path, other old files will be deleted.</p>



<p>Since rotation is done before a backup is created, we will delete one more as required file (since a new backup file is going to be created in a few lines further).</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">sub rotate_backups {
    my($period, $files) = @_;

    p_debug("All Files: ".Dumper($files));
    p_debug("$period threshold: ".$cfg-&amp;gt;{thresholds}{$period});

    # make a true copy
    my (@to_move_files) = (@{ $files });
    # The @files contains all backup files, with the youngest as element 0, the oldest 
    # backup as last element.
    # @to_move_files is a slice of @files, starting from the position threshold - 1, 
    # until the end of the array. Those files will be either rotated or removed
    @to_move_files = @to_move_files[ $cfg-&amp;gt;{thresholds}{$period} -1 .. $#to_move_files ];
    p_debug("TO MOVE FILES: ".Dumper(\@to_move_files));

    if($period eq 'daily'){
        foreach my $file (@to_move_files){
            # move backups to weekly
            if($file =~ /$cfg-&amp;gt;{daily_to_weekly_pattern}/){
                p_info("Moving daily backup '$file' to weekly");
                move($file, return_backup_path('weekly', $schema) . '/' . $file)
            }
            else {
                p_info("Removing backup '$file'");
                unlink($file);
            }
        }
    }

    if($period eq 'weekly'){
        foreach my $file (@to_move_files){
            # remove files
            p_info("Removing backup '$file'");
            unlink($file);
        }
    }
}
</pre>



<p>At this point now, the required directory structure has been checked and is present and older backup files have been rotated or deleted.<br />Finally we can create the actual backup:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">sub create_backup {
    my($schema) =@_;

    p_info("Creating backup for schema '$schema', database:" . $cfg-&amp;gt;{database});
    my $now = DateTime-&amp;gt;now;
    my $path = return_backup_path('daily', $schema) 
                . '/' . $now-&amp;gt;ymd('') . $now-&amp;gt;hms('')
                . '_' .lc($now-&amp;gt;day_name) 
                . '.dump.sql';

    # Create the dump file
    my $dump_output = do{
        local $/;
        open my $c, '-|', "pg_dump -v -n $schema -f $path $cfg-&amp;gt;{database} 2&amp;gt;&amp;amp;1" 
            or die "pg_dump for '$schema' failed: $!";
        &amp;lt;$c&amp;gt;;
    };
    p_debug('pg_dump output: ', $dump_output);

    # GZIP the dump file
    my $gzip_output = do{
        local $/;
        open my $c, '-|', "gzip $path 2&amp;gt;&amp;amp;1" 
            or die "gzip for '$path' failed: $!";
        &amp;lt;$c&amp;gt;;
    };
    p_debug('gzip output: ', $gzip_output);

    # change the permissions
    chmod 0660, "$path.gz";

    p_info("Created backup for schema '$schema' in '$path.gz'");
}
</pre>



<p>The backup is created by issuing <code>pg_dump</code> for that schema and it will produce a normal text SQL file. This file will be compressed with <code>gzip</code> and afterwards the file permissions will be changed to 0660. This means that, since the backup file is created by the <code>postgres</code> user, only the <code>postgres</code> user will have access to this file.</p>



<p>The full script and configuration file can be found at <a title="Github Repositry" href="https://github.com/insani4c/perl_tools/tree/master/backup_schema" target="_blank" rel="noopener">https://github.com/insani4c/perl_tools/tree/master/backup_schema</a></p>
]]></content:encoded>
					
					<wfw:commentRss>https://jmorano.moretrix.com/2014/08/perl-create-schema-backups-in-postgresql/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Postgresql: Monitor sequence scans with Perl</title>
		<link>https://jmorano.moretrix.com/2014/02/postgresql-monitor-sequence-scans-perl/</link>
					<comments>https://jmorano.moretrix.com/2014/02/postgresql-monitor-sequence-scans-perl/#comments</comments>
		
		<dc:creator><![CDATA[Johnny Morano]]></dc:creator>
		<pubDate>Wed, 12 Feb 2014 07:33:26 +0000</pubDate>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Dev]]></category>
		<category><![CDATA[Monitoring]]></category>
		<category><![CDATA[Postgresql]]></category>
		<category><![CDATA[SysAdmin]]></category>
		<guid isPermaLink="false">http://jmorano.moretrix.com/?p=1065</guid>

					<description><![CDATA[Not using indexes or huge tables without indexes, can have a very negative impact on the duration of&#8230;]]></description>
										<content:encoded><![CDATA[
<p>Not using indexes or huge tables without indexes, can have a very negative impact on the duration of a SQL query. The query planner will decide to make a sequence scan, which means that the query will go through the table sequentially to search for the required data. When this table is only 100 rows big, you will probably not even notice it is making a sequence scans, but if your table is 1,000,000 rows big or even more, you can probably optimize your table to use indexes to result in faster searches.</p>



<p>In the example script we will be using a <em>Storable</em> state file and we will the statistics as a JSON object in the PostgreSQL database.</p>



<p>First let&#8217;s take a look at the query we will be executing:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="sql" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">SELECT schemaname, relname, seq_tup_read 
FROM pg_stat_all_tables 
WHERE seq_tup_read &amp;gt; '0' 
      AND relname NOT LIKE 'pg_%'
ORDER BY seq_tup_read desc
</pre>



<p>As you can see, PostgreSQL stores all the information we need about our tables in just one table, called <em>pg_stat_all_tables</em>. In this table there is a column called <em>seq_tup_read</em>, which will contain the information we need.</p>



<p>Just reading out this information is not going to be enough, because it contains information since the startup of your PostgreSQL database. Since production databases aren&#8217;t restarted (that often), we will have to compare this information with some previous information (hence the <em>Storable</em> state file).<br />Our plan is to run the script in a cronjob, each 5 minutes.</p>



<p>The statistics are also stored in as a JSON object in a database, just so that we could build some web interface for the statistics, in a later stage. And we want to keep a history of these statistics.</p>



<p>Furthermore the script will <em>setuid</em> to postgres (same like <em>su &#8211; postgres</em> on the command line), so that it could connect to the PostgreSQL UNIX socket file.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">use strict;
use warnings;
use utf8;

use DBI;
use DateTime;
use POSIX qw/setuid/;
use Text::ASCIITable;
use JSON;

my $db   = 'mydatabase';
if(scalar @ARGV){
    $db = shift @ARGV;
}

my $host = '/var/run/postgresql';
my $user = 'postgres';
my $pass = 'undef';

my $state_db   = 'database_statistics';
my $state_host = '192.168.1.1';
my $state_user = 'skeletor';
my $state_pass = 'he-manisawhimp';

my $state_file = '/var/tmp/sequence_read.state';

# suid to postgres
setuid(scalar getpwnam 'postgres');

# define and open up the state file
my $state = {};
$state = retrieve $state_file if -f $state_file;

my $now      = DateTime-&amp;gt;now;

# Connect to the database which we want to monitor
my $dbh = DBI-&amp;gt;connect("dbi:Pg:dbname=$db;host=$host", $user, $pass) 
                or die "Could not connect to database: $!\n";

# Connect to the database that will be used to store the statistics
my $state_dbh = DBI-&amp;gt;connect("dbi:Pg:dbname=$state_db;host=$state_host", $state_user, $state_pass) 
                or die "Could not connect to the State database '$state_db': $!\n";

my $sql = &amp;lt;&amp;lt;EOF;
SELECT schemaname, relname, seq_tup_read 
FROM pg_stat_all_tables 
WHERE seq_tup_read &amp;gt; '0' 
      AND relname NOT LIKE 'pg_%'
ORDER BY seq_tup_read desc
EOF

# Get the statistics
my $results = $dbh-&amp;gt;selectall_arrayref( $sql, undef);

# Store the statistics as a JSON object in the second databse
eval {
    $state_dbh-&amp;gt;do('INSERT INTO mydbschema.seq_tup_read (data) VALUES(?)', undef, encode_json($results));
};
if($@){
    print "Insert into state-db failed: $@\n";
}

# Prepare a nice ASCII table for output
my $t = Text::ASCIITable-&amp;gt;new({ headingText =&amp;gt; 'Seq Tup Read ' . $now-&amp;gt;ymd('-')     . ' ' . $now-&amp;gt;hms(':')});
$t-&amp;gt;setCols('Schema Name','Relation Name ', 'Seq Tup Read', 'Increase (delta)');

my $row_count = 0;
foreach my $r (@{$results}){
    last if $row_count &amp;gt; 25;

    my (@values) = (@{$r});
    my ($increase, $delta) = (0, 0);
    # Calculate the increase and its delta
    if(defined $state-&amp;gt;{last}{$r-&amp;gt;[0].':'.$r-&amp;gt;[1]}{seq_tup_read}){
        $increase = $r-&amp;gt;[2] - $state-&amp;gt;{last}{$r-&amp;gt;[0].':'.$r-&amp;gt;[1]}{seq_tup_read};
        $delta    = $increase / $state-&amp;gt;{last}{$r-&amp;gt;[0].':'.$r-&amp;gt;[1]}{seq_tup_read} * 100;
        my $str = sprintf '%.0f (%.4f %%)', $increase, $delta;
        push @values, ($str);
    }
    else {
        push @values, '0 (0%)';
    }
    # Store this information for the next run of the script
    $state-&amp;gt;{last}{$r-&amp;gt;[0].':'.$r-&amp;gt;[1]}{seq_tup_read} = $r-&amp;gt;[2];
    $state-&amp;gt;{last}{$r-&amp;gt;[0].':'.$r-&amp;gt;[1]}{delta}        = $delta;
    $state-&amp;gt;{last}{$r-&amp;gt;[0].':'.$r-&amp;gt;[1]}{increase}     = $increase;

    # Only add the information to ASCII output table if there was an increase
    next unless $increase &amp;gt; 0;
    $t-&amp;gt;addRow(@values);
    $row_count++;
}
# Print out the ASCII table
print $t;

nstore $state, $state_file;

</pre>
]]></content:encoded>
					
					<wfw:commentRss>https://jmorano.moretrix.com/2014/02/postgresql-monitor-sequence-scans-perl/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
			</item>
		<item>
		<title>Postgresql: Monitor unused indexes</title>
		<link>https://jmorano.moretrix.com/2014/02/postgresql-monitor-unused-indexes/</link>
					<comments>https://jmorano.moretrix.com/2014/02/postgresql-monitor-unused-indexes/#comments</comments>
		
		<dc:creator><![CDATA[Johnny Morano]]></dc:creator>
		<pubDate>Tue, 11 Feb 2014 09:09:08 +0000</pubDate>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Dev]]></category>
		<category><![CDATA[Monitoring]]></category>
		<category><![CDATA[Postgresql]]></category>
		<category><![CDATA[SysAdmin]]></category>
		<guid isPermaLink="false">http://jmorano.moretrix.com/?p=1057</guid>

					<description><![CDATA[Working on large database systems, with many tables and many indexes, it is easy to loose the overview&#8230;]]></description>
										<content:encoded><![CDATA[
<p>Working on large database systems, with many tables and many indexes, it is easy to loose the overview on what is actually being used and what is just consuming unwanted disk space.<br />If indexes are not closely monitored, they could end up using undesired space and moreover, they will consume unnecessary CPU cycles.</p>



<p>Statistics about indexes can be easily retrieved from the PostgreSQL database system. All required information is stored in two tables:</p>



<ul class="wp-block-list"><li>pg_stat_user_indexes</li><li>pg_index</li></ul>



<p>When joining these two tables, interesting information can be read in the following columns:</p>



<ul class="wp-block-list"><li>idx_scan: has the query planner used this index for an &#8216;Index Scan&#8217;, the number returned is the amount of times it was used</li><li>idx_tup_read: how many tuples have been read by using the index</li><li>idx_tup_fetch: how many tuples have been fetch by using the index</li></ul>



<p>A neat function called <em>pg_relation_size()</em> allows to fetch the on-disk size of a relation, in this case the index.</p>



<p>Based on this information, the monitoring query will be built up as follows:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">SELECT 
    relid::regclass AS table, 
    indexrelid::regclass AS index, 
    pg_size_pretty(pg_relation_size(indexrelid::regclass)) AS index_size, 
    idx_tup_read, 
    idx_tup_fetch, 
    idx_scan
FROM 
    pg_stat_user_indexes 
    JOIN pg_index USING (indexrelid) 
WHERE 
    idx_scan = 0 
    AND indisunique IS FALSE
</pre>



<p>Now, all we need to do is write a script which stores this information in some kind of file and periodically report about the statistics.</p>



<p>First of all we will need a configuration file, which contains the database credentials.<br />I&#8217;ve chosen YAML because it is so versatile.</p>



<p>It will contain two important sets of information:</p>



<ul class="wp-block-list"><li>The database credentials</li><li>path to the state file</li></ul>



<p>Example:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">dsn: "dbi:Pg:host=/var/run/postgresql;database=testdb"
user: postgres
pass:
state_file: /var/tmp/monitor_unused_indexes.state
</pre>



<p>As you can see, we will be connect to the PostgreSQL database by using its UNIX socket.</p>



<p>The script will use <em>Text::ASCIITable</em> to output the statistics in a nice table. <em>Storable</em> is used to save our statistics to disk.</p>



<p>In the below script, we will check if an index was unused in a timespan of 30 days. If yes, the script will report this index to STDOUT.<br />Therefore, we will store a score and timestamp for each unused index in the state file.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">#!/usr/bin/env perl
use strict;
use warnings;
use utf8;
use DBI;
use Storable qw/nstore retrieve/;
use YAML qw/LoadFile/;
use POSIX qw/setuid/;
use Getopt::Long;
use DateTime;
use Text::ASCIITable;

my $cfg_file = './monitor_unused_indexes.yaml';
my $verbose = 0;
GetOptions("cfg=s" =&amp;gt; \$cfg_file,
           "verbose|v" =&amp;gt; \$verbose, 
        );

my $sql = &amp;lt;&amp;lt;EOS;
SELECT 
    relid::regclass AS table, 
    indexrelid::regclass AS index, 
    pg_size_pretty(pg_relation_size(indexrelid::regclass)) AS index_size, 
    idx_tup_read, 
    idx_tup_fetch, 
    idx_scan
FROM 
    pg_stat_user_indexes 
    JOIN pg_index USING (indexrelid) 
WHERE 
    idx_scan = 0 
    AND indisunique IS FALSE
EOS

my ($cfg) = LoadFile($cfg_file);

# suid to postgres, other whatever user is configured in the config.yaml file
setuid(scalar getpwnam $cfg-&amp;gt;{user});

# Connect to the database
my $dbh = DBI-&amp;gt;connect($cfg-&amp;gt;{dsn}, $cfg-&amp;gt;{user}, $cfg-&amp;gt;{pass}) 
            or die "Could not connect to database: $! (DBI ERROR: ".$DBI::errstr.")\n";

my $state;
if(-f $cfg-&amp;gt;{state_file}){
    $state = retrieve $cfg-&amp;gt;{state_file};
}

# Fetch the statistics
my $results = $dbh-&amp;gt;selectall_arrayref( $sql, undef );

my $now_dt   = DateTime-&amp;gt;now;

# Initialize the ASCII table
my $t = Text::ASCIITable-&amp;gt;new({ headingText =&amp;gt; 'INDEX STATISTICS'});
$t-&amp;gt;setCols(qw/Table Index Index_Size idx_tup_read idx_tup_fetch idx_scan/);

# Analyze the results
foreach my $r (@$results){
    if($verbose){
        $t-&amp;gt;addRow(@{$r});
    }
    # Only update the state file if --verbose was not specified.
    # This way the script can be check manually with --verbose many times and executed for instance
    # from a cronjob once a day without --verbose
    else {
        if(defined $state-&amp;gt;{unused_indexes}{$r-&amp;gt;[1]}){
            my $first_dt = DateTime-&amp;gt;from_epoch( epoch =&amp;gt; $state-&amp;gt;{unused_indexes}{$r-&amp;gt;[1]}{first_hit} );
            if($first_dt-&amp;gt;add(days =&amp;gt; $state-&amp;gt;{unused_indexes}{$r-&amp;gt;[1]}{score})-&amp;gt;day == $now_dt-&amp;gt;day ) {
                $state-&amp;gt;{unused_indexes}{$r-&amp;gt;[1]}{score}++;
            }
            else {
                $state-&amp;gt;{unused_indexes}{$r-&amp;gt;[1]}{score}     = 1;
                $state-&amp;gt;{unused_indexes}{$r-&amp;gt;[1]}{first_hit} = $now_dt-&amp;gt;epoch;;
            }
        }
        else {
            $state-&amp;gt;{unused_indexes}{$r-&amp;gt;[1]}{score}     = 1;
            $state-&amp;gt;{unused_indexes}{$r-&amp;gt;[1]}{first_hit} = $now_dt-&amp;gt;epoch;;
        }
    }
}

# Print out the statistics table, if --verbose was specified
print $t if $verbose; 

# Store the statistics to disk in a state file
nstore $state, $cfg-&amp;gt;{state_file};

foreach my $idx (keys %{ $state-&amp;gt;{unused_indexes} }){
    my $first_dt = DateTime-&amp;gt;from_epoch( epoch =&amp;gt; $state-&amp;gt;{unused_indexes}{$idx}{first_hit} );
    if( $first_dt-&amp;gt;add(days =&amp;gt; 30) &amp;lt;= $now_dt ){
        my $line = "Index: $idx ready for deletion";
        $line .= " (score:" . $state-&amp;gt;{unused_indexes}{$idx}{score};
        $line .= " (|first_hit:" . DateTime-&amp;gt;from_epoch(epoch =&amp;gt; $state-&amp;gt;{unused_indexes}{$idx}{first_hit})-&amp;gt;ymd . ")";

        print $line."\n" if $verbose;
    }
}
</pre>
]]></content:encoded>
					
					<wfw:commentRss>https://jmorano.moretrix.com/2014/02/postgresql-monitor-unused-indexes/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
			</item>
		<item>
		<title>Postgresql 9.3: Creating an index on a JSON attribute</title>
		<link>https://jmorano.moretrix.com/2013/12/postgresql-9-3-creating-index-json-attribute/</link>
					<comments>https://jmorano.moretrix.com/2013/12/postgresql-9-3-creating-index-json-attribute/#respond</comments>
		
		<dc:creator><![CDATA[Johnny Morano]]></dc:creator>
		<pubDate>Fri, 27 Dec 2013 10:28:25 +0000</pubDate>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[Dev]]></category>
		<category><![CDATA[Postgresql]]></category>
		<category><![CDATA[SQL]]></category>
		<guid isPermaLink="false">http://jmorano.moretrix.com/?p=1036</guid>

					<description><![CDATA[Recently I&#8217;ve discovered some very interesting new features in the PostgreSQL 9.3 database.First of all, a new data&#8230;]]></description>
										<content:encoded><![CDATA[
<p>Recently I&#8217;ve discovered some very interesting new features in the PostgreSQL 9.3 database.<br />First of all, a new data type has been introduced: <a title="Datatype JSON" href="http://www.postgresql.org/docs/9.3/static/datatype-json.html" target="_blank" rel="noopener">JSON</a>. Together with this new data type, <a title="JSON Functions" href="http://www.postgresql.org/docs/9.3/static/functions-json.html" target="_blank" rel="noopener">new functions</a> were also introduced.</p>



<p>These new features now simply for instance saving web forms in your Postgresql database. Or actually any kind of dynamic data, such as for instance Perl hashes. Plus, thanks to the new JSON functions, this data can be easily searched and indexed.</p>



<p>Let&#8217;s start with creating a test table.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="sql" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">CREATE SEQUENCE data_seq    
    START WITH 1    
    INCREMENT BY 1    
    NO MINVALUE    
    NO MAXVALUE    
    CACHE 1;

CREATE TABLE data (    
    id bigint DEFAULT nextval('data_seq'::regclass) NOT NULL,
    form_name TEXT,
    form_data JSON
);
</pre>



<p>I&#8217;ve inserted into this table 100k rows of test data with a very simple Perl script.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="sql" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">#!/usr/bin/perl
use strict;
use DBI;
use AnyEvent;
use AnyEvent::Util;
$AnyEvent::Util::MAX_FORKS = 25;

print "Inserting test data...\n";
my $cv = AnyEvent-&amp;gt;condvar;
$cv-&amp;gt;begin;
foreach my $d (0..100000){
    $cv-&amp;gt;begin;
    fork_call {
        my($d) = @_;
        my $name = do{local $/; open my $c, '-|', 'pwgen -B -s -c1 64'; &amp;lt;$c&amp;gt;};
        chomp($name);
        my $dbh = DBI-&amp;gt;connect("dbi:Pg:host=/var/run/postgresql;dbname=test;port=5432",'postgres', undef);
        $dbh-&amp;gt;do(qq{insert into data (form_name,form_data) VALUES('test_form', '{"c":{"d":"ddddd"},"name":"$name","b":"bbbbb", "count":$d}')});
        $dbh-&amp;gt;disconnect;
        return $d;
    } $d,
    sub {
        my ($count) = @_;
        print "$d ";
        $cv-&amp;gt;end;
    }
} 
$cv-&amp;gt;end;
$cv-&amp;gt;recv;
print "\n\nDone\n";
</pre>



<p>Now let&#8217;s assume that the JSON data we are going to insert (or have inserted) always contains the attribute field &#8216;name&#8217;. On this attribute we will create the following database index:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="sql" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">CREATE INDEX ON data USING btree (form_name, json_extract_path_text(form_data,'name'));
</pre>



<p>The above example creates a multi-column index.</p>



<p>Now let&#8217;s a make our first test.<br />The first test will not use the index we have created previously.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="sql" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">EXPLAIN ANALYZE VERBOSE SELECT * FROM data WHERE form_name = 'test_form' AND form_data-&amp;gt;&amp;gt;'name' = 'cbcO5twuPnAYJ1VLV6gsEv9zWs2AbQxQ9PoALLr2w6Rwpr2PtoQHCCK0hyOMuIME';
                                                                             QUERY PLAN                                                                              
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on data  (cost=0.00..4337.28 rows=500 width=102) (actual time=28.608..129.945 rows=1 loops=1)
   Filter: ((data.form_name = 'test_form'::text) AND ((data.form_data -&amp;gt;&amp;gt; 'name'::text) = 'cbcO5twuPnAYJ1VLV6gsEv9zWs2AbQxQ9PoALLr2w6Rwpr2PtoQHCCK0hyOMuIME'::text))
   Rows Removed by Filter: 100000
 Total runtime: 129.968 ms
(5 rows)

</pre>



<p>130ms for searching through 100k rows, is actually quite ok.</p>



<p>Now let&#8217;s see how we can speed up this query by using the index we&#8217;ve created.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="sql" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">EXPLAIN ANALYZE VERBOSE SELECT * FROM data WHERE form_name = 'test_form' AND json_extract_path_text(form_data,'name') = 'cbcO5twuPnAYJ1VLV6gsEv9zWs2AbQxQ9PoALLr2w6Rwpr2PtoQHCCK0hyOMuIME';
                                                                             QUERY PLAN                                                                                                
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using data_form_name_json_extract_path_text_idx on data  (cost=0.42..8.44 rows=1 width=102) (actual time=0.056..0.057 rows=1 loops=1)
   Index Cond: ((data.form_name = 'test_form'::text) AND (json_extract_path_text(data.form_data, VARIADIC '{name}'::text[]) = 'cbcO5twuPnAYJ1VLV6gsEv9zWs2AbQxQ9PoALLr2w6Rwpr2PtoQHCCK0hyOMuIME'::text))
 Total runtime: 0.084 ms
(4 rows)

</pre>



<p>0.084ms! That&#8217;s is about 1625 times faster! What makes this index extremely interesting is that the index has only been created on one attribute of the JSON data and not on the entire JSON data. This will keep the index data small and thus will be kept longer in your database&#8217; memory.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://jmorano.moretrix.com/2013/12/postgresql-9-3-creating-index-json-attribute/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>PostgreSQL 9.2 Master &#8211; Slave Monitoring</title>
		<link>https://jmorano.moretrix.com/2013/08/postgresql-9-2-master-slave-monitoring/</link>
					<comments>https://jmorano.moretrix.com/2013/08/postgresql-9-2-master-slave-monitoring/#comments</comments>
		
		<dc:creator><![CDATA[Johnny Morano]]></dc:creator>
		<pubDate>Tue, 13 Aug 2013 13:07:04 +0000</pubDate>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Bash]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Dev]]></category>
		<category><![CDATA[Monitoring]]></category>
		<category><![CDATA[Nagios]]></category>
		<category><![CDATA[Postgresql]]></category>
		<guid isPermaLink="false">http://jmorano.moretrix.com/?p=943</guid>

					<description><![CDATA[Nagios plugin script written in Bash to check the master-slave replication in PostgreSQL (tested on PostgreSQL 9.2.4) (executed&#8230;]]></description>
										<content:encoded><![CDATA[<p>Nagios plugin script written in Bash to check the master-slave replication in PostgreSQL (tested on PostgreSQL 9.2.4) (executed on the slave).<br />
The script will report how many bytes the slave server is behind, and how many seconds ago the last replay of data occurred.</p>
<p>The script must be executed as &#8216;postgres&#8217; user.</p>
<pre class="brush:bash">
#!/bin/bash

# $Id: check_slave_replication.sh 3421 2013-08-09 07:52:44Z jmorano $

STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
 
## Master (p_) and Slave (s_) DB Server Information	
export s_host=$1
export s_port=$2
export p_db=$3
export p_host=$4
export p_port=$5
 
export psql=/opt/postgresql/bin/psql
export bc=/usr/bin/bc
 
## Limits
export  critical_limit=83886080 # 5 * 16MB, size of 5 WAL files
export   warning_limit=16777216 # 16 MB, size of 1 WAL file
 
master_lag=$($psql -U postgres -h$p_host -p$p_port -A -t -c "SELECT pg_xlog_location_diff(pg_current_xlog_location(), '0/0') AS offset" $p_db)
slave_lag=$($psql -U postgres  -h$s_host -p$s_port -A -t -c "SELECT pg_xlog_location_diff(pg_last_xlog_receive_location(), '0/0') AS receive" $p_db)
replay_lag=$($psql -U postgres -h$s_host -p$s_port -A -t -c "SELECT pg_xlog_location_diff(pg_last_xlog_replay_location(), '0/0') AS replay" $p_db)
replay_timediff=$($psql -U postgres -h$s_host -p$s_port -A -t -c "SELECT NOW() - pg_last_xact_replay_timestamp() AS replication_delay" $p_db)
 
if [[ $master_lag -eq '' || $slave_lag -eq '' || $replay_lag -eq '' ]]; then
    echo "CRITICAL: Stream has no value to compare (is replication configured or connectivity problem?)"
    exit $STATE_CRITICAL
else
    if [[ $master_lag -eq $slave_lag && $master_lag -eq $replay_lag && $slave_lag -eq $replay_lag ]] ; then
        echo "OK: Stream: MASTER:$master_lag Slave:$slave_lag Replay:$replay_lag"
        exit $STATE_OK
    else
        if [[ $master_lag -eq $slave_lag ]] ; then
            if [[ $master_lag -ne $replay_lag ]] ; then
                if [ $(bc <<< $master_lag-$replay_lag) -lt $warning_limit ]; then
                    echo "OK: Stream: MASTER:$master_lag Replay:$replay_lag :: REPLAY BEHIND"
                    exit $STATE_OK
                else
                    echo "WARNING: Stream: MASTER:$master_lag Replay:$replay_lag :: REPLAY $(bc <<< $master_lag-$replay_lag)bytes BEHIND (${replay_timediff}seconds)"
                    exit $STATE_WARNING
                fi
            fi
        else
            if [ $(bc <<< $master_lag-$slave_lag) -gt $critical_limit ]; then
                echo "CRITICAL: Stream: MASTER:$master_lag Slave:$slave_lag :: STREAM BEYOND CRITICAL LIMIT ($(bc <<< $master_lag-$slave_lag)bytes)"
                exit $STATE_CRITICAL
            else
                if [ $(bc <<< $master_lag-$slave_lag) -lt $warning_limit ]; then
                    echo "OK: Stream: MASTER:$master_lag Slave:$slave_lag Replay:$replay_lag :: STREAM BEHIND"
                    exit $STATE_OK
                else
                    echo "WARNING: Stream: MASTER:$master_lag Slave:$slave_lag :: STREAM BEYOND WARNING LIMIT ($(bc <<< $master_lag-$replay_lag)bytes)"
                    exit $STATE_WARNING
                fi
            fi
        fi
        echo "UNKNOWN: Stream: MASTER: $master_lag Slave: $slave_lag Replay: $replay_lag"
        exit $STATE_UNKNOWN
    fi
fi
</pre>
<p>Possible outputs:</p>
<pre class="brush:bash">
$ bash check_slave_replication.sh 192.168.0.1 5432 live 192.168.0.2 5432
WARNING: Stream: MASTER:1907958306184 Replay:1907878056888 :: REPLAY 80249296bytes BEHIND (00:03:14.056747seconds)
$ bash check_slave_replication.sh 192.168.0.1 5432 live 192.168.0.2 5432
OK: Stream: MASTER:2055690128376 Slave:2055690143144 Replay:2055690193744 :: STREAM BEHIND
$ bash check_slave_replication.sh 192.168.0.1 5432 live 192.168.0.2 5432
OK: Stream: MASTER:2055690497120 Replay:2055690497328 :: REPLAY BEHIND
$ bash check_slave_replication.sh 192.168.0.1 5432 live 192.168.0.2 5432
OK: Stream: MASTER:2055691704672 Slave:2055691704672 Replay:2055691704672
</pre>
]]></content:encoded>
					
					<wfw:commentRss>https://jmorano.moretrix.com/2013/08/postgresql-9-2-master-slave-monitoring/feed/</wfw:commentRss>
			<slash:comments>14</slash:comments>
		
		
			</item>
	</channel>
</rss>
