dark

Postgresql 9.3: Creating an index on a JSON attribute

blank
blank

Recently I’ve discovered some very interesting new features in the PostgreSQL 9.3 database.
First of all, a new data type has been introduced: JSON. Together with this new data type, new functions were also introduced.

These new features now simply for instance saving web forms in your Postgresql database. Or actually any kind of dynamic data, such as for instance Perl hashes. Plus, thanks to the new JSON functions, this data can be easily searched and indexed.

Let’s start with creating a test table.

CREATE SEQUENCE data_seq    
    START WITH 1    
    INCREMENT BY 1    
    NO MINVALUE    
    NO MAXVALUE    
    CACHE 1;

CREATE TABLE data (    
    id bigint DEFAULT nextval('data_seq'::regclass) NOT NULL,
    form_name TEXT,
    form_data JSON
);

I’ve inserted into this table 100k rows of test data with a very simple Perl script.

#!/usr/bin/perl
use strict;
use DBI;
use AnyEvent;
use AnyEvent::Util;
$AnyEvent::Util::MAX_FORKS = 25;

print "Inserting test data...\n";
my $cv = AnyEvent->condvar;
$cv->begin;
foreach my $d (0..100000){
    $cv->begin;
    fork_call {
        my($d) = @_;
        my $name = do{local $/; open my $c, '-|', 'pwgen -B -s -c1 64'; <$c>};
        chomp($name);
        my $dbh = DBI->connect("dbi:Pg:host=/var/run/postgresql;dbname=test;port=5432",'postgres', undef);
        $dbh->do(qq{insert into data (form_name,form_data) VALUES('test_form', '{"c":{"d":"ddddd"},"name":"$name","b":"bbbbb", "count":$d}')});
        $dbh->disconnect;
        return $d;
    } $d,
    sub {
        my ($count) = @_;
        print "$d ";
        $cv->end;
    }
} 
$cv->end;
$cv->recv;
print "\n\nDone\n";

Now let’s assume that the JSON data we are going to insert (or have inserted) always contains the attribute field ‘name’. On this attribute we will create the following database index:

CREATE INDEX ON data USING btree (form_name, json_extract_path_text(form_data,'name'));

The above example creates a multi-column index.

Now let’s a make our first test.
The first test will not use the index we have created previously.

EXPLAIN ANALYZE VERBOSE SELECT * FROM data WHERE form_name = 'test_form' AND form_data->>'name' = 'cbcO5twuPnAYJ1VLV6gsEv9zWs2AbQxQ9PoALLr2w6Rwpr2PtoQHCCK0hyOMuIME';
                                                                             QUERY PLAN                                                                              
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on data  (cost=0.00..4337.28 rows=500 width=102) (actual time=28.608..129.945 rows=1 loops=1)
   Filter: ((data.form_name = 'test_form'::text) AND ((data.form_data ->> 'name'::text) = 'cbcO5twuPnAYJ1VLV6gsEv9zWs2AbQxQ9PoALLr2w6Rwpr2PtoQHCCK0hyOMuIME'::text))
   Rows Removed by Filter: 100000
 Total runtime: 129.968 ms
(5 rows)

130ms for searching through 100k rows, is actually quite ok.

Now let’s see how we can speed up this query by using the index we’ve created.

EXPLAIN ANALYZE VERBOSE SELECT * FROM data WHERE form_name = 'test_form' AND json_extract_path_text(form_data,'name') = 'cbcO5twuPnAYJ1VLV6gsEv9zWs2AbQxQ9PoALLr2w6Rwpr2PtoQHCCK0hyOMuIME';
                                                                             QUERY PLAN                                                                                                
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using data_form_name_json_extract_path_text_idx on data  (cost=0.42..8.44 rows=1 width=102) (actual time=0.056..0.057 rows=1 loops=1)
   Index Cond: ((data.form_name = 'test_form'::text) AND (json_extract_path_text(data.form_data, VARIADIC '{name}'::text[]) = 'cbcO5twuPnAYJ1VLV6gsEv9zWs2AbQxQ9PoALLr2w6Rwpr2PtoQHCCK0hyOMuIME'::text))
 Total runtime: 0.084 ms
(4 rows)

0.084ms! That’s is about 1625 times faster! What makes this index extremely interesting is that the index has only been created on one attribute of the JSON data and not on the entire JSON data. This will keep the index data small and thus will be kept longer in your database’ memory.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Previous Post
blank

Datatables and Perl (and a little bit of jQuery)

Next Post
blank

Postgresql: Monitor unused indexes

Related Posts