practical json - pgconf india · 2020-03-16 · ivan panchenko postgres professional . about myself...

Practical JSON

Ivan Panchenko

Postgres Professional

About myself

• Astronomer

• Application developer since 1996

• Use PostgreSQL since 1999

• Use JSON in PostgreSQL since 2009

(yes, before it was supported)

• Use PL/Perl since 2009

• Cofounder of Postgres Professional Co.

About this talk

• Too short time to cover full JSON

• So the talk does not cover JSON completely

• The documentation does not, either

• So I will give the main ideas

• And I hope it would be useful

How to store polymorphic rows?

EAV and brothers? No!

2003 (Pg 8.2) : HStore extension (T.Sigaev, O.Bartunov)

– a date type to store sets of key-value pairs 'a=>1, b=>2'::hstore

Operators: • Retrieve value by key : hstore -> text

• Key exists? : hstore ? text

• Hstore contains other one ? : hstore @ hstore

Index support with GiST, GIN (since 2006)

Jsonb comes to help

JSON:

• Standard

• Multi-level

• JSONB (better)

HStore:

• Fast

• Multi-level Hstore (proposal)

• 2011: Joseph Adams proposed a JSON extension for 9.1

(not accepted)

• 2012: 9.2: json data type (storage and verification)

• 2012: Nested HStore proposal

• 2014: 9.4 JSONB (Bartunov, Korotkov, Sigaev)

• 2016: SQL:2016 standard: JSON support (Technical Report)

• 2019: 12 SQL/JSON (Bartunov, Korotkov, Glukhov)

https://commitfest.postgresql.org/17/1471/

https://postgrespro.com/list/id/[email protected]

http://standards.iso.org/ittf/PubliclyAvailableStandards/c067367_ISO_IEC_TR_19075-6_2017.zi

http://standards.iso.org/ittf/PubliclyAvailableStandards/c067367_ISO_IEC_TR_19075-6_2017.zi

https://postgrespro.ru/docs/postgresql/12/functions-json

https://commitfest.postgresql.org/17/1471/

Google trends:

jsonb vs hstore

Db-Engines.com ranking

(relative)

JSON or JSONB ?

JSON:

• Text

• Preserves key order

• Allows duplicate keys

• Keeps spaces

• Compact (if no spaces), May be

2 times smaller than JSONB, for

short values.

• Fast INSERT

• No @> (contains) operator

JSONB:

• Fast values retrieval (up to

1000 times!)

• Sorted and unique keys.

• Index support

JSON was not

designed for

storage !!!

Start practicing….

Creating JSON(B)

1. From textual representation: '{}'::jsonb

2. Constructing functions: jsonb_build_object(), array_to_json(), …

https://postgrespro.com/docs/postgresql/12/functions-json

3. From database objects:

SELECT row_to_json(pg_class) FROM pg_class;





Values retrieval

1. Operators: json->text, json->>text:

SELECT row_to_json(pg_class)->>'relname'

FROM pg_class;

2. Operators: json #> text[], json #>>text[]

3. Iterator functions: json_each, json_each_text,

json_array_elements, json_array_elements_text

SELECT * FROM

json_each_text('{"a":"foo","b":"bar"}')

key | value

-----+-------

a | foo

b | bar

Simple search

1. Операторы json->text, json->>text:

WHERE js ->>'key' = 'value';

2. Операторы json #> text[], json #>>text[]

WHERE js #>>ARRAY['key'] = 'value';

3. Функции json_each, json_each_text

WHERE EXISTS (

SELECT * FROM json_each_text(js)

WHERE key='key' AND value='value'

);

B

Tree

Test data { "company_name": "Postgres Vr ̥ttipara", "offices": [

{ "name": "Main",

"city": "Bengaluru",

"area": 1000

},

{ "name": "Secondary",

"city": "Pune",

"area" : 200

}

]

}

CREATE TABLE tbl (js jsonb); -- Bad practice!

“Contains” search (@>) Which companies have offices in Bengaluru?

WHERE js @> '{"offices":[ {"city" : "Bengaluru"} ]}'

CREATE INDEX idx ON tbl USING GIN (js);

Bitmap Heap Scan on tbl (cost=68.77..411.01 rows=100 width=194)

Recheck Cond: (js @> '{"offices": [{"city": "Bengaluru"}]}'::jsonb)

-> Bitmap Index Scan on idx (cost=0.00..68.75 rows=100 width=0)

Index Cond: (js @> '{"offices": [{"city": "Bengaluru"}]}'::jsonb)

GIN

More complicated search

Which companies have offices in any of the two cities?

WHERE

js @>

'{"offices":[ {"city" : "Bengaluru"} ]}'

OR

js @>

'{"offices":[ {"city" : "Pune"} ]}'

Other (pure SQL) way:

WHERE EXISTS (

SELECT *

FROM jsonb_array_elements(js->'offices') t

WHERE t->>'city' IN ('Bengaluru', 'Pune')

);

GIN

Even more complicated search

Which companies have MAIN office in any of the two cities?

WHERE

js @>

'{"offices":[

{"city": "Bengaluru", "name": "Main" }

]}'

OR

js @>

'{"offices":[

{"city": "Pune", "name": "Main" }

]}'

GIN

SQL style…

Which companies have MAIN office in any of the two cities?

WHERE EXISTS(

SELECT *



AND t->>'name' = 'Main'

);

No index acceleration …

Most complicated search for today

Which companies have MAIN office with area > 500 sq.m in

any of the two cities?

WHERE EXISTS(

SELECT *



AND t->>'name' = 'Main'

AND (t->>'area')::float > 500

);

@> operator cannot help for >…

(in fact, it can, if used for preselection)

Some conclusions

Search with -> is good and can be fast with B-Tree

functional indexes, but has poor flexibility.

Search with @> is more flexible, and fast with GIN, but

not effective fo complicated queries.

SQL-style search is most flexible, but not effective.

???????

Advanced JSONB search solutions

JSquery extension (Alexander Korotkov)

JSONPATH – SQL:2016 features in PostgreSQL 12

(O.Bartunov, N.Glukov, A.Korotkov, L.Mantrova,

T.Sigaev)

JSONTABLE – continuation in PostgreSQL 13

https://github.com/postgrespro/jsquery

https://github.com/postgrespro/jsquery

JSQuery

Defines a data type for a language for JSON search queries

Jsonb @@ jsquery

GIN index support (2 types of GIN indexes optimized for

different types of queries)

JSONPATH

• Defines a JSONPATH data type for a jsonb query

language.

• Defines functions and operators.

• Defines an opclass for index search.

• Implements SQL:2016.

Why a datatype?

• Because we can (this is a Postgres way)

• To enable operators and indexes, leveraging

PostgreSQL extendability

JSONPATH: retrieval

City of the first office:

SELECT jsonb_path_query_first(

js, '$.offices[0].city'

) FROM tbl;

City of the Main office with area > 500

SELECT jsonb_path_query_first(

js,

'$.offices[*]

? (@.area > 500 && @.name == "Main")

.city'

) FROM tbl;

JSONPATH: search

SELECT * FROM tbl WHERE

js @?

'$.offices[*]

? (@.area > 500 && @.name == "Main")

.city'

CREATE INDEX idx1 ON tbl

USING gin (js jsonb_path_ops);

GIN

Details on JSONPATH:

https://github.com/obartunov/sqljsondoc/blob/master/jsonpath.md

More opclasses in JSQuery


What else can we do with JSON ?

JSON enthusiast’s table:

-- ( not recommended! )

CREATE TABLE data (

fields jsonb

);


(next day)

-- ( not recommended! )

CREATE TABLE data (

fields jsonb

);

CREATE INDEX data_pk

ON data (fields->>'id');


(2 days later)

CREATE TABLE foo(

fields jsonb

);

CREATE UNIQUE INDEX data_pk

ON foo((fields->>'id'));

CREATE UNIQUE INDEX data_pki

ON foo (((fields->>'id'):int));

-- which is better?

INSERT INTO foo

SELECT jsonb_build_object('q',random(),'id', x)

FROM generate_series(10000000,20000000);

Storing all fields in a single JSONB

1. Impossible to define referential constraints

CREATE TABLE data_smth (

data_id int4 REFERENCES ……… ???? WTF

);

Part 1. Storing primary key in JSON

2. Do not forget to check it for NOT NULL

CHECK (fields->>'id' IS NOT NULL);

3. Still it is not a PK. Define IDENTITY for logical

replcation…

( not recommended! )

Storing all fields in a single JSONB

Part 2. Storing all other data in JSON

( not recommended! )

• Needs more space (keys + JSONB overhead) not always!!

• Slower retrieval (specially with TOAST)

• index scan ~ same performance, Seq scan is 2-3 times

slower even without TOAST

• No statistics: Planning is less effective.

• No types. Even dates missing!

• Less readability (sometimes)

A good article by Dan Robinson.

https://heap.io/blog/engineering/when-to-avoid-jsonb-in-a-postgresql-schema

How to save space with JSON ?

• Merge several rows into one row

• Which rows to merge?

• Close in space-time, or semantically

• Example: part of a time series.

Which fields to store in JSON?

• Really polymorphic data

• Better to store schema metadata too

• Use something to restrict from chaos

• Data with internal structure

• Specially if you plan to use JSQuery or JSONPATH

• If you need to store history, regardless of the schema

changes

Polymorphic structure with versions

CREATE TABLE foo (

id int primary key,

subclass … ,

schema_version …

/*

SUPERCLASS_FIELDS

*/

subclass_fields jsonb

);

CREATE TRIGGER …

-- ensure schema

foo

Subclass1 Subclass2 Subclass3

subclasses

Metadata

subclass_versions

subclass_fields

Fields with internal structure

• Image

{

file_name: '...',

file_path: '...',

width:

height:

}

• Query plan

a big tree returned by

EXPLAIN (FORMAT JSON)

Obtaining and storing query plans

• Cannot obtain it in pure SQL: SELECT FROM (EXPLAIN …)

• Also impossible: EXPLAIN INTO ….

• Need to make a function, like that:

CREATE FUNCTION explain (query text)

RETURNS jsonb

LANGUAGE plperl TRANSFORM FOR TYPE jsonb AS

$$

my ($sql) = @_;

my $res = spi_exec_query(

"EXPLAIN(FORMAT JSON) $sql", 1);

return $res->{rows}[0]->{"QUERY PLAN"};

$$;

JSON aggregates

• Problem: Obtain a book list together with authors

book person

authorship

pos int

Field theory // Landau, L.D., Lifshitz, E.M.

The C Programming Language // Kernigan, B.V., Ritchie, D.M.

JSON aggregates

Field Theory

Langau, L.D. Lifshitz, E.M.

The C Programming Language

Kernigan, B.V., Ritchie, D.M.

JSON aggregates

SELECT book.*,

( SELECT json_agg(row_to_json(x))

FROM (

SELECT person.*

FROM person

JOIN authorship

ON person.id = person_id

WHERE book_id = book.id

ORDER BY pos

) x

) authors

FROM book;

More JSON aggregates

• Expanding attributes from referenced tables

book publisher

series

SELECT book.*,

row_to_json(series) AS $series,

row_to_json(publisher) AS $publisher

FROM book

LEFT JOIN publisher

ON published.id = book.publisher

LEFT JOIN series

ON series.id = book.series

More complex JSON aggregates

• Retrieve a book with hierarchical classification

CREATE TABLE sections ( /* Section tree*/

id int PRIMARY KEY,

parent int REFERENCES sections(id),

title text

);

CREATE TABLE book ( /* Books */

…

sections[] int

…

);

More complex JSON aggregates

• Retrieve a book with hierarchical classification

Output:

{ sections: [ /* Paths of all sections of this

book*/

[ {id:..., title:...},

{id:....} …

], -- 1st section path

[ {id:...

...

] -- 2nd section path

...

}

Test data

INSERT INTO section VALUES

(1, NULL, 'DBMS'),

(2, 1, 'PostgreSQL'),

(3, NULL, 'OS'),

(4, 3, 'Solaris');

INSERT INTO book VALUES (

'Postgres for Solaris OS',

ARRAY[2,4]

);

Step 1: compute the section path

WITH RECURSIVE path AS (

SELECT s.id, s.title, s.parent , 0 AS level

FROM section s WHERE s.id = 4

UNION ALL

SELECT s.id, s.title, s.parent,

path.level+1 AS level

FROM section s

JOIN path ON s.id = path.parent

)

SELECT id,title FROM path ORDER BY level

DESC;

Step 2: aggregate the section path

SELECT json_agg(row_to_json(path)) AS path

FROM ( WITH RECURSIVE path AS (


FROM section s WHERE s.id = 4

UNION ALL

SELECT s.id, s.title, s.parent, path.level+1 AS level

FROM section s JOIN path ON s.id = path.parent

) SELECT id,title FROM path ORDER BY level DESC

) path;

path

-----------------------------------------------------

[{"id":3,"title":"OS"}, {"id":4,"title":"Solaris"}]

(1 row)

Step 3: aggregate several paths SELECT * FROM (

SELECT unnest(ARRAY[2,4]) node_id

) nodes

JOIN LATERAL ( SELECT json_agg(row_to_json(path)) AS path

FROM (

WITH RECURSIVE path AS ( SELECT s.id, s.title, s.parent , 0 AS level FROM section s WHERE s.id = node_id UNION ALL SELECT s.id, s.title, s.parent, path.level+1 AS level FROM section s JOIN path ON s.id = path.parent

) SELECT id,title FROM path ORDER BY level DESC

) path

) AS path ON true;

node_id | path

---------+----------------------------------------------------------

2 | [{"id":1,"title":"DBMS"}, {"id":2,"title":"PostgreSQL"}]

4 | [{"id":3,"title":"OS"}, {"id":4,"title":"Solaris"}]

(2 rows)

Step 4: final result SELECT title, (

SELECT json_agg(path.path)

FROM ( SELECT unnest(book.sections) node_id

) nodes

JOIN LATERAL (

SELECT json_agg(row_to_json(path)) path

FROM (

WITH RECURSIVE path AS (


FROM section s WHERE s.id = node_id

UNION ALL

SELECT s.id, s.title, s.parent, path.level+1 AS level

FROM section s JOIN path ON s.id = path.parent

)

SELECT id,title FROM path ORDER BY level DESC

) path

) path ON true

) paths FROM book;

The output -[ RECORD 1 ]-----------------------------------------------

title | Postgres for Solaris OS

paths |

[

[{"id":1,"title":"DBMS"}, {"id":3,"title":"PostgreSQL"}],

[{"id":2,"title":"OS"}, {"id":4,"title":"Solaris"}]

]

Disclaimer:

• The more complex is the query,

the more sick is the optimizer

Extracting too

much data

increases your

carbon footprint

Several queries in one Problem: return several result sets as one.

SELECT

( SELECT json_agg(row_to_json(book)) books

FROM book )::jsonb ||

( SELECT json_agg(row_to_json(section)) sections

FROM section)::jsonb;

or

SELECT json_build_object(

'books',

(SELECT json_agg(row_to_json(book)) books FROM

book ),

'sections',

(SELECT json_agg(row_to_json(section)) sections

FROM section));

Make a tree Problem: build a multi-level hierarchical tree in JSON [{ id: 1, title: 'DBMS', nodes: [

{ id: 2, title: 'Relational DBMS',

nodes: [

……

etc.

• Of course we need a recursive CTE, but…

– CTE adds rows, but we need to reduce number of

rows…

– Wow! Only aggregates do this! We need an

aggregate.

Make a tree: plan

1) Preprocess the tree with CTE

2) Make a smart aggregate, which puts each node on its place

in the tree.

Step 1: Compute node paths WITH RECURSIVE positioned_section AS (

SELECT *, ROW_NUMBER() OVER (PARTITION BY parent ORDER BY id) -1 AS pos FROM section ), t AS ( SELECT id,title, ARRAY['nodes', pos::text] AS path FROM positioned_section WHERE parent IS NULL

UNION ALL

SELECT s.id, s.title, t.path || ARRAY['nodes',pos::text] AS path FROM positioned_section s JOIN t ON s.parent = t.id

)

SELECT row_to_json(t)::jsonb node FROM t ORDER BY path;

Step 1: Result

node

---------------------------------------------------------

{"id": 1, "path": ["nodes", "0"], "title": "DBMS"}

{"id": 3, "path": ["nodes", "0", "nodes", "0"], "title":

"PostgreSQL"}

{"id": 2, "path": ["nodes", "1"], "title": "ОС"}

{"id": 4, "path": ["nodes", "1", "nodes", "0"], "title":

"Solaris"}

(4 rows)

Step 2: Aggreate CREATE OR REPLACE FUNCTION tree_agg_f (state jsonb, item jsonb)

RETURNS jsonb LANGUAGE sql IMMUTABLE STRICT AS $$

SELECT jsonb_set(

state,

(SELECT array_agg(x)

FROM jsonb_array_elements_text(item->'path') x

),

jsonb_build_object(

'id', item->'id',

'title', item->'title',

'nodes','[]'::jsonb

),

true);

$$ ;

CREATE AGGREGATE tree_agg (item jsonb) (

SFUNC=tree_agg_f, STYPE=jsonb, INITCOND='{"nodes":[]}');

Step 3: 1+2 SELECT jsonb_pretty(tree_agg(x.node)) FROM (

WITH RECURSIVE

positioned_section AS (

SELECT *,

ROW_NUMBER() OVER (PARTITION BY parent ORDER BY id)

-1 AS pos

FROM section

),

t AS (

SELECT id,title,

ARRAY['nodes', pos::text] AS path

FROM positioned_section WHERE parent IS NULL

UNION ALL

SELECT s.id, s.title,

t.path || ARRAY['nodes',pos::text] AS path

FROM positioned_section s JOIN t ON s.parent = t.id

)

SELECT row_to_json(t)::jsonb node FROM t ORDER BY path)

x;

One more example: Histogram

A single aggregate for calculating complex statistics.

For example:

Compute frequences of random integers from 0 to 20.

Obtaining random numbers:

SELECT( random()*20)::int

FROM generate_series(1,20);

Aggregate for frequencies

CREATE OR REPLACE FUNCTION freq_agg_f (

state jsonb, item int)

RETURNS jsonb LANGUAGE sql IMMUTABLE STRICT AS

$$

SELECT jsonb_set(

state,

ARRAY[item]::text[],

to_jsonb(

COALESCE((state->>(item::text))::int,0)+1

)

);

$$;

CREATE AGGREGATE freq_agg (int) (

SFUNC=freq_agg_f, STYPE=jsonb, INITCOND='{}');

The result

SELECT freq_agg ( (random()*20)::int ) FROM

generate_series(1,20);

freq_agg

----------------------------------------------------

{"2": 1, "3": 2, "5": 1, "6": 2, "10": 2, "11": 2,

"13": 1, "15": 3, "16": 3, "17": 1, "18": 2}

(1 row)

JSON in PL/* SQL and PL/PgSQL are not always effective with JSON.

• The code is not always readable

• The object is copied for any operation

What about PL/Perl, PL/Python, PL/v8 ?

• PL/Perl and PL/Python get JSON as text, unless

TRANSFORM is explicitly used

• PL/v8 gets JSON transformed into internal

representation automatically

• PL/v8 is not included in PostgreSQL distribution and has

a version lag

Aggregate function in PL/v8

CREATE EXTENSION plv8;

CREATE FUNCTION v8_agg_f (state jsonb, item

int) RETURNS jsonb AS $$

state[item] ++;

return state;

$$

LANGUAGE plv8 IMMUTABLE STRICT;

CREATE AGGREGATE v8_agg (int)

(SFUNC=v8_agg_f, STYPE=jsonb,

INITCOND='{}');

-- 4 times faster!

Aggregate function in PL/Perl

CREATE EXTENSION plperl;

CREATE EXTENSION jsonb_plperl; -- defines TRANSFORM

CREATE FUNCTION pl_agg_f (state jsonb, item int)

RETURNS jsonb AS $$

my ($state, $item) = @_;

$state->{$item} ++;

return $state;

$$

LANGUAGE plperl TRANSFORM FOR TYPE jsonb

IMMUTABLE STRICT;

CREATE AGGREGATE pl_agg (int)( SFUNC=pl_agg_f, STYPE=jsonb,

INITCOND='{}');

-- 2.5 times faster than SQL

For a snack: NULL pitfall

• jsonb_set function is STRICT

SELECT jsonb_set (some_large_json, path, NULL)

gives NULL. Usually people do not expect this.

Correct way:

SELECT jsonb_set (some_large_json, path,

'NULL'::jsonb)

Future

• Single data type for JSON

• Storage optimization (TOAST modification, compression, etc)

• Lazy transform

• Faster search inside

• JSONPATH search engine optimization

More info

• Rethinking JSONB (PgCon 2015)

• JSON, JSONB, JSQuery (PgConf.Russia 2017)

• PL/{Perl,Python,V8}

• JSONB by example (PgConf.EU 2019)

• JSONPATH manual

https://www.pgcon.org/2015/schedule/track/Hacking/784.en.html

https://pgconf.ru/2017/93658









https://devconf.ru/ru/archive/devconf2017/offer/287



https://www.postgresql.eu/events/pgconfeu2019/schedule/session/2555-jsonpath-in-examples-and-its-roadmap/


Thanks for your attention

Questions: [email protected]

practical json - pgconf india · 2020-03-16 · ivan panchenko postgres professional . about myself...

Documents