postgresql and json with python - przemek lewandowski
TRANSCRIPT
sunscrapers.comYour favored tech partner.
PostgreSQL and JSONwith PythonPrzemek Lewandowski
16.06.2016
sunscrapers.comYour favored tech partner.
1. Whys2. PostgreSQL types3. HStore vs JSON vs JSONB4. SQLAlchemy5. Django
Overview
sunscrapers.comYour favored tech partner.
● Schema-less data● Schema-unknown data● Document storage● User defined schema
Why?
sunscrapers.comYour favored tech partner.
● XML (since 8.2)
● HStore (since 8.2)
● JSON (since 9.2)
● JSONB (since 9.4)
PostgreSQL types
sunscrapers.comYour favored tech partner.
● Only simple key/value pairs● No nesting● Strings only● Indexing● Many functions and operators
HStore
SELECT 'user => "fred", id => 1'::hstore -> 'user' AS user;
user------fred
sunscrapers.comYour favored tech partner.
● Simple validation on input● Stored as text (like XML)● Preserves key order and duplicates● Indexes (only expression index)
JSON
SELECT '[{"a": {"b":{"c": "foo"}}}'::json#>'{a,b}';
?column?-------------{"c": "foo"}
sunscrapers.comYour favored tech partner.
● Full JSON implementation● Binary storage● No key order or duplicate preservation● Fast access operations● Indexing● No date type!
JSONB
SELECT '{"a": {"b": 2, "c": 3}}'::jsonb @> '{"a": {"c": 3}}'::jsonb;
sunscrapers.comYour favored tech partner.
SELECT feed_data.feed_id AS feed_data_feed_id, feed_data.feed AS feed_data_feed, feed_data.data AS feed_data_dataFROM Feed_dataWHERE CAST(feed_data.data ->> 'DISSEMINATION_ID' AS INTEGER) <= 90000 AND (feed_data.data ->> 'ACTION') = 'NEW'ORDER BY feed_data.data -> 'PRICE_NOTATION' DESCLIMIT 100 OFFSET 0
Query example
sunscrapers.comYour favored tech partner.
SQLAlchemy
● PostgreSQL dialect● Close to database● HStore, JSON, JSONB data types and much more
sunscrapers.comYour favored tech partner.
from sqlalchemy.dialects.postgresql import JSONB
# …
class FeedData(db.Model): feed_id = Column(String(), primary_key=True) feed = Column(Enum(*feed_urls.keys(), name='feeds_enum'), nullable=False) data = Column(MutableDict.as_mutable(JSONB), nullable=False)
def __init__(self, feed_id, feed, data): self.feed_id = feed_id self.feed = feed self.data = data
def as_dict(self): data = {field: getattr(self, field) for field in ['feed_id', 'feed']} data.update(self.data) return data
SQLAlchemy ORM example I
sunscrapers.comYour favored tech partner.
def filter_by_date(db_field, dt_from, dt_to): return and_( db_field.cast(Date) >= dt_from, db_field.cast(Date) <= dt_to )
SQLAlchemy ORM example II
sunscrapers.comYour favored tech partner.
Django ORM
● Django introduced PostgreSQL specific model fields● HStoreField since 1.8● JSONField since 1.9 (uses JSONB)● Support for migrations● Third party libs like: django-hstore
sunscrapers.comYour favored tech partner.
More info
● https://www.pgcon.org/2014/schedule/events/684.en.html● https://www.postgresql.org/docs/9.4/static/datatype-json.html● https://docs.djangoproject.com/en/1.9/ref/contrib/postgres/fields/● http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html
sunscrapers.comYour favored tech partner.
Thanks!Questions?