Open Source PostgreSQL database as a connector and enabler for traditional, NoSQL and IoT dataflows
Automatica – IT2Industry ForumMunich 2018
Kaarel Moppel – Senior Database ConsultantCybertec Schönig & Schönig GmbH
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
OUR SERVICES
24/7 SUPPORT
TRAINING CONSULTING
DEVELOPMENT
HIGH AVAILABILITY
SETUP
REPLICATION
PERFORMANCE TUNING
BIG DATA
CLOUD
office@ cybertec.at
+43 2622/ 93022-0
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
EXCERPT FROM
REFERENCES
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
LOW SUPPORT COSTS
SCALABILITY
RELIA- BILITY
Intro - Showstoppers for Innovation
● Thinking a.k.a People
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
LOW SUPPORT COSTS
SCALABILITY
RELIA- BILITY
Intro - Showstoppers for Innovation
● Thinking a.k.a People
● Knowledge
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
LOW SUPPORT COSTS
SCALABILITY
RELIA- BILITY
Intro - Showstoppers for Innovation
● Thinking a.k.a People
● Knowledge
● Legacy systems
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
LOW SUPPORT COSTS
SCALABILITY
RELIA- BILITY
Connecting business domain with IT
● Relational
● Graph
● Object
● NoSQL
● Files
Various database systems are #1 way to connect real world (domain) thinking & processes with technology
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
LOW SUPPORT COSTS
SCALABILITYRELIA- BILITY
Middleware for Data Interchange
Central problem of System Integration - how to communicate between various data storage systems.
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
LOW SUPPORT COSTS
SCALABILITYRELIA- BILITY
Middleware for Data Interchange
Nowadays it looks actually even more complex ...
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
LOW SUPPORT COSTS
SCALABILITYRELIA- BILITY
SQL/MED
Are there some solutions to elegantly solve some of the Data Interchange problems?
Meet SQL/MED – an ISO standard for inter-database communication
International standard accepted by the ISO SQL standardization committee. Implemented though sufficient only by a couple of database systems.
● IBM DB2 (since 2008)● PostgreSQL (since 2011)
* Some other popular systems provide also something similar but in proprietary ways and with simplified functionality - restricted mostly to only reading data.
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
Introducing POSTGRESQL
ToSolve / Simplify
A lot of Data Interchange Problems
PostgreSQL RDBMS
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
NOLICENCE COSTS
LOW SUPPORT COSTS
SCALABILITY
EXTENSIVE FUNCTIONALITY
RELIA- BILITY
BENEFITS OF POSTGRESQL
MOST ADVANCEDOPEN SOURCE DATABASE SYSTEM
25 YEARS OF DEVELOPMENT
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
WHO USESPOSTGRESQL?
FINANCE & BANKING
PUBLIC SECTOR
INDUSTRY & MANUFACTURING
WEB TECHNOLOGY
RESEARCH & SCIENCE
AND MANY MORE...
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
INDUSTRY & MANUFACTURING
TRADE
SENSOR TECHNOLOGYTire pressure measurement
Parking sensors Airbag activation
PRODUCTION DATAAutomatization
Quality data Production scheduling
LOCATING OF LIGHTNING
WEATHER TIME SERIESweather-dependent analysis of
service intervalsof wind turbines
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
FINANCE&
BANKING
TIME SERIES ANALYSIS
dynamic credit rating
ANOMALY DETECTION
Identifiying anomalies in transaction patterns
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
PUBLIC SECTOR
ELDAElectronic data exchange with social insurance agencies
PUBLIC STATISTICS
HEALTH DATAEntry & management of health data
OFFICIAL DOCUMENTSPassports and visa applications
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
RESEARCH &SCIENCE
POLLUTANT EMISSIONAdjustment of pollution
emissions during specific Traffic events
BUILDING SITE IMPROVEMENT
Risk evaluation - searching for duds from World War II
RADIOACTIVE EMISSIONDistribution, danger zones
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
SQL/MED in PostgreSQL
In Postgres SQL/MED implementation is also known as Foreign Data Wrappers (FDWs)
● There are ~100+ data sources supported
● Your Enterprise Database product or Web data source
(API) is most probably already covered
● Works best between Postgres databases
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
SQL/MED in PostgreSQL
To give you an idea what’s supported:
● Oracle, MS SQL, MySQL / MariaDB, Sybase, ODBC
● MongoDB, Cassandra, Neo4j, Redis, Riak
● Hadoop, Hive, Elasticsearch, CouchDB, HBase
● WWW, LDAP, AWS S3, Twitter, IMAP, Open Street Map
● Flat files – CSV, Zip-files & Co
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
SQL/MED in PostgreSQL
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
SQL/MED in PostgreSQL
●Futureproof - ISO standard
Benefits
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
SQL/MED in PostgreSQL
●Futureproof - ISO standard
●Simpler - declarative usage vs imperative step-by-
step data loading / duplication
Benefits
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
SQL/MED in PostgreSQL
●Futureproof - ISO standard
●Simpler - declarative usage vs imperative data
loading / duplication
●Faster – „take what you need” principle, less waste
Benefits
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
SQL/MED in PostgreSQL
●Futureproof - ISO standard
●Simpler - declarative usage vs imperative data
loading / duplication
●Faster – „take what you need” principle, less waste
●Extensible – possibility to add support for your own
data source / format
Benefits
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
SQL/MED in PostgreSQL
CREATE EXTENSION oracle_fdw;
CREATE SERVER oradb FOREIGN DATA WRAPPER oracle_fdw OPTIONS (dbserver '//dbserver.mydomain.com:1521/ORADB');
CREATE USER MAPPING FOR public SERVER oradb OPTIONS (user 'orauser', password 'orapwd');
CREATE FOREIGN TABLE oratab ( id integer OPTIONS (key 'true') NOT NULL, my_data character varying(30) ) SERVER oradb OPTIONS (schema 'ORAUSER', table 'ORATAB');
How it looks in code ...
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
Era of „Web/Big Data”, IOT and NoSQL
● Modern data flows usually employ dynamic (no schema)
data formats for storage and interchange.
● Traditionally „old school” relational databases have not
been too good at supporting those new data formats and
associated dynamic software development flows.
● Postgres has been adressing this issue since many
years, being the 1st established RDBMS to include native
support for JSON.
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
Era of „Web/Big Data”, IOT and NoSQL
Also due to open nature of Postgres, the „core” database product can be easily extended via „extensions” to make it fit for all kinds of modern use cases. Some examples include:
●Columnar data storage extension for Big Data
●Time-series ext. for effective IoT data storage
●PostGIS for Geospatial data (kNN)
●GPU leveraging for Big Data with PG-Strom
●Sharding for Big Data with Citus
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
Effective Data Analytics
What can we do with the gathered/accessible (remember, FDW-s) data inside the database? A lot!
●SQL standard features like normal aggregates
●Ordered Set Aggregates
●Hypothetical Set Aggregates ●Window Functions (the OVER clause)
●ROLLUP, CUBE, Grouping Sets
●Custom functions in your favourite programming language!
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
Effective Data Science
Machine Learning there where the data lives – inside the database!
●PL/R – caret, forecast , glmnet
●PL/Python - scikit-learn
●MADlib - classication, regression, clustering
●Your favourite programming language / software library!
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
Effective Data Science
A complex process – so all simplifications are very welcome!
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
Effective Data Science
Sample PL/R code:
CREATE OR REPLACE FUNCTION benford(numarr float8[])RETURNS benford_t AS $$ xcoll <- function(x) {return ((10 * x) / (10 ^ (trunc(log10(x)))))} numarr <- numarr[numarr >= 10] numarr <- xcoll(numarr) actual_mean <- mean(numarr) n <- length(numarr) expected_mean <- (90 / (n * (10 ^ (1/n) - 1))) distorion<-((actual_mean - expected_mean) / expected_mean) z<-(distorion / sd(numarr)) retval<-data.frame(actual_mean,n,expected_mean,distorion,z) return(retval)$$ LANGUAGE plr;
* https://www.joeconway.com/presentations/plr-DWDC-2015.05.pdf
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
Effective Data Science
Sample MADlib code
SELECT madlib.linregr_train( 'houses', 'houses_linregr', 'price', 'ARRAY[1, tax, bath, size]' );
SELECT houses.*, madlib.linregr_predict( ARRAY[1,tax,bath,size], m.coef ) FROM houses, houses_linregr;
* https://madlib.apache.org/docs/v1.10/group__grp__linreg.html
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
Summary
There are lots and lots of proven, industry standard, data interchange and data analysis options available for all sizes of data with the world’s #1 most advanced Open Source database - PostgreSQL!
CYBERTEC
POSTGRESQL
REFERENCES
CONTACT
FIND US @ B4.211
We regularly attend relevant international trade fairs and conferences - both as exhibitor and speaker.
PG Conf USPG Conf EUPG Conf DE
PG Conf APACPG Day IsraelNordic PG DaySwiss PG Day
Prague Developers Day
CEBITDOAG
Postgres Open San Francisco
AGITAutomatica
topsoftNVIDIA GPU
Thank you!
www.cybertec-postgresql.com