road to a multi-model database - pgconf asia · document mongodb, couchbase, marklogic ......
TRANSCRIPT
![Page 1: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/1.jpg)
December 5, 2017
Takayuki TsunakawaFujitsu Limited
Road to a Multi-model Database-- making PostgreSQL the most
popular and versatile database
Copyright 2017 FUJITSU LIMITED0
PGConf.ASIA 2017
![Page 2: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/2.jpg)
Who am I?
Takayuki Tsunakawa PostgreSQL contributor PostgreSQL Enterprise Consortium member
(PostgreSQL Ecosystem Wiki maintainer)
Develop/Maintain/SupportFUJITSU Software Enterprise Postgres
(PostgreSQL-based product) Support open source PostgreSQL in various products
Copyright 2017 FUJITSU LIMITED1
![Page 3: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/3.jpg)
AgendaWhy is multi-model necessary? (background)What is multi-model database?How should we implement it?
Copyright 2017 FUJITSU LIMITED2
![Page 4: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/4.jpg)
Why is multi-model necessary?
Copyright 2017 FUJITSU LIMITED3
![Page 5: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/5.jpg)
Big Data
Copyright 2017 FUJITSU LIMITED
Variety
VelocityVolume
4
![Page 6: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/6.jpg)
Can PostgreSQL Handle Big Data?
Copyright 2017 FUJITSU LIMITED
Variety
Volume Velocity
Key-value modelhstore type
Document modeljsonb type
PartitioningPostgreSQL 10~
ScaleoutPostgres-XL Citus
(fork) (extension)
GPUPG-Strom
(extension)
StreamingPipelineDB
(fork)
In-memory columnarIn developing
Persistent memory, FPGA, SIMD
N/A
5
![Page 7: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/7.jpg)
Developer productivity with flexible data model Can handle various data types as-is (array, list, object, graph, etc.)No need to map to relational model (eliminate ORM)
High scalability Can store and process voluminous data Can handle many requests simultaneously
Fault tolerance
Why NoSQL Attracts Attention?
Copyright 2017 FUJITSU LIMITED
array list object
graph ・・・relational modelMapping
is not needed
voluminous dataapplication
6
![Page 8: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/8.jpg)
Data Models
Copyright 2017 FUJITSU LIMITED
Data model Representative DBMSsRelational Oracle, MySQL, SQL Server, PostgreSQLKey-value Redis, MemcachedDocument MongoDB, CouchBase, MarkLogicGraph Neo4jWide columnar Cassandra, HbaseRDF MarkLogic, Virtuoso, OracleText search Elasticsearch, Apache SolrTime series InfluxDBMulti-dimensional array rasdaman, SciDBEvent Event Store, NEventStoreObject InterSystems Cache
7
![Page 9: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/9.jpg)
Polyglot PersistenceUse multiple DBMSs in one system/application Spread by Martin Fowler
Copyright 2017 FUJITSU LIMITED
Graph
Key-value
Document Wide columnar
Data models in online shopping application
RDBWeb sessionshopping cartuser profile
customerorder
recommendationproduct catalog
Web access log
application
8
![Page 10: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/10.jpg)
Multiple DBMSs Use Leading tech companies use many DBMSs (ex. Netflix)
Copyright 2017 FUJITSU LIMITED
Data model DBMSsRelational MySQL, Redshift
Key-value Memcached, Redis, Hollow (developed by Netflix )
Text search Elasticsearch
Wide columnar Cassandra
Time series Atlas (developed by Netflix )
Event Druid
9
![Page 11: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/11.jpg)
Problems (1/2)Data silo to prevent cross-sectional data analysis Time-consuming and laborious ETL Complex logic in application (fetch, join, aggregation, sort)
Data consistency among DBMSsDistributed transaction is not available in all DBMSs
Infrastructure cost increase due to duplication of data
Copyright 2017 FUJITSU LIMITED
・・・ ・・・
Key-value Graph
RDB Document
10
![Page 12: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/12.jpg)
Problems (2/2)Operational complexity Product/OSS software management, support/service contracts Infrastructure provisioning (server, storage, network)Deployment, patching, testing, configuration, version control Security: user management, access control, encryption, auditingMonitoring and diagnosis, performance tuning, troubleshootingHA: backup/recovery, local failover, disaster recovery
Steep learning curve for developersDBMS-specific non-SQL API and SQL-like query language Transaction control, consistency model, application tuning
Lack of skilled personnel
Copyright 2017 FUJITSU LIMITED
? ?
11
![Page 13: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/13.jpg)
What is multi-model database?
Copyright 2017 FUJITSU LIMITED12
![Page 14: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/14.jpg)
Overview Support multiple data models in one DBMS
Copyright 2017 FUJITSU LIMITED
etc・・・
RDB Graph Key-value Document
application
Very smart!
13
![Page 15: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/15.jpg)
Merits
"All-in-one" is convenient, just like a smartphone
Copyright 2017 FUJITSU LIMITED
Smooth data utilization with less data integration
Higher developer productivity
Lower cost for infrastructure and DBA
14
![Page 16: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/16.jpg)
Multi-model Database Examples
Copyright 2017 FUJITSU LIMITED
DBMS Supported data modelsArangoDB key-value, document, graph
Cosmos DB key-value, document, graph
CouchBase key-value, document
DataStax(on Cassandra) key-value, wide column, graph
MarkLogic document, text/binary, RDF
OrientDB key-value, document, graph, text/binary
15
![Page 17: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/17.jpg)
Trends of Major DBMSsMajor RDBMSs are adding data modelsNoSQL DBMSs are also adding data models
Copyright 2017 FUJITSU LIMITED
DBMS Key-value Document Wide column Graph
Oracle ++ +MySQL ++ +SQL Server + +MongoDB + ++ +PostgreSQL + +
Data model support in top 5 popular DBMSs
16
![Page 18: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/18.jpg)
PostgreSQL as a Multi-model DatabaseWhy based on RDBMS?
Why based on PostgreSQL?
Copyright 2017 FUJITSU LIMITED
Mature storage engine and transaction management Smart optimizer Prevalent RDBMS gives more people the chance to use
RDBMS has
Extensibility as a data platform Liberal community open to niche data models
PostgreSQL has
17
![Page 19: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/19.jpg)
How should we implement multi-model database?
Copyright 2017 FUJITSU LIMITED18
![Page 20: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/20.jpg)
What is Data Model?
Copyright 2017 FUJITSU LIMITED
Data model = Structure + Constraint + OperationData model
Structure
table, row, column
key, value
node, relationship,property, label
Constraint
unique, referential, check, not null, ...
unique
unique, node existence
Operation
scan, join, restriction,
projection, …
get, put
scan, join, restriction, projection,
pattern match, …
Relational
Key-value
Graph
19
![Page 21: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/21.jpg)
Query Language and API Adopt standard and well-known languages/APIs per
data modelDeveloper productivity: leverage skill/know-how/asset Rich information for learning Standard compliance and popularity for ecosystem
Examples
Copyright 2017 FUJITSU LIMITED
Data model languages/APIs Key-value Redis API, Memcached APIDocument SQL/JSON path (SQL standard), MongoDB APIGraph Cypher, GremlinRDF SPARQL (W3C standard)Array SQL/MDA (Multi-Dimensional Array) (future SQL standard)
20
![Page 22: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/22.jpg)
Multi-model Approach 1 Flexible Schema Data (FSD) Leverage RDBMS’s user defined data type, function, and index Store/access data in a table column with functions in SQLUsed for XML, JSON, geospatial data
Copyright 2017 FUJITSU LIMITED
http://cidrdb.org/cidr2015/Papers/CIDR15_Paper5.pdfreference :
Relational Data Flexible SchemaData (FSD)
RDBMSSQLSQL
NoSQL API
application
user defined data type,function, index
21
![Page 23: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/23.jpg)
Multi-model Approach 2 Independent data model componentsQuery language and API for each data modelData is optionally separated from relational dataUse for Graph, RDF, time series, event…
Independence ensures performance for each data model
Copyright 2017 FUJITSU LIMITED
ParserTransformer
PlannerExecutor
Graph Parser
RDF
Que
ry P
roce
ssor
Cypher, GremlinSPARQL
ParserTransformer
PlannerExecutor
SQL
Relational
application
Storage engineStorage engine
22
![Page 24: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/24.jpg)
Examples Based on Approach 2Graph model: AgensGraph (fork) https://github.com/bitnine-oss/agensgraph
Time series model: TimescaleDB (extension) https://github.com/timescale/timescaledb
Copyright 2017 FUJITSU LIMITED23
![Page 25: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/25.jpg)
Pluggable Data ModelWant to facilitate data model development Introduce 3 pluggable objectsQuery language : generate parse tree from query stringData model : generate query plan from parse tree and run it Region : combination of query language and data model
Copyright 2017 FUJITSU LIMITED
Data model as an extension
Querylanguage
Data model
Relational model
PostgreSQL
Querylanguage
Data model
Graph model
Querylanguage
RDF model24
![Page 26: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/26.jpg)
Multi-model QueryMix queries for multiple data models in a query string Execute query in a specified region
in_region(region_name, query string) Convert data across regions
cast_region(source data, dest region name,dest container, dest schema)
Copyright 2017 FUJITSU LIMITED
-- Among Chinese restaurants in Tokyo, -- list up to 5 top ones among friends' friends SELECT r.name, g.num_likers FROM restaurant r,
cast_region(in_region('graph_cypher',
'MATCH (:Person {name:"Taro"})-[:IS_FRIEND_OF*1..2]-(friend),(friend)-[:LIKES]->(restaurant:Restaurant)RETURN restaurant.name, count(*)'),
'relational', 'g', '(name text, num_likers int') WHERE r.name = g.name AND r.city = 'Tokyo' AND r.cuisine = ‘chinese'ORDER BY g.num_likers DESC LIMIT 5;
25
![Page 27: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/27.jpg)
Mixed-model Query Execution
Copyright 2017 FUJITSU LIMITED
Multi-model query plan
relation:table/index scanrestaurant
relation:join
relation:sort
graph:pattern matchIS_FRIEND_OF
graph:node scanPerson
26
![Page 28: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/28.jpg)
Document Model PostgreSQL supports JSON since 2012, but…Different SQL/JSON was standardized in SQL:2016 Store JSON data in character/binary column Intuitive function and SQL/JSON path language Powerful JSON_TABLE function to map JSON to relational data
Support for SQL/JSON is being developed in community
Copyright 2017 FUJITSU LIMITED
SELECT JSON_VALUE(jcol, '$.name') AS name, JSON_QUERY(jcol, '$.skills') AS skills
FROM empWHERE
JSON_EXISTS(jcol, '$.projects[*] ? (@.category == "IoT")');
SELECT jcol ->> 'name' AS name, jcol -> 'skills' AS skills
FROM empWHERE
jcol @> '{ "projects": [{ "category": "IoT" }] }';
Query in current PostgreSQL Query in SQL/JSON
27
![Page 29: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/29.jpg)
Index
Graph Model The key is performance in storage engine RDB is slow to traverse graph due to index scan Eliminate index scan using direct pointers between recordsNode traversal cost drops from O(n) to O(1)
Copyright 2017 FUJITSU LIMITED
Friend Friend
FriendFriend
Native graph
Friend
Jill JackJohn JackJohn JillJack JillJack John
Graph in RDBMS
John Jack Jill
John
Jack
28
![Page 30: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/30.jpg)
Key-value Model PostgreSQL has hstore data type, but Less performant than expectedUnfamiliar API
Solution: Redis in the background workerMaximal performance by bypassing SQL processor Familiar, developer-friendly Redis API
Copyright 2017 FUJITSU LIMITED
SQL processor storage engine
Redis API (get/put)
application
DiskTable
29
![Page 31: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/31.jpg)
Conclusion
Copyright 2017 FUJITSU LIMITED
Multi-model is necessary for broader use of PostgreSQL
PostgreSQL 10
PostgreSQL 11 Build pluggable data model infrastructure Add/Improve popular data models:
key-value, SQL/JSON, graph
PostgreSQL 12 Add other (niche?) data models
30
![Page 32: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/32.jpg)
Let’s do it together! Search “multi-model“ in pgsql-hackers mailing list
Any idea/wish comment as a user is welcome Contact me if inconvenient (Japanese/English OK)
Copyright 2017 FUJITSU LIMITED31
![Page 33: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/33.jpg)
32
![Page 34: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:](https://reader031.vdocuments.us/reader031/viewer/2022011802/5b189a687f8b9a23258be32c/html5/thumbnails/34.jpg)
Copyright 2017 FUJITSU LIMITED
Questions?
33