stephen buxton | data integration - a multi-model approach - documents and triples

22
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Stephen Buxton, Senior Director, Product Management, MarkLogic When to Use Documents vs Triples

Upload: semanticsconference

Post on 07-Jan-2017

32 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Stephen Buxton, Senior Director, Product Management, MarkLogic

When to Use Documents vs Triples

Page 2: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

SLIDE: 2

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

NoSQL

KEY-VALUE

COLUMN

DOCUMENT

GRAPH

PROPERTY GRAPHS

TRIPLE STORES

NoSQL

Page 3: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

SLIDE: 3

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

NoSQL

KEY-VALUE

COLUMN

DOCUMENT

GRAPH

PROPERTY GRAPHS

TRIPLE STORES

NoSQL

A Database That Integrates Data Better, Faster, with Less Cost

Page 4: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

SLIDE: 4

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Leading Organizations Using MarkLogic Semantics

Intelligent Search Semantic Metadata Hub Dynamic Semantic Publishing Recommendation Engines Compliance

EntertainmentCompany

PharmaceuticalCompany

Page 5: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

SLIDE: 5

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Relational Databases

Table PROs Natural way to model strictly-tabular data Mature technology with rich eco-system

Ph_ID Cus_ID Type Number4001 2001 Home 555-6789

4002 2001 Cell 555-7238

4003 2002 Home 137-2859

4004 2003 Home 189-2212

4005 2003 Cell 199-2312

4006 2003 Office 444-1898

4007 2003 Main 199-2312

CONs Real-world entities require complex modeling up-front Brittle: changes require adding columns and tables No inherent semantics

Page 6: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

SLIDE: 6

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Document Databases

Document PROs Natural way to model entities Schema is flexible within/across documents Self-describing Query and Search immediately Handles hierarchical data Handles repeating elements Handles sparse data Joins can be denormalized away

{ “ID” : 1001 ,“Fname” : “Paul” ,“Lname” : “Jackson” ,“Phone” : “415-555-1212” ,“SSN” : “123-45-6789” ,“Addr” : “123 Avenue Road” ,“City” : “San Francisco” ,“State” : “CA” ,“Zip” : 94111

}

Page 7: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

SLIDE: 7

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Graph Databases – Triple Stores

Graph PROs A triple defines a relationship

Entity->Entity Entity->Concept Concept->Value

Triples come together to form Graphs Graphs can be easily shared, combined Graphs can be traversed Can infer new triples using definitions (rules)

Page 8: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

SLIDE: 8

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Hybrid Documents + Triple Store

HybridPROs PROs of a Document Store PROs of a Triple Store Combination: Documents with Semantic context

Define the semantics of your data Richer search through context and facts

Combination: Triples with Document context Arbitrary annotation of Triples Metadata, provenance, temporal, etc.

Rich queries over rich data Fast, iterative development Query through a SQL lens where appropriate

{ “ID” : 1001 ,“Fname” : “Paul” ,“Lname” : “Jackson” ,“Phone” : “415-555-1212” ,“SSN” : “123-45-6789” ,“Addr” : “123 Avenue Road” ,“City” : “San Francisco” ,“State” : “CA” ,“Zip” : 94111

}

Page 9: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

SLIDE: 9

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Sidetrack – Documents and Data

Title

DateBody

Section

SectionSection

Article

Abstract

Paragraph

Paragraph

Paragraph

Type

DateParties

Seller

BuyerChannel

Trade

Amount

PaidBy

Affiliation

Name

Page 10: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

TRIPLES AND DOCUMENTS

Page 11: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

SLIDE: 11

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Triples Alongside Documents

User1

rank

Senior Manager

Geneva

basedIn

Compliance Officer

role

High risk personApp1

runsOn

Cluster1TopSecret

requires

Database1

accesses

runs

Page 12: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

SLIDE: 12

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Show me documents that mention App1 (or its dependencies)

… and "trades" or "markets" … that were valid yesterday afternoon … that were produced near HQ see Intelligent Search, Infobox

Show me instructions to access App1 App1 user guide How to get TopSecret access Scope of Database1 see Dynamic Semantic Publishing

Triples Alongside Documents

Page 13: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

SLIDE: 13

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Documents as Part of the Graph

User1

rank

Senior Manager

Geneva

basedIn

Compliance Officer

role

High risk person

App1

runsOn

Cluster1TopSecret

requires

Database1

accesses

runs

deep dive

license

user guidetutorialMovie

order

Page 14: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

SLIDE: 14

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Document as opaque object Show me all the instructional

documents related to App1 Search inside the document

Show me all the applications that managers use that expire in the next 6 months

Documents as Part of the Graph

Page 15: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

SLIDE: 15

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Triples About Documents – Extended Metadata

User1

rank

Senior Manager

Geneva

basedIn

Compliance Officer

role

High risk person

App1

runsOn

Cluster1TopSecret

requires

Database1

accesses

runsorder

format

JSON

English

Delaware

2016-12-31

jurisdiction

expires

Ts and Cs

language

Page 16: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

SLIDE: 16

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Triples are a natural way to represent metadata about documents

Extended because that metadata is part of the graph

Example: show me all orders for a TopSecret app that will expire soon

Triples About Documents – Extended Metadata

Page 17: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

SLIDE: 19

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Data Integration: Dirty data Show me license documents from

vendor Acme Data Integration: Overlapping data

Show me all assets from vendor Acme

Triples About Documents

Page 18: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

SLIDE: 23

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Triples as part of a document Embed triples in a document

Triples and document have the same security, transactions, backup, temporality, …

Annotate triples in an entirely generic way (XML or JSON) Provenance Confidence Bitemporal

Query across triples and documents in the same query SPARQL, restrict result to some source, confidence range, bitemporal range Search, restrict result to documents that contain some facts or metadata

Page 19: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

SUMMARY

Page 20: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

SLIDE: 25

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Use Triples when you want to …

Data Documents Triples

Store and query hundreds of billions of facts and relationships

Explore a graph

Visualize a graph

Leverage standards: data + query

Infer new information

better insights

simpler data modeling

Semantics of data

integration

Page 21: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

SLIDE: 26

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Use Documents when you want to …

Data Documents Triples

Easily store heterogeneous data (transactional data, records, free-text)

Schema-agnostic

modeling freedom

integrate without ETL*

Search flexibility and specificity

Fast app development

Page 22: Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

SLIDE: 27

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Document Store and Triple Store Combined

Data Documents Triples

All the benefits of each, plus:

Docs can contain triples, Triples can annotate docs, Graphs can contain docs

– Faster data integration using semantics as the glue

– Ideal model for reference data, metadata, provenance

– Ability to run really powerful queries

Massive speed and scale

Simplicity of a single unified platform

Enterprise features (security, HA/DR, ACID transactions,…)