stephen buxton | data integration - a multi-model approach - documents and triples
TRANSCRIPT
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Stephen Buxton, Senior Director, Product Management, MarkLogic
When to Use Documents vs Triples
SLIDE: 2
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
NoSQL
KEY-VALUE
COLUMN
DOCUMENT
GRAPH
PROPERTY GRAPHS
TRIPLE STORES
NoSQL
SLIDE: 3
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
NoSQL
KEY-VALUE
COLUMN
DOCUMENT
GRAPH
PROPERTY GRAPHS
TRIPLE STORES
NoSQL
A Database That Integrates Data Better, Faster, with Less Cost
SLIDE: 4
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Leading Organizations Using MarkLogic Semantics
Intelligent Search Semantic Metadata Hub Dynamic Semantic Publishing Recommendation Engines Compliance
EntertainmentCompany
PharmaceuticalCompany
SLIDE: 5
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Relational Databases
Table PROs Natural way to model strictly-tabular data Mature technology with rich eco-system
Ph_ID Cus_ID Type Number4001 2001 Home 555-6789
4002 2001 Cell 555-7238
4003 2002 Home 137-2859
4004 2003 Home 189-2212
4005 2003 Cell 199-2312
4006 2003 Office 444-1898
4007 2003 Main 199-2312
CONs Real-world entities require complex modeling up-front Brittle: changes require adding columns and tables No inherent semantics
SLIDE: 6
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Document Databases
Document PROs Natural way to model entities Schema is flexible within/across documents Self-describing Query and Search immediately Handles hierarchical data Handles repeating elements Handles sparse data Joins can be denormalized away
{ “ID” : 1001 ,“Fname” : “Paul” ,“Lname” : “Jackson” ,“Phone” : “415-555-1212” ,“SSN” : “123-45-6789” ,“Addr” : “123 Avenue Road” ,“City” : “San Francisco” ,“State” : “CA” ,“Zip” : 94111
}
SLIDE: 7
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Graph Databases – Triple Stores
Graph PROs A triple defines a relationship
Entity->Entity Entity->Concept Concept->Value
Triples come together to form Graphs Graphs can be easily shared, combined Graphs can be traversed Can infer new triples using definitions (rules)
SLIDE: 8
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Hybrid Documents + Triple Store
HybridPROs PROs of a Document Store PROs of a Triple Store Combination: Documents with Semantic context
Define the semantics of your data Richer search through context and facts
Combination: Triples with Document context Arbitrary annotation of Triples Metadata, provenance, temporal, etc.
Rich queries over rich data Fast, iterative development Query through a SQL lens where appropriate
{ “ID” : 1001 ,“Fname” : “Paul” ,“Lname” : “Jackson” ,“Phone” : “415-555-1212” ,“SSN” : “123-45-6789” ,“Addr” : “123 Avenue Road” ,“City” : “San Francisco” ,“State” : “CA” ,“Zip” : 94111
}
SLIDE: 9
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Sidetrack – Documents and Data
Title
DateBody
Section
SectionSection
Article
Abstract
Paragraph
Paragraph
Paragraph
Type
DateParties
Seller
BuyerChannel
Trade
Amount
PaidBy
Affiliation
Name
TRIPLES AND DOCUMENTS
SLIDE: 11
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Triples Alongside Documents
User1
rank
Senior Manager
Geneva
basedIn
Compliance Officer
role
High risk personApp1
runsOn
Cluster1TopSecret
requires
Database1
accesses
runs
SLIDE: 12
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Show me documents that mention App1 (or its dependencies)
… and "trades" or "markets" … that were valid yesterday afternoon … that were produced near HQ see Intelligent Search, Infobox
Show me instructions to access App1 App1 user guide How to get TopSecret access Scope of Database1 see Dynamic Semantic Publishing
Triples Alongside Documents
SLIDE: 13
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Documents as Part of the Graph
User1
rank
Senior Manager
Geneva
basedIn
Compliance Officer
role
High risk person
App1
runsOn
Cluster1TopSecret
requires
Database1
accesses
runs
deep dive
license
user guidetutorialMovie
order
SLIDE: 14
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Document as opaque object Show me all the instructional
documents related to App1 Search inside the document
Show me all the applications that managers use that expire in the next 6 months
Documents as Part of the Graph
SLIDE: 15
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Triples About Documents – Extended Metadata
User1
rank
Senior Manager
Geneva
basedIn
Compliance Officer
role
High risk person
App1
runsOn
Cluster1TopSecret
requires
Database1
accesses
runsorder
format
JSON
English
Delaware
2016-12-31
jurisdiction
expires
Ts and Cs
language
SLIDE: 16
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Triples are a natural way to represent metadata about documents
Extended because that metadata is part of the graph
Example: show me all orders for a TopSecret app that will expire soon
Triples About Documents – Extended Metadata
SLIDE: 19
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Data Integration: Dirty data Show me license documents from
vendor Acme Data Integration: Overlapping data
Show me all assets from vendor Acme
Triples About Documents
SLIDE: 23
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Triples as part of a document Embed triples in a document
Triples and document have the same security, transactions, backup, temporality, …
Annotate triples in an entirely generic way (XML or JSON) Provenance Confidence Bitemporal
Query across triples and documents in the same query SPARQL, restrict result to some source, confidence range, bitemporal range Search, restrict result to documents that contain some facts or metadata
SUMMARY
SLIDE: 25
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Use Triples when you want to …
Data Documents Triples
Store and query hundreds of billions of facts and relationships
Explore a graph
Visualize a graph
Leverage standards: data + query
Infer new information
better insights
simpler data modeling
Semantics of data
integration
SLIDE: 26
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Use Documents when you want to …
Data Documents Triples
Easily store heterogeneous data (transactional data, records, free-text)
Schema-agnostic
modeling freedom
integrate without ETL*
Search flexibility and specificity
Fast app development
SLIDE: 27
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Document Store and Triple Store Combined
Data Documents Triples
All the benefits of each, plus:
Docs can contain triples, Triples can annotate docs, Graphs can contain docs
– Faster data integration using semantics as the glue
– Ideal model for reference data, metadata, provenance
– Ability to run really powerful queries
Massive speed and scale
Simplicity of a single unified platform
Enterprise features (security, HA/DR, ACID transactions,…)