enterprise knowledge graphs

49
Enterprise Knowledge Graphs Sören Auer https:// www.eccenca.com

Upload: soeren-auer

Post on 08-Jan-2017

758 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Enterprise knowledge graphs

Enterprise Knowledge GraphsSören Auer

https://www.eccenca.com

Page 2: Enterprise knowledge graphs

Sören Auer 2

The three Big Data „V“ – Variety is often neglected

Quelle: Gesellschaft für Informatik

Page 3: Enterprise knowledge graphs

Linked Data Principles

Addressing the neglected third V (Variety)

1. Use URIs to identify the “things” in your data

2. Use http:// URIs so people (and machines) can look them up on the web

3. When a URI is looked up, return a description ofthe thing (in RDF format)

4. Include links to related things

http://www.w3.org/DesignIssues/LinkedData.html

3

[1] Auer, Lehmann, Ngomo, Zaveri: Introduction to Linked Data and Its Lifecycle on the Web. Reasoning Web 2013

Page 4: Enterprise knowledge graphs

Sören Auer

Linked (Open) Data: The RDF Data Model

4

RDF = Resource Description Framework

located in

label

industryheadquarters

full nameDHL

Post Tower

162.5 m

Bonn

Logistics Logistik

DHL International GmbH

height 物流

label

Page 5: Enterprise knowledge graphs

RDF Data Model (a bit more technical)

– Graph consists of:• Resources (identified via URIs)• Literals: data values with data type (URI) or language (multilinguality integrated)• Attributes of resources are also URI-identified (from vocabularies)

– Various data sources and vocabularies can be arbitrarily mixed and meshed– URIs can be shortened with namespace prefixes; e.g. dbp: → http://dbpedia.org/resource/

gn:locatedIn

rdfs:label

dbo:industryex:headquarters

foaf:namedbp:DHL_International_GmbH

dbp:Post_Tower

"162.5"^^xsd:decimal

dbp:Bonn

dbp:Logistics

"Logistik"@de

"DHL International GmbH"^^xsd:string

ex:height"物流"@zh

rdfs:label

rdf:value

unit:Meter

ex:unit

Page 6: Enterprise knowledge graphs

Sören Auer 6

RDF mediates between different Data Models & bridges between Conceptual and Operational Layers

Id Title Screen

5624 SmartTV 104cm

5627 Tablet 21cm

Prod:5624 rdf:type ElectronicsProd:5624 rdfs:label “SmartTV”Prod:5624 hasScreenSize “104”^^unit:cm...

Electronics

Vehicle

Car Bus Truck

Vehicle rdf:type owl:ThingCar rdfs:subClassOf VehicleBus rdfs:subClassOf Vehicle...

Tabular/Relational Data

Taxonomic/Tree Data

Logical Axioms / Schema

Male rdfs:subClassOf HumanFemale rdfs:subClassOf HumanMale owl:disjointWith Female...

Page 7: Enterprise knowledge graphs

© Fraunhofer · Seite 7

Vocabulary ExampleVocabulary Schema Instantiation

PostTower rdf:type BuildingPostTower locatedIn dbpedia:BonnPostTower height "162.5"^^meter

located in

label

industryheadquarters

full nameDHL

Post Tower

162.5 m

Bonn

Logistics Logistik

DHL International GmbH

height 物流

label

Class: CompanyProperty Expected typeinIndustry IndustryfullName Stringheadquarter Building

Class: BuildingProperty Expected typelocatedIn Industryheight unit:meter

RDF

Repr

esen

tati

onVi

sual

Rep

rese

ntat

ion

Company rdf:type rdfs:ClassBuilding rdf:type rdfs:Class

inIndustry rdf:type rdfs:PropertyinIndustry rdfs:domain CompanyinIndustry rdfs:range Industry

headquarter rdf:type rdfs:Propertyheadquarter rdfs:domain Companyheadquarter rdfs:range Building

DHL rdf:type CompanyDHL fullName "DHL Int. GmbH"DHL inIndustry LogisticsDHL headquarter PostTower

Page 8: Enterprise knowledge graphs

© Fraunhofer · Seite 8

Semantic Web Layer Cake 2001

http://www.w3.org/2001/10/03-sww-1/slide7-0.html

• Monolithic based on XML• Focus on heavyweight

Semantic (Ontologies, Logic, Reasoning)

Page 9: Enterprise knowledge graphs

© Fraunhofer

The Semantic Web Layer Cake 2015 – Bridging between Big & Smart Data

Unicode URIs

XML JSON CSV RDB HTML

RDF

RDF/XML JSON-LD CSV2RDF R2RML RDFa

RDF Data Shapes

RDF-Schema

Vocabularies

OntologienSKOS Thesauri

LogikSWRL Regeln

SPARQL

(Acc

ess c

ontro

l), S

igna

tur,

Encr

yptio

n (H

TTPS

/CER

T/DA

NE),

• Lingua Franca of Data integration with many technology interfaces (XML, HTML, JSON, CSV, RDB,…)

• Focus on lightweight vocabularies, rules,thesauri etc.

• Less “invasive”

Page 10: Enterprise knowledge graphs

© Fraunhofer

RDF - the Lingua Franca of Data Integration

• RDF is simple• We can easily encode and combine all kinds of data models (relational,

taxonomic, graphs, object-oriented, …)• RDF supports distributed data and schema• We can seamlessly evolve simple semantic representations (vocabularies)

to more complex ones (e.g. ontologies)• Small representational units (URI/IRIs, triples) facilitate mixing and

mashing• RDF can be viewed from many perspectives: facts, graphs, ER, logical

axioms, graphs, objects• RDF integrates well with other formalisms - HTML (RDFa), XML

(RDF/XML), JSON (JSON-LD), CSV, …• Linking and referencing between different knowledge bases, systems and

platforms facilitates the creation of sustainable data ecosystems

10

Page 11: Enterprise knowledge graphs

© Fraunhofer

Successful application domainsLinked Data & Semantic Integration

Search Engine Optimization & Web-Commerce Schema.org used by >20% of Web sites Major search engines exploit semantic desciptions

Pharma, Lifesciences Mature, comprehensive vocabularies and ontologies Billions of disease, drug, clinical trial descriptions

Digital Libraries Many established vocabularies (DublinCore, FRBR, EDM) Millions of aggregated from thousends of memory

institutions in Europeana, German Digital Library

Page 12: Enterprise knowledge graphs

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme

IAIS

The Web evolves into a Web of Data

Sören Auer 12

Linked Open Data

FacebookOpen Graph

Page 13: Enterprise knowledge graphs

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme

IAIS

Knowledge Graphs – A definition

• Fabric of concept, class, property, relationships, entity descriptions

• Uses a knowledge representation formalism (typically RDF, RDF-Schema, OWL)

• Holistic knowledge (multi-domain, source, granularity):• instance data (ground truth),

• open (e.g. DBpedia, WikiData), private (e.g. supply chain data), closed data (product models),

• derived, aggregated data,• schema data (vocabularies, ontologies) • meta-data (e.g. provenance, versioning, documentation

licensing)• comprehensive taxonomies to categorize entities• links between internal and external data• mappings to data stored in other systems and databases

Page 14: Enterprise knowledge graphs

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme

IAIS

Knowledge Graph Challenges & OpportunitiesKnowledge graphs typically cover• Multiple domains• Various levels of granularity• Data from multiple sources• Various degrees of structure

Challenges• Quality• Coherence• Co-evolution• Update propagation• Curation & interaction

Opportunities• Background knowledge for various applications (e.g. question answering, data

integration, machine learning)• Facilitate intra-organizational data sharing and exchange (data value chains)

14

Page 15: Enterprise knowledge graphs

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme

IAIS

Comparison of various enterprise data integration paradigmsParadigm Data

ModelIntegr. Strateg

y

Conceptual/

operational

Hetero-geneous data

Intern./ extern.

data

No. of source

s

Type of integr.

Domain coverage

Se-mantic repres.

XML Schema

DOM trees

LaV operational medium

both medium high

Data Warehouse

relational GaV operational - partially medium

physical small medium

Data Lake various LaV operational large physical high medium

MDM UML GaV conceptual - - small physical small medium

PIM / PCS trees GaV operational partially partially - physical medium medium

Enterprise search

document - operational partially large virtual high low

EKG RDF LaV both medium

both high very high

[1] Michael Galkin, Sören Auer, Simon Screrri: Enterprise Knowledge Graphs: A Survey. Submitted to 37th International Conference on Information Systems. 2016.

Page 16: Enterprise knowledge graphs

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme

IAIS

Knowledge Graph Technology

16

Page 17: Enterprise knowledge graphs

17

Adding a Semantic Layer to Data Lakes

ManagementAccounting

Marketing Sales SupportR&D

Semantic Data Lake• central place for

model, schema and data historization

• Combination of Scale Out (cost reduction) and semantics (increased control & flexibility)

• grows incrementally (pay-as-you-go)

Inbound

Data Sources

Outbound and Consumption

Inbound Raw Data Store

Data Lake (order of magnitude cheaper scalable data store)

Knowledge Graph for Relationship Definition and Meta Data

Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to

Target Systems

JSON-LD CSVW R2RMLXML2RDF

© eccenca.com See also https://www.eccenca.com/en/products-corporate-memory.html

Page 18: Enterprise knowledge graphs

Sören Auer 18

W3C R2RML – Relational to RDF Mapping

R2RML: RDB to RDF Mapping Language, W3C Recommendation 27 September 2012Editors: Souripriya Das, Seema Sundara, Richard Cyganiakhttp://www.w3.org/TR/r2rml/

Page 19: Enterprise knowledge graphs

Sören Auer 19

Example R2RML Mapping

Page 20: Enterprise knowledge graphs

1. Either resulting RDF knowledge base is materialized in a triple store &2. subsequently queried using SPARQL3. or the materialization step is avoided by dynamically mapping an input

SPAQRL query into a corresponding SQL query, which renders exactly the same results as the SPARQL query being executed against the materialized RDF dump

SPARQLMap – Mapping RDB 2 RDF

Page 21: Enterprise knowledge graphs

Example: Sparqlify

• Rationale: Exploit existing formalisms (SQL, SPARQL Construct) as much as possible

• flexible & versatile mapping language• translating one SPARQL query into

exactly one efficiently executable SQL query

• Solid theoretical formalization based on SPARQL-relational algebra transformations

• Extremely scalable through elaborated view candidate selection mechanism

• Used to publish 20B triples for LinkedGeoData

[1] Stadler, Unbehauen, Auer, Lehmann: Sparqlify – Very Large Scale Linked Data Publication from Relational Databases.[2] Unbehauen, Stadler, Auer: Optimizing SPARQL-to-SQL Rewriting. iiWAS 2013[3] Auer, et al.: Triplify: light-weight linked data publication from relational databases. WWW 2009

SPARQLConstruct

SQLView

Bridge

Page 22: Enterprise knowledge graphs

Sören Auer 22

Semantified Big Data Architecture Blueprint

[1] Mami, Scerri, Auer, Vidal: Towards the Semantification of Big Data Technology. DEXA 2016

Datasources Ingestion Storage

Semantic Lifting with Mappings

QuerysStoring of semantic and semantified data in Apache Parquet files on HDFS

Page 23: Enterprise knowledge graphs

Sören Auer 23

SEBIDA Implementation Architecture

Page 24: Enterprise knowledge graphs

Sören Auer 24

SEBIDA Evaluation Results

• Loads data faster• Has quite different query

performance characteristics – faster in 5 out of 12 queries, similar performance in 2, slower in 5

Page 25: Enterprise knowledge graphs

© Fraunhofer · Seite 25

VOCOL: COLLABORATIVE VOCABULARY CURATION ENVIRONMENT

Comprehensive Support for Evolving Vocabularies

Page 26: Enterprise knowledge graphs

© Fraunhofer · Seite 26

Industry 4.0Semantic Models as Bridge between Shop & Office Floor

Page 27: Enterprise knowledge graphs

© Fraunhofer · Seite 27

Semantic Administrative Shell & Reference Architecture for Industry 4.0 (RAMI4.0)Administrative Shell (Verwaltungsschale)

provides a digital identity for arbitrary Industry 4.0 components (e.g. sensors, actors/robots) exposing data covering the whole life-cycle

Reference Architecture for Industry 4.0 (RAMI4.0) provides a conceptual framework for implementing comprehensive Industry 4.0 scenarios

We have implemented both concepts along with a number of IEC and ISO standards in a comprehensive information model ready to be implemented in productive environments

Page 28: Enterprise knowledge graphs

© Fraunhofer · Seite 28

VoCol collaborative Development Environment for Vocabularies

VersioningGit/

Bitbucket

Integrates a number of tools & services for different aspects of vocabulary developmentIs centered around Git version control (or Bitbucket), thus supporting the branching and merging of vocabulariesSupports the roundtrip between• Schema/vocabulary

development• Competency questions

(expressed in SPARQL)• Example data Bridges between conceptual

models and executable codehttp://eis.iai.uni-bonn.de/Projects/VoCol.html

Page 29: Enterprise knowledge graphs

© Fraunhofer · Seite 29

Development based on Git – Version Control

Git is meanwhile the most widely used version control system. It is a distributed revision control system with an emphasis on speed, data integrity, and support for distributed, non-linear workflows.Git was initially designed and developed in 2005 by Linux kernel developers for Linux kernel developmentGit is the basis for a variety of open-source or commercial services and products such as:GitHub/Bitbucket - Web-based Git repository hosting service with

millions of usersGitLab/Gitolite - open-source Web-based Git repository management

platformsSince TeamFoundationServer release 2013, Microsoft added native

support for GitGit is easily extensible and integratable into arbitrary workflows via GitHooks

Page 30: Enterprise knowledge graphs

VoCol Collaborative Vocabulary Development Environment Entry Page

Page 31: Enterprise knowledge graphs

VoCol:Dynamic Documentation

Page 32: Enterprise knowledge graphs

© Fraunhofer · Seite 32

Environment: Dynamic Documentation

Page 33: Enterprise knowledge graphs

© Fraunhofer · Seite 33

VoCol Environment: Dynamic Visualization

Page 34: Enterprise knowledge graphs

© Fraunhofer · Seite 34

VoCol Environment: Analytics

Page 35: Enterprise knowledge graphs

VoCol Environment: Version Control with Git/GitHub/GitLab/Bitbucket

Page 36: Enterprise knowledge graphs

© Fraunhofer · Seite 36

VoCol Environment:Integrated SPARQL Querying, e.g. for checking competency questions

Page 37: Enterprise knowledge graphs

VoColMap Visualization

Page 38: Enterprise knowledge graphs

VoCol Environment: Direct Turtle Editing

Page 39: Enterprise knowledge graphs

VoCol Environment: Vocabulary Evolution Report

Page 40: Enterprise knowledge graphs

© Fraunhofer · Seite 40

INDUSTRIAL DATA SPACE

Page 41: Enterprise knowledge graphs

© Fraunhofer · Seite 41

Vocabulary-based Integration facilitates Data-driven Businesses

Vocablary

Page 42: Enterprise knowledge graphs

© Fraunhofer ·· Seite 42

Die Arbeiten zum Industrial Data Space sind komplementär verzahnt mit der Plattform Industrie 4.0

Handel 4.0 Bank 4.0Versicherung4.0

…Industrie 4.0

Fokus auf die produzierende

IndustrieSmart Services

Übertragung,Netzwerke

Echtzeitsysteme

Industrial Data SpaceFokus auf Daten

Daten

Page 43: Enterprise knowledge graphs

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme

IAIS

The Industrial Data Space InitiativeCommunity of >30 large German and European CompaniesPre-competitive, publicly funded innovation project involving 11

Fraunhofer institutes for developing IDS reference architectureCurrent members of the

Industrial Data Space Association

Page 44: Enterprise knowledge graphs

© Fraunhofer · Seite 44

Bilder: ©FotoliaFrancesco De Paoli, Nmedia, hakandogu

Semantic Data Linking for Enterprise Data Value Chains

Data Lake Pure Internet

centralized, monopolistic federated, secure, „trusted“, standard-based

completely dezentral, open, unsecure

Data management Central Repository Decentral Decentral

Data Ownership Central Decentral Decentral

Data Linking Single provider Federated, on demand Missing

Data Security Bilateral Certified system Bilateral

Market structure Central Provider Role system Unstructured

Transport infrastructure Internet Internet Internet

Industrial Data Space

Page 45: Enterprise knowledge graphs

© Fraunhofer · Seite 45

Bilder: © Fotolia 77260795 ∙ 73040142 58947296 ∙ 68898041

Basic principles of the Industrial Data Space

On DemandVernetzung

Linked Light Semantics

Securitywith

Industrial Data

Container

Certified Roles

On DemandInterlinking

Page 46: Enterprise knowledge graphs

© Fraunhofer · Seite 46

Bildquellen: Istockphoto

Industrial Data Space: On Demand Interlinking

Service A

Service C

Service EService B

Service D

Service GService F

Enterprise 4

Enterprise 1

Enterprise 6

Enterprise 2 Enterprise 3

Enterprise 5

All Data stays with its Ownern and are controlled and secured. Only on request for a service data will be shared. No central platform.

Page 47: Enterprise knowledge graphs

© Fraunhofer · Seite 47 --- VERTRAULICH ---

Industrial Data Space

Upload / Download / SearchInternet

AppsVocabulary

Industrial Data SpaceBroker

Clearing

RegistryIndex

Industrial Data SpaceApp Store

Internal IDS

Connector

Company A Internal IDS

Connector

Company B

External IDS

Connector

External IDS

Connector

Upload

Third PartyCloud Provider

Download

Upload / Download

© Fraunhofer

IDS Architecture Overview

Page 48: Enterprise knowledge graphs

Sören Auer 48

Big Data is not Just Volume and VelocityVariety (& Varacity) are key challengesLinked Data helps dealing with both• Linked Data life-cycle requires to integrate and

adapt results from a number of disciplines– NLP, – Machine Learning, – Knowledge Representation, – Data Management, – User Interaction– …

• Applications in a number of domains – cultural heritage, – life sciences, – industry 4.0 / cyber-physical systems, – smart cities, – mobility,– …

Linked Data links not only data but also:• Various disciplines• Applications and Use cases

Page 49: Enterprise knowledge graphs

Creating Knowledge out of Interlinked Data

Thanks for your attention!

Sören Auerhttp://www.iai.uni-bonn.de/~auer | http://[email protected]

https://www.eccenca.com