making semantic data federation work
DESCRIPTION
Enterprises are drowning in data that they can't find, access, or use. For many years, enterprises have wrestled with the best way to combine all that data into actionable information without building systems that break as schemas evolve. Approaches like warehousing and ETL can be brittle in the face of changing data sources or expensive to create. Data integration at the application level is common but this results in significant complexity in the code. Data-oriented web services attempt to provide reusable sources of integrated data, however these have just added another layer of data access that constrain query and access patterns.This talk will look at how semantic web technologies can be used to make existing data visible and actionable using standards like RDF (data), R2RML (data translation), OWL (schema definition and integration), SPARQL (federated query), and RIF (rules). The semantic web approach takes the data you already have and makes that data available for query and use across your existing data sources. This base capability is an excellent platform for building federated analytics.TRANSCRIPT
Making Semantic Data Federation Work
by Alex Miller
Data Integration Problems1. Discovery and description
2. Internal integration
3. External integration
4. Nomadic data
5. Inflexible interfaces
2
1. Discovery and description
• What data do we have?
• What does it mean?
• Who is creating it?
• Who is using it?
3
2. Internal integration• Does your order entity have the same
fields as my entity?
• Are your codes for order status the same as my codes for order status?
4
3. External integration• Does a public source of information
exist?
• How do the entities in the public source relate to the entities in my data?
5
4. Nomadic data
• Where does your data come from?
• Which version of the data are you using?
• Why does your data not match my data?
6
5. Inflexible interfaces
• Why can't I see all of my data?
• Why does it take months to expose a new data element in my application?
7
Results
8
Data Information ActionX
Semantic Technologies• Data model - RDF
• Metadata - RDFS/OWL
• Entailment - OWL, RIF
• Relational data - R2RML
• Query - SPARQL
• Federation - SPARQL Protocol, Federation
9
10
Semantic Data Source
SPARQL Protocol
SPARQL
RDFS/OWL
RDF
Semantic Data Source
10
Semantic Data Source
SPARQL Protocol
SPARQL
RDFS/OWL
RDFData model
Semantic Data Source
10
Semantic Data Source
SPARQL Protocol
SPARQL
RDFS/OWL
RDF
Metadata
Semantic Data Source
10
Semantic Data Source
SPARQL Protocol
SPARQL
RDFS/OWL
RDF
Query
Semantic Data Source
10
Semantic Data Source
SPARQL Protocol
SPARQL
RDFS/OWL
RDF
API
Semantic Data Source
Semantic Data Source
11
Relational Access
SPARQL Protocol
SPARQL
RDFS/OWL
RDB2RDF
RDF
RelationalDatabase
SQL
Semantic Data Source
11
Relational Access
SPARQL Protocol
SPARQL
RDFS/OWL
RDB2RDF
RDF
RelationalDatabase
SQL
Virtual
Semantic Data Source
11
Relational Access
SPARQL Protocol
SPARQL
RDFS/OWL
RDB2RDF
RDF
RelationalDatabase
SQL
Music Database
12
MID First Last Inst_ID
1 Eddie Van Halen 10
2 Yo Yo Ma 20
3 Kenny G 30
Musicians:
IID Instrument Type
10 Guitar String
20 Cello String
30 Saxophone Woodwind
Instruments:
Musician Schema
13
music:Instrument
rdfs:domain
music:Musician
rdf:type
rdfs:Class rdf:Property
music:firstName
music:lastName
music:plays
music:instName
music:instType
rdf:type
rdfs:domain
rdfs:domain
rdfs:range
rdfs:domainrdfs:domain
Triples From Tables
14
MID First Last Inst_ID
1 Eddie Van Halen 10
2 Yo Yo Ma 20
3 Kenny G 30
Musicians:
artist:1 rdf:type music:Musicianartist:2 rdf:type music:Musicianartist:3 rdf:type music:Musician
Turn each key into a resource and specify the proper type of each resource:
IID Instrument Type
10 Guitar String
20 Cello String
30 Saxophone Woodwind
Instruments:
instrument:10 rdf:type music:Instrumentinstrument:20 rdf:type music:Instrumentinstrument:30 rdf:type music:Instrument
Triples From Tables
15
MID First Last Inst_ID
1 Eddie Van Halen 10
2 Yo Yo Ma 20
3 Kenny G 30
Musicians:
artist:1 music:firstName "Eddie"artist:1 music:lastName "Van Halen"artist:2 music:firstName "Yo Yo"artist:2 music:lastName "Ma"artist:3 music:firstName "Kenny"artist:3 music:lastName "G"
Turn each cell into a triple based on the key, property (mapped per column), and value:
IID Instrument Type
10 Guitar String
20 Cello String
30 Saxophone Woodwind
Instruments:
instrument:10 music:instName "Guitar"instrument:10 music:instType "String"instrument:20 music:instName "Cello"instrument:20 music:instType "String"instrument:30 music:instName "Saxophone"instrument:30 music:instType "Woodwind"
Triples From Tables
16
MID First Last Inst_ID
1 Eddie Van Halen 10
2 Yo Yo Ma 20
3 Kenny G 30
Musicians:
artist:1 music:plays instrument:10artist:1 music:plays instrument:20artist:2 music:plays instrument:30
Turn each foreign key reference into a relationship between the foreign and primary resources.
IID Instrument Type
10 Guitar String
20 Cello String
30 Saxophone Woodwind
Instruments:
R2RML Triple Mapping
17
IID Instrument Type
10 Guitar String
music:Instrumentmusic:instName
music:instType
rdfs:domain
rdfs:domain
Instruments:
R2RML Triple Mapping
17
IID Instrument Type
10 Guitar String
music:Instrumentmusic:instName
music:instType
rdfs:domain
rdfs:domain
Instruments:
Triples Map rr:tableName
R2RML Triple Mapping
17
IID Instrument Type
10 Guitar String
music:Instrumentmusic:instName
music:instType
rdfs:domain
rdfs:domain
Instruments:
Triples Map
Subject Map"http://example.com/music/
Inst-{iid}"
rr:class
rr:tableName
R2RML Triple Mapping
17
IID Instrument Type
10 Guitar String
music:Instrumentmusic:instName
music:instType
rdfs:domain
rdfs:domain
Instruments:
Triples Map
Subject Map"http://example.com/music/
Inst-{iid}"
Predicate Object Map
Predicate Map
Object Map
rr:class
rr:tableName
rr:predicate
rr:column
R2RML Triple Mapping
17
IID Instrument Type
10 Guitar String
music:Instrumentmusic:instName
music:instType
rdfs:domain
rdfs:domain
Instruments:
Triples Map
Subject Map"http://example.com/music/
Inst-{iid}"
Predicate Object Map
Predicate Map
Object Map
rr:class
rr:tableName
rr:predicate
rr:column
Domain
ontology
R2RML Triple Mapping
17
IID Instrument Type
10 Guitar String
music:Instrumentmusic:instName
music:instType
rdfs:domain
rdfs:domain
Instruments:
Triples Map
Subject Map"http://example.com/music/
Inst-{iid}"
Predicate Object Map
Predicate Map
Object Map
rr:class
rr:tableName
rr:predicate
rr:column
R2RML Triple Mapping
17
IID Instrument Type
10 Guitar String
music:Instrumentmusic:instName
music:instType
rdfs:domain
rdfs:domain
Instruments:
Triples Map
Subject Map"http://example.com/music/
Inst-{iid}"
Predicate Object Map
Predicate Map
Object Map
rr:class
rr:tableName
rr:predicate
rr:column
Database
R2RML Triple Mapping
17
IID Instrument Type
10 Guitar String
music:Instrumentmusic:instName
music:instType
rdfs:domain
rdfs:domain
Instruments:
Triples Map
Subject Map"http://example.com/music/
Inst-{iid}"
Predicate Object Map
Predicate Map
Object Map
rr:class
rr:tableName
rr:predicate
rr:column
R2RML Triple Mapping
17
IID Instrument Type
10 Guitar String
music:Instrumentmusic:instName
music:instType
rdfs:domain
rdfs:domain
Instruments:
Triples Map
Subject Map"http://example.com/music/
Inst-{iid}"
Predicate Object Map
Predicate Map
Object Map
rr:class
rr:tableName
rr:predicate
rr:columnR2RML
R2RML Triple Mapping
17
IID Instrument Type
10 Guitar String
music:Instrumentmusic:instName
music:instType
rdfs:domain
rdfs:domain
Instruments:
Triples Map
Subject Map"http://example.com/music/
Inst-{iid}"
Predicate Object Map
Predicate Map
Object Map
rr:class
rr:tableName
rr:predicate
rr:column
Registry• Semantic data sources are self-describing
and use a common protocol
• Easy to build into a registry w/ additional metadata (also described with RDFS/OWL)
18
Benefits of semantic technology stack
1. Common data model
2. Precise description
3. Uniform access
4. Federation
19
1. Common data model
• RDF provides common model for both data and descriptions of all kinds
• Very flexible (but also very fine-grained)
20
21
dbp: http://dbpedia.org/resource/ex: http://example.org/ontology/rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#rdfs: http://www.w3.org/2000/01/rdf-schema#
ex:City
dbp:London
rdf:type
ex:cityFounded47
rdf:Property
rdf:type
rdfs:domainrdfs:range
xsd:gYear
2. Precise flexible description
rdf:Class
rdf:type
3. Uniform access
• SPARQL 1.1
• SPARQL Protocol
• HTTP
22
23
Semantic Data Source
Semantic Data Source
Semantic Data Source
4. Federation
RelationalDatabase
DBPedia
Data Integration Solutions(with semantics)
1. Discovery and description
2. Internal integration
3. External integration
4. Nomadic data
5. Inflexible interfaces
24
Challenges
25
Challenges
• Relating data domains
25
Challenges
• Relating data domains
• Security
25
Challenges
• Relating data domains
• Security
• Unconstrained query access
25
Challenges
• Relating data domains
• Security
• Unconstrained query access
• Federated query optimization
25
Thanks!
Visit us at http://revelytix.com or at our booth!
26