who’s who and what’s what with oracle database semantic...
TRANSCRIPT
Who’s Who and What’s What with Oracle Database Semantic Technologies
Xavier Lopez Jay BanerjeeDirector, Product Management Senior Director, Software DevelopmentOracle Server Technologies Oracle Server Technologies
<Insert Picture Here>
Agenda
• Intro: Oracle Database 11g Semantic Technologies
• Feature Overview • New features in 11.2
• Performance and scalability evaluation
The purpose of Oracle Semantic Technologies
• To fluidly combine diverse sources of information– For analysis, mining, reporting, problem solving, …
– Sources may be relational, calendars, eMail, social networks, spreadsheets, PDF, ….
• To enable machines to understand what we mean– Model machine-recognizable semantics through vocabularies
and information derivation rules
• To obtain more semantically rich information from enterprise relational databases– Natively, in SQL
Adoption of Semantics-enabled Business Applications
• Intelligence, Law Enforcement: – Threat analysis, asset tracking, integrated justice
• Integrated BioInformatics & Health Care– Bio-Pathway analysis, protein interaction
• Health Care Informatics– Patient records, reporting, bio-surveillance
• Finance– Fraud detection, Compliance Management
• Web and Social Network Solutions – Recommender, Social Network Analysis, Activity Analysis
• Media, Games, Content Management– Media metadata, content re-purposing
Success Factors
• Diverse industries adopting semantic web concepts– Realizing the need for a new approach to capture semantics– For flexible data integration– For knowledge modeling through linked graph-structured data
• Wide acceptance of standards– SQL, XML, RDF, RDFS, OWL, SPARQL
• Wide availability of Life Sciences ontologies– Snomed, Uniprot, UMLS, NCI Cancer ontology, etc
Relational to Ontological Mapping
Courtesy: SenseLab, Yale University
Relational to Ontological Mapping
Drug
Neuron
PathologicalAgent
Receptor
Channel
inhibitsinhibits
Agent
NeuronalProperty
PathologicalChange
involvesinvolves inhibits
Compartment
has
is_located_in
is_located_in
Courtesy: SenseLab, Yale University
Use Case: Integrated Bioinformatics Data
Source: Siderean Software
Case Study: National Intelligence
Information Extraction
Categorization, Feature/term ExtractionWeb Resources
News, Email, RSS
Content Mgmt. Systems
Processed Document Collection
RDF/OWL
AnalystBrowsing, Presentation, Reporting, Visualization, Query
SQL/SPARQL Query
Explore
Domain Specific
Knowledge Base
OWL
Ontologies
Ontology Engineering Modeling Process
Data Integration in Health Informatics
Run-Time Metadata
Enterprise Information Consumers (EICs)
Business Intelligence ClinicalAnalytics
PatientCare
WorkforceManagement
ModelPhysical
ModelVirtual
RelateDeploy
Access
Access
Oracle Spatial(Semantic Knowledge
base)
HTBCISLIS HIS
Data Integration in Health Informatics
Run-Time Metadata
Enterprise Information Consumers (EICs)
Business Intelligence ClinicalAnalytics
PatientCare
WorkforceManagement
ModelPhysical
ModelVirtual
RelateDeploy
Access
Access
Oracle Spatial(Semantic Knowledge
base)
HTBCISLIS HIS
Oracle Database 11g Release 1 RDF/OWL Capability
• Oracle 11g is the leading commercial database with native RDF/OWL data management
• Scalable & secure platform for wide-range of semantic applications
• Readily scales to ultra-large repositories (+1 billion)
• Choice of SQL or SPARQL query• Leverages Oracle Partitioning. RAC supported
• Growing ecosystem of 3rd party tools partners
• Native RDF graph data store• Manages billions of triples• Fast batch, bulk and incremental load
• SQL: SEM_Match • SPARQL: via Jena plug-in• Ontology assisted query of RDBMS data
• Forward chaining model • RDFS++ OWL, OWL Prime• User defined rule base
Key Capabilities:
Load / Storage
Query
Reasoning
Edit &
Transform
• RDF/OWL Data Management
• SQL & SPARQL Query
• Inferencing
• Semantic Rules
• Scalability & Security
• Graph Visualization
• Link Analysis
• Statistical Analysis
• Faceted Search
• Pattern Discovery
• Text Mining
Load, Query
& Inference
Applications &
Analysis
Semantic Data Management Workflow
Other Data Formats
RSS, email
TransactionSystems
Data Sources
Unstructured Content • Entity
Extraction & Transform
• Ontology Engineering
• Categorization
• Custom Scripting
Partners Partners
12
• Strong security for Semantic Technologies– Security policies and data classification for RDF data
• Semantic indexing for documents– Semantic indexing of documents based on popular natural language tools
• Faster, more efficient reasoning to find new relationships – Parallel and incremental inference, owl:sameAs optimization
• Change management for collaboration• Standards & open source support
– SPARQL query support for Filter, Union in SEM_MATCH table function– OWL: union, intersection, OWL 2 property chains, disjoint properties– Pellet OWL DL reasoner Integration – Jena V2.5– Java SDK for SPARQL for 3rd party integration e.g., Sesame– W3C SKOS & SNOMED ontologies
Oracle Database 11g Release 2 RDF/OWL Highlights
Semantic Web – Implementations
Stanford University
Swiss Institute of Bioinformatics
Hutchinson
Semantic Technology PartnersIntegrated Tools and Solution Providers:
Oracle Database 11g Semantic Technologies
16
Architectural Overview
Architectural Overview
Enterprise (Relational)
dataRDF/OWL data and
ontologies
Rulebases: OWL, RDF/S, user-defined
Inferred RDF/OWL
dataRD
F/O
WL
Ora
cle
DB Security: fine-grained Versioning: Workspaces
Sem
antic
In
dexe
s
Architectural Overview
Enterprise (Relational)
dataRDF/OWL data and
ontologies
Rulebases: OWL, RDF/S, user-defined
Inferred RDF/OWL
dataRD
F/O
WL
Ora
cle
DB Security: fine-grained Versioning: Workspaces
Sem
antic
In
dexe
s
Ontology-assisted Query of
Enterprise Data
Query RDF/OWL data
and ontologies
INFERLOAD
RD
F/S
Use
r-de
f.
OW
Lsub
sets
Bul
k-Lo
ad
Incr
. DM
L
Cor
e fu
ncti
onal
ity
QUERY (SQL-based SPARQL)
Java API support
SPARQL (Jena) Sesame
JDBC
Java Programs
SQL Interface
SQ
Lplu
s
PL/
SQ
L
SQ
Ldev
.
Pro
gram
min
gIn
terf
ace
Architectural Overview
Enterprise (Relational)
dataRDF/OWL data and
ontologies
Rulebases: OWL, RDF/S, user-defined
Inferred RDF/OWL
dataRD
F/O
WL
Ora
cle
DB Security: fine-grained Versioning: Workspaces
Sem
antic
In
dexe
s
Ontology-assisted Query of
Enterprise Data
Query RDF/OWL data
and ontologies
INFERLOAD
RD
F/S
Use
r-de
f.
OW
Lsub
sets
Bul
k-Lo
ad
Incr
. DM
L
Cor
e fu
ncti
onal
ity
QUERY (SQL-based SPARQL)
3rd-Party C
alloutsR
easoners: Pellet
NLP
Info. Extractor: C
alais, GA
TE
Java API support
SPARQL (Jena) Sesame
JDBC
Java Programs
SQL Interface
SQ
Lplu
s
PL/
SQ
L
SQ
Ldev
.
Pro
gram
min
gIn
terf
ace
Architectural Overview
Enterprise (Relational)
dataRDF/OWL data and
ontologies
Rulebases: OWL, RDF/S, user-defined
Inferred RDF/OWL
dataRD
F/O
WL
Ora
cle
DB Security: fine-grained Versioning: Workspaces
Sem
antic
In
dexe
s
Ontology-assisted Query of
Enterprise Data
Query RDF/OWL data
and ontologies
INFERLOAD
RD
F/S
Use
r-de
f.
OW
Lsub
sets
Bul
k-Lo
ad
Incr
. DM
L
Cor
e fu
ncti
onal
ity
QUERY (SQL-based SPARQL)
3rd-Party C
alloutsR
easoners: Pellet
NLP
Info. Extractor: C
alais, GA
TE
Java API support
SPARQL (Jena) Sesame
JDBC
Java Programs
SQL Interface
SQ
Lplu
s
PL/
SQ
L
SQ
Ldev
.
Pro
gram
min
gIn
terf
ace
Tools
Visualizer
(cytoscope)
Architectural Overview
Enterprise (Relational)
dataRDF/OWL data and
ontologies
Rulebases: OWL, RDF/S, user-defined
Inferred RDF/OWL
dataRD
F/O
WL
Ora
cle
DB Security: fine-grained Versioning: Workspaces
Sem
antic
In
dexe
s
Ontology-assisted Query of
Enterprise Data
Query RDF/OWL data
and ontologies
INFERLOAD
RD
F/S
Use
r-de
f.
OW
Lsub
sets
Bul
k-Lo
ad
Incr
. DM
L
Cor
e fu
ncti
onal
ity
QUERY (SQL-based SPARQL)
3rd-Party C
alloutsR
easoners: Pellet
NLP
Info. Extractor: C
alais, GA
TE
Java API support
SPARQL (Jena) Sesame
JDBC
Java Programs
SQL Interface
SQ
Lplu
s
PL/
SQ
L
SQ
Ldev
.
Pro
gram
min
gIn
terf
ace
3rd-Party Tools Topbraid Composer ProtégéSPARQL Endpoint Tools
Visualizer
(cytoscope)
Architectural Overview
Enterprise (Relational)
dataRDF/OWL data and
ontologies
Rulebases: OWL, RDF/S, user-defined
Inferred RDF/OWL
dataRD
F/O
WL
Ora
cle
DB Security: fine-grained Versioning: Workspaces
Sem
antic
In
dexe
s
Ontology-assisted Query of
Enterprise Data
Query RDF/OWL data
and ontologies
INFERLOAD
RD
F/S
Use
r-de
f.
OW
Lsub
sets
Bul
k-Lo
ad
Incr
. DM
L
Cor
e fu
ncti
onal
ity
QUERY (SQL-based SPARQL)
Semantic Data Modeling - Basic Concepts
• Resources (concepts, things, events, …) are uniquely identifiable through URIs
– trunk (of an elephant) and trunk (a storage chest) are different
– separately developed models can be reconciled/merged
• RDF Triples: Resources relate to other resources through properties/predicates (subject, predicate, object)
– (NYCityURI RDF:type CityClassURI)
• Rules, ontologies & inferencing provide the basis for richer semantics
– RDFS – class, subclass, property, subproperty, range, domain, etc
– OWL – transitive, symmetric, union of, disjoint with, etc
– Oracle user-defined rules:
(?a SSN ?x) (?b SSN ?X) => ( ?a same-as ?b)
• OWL schema provides structure but data can evolve continuously – e.g., new triple has a new property (John likes books)
Semantic Data Mgmt.: Example
Semantic Data Mgmt.: Example
:California :USA :NorthAmerica:partOf :partOf
:partOf owl:TransitivePropertyrdf:type
Semantic Data Mgmt.: Example
:partOf rdf:type owl:TransitiveProperty :California :partOf :USA:USA :partOf :NorthAmerica
Asserted Facts
:California :USA :NorthAmerica:partOf :partOf
:partOf owl:TransitivePropertyrdf:type
Semantic Data Mgmt.: Example
:partOf rdf:type owl:TransitiveProperty :California :partOf :USA:USA :partOf :NorthAmerica
Asserted Facts
:California :USA :NorthAmerica:partOf :partOf
:partOf owl:TransitivePropertyrdf:type
:partOf
Semantic Data Mgmt.: Example
:partOf rdf:type owl:TransitiveProperty :California :partOf :USA:USA :partOf :NorthAmerica
Asserted Facts
:California :partOf :NorthAmerica
Derived Facts
:California :USA :NorthAmerica:partOf :partOf
:partOf owl:TransitivePropertyrdf:type
:partOf
Semantic Data Mgmt.: Example
:partOf rdf:type owl:TransitiveProperty :California :partOf :USA:USA :partOf :NorthAmerica
Asserted Facts
:California :partOf :NorthAmerica
Derived Facts
:California :USA :NorthAmerica:partOf :partOf
:partOf owl:TransitivePropertyrdf:type
:partOf
Query: SELECT ?x ?y FROM … WHERE { ?x :partOf ?y }
Semantic Data Mgmt.: Example
:partOf rdf:type owl:TransitiveProperty :California :partOf :USA:USA :partOf :NorthAmerica
Asserted Facts
:California :partOf :NorthAmerica
Derived Facts
:California :USA :NorthAmerica:partOf :partOf
:partOf owl:TransitivePropertyrdf:type
:partOf
Query: SELECT ?x ?y FROM … WHERE { ?x :partOf ?y }
Result: ?x______ ?y__________ :California :USA :California :NorthAmerica :USA :NorthAmerica
Store Semantic Data
Hand_Fracture Arm_Fracture:subClassOf
• Scalable native graph data store in Oracle Database
– Oracle Database 11g stores up to 8 exabytes
• Semantic data stored optimally in relational tables
• Load Options: Bulk, Batch, and SQL INSERT
• Indexes: S-P-O, P-O-S, O-S-P
Inference RDF Data
• Native inferencing in the database for– RDF, RDFS, OWL subset– User-defined rules
• New relationships/triples are inferred and stored ahead of query time– Forward Chaining– Entailment stored persistently to minimize on-the-fly computation,
thus speeding query execution
• Automatic identification of new relationships (triples) Ex: :California :partOf :USA
:USA :partOf :NorthAmerica
=> :California :part of :NorthAmerica
OWL Subsets Supported
• RDFS++• RDFS plus owl:sameAs and owl:InverseFunctionalProperty
• OWLSIF (OWL with IF semantics)• Based on Dr. Horst’s pD* vocabulary¹
• OWLPrime• rdfs:subClassOf, subPropertyOf, domain, range• owl:TransitiveProperty, SymmetricProperty, FunctionalProperty,
InverseFunctionalProperty, inverseOf• owl:sameAs, differentFrom
• owl:disjointWith, complementOf,• owl:hasValue, allValuesFrom, someValuesFrom• owl:equivalentClass, equivalentProperty
• Jointly determined with domain experts, customers and partners
1 Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary
OWL DLOWL Lite
OWLPrime
Query Semantic Data
• Choice of SQL or SPARQL
• SPARQL-like graph queries can be embedded in SQL– SEM_MATCH operator includes a SPARQL query pattern,
– Key advantage – semantic queries can be combined with relational data
– Ex: find me all fractures related to upper_extremity_fracture that occurred in patients between ages 5 and 10
• Jena plug-in for Oracle can be used, includes a full SPARQL API
• Joseki end-point for SPARQL queries
Jena Adaptor for Oracle Database 11g
• Jena Adaptor for Oracle Database 11g Release 1• Implements Jena’s Graph/Model/BulkUpdateHandler/… APIs.
• Full SPARQL query (select, ask, construct, describe) support• Allows various forms of data loading
• Bulk/Batch/Incremental load RDF or OWL (in N3, RDF/XML, N-TRIPLE etc.) with long literal support
• Integrates Oracle Database release 11g RDF/OWL with tools• TopBraid Composer• External complete DL reasoners (e.g. Pellet)
• Jena Adaptor for Oracle Database 11g Release 2• SPARQL service endpoint supporting full SPARQL Protocol• Tight integration with Jena ARQ 2.7.0 (2.8.0) for faster query performance• Query management and execution control (timeout, parallel execution, abort…)• Support ARQ functions for projected variables• Extensible user-defined functions in SPARQL• Connection Pooling through OraclePool• API enhancements (parallel, incremental inference, query inferred data only…)• Utilities functions
24
• Create an Oracle object– oracle = new Oracle(oracleConnection);
• Create a GraphOracleSem Object– graph = new GraphOracleSem(oracle, model_name, attachment);
• Load data– graph.add(Triple.create(…)); // for incremental triple additions
• Collect statistics– graph.analyze();
• Run inference– graph.performInference();
• Collect statistics– graph.analyzeInferredGraph();
• Query– QueryFactory.create(…);– queryExec = QueryExecutionFactory.create(query, model);– resultSet = queryExec.execSelect();
Programming Semantic Applications in Java
No need to create model
manually!
Important for performance
!
25
Ability to restrict access to parts of the RDF graph based on the application/user context.
– An individual can only access information about the projecthe works on.
– Monetary value of a projectcan only be accessedby the project lead or theVP of the department.
– Only a department VP can access information about the department’s budget.
• Data access constraints are associated with RDF Classes and Properties.• SPARQL query is rewritten to include appropriate constraints.
Security: Virtual Private Database for RDF Data
projectHLS
hasL
ead
projectDOT
Andy Cathy
hasLead
hasStatus
hasS
tatu
s
OpenComplete
100,000 500,000
hasValueha
sVal
ue
NEDept
Steve 1,000,000
ownedBy
hasV
P
hasBudget
Susan
worksOn
RDF Metadata for enforcing VPD
projectDOT
Cathy
hasLead
hasStatus
Open
500,000
hasValue
NEDept
Steve 1,000,000
ownedBy
hasV
P
hasBudget
Susan
worksOn
:Contract rdfs:subClassOf :Project
:GovtProject rdfs:subClassOf :Project
…
:hasLead rdfs:domain :Project
:hasLead rdfs:range :Employee
:hasStatus rdfs:domain :Project
:hasValue rdfs:domain :Contract
:ownedBy rdfs:domain :Project
:ownedBy rdfs:range :Department
…
:hasVP rdfs:domain :Department
:hasVP rdfs:range :Employee
:hasBudget rdfs:domain :Department
Query : Get the list of projects and their values
SELECT ?proj ?val FROM ProjectsGraph WHERE { ?proj :hasValue ?val } .
RDF Metadata for enforcing VPD
projectDOT
Cathy
hasLead
hasStatus
Open
500,000
hasValue
NEDept
Steve 1,000,000
ownedBy
hasV
P
hasBudget
Susan
worksOn
:Contract rdfs:subClassOf :Project
:GovtProject rdfs:subClassOf :Project
…
:hasLead rdfs:domain :Project
:hasLead rdfs:range :Employee
:hasStatus rdfs:domain :Project
:hasValue rdfs:domain :Contract
:ownedBy rdfs:domain :Project
:ownedBy rdfs:range :Department
…
:hasVP rdfs:domain :Department
:hasVP rdfs:range :Employee
:hasBudget rdfs:domain :Department
Query : Get the list of projects and their values
SELECT ?proj ?val FROM ProjectsGraph WHERE { ?proj :hasValue ?val } .
RDF Metadata for enforcing VPD
projectDOT
Cathy
hasLead
hasStatus
Open
500,000
hasValue
NEDept
Steve 1,000,000
ownedBy
hasV
P
hasBudget
Susan
worksOn
:Contract rdfs:subClassOf :Project
:GovtProject rdfs:subClassOf :Project
…
:hasLead rdfs:domain :Project
:hasLead rdfs:range :Employee
:hasStatus rdfs:domain :Project
:hasValue rdfs:domain :Contract
:ownedBy rdfs:domain :Project
:ownedBy rdfs:range :Department
…
:hasVP rdfs:domain :Department
:hasVP rdfs:range :Employee
:hasBudget rdfs:domain :Department
Query : Get the list of projects and their values
SELECT ?proj ?val FROM ProjectsGraph WHERE { ?proj :hasValue ?val } .
RDF Metadata for enforcing VPD
projectDOT
Cathy
hasLead
hasStatus
Open
500,000
hasValue
NEDept
Steve 1,000,000
ownedBy
hasV
P
hasBudget
Susan
worksOn
:Contract rdfs:subClassOf :Project
:GovtProject rdfs:subClassOf :Project
…
:hasLead rdfs:domain :Project
:hasLead rdfs:range :Employee
:hasStatus rdfs:domain :Project
:hasValue rdfs:domain :Contract
:ownedBy rdfs:domain :Project
:ownedBy rdfs:range :Department
…
:hasVP rdfs:domain :Department
:hasVP rdfs:range :Employee
:hasBudget rdfs:domain :Department
Query : Get the list of projects and their values
SELECT ?proj ?val FROM ProjectsGraph WHERE { ?proj :hasValue ?val } .
?proj :hasStatus :Open .?proj :hasLead sys_context(..)” }
Security: Oracle Label Security for RDF Data • OLS enforces fine-grained security at the “row” level.
– Attempts to access key or non-key columns of a specific row are validated using row/data sensitivity labels.
– Row labels determine the sensitivity of the rows or the rights a person must posses in order to read or write the data.
– User labels indicate their access rights to the data records
• Notion of “row” does not exist for RDF data– Conceptually, a relational table row maps to a set of triples and the
sensitivity label applies to the complete set.
ContractID Organization ContractValue RowLabel
ProjectHLS N. America 1000000 SE:HLS:US
projectHLS
N.America
1000000
Organization
ContractValue
Subject Predicate Objects
SE:HLS:US
Sensitivity Label
Level
Compartment
Group
Securing triples with sensitivity labels
SE:HLS:US1000000ContractValueprojectHLS
SE:HLS:USN.AmericaOrganizationprojectHLS
RowLabelObjectPredicateSubject
• User’s read access label must cover sensitivity label of a triple to be read.
• Each resource has up to 3 labels (as subject, as predicate or as object)
• User’s write access label must cover labels for all 3 parts of a new triple.
• A new triple is assigned the user’s “initial session label”
• For inferred triples, label to be assigned may be based on the label for the subject, predicate, object, the last rule in inferencing path, etc.
Triples table
projectHLS
N.America
1000000
Organization
ContractValue
Subject Predicate Objects
SE:HLS:US
Sensitivity Label
Securing triples with sensitivity labels
SE:HLS:US1000000ContractValueprojectHLS
SE:HLS:USN.AmericaOrganizationprojectHLS
RowLabelObjectPredicateSubject
• User’s read access label must cover sensitivity label of a triple to be read.
• Each resource has up to 3 labels (as subject, as predicate or as object)
• User’s write access label must cover labels for all 3 parts of a new triple.
• A new triple is assigned the user’s “initial session label”
• For inferred triples, label to be assigned may be based on the label for the subject, predicate, object, the last rule in inferencing path, etc.
Triples table
projectHLS
N.America
1000000
Organization
ContractValue
Subject Predicate Objects
SE:HLS:US
Sensitivity Label
SE:HLS,FIN:US
Inferencing Optimizations for Improved Performance
• Enabling Parallel inference optionEXECUTE sem_apis.create_entailment('M_IDX',sem_models('M'),sem_rulebases('OWLPRIME'), sem_apis.REACH_CLOSURE, null, 'DOP=x');
– Where ‘x’ is the degree of parallelism (DOP)
• Enabling Incremental inference optionEXECUTE sem_apis.create_entailment ('M_IDX',sem_models('M'),sem_rulebases('OWLPRIME'),null,null, 'INC=T');
– Or, use the SEM_APIS.ENABLE_INC_INFERENCE procedure
• Enabling owl:sameAs option to limit duplicatesEXECUTE Sem_apis.create _entailment('M_IDX',sem_models('M'), sem_rulebases('OWLPRIME'),null,null,'OPT_SAMEAS=T');
Enhanced Standards Support
• Systematized Nomenclature of Medicine (SNOMED)– US NIH comprehensive clinical ontology – Complex ontology covering most areas of clinical medicine– Only Oracle has a commercially available reasoner with the
necessary OWL support, scalability and data persistence to inference both the SNOMED ontology and large SNOMED-based user data sets!
• W3C Simple Knowledge Organization System (SKOS)– New rulebase supporting the emerging SKOS standard on RDF– Enables easy sharing of controlled / structured vocabularies
(thesauri, taxonomies, classification schemes)– Enforces integrity constraints
New CREATE_ENTAILMENT Components• UNION: (OWL 1) owl:unionOf • INTERSECT & INTERSECTSCOH: (OWL 1) owl:intersectionOf
• SNOMED: (OWL 2) Systematized Nomenclature of Medicine
• PROPDISJH: (OWL 2) interaction between owl:propertyDisjointWith and rdfs:subPropertyOf.
• CHAIN: (OWL 2) Supports chains of length 2• SKOSAXIOMS: most of the axioms defined in SKOS reference• MBRLST: for any resource, every item in the list given as the
value of the skos:memberList property is also a value of the skos:member property.
• SVFH: certain interaction between owl:someValuesFrom and rdfs:subClassOf
• THINGH & THINGSAM: any defined OWL class is a subclass of owl:Thing & instances of owl:Thing are equal to themselves
New SEM_MATCH Table Function Syntax
• FILTER specifies a filter expression in the graph pattern to restrict the solutions to a querye.g., returns grandchildren info for only grandfathers who are residents
of either NY or CASELECT x, yFROM TABLE(SEM_MATCH('{?x :grandParentOf ?y . ?x rdf:type :Male . ?x :residentOf ?z
FILTER (?z = "NY" || ?z = "CA")}',}…
• UNION matches one of alternative graph patternse.g., grandfathers are returned only if they are residents of NY or CA or
own property in NY or CA, or if both conditions are trueSELECT x, yFROM TABLE(SEM_MATCH('{?x :grandParentOf ?y . ?x rdf:type :Male{{?x :residentOf ?z} UNION {?x :ownsPropertyIn ?z}}FILTER (?z = "NY" || ?z = "CA")}',…
Change Management for Semantic Data
• Manage public and private versions of semantic data
• Objective: Collaboration and “What if” analysis
• An RDF Model is version-enabled by version-enabling its Application table.
• Changed data is private to the version until it is merged.
Live or Production
Version
Version1 Version2
Version4 Version5Version3
Change Management
• Efficient data storage and querying– New versions created only for changed triples– SEM_MATCH queries are version aware, & apply to one
version only– Database Workspace Manager is the underlying feature
• Inferred data – Continues to be visible in the parent and child version, until
the child version has its own private changes– When asserted data is modified in the child, the inferred data
is available in the parent but invalid in the child – Child version can re-inference to support queries– Parent inference becomes invalid upon merge and must be
re-inferenced
Ontology-assisted Query (SEM_RELATED operator)
Finger_Fracture
Arm_Fracture
Upper_Extremity_Fracture
Hand_FractureElbow_FractureForearm_Fracture
rdfs:subClassOf
rdfs:subClassOf
rdfs:subClassOf
rdfs:subClassOf
ID DIAGNOSIS
1 Hand_Fracture
2 Rheumatoid_Arthritis
Patients
“Find all entries in diagnosis column that are related to ‘Upper_Extremity_Fracture’”
Syntactic query will not work:SELECT p_id, diagnosis FROMPatients WHERE diagnosis = ‘Upper_Extremity_Fracture’;
Ontology-assisted Query (SEM_RELATED operator)
Finger_Fracture
Arm_Fracture
Upper_Extremity_Fracture
Hand_FractureElbow_FractureForearm_Fracture
rdfs:subClassOf
rdfs:subClassOf
rdfs:subClassOf
rdfs:subClassOf
ID DIAGNOSIS
1 Hand_Fracture
2 Rheumatoid_Arthritis
Patients
“Find all entries in diagnosis column that are related to ‘Upper_Extremity_Fracture’”
Syntactic query will not work:SELECT p_id, diagnosis FROMPatients WHERE diagnosis = ‘Upper_Extremity_Fracture’;
SELECT p_id, diagnosis FROM PatientsWHERE SEM_RELATED ( diagnosis, ‘rdfs:subClassOf’, ‘Upper_Extremity_Fracture’, ‘Medical_ontology’) = 1;
Ontology-assisted Query (SEM_RELATED operator)
Finger_Fracture
Arm_Fracture
Upper_Extremity_Fracture
Hand_FractureElbow_FractureForearm_Fracture
rdfs:subClassOf
rdfs:subClassOf
rdfs:subClassOf
rdfs:subClassOf
ID DIAGNOSIS
1 Hand_Fracture
2 Rheumatoid_Arthritis
Patients
“Find all entries in diagnosis column that are related to ‘Upper_Extremity_Fracture’”
Syntactic query will not work:SELECT p_id, diagnosis FROMPatients WHERE diagnosis = ‘Upper_Extremity_Fracture’;
SELECT p_id, diagnosis FROM PatientsWHERE SEM_RELATED ( diagnosis, ‘rdfs:subClassOf’, ‘Upper_Extremity_Fracture’, ‘Medical_ontology’ = 1AND SEM_DISTANCE() <= 2;
Ontology-assisted Query (SEM_RELATED operator)
Finger_Fracture
Arm_Fracture
Upper_Extremity_Fracture
Hand_FractureElbow_FractureForearm_Fracture
rdfs:subClassOf
rdfs:subClassOf
rdfs:subClassOf
rdfs:subClassOf
ID DIAGNOSIS
1 Hand_Fracture
2 Rheumatoid_Arthritis
Patients
“Find all entries in diagnosis column that are related to ‘Upper_Extremity_Fracture’”
Syntactic query will not work:SELECT p_id, diagnosis FROMPatients WHERE diagnosis = ‘Upper_Extremity_Fracture’;
Semantic Indexing for Documents
• Lexical or syntactic indexing allows Textual documents to be searched for exact match of keywords
• Semantic indexing allows concepts, events, facts to be searched based on actual meaning
• Semantic indexing of a column in a relational table– Configure an entity extraction or NLP (natural language
processing) tool to plug in to the new semantic indexing API
– Examples of 3rd party tools: OpenCalais, GATE
– Declare the column to have a semantic index based on the selected extraction tool
– The index is simply an RDF graph for each row of the table
– Use SEM_CONTAINS operator in relation SQL queries
Extracting RDF from Documents
Domain Ontologies
Document Processor
Entity Extractor
Indiana authorities filed felony charges and a court issued an arrest warrant for a financial manager who apparently tried to fake his death by crashing his airplane in a Florida swamp. Marcus, 38, remained on the lam Tuesday afternoon, two days after authorities say he staged his disappearance and then rode out of a small Alabama town on a red motorcycle under cover of darkness.
Input Document
<rdf:RDF> <rdf:Description rdf:about=“http://../Marcus"> <rdf:type rdf:resource="http://../Person"/> <p:hasName .. >Marcus</c:name> <p:hasAge .. >38</c:age> <p:hasGender .. >Male</c:gender> </rdf:Description> <rdf:Description rdf:about="http://../FloridaSwamp"> <rdf:type rdf:resource="http://../NaturalFeature"/> <c:hasLocation>Florida</c:location> … </rdf:Description> …</rdf:RDF>
RDF/XML Output
Extracting RDF from Documents
Domain Ontologies
Document Processor
Entity Extractor
Indiana authorities filed felony charges and a court issued an arrest warrant for a financial manager who apparently tried to fake his death by crashing his airplane in a Florida swamp. Marcus, 38, remained on the lam Tuesday afternoon, two days after authorities say he staged his disappearance and then rode out of a small Alabama town on a red motorcycle under cover of darkness.
Input Document
<rdf:RDF> <rdf:Description rdf:about=“http://../Marcus"> <rdf:type rdf:resource="http://../Person"/> <p:hasName .. >Marcus</c:name> <p:hasAge .. >38</c:age> <p:hasGender .. >Male</c:gender> </rdf:Description> <rdf:Description rdf:about="http://../FloridaSwamp"> <rdf:type rdf:resource="http://../NaturalFeature"/> <c:hasLocation>Florida</c:location> … </rdf:Description> …</rdf:RDF>
RDF/XML Output
..
Major dealers and investors in over-the-counter derivatives agreed to report all credit ..
2
Indiana authorities filed felony charges and a court issued an arrest warrant for a financial manager who apparently tried to fake his death …
1
ArticleDocId
Newsfeed table
“Marcus”^^xsd:stringpred:hasNamep:Marcusr1
rc:Personrdf:typep:Marcusr1
........
“38”^^xsd:integerpred:hasAgep:Marcusr1
rc:Organizationrdf:typec:AcmeCorpr2
Subject Property ObjectNG
Triples tableRD
F/X
ML
fo
r ea
ch d
ocu
men
tr1
r2
Semantic Indexing for Documents
• Embed SPARQL graph pattern queries in SQL to find documents of interest and return relevant information.e.g., find documents that contain financial business organization names
SELECT docId FROM Newsfeed WHERE SEM_CONTAINS (article, '{?org rdf:type c:Organization .
?org pred:categoryName calais:BusinessFinance}’) = 1
• User-defined rules, domain ontologies and inferencing may be applied on the extracted metadata SELECT docId FROM Newsfeed WHERE SEM_CONTAINS (Articles, ‘ { ?org rdf:type c:Organization . ?org pred:categoryName c:BusinessFinance . ?org pred:location ?city . ?city geo:state “NY”^^xsd:string}’) = 1
• Allows combining triples extracted by multiple extraction tools. • Allows extension of the knowledge base with community feedback• Treat the entire index as one knowledge base, and use for analytics
Bulk Loader Performance on Desktop PCOntology
sizeTime Space (in GB)
bulk-load API[1]
Time
Sql*loader time range
low[2]
high[3]
RDF Model:
DataIndexes
RDF Values:
DataIndexes
Total:Data
Index
App Table:Data[4]
Staging Table:Data[5]
LUBM506.9 million
8 min 1min 4.3min
0.140.48
0.110.12
0.250.60
0.14 0.32
LUBM1000138 million
3hr 25min 19min1h 26m
2.759.32
2.302.33
5.0511.65
2.77 6.39
LUBM80001,106 million
30hr 43min 2h 35m11h 32m
21.9874.15
18.6219.45
40.6093.60
22.10 51.66
UniProt (old)207 million
4hr 40min 30m1h 55m
4.0613.86
1.442.18
5.5016.04
4.04 7.69
[1] Uses flags=>' VALUES_TABLE_INDEX_REBUILD ' [2] Less time for minimal syntax check. [3] More time is needed when RDF values used in N-Triple file are checked for correctness. [4] Application table has table compression enabled.[5] Staging table has table compression enabled.
• Results collected on a single CPU PC (3GHz), 4GB RAM, 7200rpm SATA 3.0Gbps, 32 bit Linux. RDBMS 11.1.0.6
• Empty network is assumed
40
Query PerformanceOntology LUBM506.8 million &
5.4 million inferredLUBM Benchmark Queries
OWLPrime
& new inference components
Query Q1 Q2 Q3 Q4 Q5 Q6 Q7
# answers 4 130 6 34 719 519842 67
Complete? Y Y Y Y Y Y Y
Time
(sec)0.05 0.75 0.20 0.5 0.22 1.86 1.71
Query Q8 Q9 Q10 Q11 Q12 Q13 Q14
# answers 7790 13639 4 224 15 228 393730
Complete? Y Y Y Y Y Y Y
Time
(sec)1.07 1.65 0.01 0.02 0.03 0.01 1.47
41
• Setup: Intel Q6600 quad-core, 3 7200RPM SATA disks, 8GB DDR2 PC6400 RAM, No RAID.
64-bit Linux 2.6.18. Average of 3 warm runs
Query Performance on Server Going Parallel
0
25
50
75
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14
LUBM1000 Query Performance
Tim
e (s
eco
nd
s)
LUBM Benchmark Query
DOP=1DOP=4
42
• Setup: Server class machine with 16 cores, NAND based flash storage, 32GB RAM,
Linux 64 bit, Average of 3 warm runs
hrs 3GHz single CPU Dual-core 2.33GHz CPU
Inference Performance
• OWLPrime (11.1.0.7) inference performance scales really well with hardware. It is not a parallel inference engine though.
43
Inference Performance
Parallel Inference(LUBM8000
1.06 billion triples+ 860M inferred)
• Time to finish inference: 12 hrs.
• 3.3x faster compared to serial inference in release 11.1
Parallel Inference(LUBM25000 3.3 billion triples
+ 2.7 billion inferred)
• Time to finish inference: 40 hrs.
• 30% faster than nearest competitor• 1/5 cost of other hardware configurations
Incremental Inference(LUBM80001.06 billion triples
+ 860M inferred)
• Time to update inference: less than 30 seconds after adding 100 triples.
• At least 15x to 50x faster than a complete inference done with release 11.1
Large scale owl:sameAs Inference(UniProt 1 Million sample)
• 60% less disk space required
• 10x faster inference compared to release 11.1
• Setup: Intel Q6600 quad-core, 3 7200RPM SATA disks, 8GB DDR2 PC6400 RAM, No RAID. 64-bit Linux 2.6.18. Assembly cost: less than USD 1,000
44
For More Information
http://search.oracle.com
Google Oracle RDF
Semantic Technologies