unifying data and domain knowledge using virtual views ibm t.j. watson research center lipyeow lim,...

18
Unifying Data and Domain Knowledge Using Virtual Views IBM T.J. Watson Research Center Lipyeow Lim, Haixun Wang, Min Wang, VLDB2007 2008. 1. 4 Summarized by Gong GI Hyun, IDS Lab., Seoul National University

Upload: elmer-domenic-daniel

Post on 29-Dec-2015

229 views

Category:

Documents


1 download

TRANSCRIPT

Unifying Data and Domain Knowledge Using Virtual Views

IBM T.J. Watson Research Center

Lipyeow Lim, Haixun Wang, Min Wang, VLDB2007

2008. 1. 4

Summarized by Gong GI Hyun, IDS Lab., Seoul National University

Copyright 2008 by CEBTCenter for E-Business Technology

Background

DBMS originally designed for transaction data.

Current DBMSs are not ready to manipulate data in con-nection with knowledge.

An unending quest

Database or Knowledge-base?

New applications: the Semantic web, etc.

New extensions are required to bridge the gap between data representation and knowledge representation/infer-encing

IDS Lab. Seminar - 2

Copyright 2008 by CEBTCenter for E-Business Technology

A Motivating Example

RDBMS allows us to query wines through attributes ID, Type, Origin, Maker, Price.

Human intelligence operates in a quite different way.

Humans have ability to combine data with the domain Knowledge.

IDS Lab. Seminar - 3

Copyright 2008 by CEBTCenter for E-Business Technology

A Motivating Example

Query 1

To find wines that originate from the US.

RDB Case :

– Select W.ID From Wine as W Where W.Origin = ‘US’;

– Result : No result

Human’s answer : Zinfandel

– Zinfandel’s Origin EdnaValley is located in California.

IDS Lab. Seminar - 4

Copyright 2008 by CEBTCenter for E-Business Technology

A Motivating Example

Query 2

Which wine is a red wine?

RDB Case :

– Select W.ID From Wine as W Where W.hasColor = ‘red’;

– Result : ERROR!

– HasColor is not in the schema of the wine table.

Human’s Answer : Zinfandel

– Zinfandel is red.

Both the user and the DBMS must know what HasColor stands for when it appears in a query, and how to derive the value for Has-Color for any given wine.

IDS Lab. Seminar - 5

Copyright 2008 by CEBTCenter for E-Business Technology

Domain Knowledge from OWL Ontol-ogy

Wine Ontology from the web ontology language OWL (W3C)

Extract class hierarchies, (transitive) properties, implications from OWL

IDS Lab. Seminar - 6

Copyright 2008 by CEBTCenter for E-Business Technology

Challenges

How to incorporate domain knowledge (ontology) into a RDBMS?

Relational Data model remains ill-suited for semi-structured data.

How to integrate relational data with domain knowledge?

How to query relational data with meaning ?

How to process such queries ?

IDS Lab. Seminar - 7

Copyright 2008 by CEBTCenter for E-Business Technology

Overview of our solution

Create a relational virtual view on top of the data and the domain knowledge.

Data and knowledge can be queried together.

New knowledge can be derived

The virtual view is an interface through which users can query data, domain knowledge, and derived knowledge in a seamlessly unified manner.

Rewrite query on virtual view.

IDS Lab. Seminar - 8

Copyright 2008 by CEBTCenter for E-Business Technology

The Virtuality of the View

Users create virtual views over the relational data and the ontology.

Virtual columns/attributes not in original data.

Virtual columns not materialized -- inferred from the on-tology.

IDS Lab. Seminar - 9

Copyright 2008 by CEBTCenter for E-Business Technology

Creating the Vitual View

CREATE VIEW WineView(Id, Type, Origin,

Maker, Price, LocatedIn) AS

SELECT W.*, R.Regions

FROM Wine AS W, RegionKnowledge AS R

WHERE W.Origin = R.region

IDS Lab. Seminar - 10

Copyright 2008 by CEBTCenter for E-Business Technology

Queries

Query 1 : To find wines that originate from the US

Select ID from WineView Where ‘US’ in LocatedIn

Query 2 : Which wine is a red wine? Select ID from WineView Where hasColor = ‘red’;

IDS Lab. Seminar - 11

Copyright 2008 by CEBTCenter for E-Business Technology

Physical Storage Layer

The ontology is modeled as semi-structured data.

Traditional RDMSs cannot handle directly.

Hybrid Relational-XML DBMS

IBM DB2 9 PureXML supports XML

IDS Lab. Seminar - 12

Copyright 2008 by CEBTCenter for E-Business Technology

Ontology Repository

Ontology repository extracts several types of information from the ontology files including class hierarchies, implication rules, transitive properties.

Class Hierarchies

subClassOf

isA

IDS Lab. Seminar - 13

Transitive Property

TransitiveProperty

Implication rules

Implication graph

Copyright 2008 by CEBTCenter for E-Business Technology

Query Expanding

SELECT V.Id FROM WineView AS V WHERE .hasColor=White;

(Type=WhiteWine) → (hasColor=white)

(Type=Riesling) → (hasColor=white)

SELECT V.Id FROM Wine AS W WHERE W.type=WhiteWine OR W.type=Riesling;

IDS Lab. Seminar - 14

Copyright 2008 by CEBTCenter for E-Business Technology

Experiment

Investigate time to rewrite the queries on virtual views.

Measurement: rewriting time averaged over 5 randomly generated data sets.

Some tweaks :

Remove dead nodes

Memoization techniques

Pre-computation of predicate re-writing

IDS Lab. Seminar - 15

Copyright 2008 by CEBTCenter for E-Business Technology

Experiment

Implication Graph Density

Size of transitive property trees

IDS Lab. Seminar - 16

Copyright 2008 by CEBTCenter for E-Business Technology

Related Works

Ontology Tools

OntoEdit : Use a file system to store ontology.

RStar, KAON : Allow the ontology data to be stored in a RDB

Two Limitations of this loosely coupled approach

DBMS users cannot reference ontology data directly.

Ontology related query processing cannot leverage the query processing and optimization power of a DBMS.

A recent advance in ontology management in DBMSs was introduced by Oracle.

IDS Lab. Seminar - 17

Copyright 2008 by CEBTCenter for E-Business Technology

Conclusion

Framework for putting a little semantics into relational SQL systems.

Users register ontologies in DBMS and links them with re-lational data by creating virtual views.

Virtual columns in the virtual views are not materialized.

Queries on the virtual columns are rewritten to predi-cates on base table columns.

Future work: performance issues

IDS Lab. Seminar - 18