inexact querying of xml. xml data may be irregular relational data is regular and organized. xml may...

34
Inexact Querying of XML Inexact Querying of XML

Upload: emerald-farmer

Post on 16-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

Inexact Querying of XMLInexact Querying of XML

Page 2: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

XML Data May be IrregularXML Data May be Irregular

• Relational data is regular and organized. XML may be

very different.

– Data is incomplete: Missing values of attributes in elements

– Data has structural variations: Relationships between

elements are represented differently in different parts of the

document

– Data has ontology variations: Different labels are used to

describe nodes of the same type

• (Note: In some of the upcoming slides, we have labels

on edges instead of on nodes.)

Page 3: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

1

11 12 14

Movie Database

Movie

Movie

Actor

22 23 25 26 27 2829

T.V. Series

Film

ActorActor

TitleName Name

Name

Title

Title Title

31 3234 35

KyleMacLachlan

NataliePortman

Harrison Ford

1977

Dune

StarWars

TwinPeaks

36

Year

1984

24

Year

21

Actor

Name

30

Mark Hamill

Léon

Movie

13

Title

33Magnolia

The movie has a year attribute

Incomplete DataIncomplete Data

The year of the movie is missing

Page 4: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

1

11 12 14

Movie Database

Movie

Movie

Actor

22 23 25 26 27 2829

T.V. Series

Film

ActorActor

TitleName Name

Name

Title

Title Title

31 3234 35

KyleMacLachlan

NataliePortman

Harrison Ford

1977

Dune

StarWars

TwinPeaks

36

Year

1984

24

Year

Actor

Name

30

Mark Hamill

Léon

Movie

13

Title

33Magnolia

Variations in StructureVariations in Structure

11

Movie below Actor

29

14

2121

Actor below Movie

Page 5: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

1

11 12 13

Movie Database

Movie

Movie

Actor

22 23 25 26 27 2829

T.V. Series

Film

ActorActor

TitleName Name

Name

Title

Title Title

31 3233 34

KyleMacLachlan

NataliePortman

Harrison Ford

1977

Dune

StarWars

TwinPeaks

35

Year

1984

24

Year

21

Actor

Name

30

Mark Hamill

Léon

Movie

13

Title

34Magnolia

A movie label A film label

Ontology VariationsOntology Variations

Page 6: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

The description of the

schema is large

(e.g., a DTD of XML)

The description of the

schema is large

(e.g., a DTD of XML)

It is difficult to use the schema when formulating queries

It is difficult to use the schema when formulating queries

Data is contributedby many users in a variety of designs

Data is contributedby many users in a variety of designs

The query should deal with differentstructures of data

The query should deal with differentstructures of data

The structure of the

database is changed

frequently

The structure of the

database is changed

frequently

Queries should be rewritten frequentlyQueries should be rewritten frequently

Need to allow the user to write an “approximate query” and have the query processor deal with it

Need to allow the user to write an “approximate query” and have the query processor deal with it

Page 7: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

The ProblemThe Problem

• In many different domains, we are given the option

to query some source of information

• Usually, the user only gets results if the query can

be completely answered (satisfied)

• In many domains, this is not appropriate, e.g.,

– The user is not familiar with the database

– The database does not contain complete information

– There is a mismatch between the ontology of the user

and that of the database

Page 8: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

Example 1Example 1

ישוב: באר שבע 03איזור חיוג :

Page 9: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

היישוב הנבחר אינו מופיע באיזור החיוג הנבחר!

Page 10: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

עלייה: חיפה – טכניוןירידה: אילת

Page 11: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

אין קו ישיר המחבר בין הנקודות הנבחרות

Page 12: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

עלייה: ירידה: אילת

Page 13: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

פרטי המקצוע: בסיסי נתונים

Page 14: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

לא נמצאו מקצועות מתאימים

Page 15: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

What Do Users Need?What Do Users Need?

• Users need a way to get interesting partial answers

to their queries, especially if a complete answer does

not exist

• These partial answers should contain maximal

information

• Problem:

– It is easy to define when an answer satisfies a query

– Hard to say when an answer that does not satisfy a query is

of interest

– Hard to say which incomplete answers are better than others

Page 16: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

Modeling a Database and a Modeling a Database and a QueryQuery

• It is useful to model both databases and

queries as labeled directed graphs

– Clean mathematical modeling!

– Captures the essentials of XPath, XQuery

Page 17: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

University DatabaseUniversity Database

Technion

University

NameDept Dept

Name Faculty Name Faculty

Professor

Name Teaches Teaches

Lecturer

Name Teaches

ComputerScience

ChanaIsraeli

Databases Bioinformatics AviLevy

Biology

MolecularBiology

Page 18: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

QueryQuery University

Dept

Faculty

Name

• Exact answers are

defined by exact

matchings, i.e.,

subgraph

homorphisms

• This query asks for the

names of all faculty

members (of any type)How would you write

this in XPath?

Page 19: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

Exact AnswersExact Answers

Technion

University

NameDept Dept

Name Faculty Name Faculty

Professor

Name Teaches Teaches

Lecturer

NameTeaches

ComputerScience

ChanaIsraeli

Databases Bioinformatics AviLevy

Biology

MolecularBiology

University

Dept

Faculty

Name

Page 20: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

Exact AnswersExact Answers

Technion

University

NameDept Dept

Name Faculty Name Faculty

Professor

Name Teaches Teaches

Lecturer

NameTeaches

ComputerScience

ChanaIsraeli

Databases Bioinformatics AviLevy

Biology

MolecularBiology

University

Dept

Faculty

Name

Page 21: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

Slightly More Complex QuerySlightly More Complex Query

University

Dept

Faculty

Name

• Returns faculty

members only from the

Biology Department

Biology

Page 22: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

Exact Answers Are Not Always Exact Answers Are Not Always UsefulUseful

• Problems with exact answers:

– labels are not always known

– content may be unknown, misspelled, etc.

– structure may be unknown, or may vary from one

representation to another

– we may actually want to perform a search, since the

query is a vague hypothesis

– do not allow users to get partial/vague answers

where none better exist

Page 23: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

Manually Adding InexactnessManually Adding Inexactness

• One can use language constructs in order to

get more flexible queries

• Example: Suppose we want to find courses,

with teachers that teach them but we don’t

know which hierarchy exists in the database:

– for each teacher, there is a list of courses or

– for each course, there is a list of teachers

– or both…

Page 24: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

Technion

University

NameDept Dept

Name Faculty Name Faculty

Teacher

Name Course Course

Teacher

NameCourse

ComputerScience

ChanaIsraeli

Databases Bioinformatics AviLevy

Biology

MolecularBiology

Teacher

Course

Query Needed:

Page 25: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

Technion

University

NameDept Dept

Name Faculty Name Faculty

Course

Name Teacher Teacher

Course

Name

ComputerScience

Bioinformatics ChanaIsraeli

Avi Levy

Biology

MolecularBiology

Course

Teacher

Query Needed:

Page 26: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

Manually Adding Inexactness Manually Adding Inexactness (cont.)(cont.)

• If we don’t know the hierarchy, we need

Teacher

Course

Course

Teacher

Union

Page 27: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

Manually Adding Inexactness Manually Adding Inexactness (cont.)(cont.)

• If we don’t know the hierarchy, we need:

• If we don’t know what exactly the labels are, we

might need:

Teacher

Course

Course

Teacher

Union

Teacher or Lecturer or Professor

Course or Seminar or Lab

UnionTeacher or Lecturer or

Professor

Course or Seminar or Lab

Page 28: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

Help!Help!

Page 29: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

IntuitionIntuition

• Users write regular queries, stating what

they are looking for

• The query processor uses a built-in strategy

to find answers that exactly satisfy the query

or inexactly satisfy the query

• Burden is on the query processor, not on the

user

Page 30: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

Inexact AnswersInexact Answers

• Many different definitions have been given

– For each definition, query processing algorithms have been

defined

• Examples:

– Allow some of the nodes of the query to be unmatched

– Allow edges in the query to be matched to paths in the

database

– Allow nodes to be matched to nodes with labels that have a

similar meaning

• Be careful so that answers are meaningful!

Page 31: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

Name

Area Code

City

Allow Unmatched Nodes: Bezeq Allow Unmatched Nodes: Bezeq QueryQuery

Phone Number

שמולביץ

באר שבע

03

Page 32: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

Eilat

Matching Edges to Paths: Matching Edges to Paths: Egged QueryEgged Query

Source

Destination

Technion-Haifa

Page 33: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

Similar Meaning LabelsSimilar Meaning Labels

Course

Name Details

בסיסי נתוניםבסיסי נתונים

Page 34: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values

Other Types of InexactnessOther Types of Inexactness

• Many other definitions have been given, e.g.,

– allow permutations of nodes in the query

– allow child nodes to be promoted

– interconnection

• Summary: Inexactness basically means that

we relax some of the query requirements!