probabilistic answers to relational queries (parq)
DESCRIPTION
Probabilistic answers to relational queries (PARQ). Octavian Udrea Yu Deng Edward Hung V. S. Subrahmanian. Content. Motivation and goals Running example Technical preliminaries CPO model CPO integration CPO inference algorithms Experimental results Ongoing work. Content. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/1.jpg)
Probabilistic answers to relational queries (PARQ)
Octavian UdreaYu DengEdward HungV. S. Subrahmanian
![Page 2: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/2.jpg)
Content
Motivation and goals Running example Technical preliminaries CPO model CPO integration CPO inference algorithms Experimental results Ongoing work
![Page 3: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/3.jpg)
Content
Motivation and goals Running example Technical preliminaries CPO model CPO integration CPO inference algorithms Experimental results Ongoing work
![Page 4: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/4.jpg)
Motivation
Query algebras do not take semantics into account when computing answers
Data is not always precise Ambiguity, insufficient information
Goal: Use probabilistic ontologies to improve query answer recall and quality
![Page 5: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/5.jpg)
The probabilistic solution
Compute and return answers with high probability ( > pthr)
Keep probabilities hidden from the user
Problems How do we assign a probability to each
data item? How do we choose pthr?
![Page 6: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/6.jpg)
Concepts
Constraint probabilistic ontologies Is-a graph with edges labeled with
probabilities Including conditional probabilities Disjoint decompositions
Ontologies associated with terms in a data source Attributes in a relation/XML Propositional entities in text sources
![Page 7: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/7.jpg)
Content
Motivation and goals Running example Technical preliminaries CPO model CPO integration CPO inference algorithms Experimental results Ongoing work
![Page 8: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/8.jpg)
Running example
Email fragment: “Ed Masters opposed the new marketing policy during the board meeting...Eric claimed Ed was not aware of the situation in the financial unit...”
OrganizationUnit
Board
d
Comittee Team Department Legal Executive Financial Marketing
d
JudicialBoardFinancial
ReviewBoardAuditingComittee
Board of Directors
Sales Department
d
0.1 0.5 0.1 0.2 0.1 0.4 0.4 0.1
0.5, <(hasSubject marketing), 0.65>
0.35
d
0.1
0.15
0.9 0.95
0.85
0.85
0.95
0.5, <(isPresent Ed_Masters) 0.75>
![Page 9: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/9.jpg)
Example: decompositions
Email fragment: “Ed Masters opposed the new marketing policy during the board meeting...Eric claimed Ed was not aware of the situation in the financial unit...”
OrganizationUnit
Board
d
Comittee Team Department Legal Executive Financial Marketing
d
JudicialBoardFinancial
ReviewBoardAuditingComittee
Board of Directors
Sales Department
d
0.1 0.5 0.1 0.2 0.1 0.4 0.4 0.1
0.5, <(hasSubject marketing), 0.65>
0.35
d
0.1
0.15
0.9 0.95
0.85
0.85
0.95
0.5, <(isPresent Ed_Masters) 0.75>
![Page 10: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/10.jpg)
Example: probability labels
Email fragment: “Ed Masters opposed the new marketing policy during the board meeting...Eric claimed Ed was not aware of the situation in the financial unit...”
OrganizationUnit
Board
d
Comittee Team Department Legal Executive Financial Marketing
d
JudicialBoardFinancial
ReviewBoardAuditingComittee
Board of Directors
Sales Department
d
0.1 0.5 0.1 0.2 0.1 0.4 0.4 0.1
0.5, <(hasSubject marketing), 0.65>
0.35
d
0.1
0.15
0.9 0.95
0.85
0.85
0.95
0.5, <(isPresent Ed_Masters) 0.75>
![Page 11: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/11.jpg)
Example: conditional probabilities
Email fragment: “Ed Masters opposed the new marketing policy during the board meeting...Eric claimed Ed was not aware of the situation in the financial unit...”
OrganizationUnit
Board
d
Comittee Team Department Legal Executive Financial Marketing
d
JudicialBoardFinancial
ReviewBoardAuditingComittee
Board of Directors
Sales Department
d
0.1 0.5 0.1 0.2 0.1 0.4 0.4 0.1
0.5, <(hasSubject marketing), 0.65>
0.35
d
0.1
0.15
0.9 0.95
0.85
0.85
0.95
0.5, <(isPresent Ed_Masters) 0.75>
![Page 12: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/12.jpg)
Running example: Sample queries
“Ed Masters opposed the new marketing policy during the board meeting...Eric claimed Ed was not aware of the situation in the financial unit...”
What type of board meeting is being discussed? Since Ed Masters is present, there is a 75%
probability it is a board of directors meeting What type of financial unit is referenced?
Since the subject is marketing policy, there is a 65% probability it is the Financial Review Board.
![Page 13: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/13.jpg)
Content
Motivation and goals Running example Technical preliminaries CPO model CPO integration CPO inference algorithms Experimental results Ongoing work
![Page 14: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/14.jpg)
Technical preliminaries: POB
POB schema: C is a finite set of classes is a directed acyclic graph me produces clusters (disjoint decompositions)
for each node me(OrganizationUnit) = {{Comittee, Board,
Team, Department}, {Legal, Executive, Financial, Marketing}}
maps each edge in to a positive rational number in [0,1]
),,,( meC
),( C
),( C
1),(),(, Ld
cdcmeLCc
![Page 15: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/15.jpg)
Back to the example
Email fragment: “Ed Masters opposed the new marketing policy during the board meeting...Eric claimed Ed was not aware of the situation in the financial unit...”
OrganizationUnit
Board
d
Comittee Team Department Legal Executive Financial Marketing
d
JudicialBoardFinancial
ReviewBoardAuditingComittee
Board of Directors
Sales Department
d
0.1 0.5 0.1 0.2 0.1 0.4 0.4 0.1
0.5, <(hasSubject marketing), 0.65>
0.35
d
0.1
0.15
0.9 0.95
0.85
0.85
0.95
0.5, <(isPresent Ed_Masters) 0.75>
![Page 16: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/16.jpg)
Constraint probabilities
Simple constraints: Only for entities NOT represented in the current
ontology Nil constraint:
Constraint probabilities: Pair , with p in [0,1] and a conjunction of
simple constraints
)(, ii AdomDDA
)( iAdomD
),( p
![Page 17: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/17.jpg)
Labeling
Labeling should not be arbitrary Invalid labeling may lead to time-consuming
consistency algorithms And to ambiguity in interpreting query answers
Valid labeling: No constraint refers to the entities associated
with this ontology There is exactly one nil constraint probability on
each edge
![Page 18: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/18.jpg)
Content
Motivation and goals Running example Technical preliminaries CPO model CPO integration CPO inference algorithms Experimental results Ongoing work
![Page 19: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/19.jpg)
The CPO model
CPO: C is a finite set of classes is a directed acyclic graph me produces clusters (disjoint decompositions)
for each node is a valid labeling for
Note there is no condition on the probabilities....yet!
),,,( meC
),( C
),( C
![Page 20: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/20.jpg)
CPO enhanced data sources
Associate CPOs with some attributes of a relation.
Associate CPOs with elements in an XML data store.
Associate CPOs with some keywords for text files.
CPOk
At most k probabilities on each edge CPO1 is a POB
![Page 21: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/21.jpg)
Answering queries
“Ed Masters opposed the new marketing policy during the board meeting...Eric claimed Ed was not aware of the situation in the financial unit...”
What type of board meeting is being discussed? Since Ed Masters is present, there is a 75%
probability it is a board of directors meeting Goal: Associate probabilities with
possible answers.
![Page 22: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/22.jpg)
Probability path
Email fragment: “Ed Masters opposed the new marketing policy during the board meeting...Eric claimed Ed was not aware of the situation in the financial unit...”
OrganizationUnit
Board
d
Comittee Team Department Legal Executive Financial Marketing
d
JudicialBoardFinancial
ReviewBoardAuditingComittee
Board of Directors
Sales Department
d
0.1 0.5 0.1 0.2 0.1 0.4 0.4 0.1
0.5, <(hasSubject marketing), 0.65>
0.35
d
0.1
0.15
0.9 0.95
0.85
0.85
0.95
0.5, <(isPresent Ed_Masters) 0.75>
![Page 23: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/23.jpg)
Probability path
if: c => c1 => c2 => … => ck => d f is a function defined on the chain
f selects one probability on each edge is the set of constraints selected
by f along with the probabilities
),(),( yxyxf
dc p
yx
pyxf,
),(
)( dc pf
![Page 24: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/24.jpg)
CPO consistency
CPO An arbitrary universe of objects O Interpretation ε is a mapping from C to 2O
ε is a taxonomic model iff: We assign objects to each class Objects cannot be shared between classes in
the same cluster => edges imply subset relations on the sets of
objects assigned to each class If A => B is labeled with probability p, at least
p percent of objects in A are also assigned to B
),,,( meC
![Page 25: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/25.jpg)
CPO consistency (cont’d)
CPO consistent it has a taxonomic probabilistic model
Deciding if a CPO is consistent is NP-complete The weight formula satisfiability
problem. A non-deterministic algorithm for
consistency checking is straightforward.
![Page 26: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/26.jpg)
Consistency approach
Identify a subclass of CPOs for which we can check consistency
Two parts: Pseudoconsistency – this was done for
POBs Well-structuredness – particular to
CPOs
![Page 27: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/27.jpg)
Pseudoconsistent CPO
CPO No two classes in the same cluster have a
common subclass The graph is rooted For every immediate distinct subclasses of c, they
either: Have no common subclass Have a greatest common subclass different from
them No cycles If c inherits from multiple clusters, all paths from
descendants of c to the root go through c
),,,( meC
![Page 28: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/28.jpg)
Pseudoconsistency
OrganizationUnit
Board
d
Comittee Team Department Legal Executive Financial Marketing
d
JudicialBoardFinancial
ReviewBoardAuditingComittee
Board of Directors
Sales Department
d
0.1 0.5 0.1 0.2 0.1 0.4 0.4 0.1
0.5, <(hasSubject marketing), 0.65>
0.35
d
0.1
0.15
0.9 0.95
0.85
0.85
0.95
0.5, <(isPresent Ed_Masters) 0.75>
![Page 29: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/29.jpg)
Weight factor
A set P of not-nil constraint probabilities If P is the empty set, wf(P) = 0
If P = {(p,γ)}, wf(P) = p
wf(P U Q) = wf(P) + wf(Q) – wf(P) * wf(Q)
Intuitive meaning: how many objects from class A do I have to assign to class B and satisfy the constraints?
![Page 30: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/30.jpg)
More weight factors
CPO c => d an edge We write: We define: Result: Conditions of taxonomic
interpretation can be satisfied by selecting at most w(c,d)*|Od| objects from d into c.
),,,( meC
),(')},{(),( 0 dctruepdc
))),('(,max(),( 0 dcwpdcw f
![Page 31: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/31.jpg)
Well-structured CPO
Conditional constraints on edges from the same cluster must be disjoint Otherwise, impossible to cpumte a
weight factor for the cluster edges. The sum of the weight factors for
edges in a cluster is ≤ 1
![Page 32: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/32.jpg)
Well-structuredness
0.85
OrganizationUnit
Board
d
Comittee Team Department Legal Executive Financial Marketing
d
JudicialBoardFinancial
ReviewBoardAuditingComittee
Board of Directors
Sales Department
d
0.1 0.5 0.1 0.2 0.1 0.4 0.4 0.1
0.5, <(hasSubject marketing), 0.65>
0.35
d
0.1,<NOT (IsPresent Ed_Masters) 0.2>
0.15
0.9 0.95
0.85
0.95
0.5, <(isPresent Ed_Masters) 0.75>
![Page 33: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/33.jpg)
Consistent CPOs revisited
A pseudoconsistent and well-structured CPO is consistent Pseudoconsistency accounts for most of
the conditions in the taxonomic interpretation
Well-structuredness accounts for the the assignment of objects to subclasses
![Page 34: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/34.jpg)
Consistency checking algorithm
Pseudoconsistency is O(n2e) and well-structuredness is O(n2k2) n – number of classes e – the number of edges k – the order of the CPO
Algorithm based on: Topological sort Dijskstra and derivatives
![Page 35: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/35.jpg)
CPO enhanced algebras
CPO enhanced algebras formally defined for: Relational data sources XML data stores Selection, projection, product, join, etc.
Ongoing work: RDF ehanced query algebra Directly related to RDF extraction from
text.
![Page 36: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/36.jpg)
Content
Motivation and goals Running example Technical preliminaries CPO model CPO integration CPO inference algorithms Experimental results Ongoing work
![Page 37: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/37.jpg)
CPO integration: motivationOrganizationUnit
Board
d
ComitteeExecutive Financial
d
FinancialReviewBoard
AuditingComittee
Board of Directors
d
0.1 0.50.4 0.4
0.5, <(hasSubject marketing), 0.65>
0.35
d
0.150.85
0.95
0.5, <(isPresent Ed_Masters) 0.75>
OrganizationUnit
Board
d
Team Department Management Financial Marketing
d
FO BoardBoard of Directors
Sales Department
d
0.5 0.1 0.2 0.4 0.4 0.1
0.7
d
0.15
0.90.95
0.3, <(isPresent Ed_Masters) 0.75>
Management :=: FinancialFinancialReviewBoard :=: FO Board
Email from ACME corp. to EVIL corp.: “During you last FO board meeting, the rising costs of quality assurance were not addressed. We would like to include this in our next auditing comittee meeting....
ACME corp. CPO EVIL corp. CPO
![Page 38: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/38.jpg)
Merging CPOs
Two scenarios: One data source that refers to similar
entities but from different application domains.
Example: ACME – EVIL correspondence Queries across multiple data sources Example: Two different CPOs
associated with distinct relations during a join query.
![Page 39: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/39.jpg)
Interoperation constraints
Since the CPOs being merged refer to similar entities, some classes may be euqivalent Equality constraints c1:=:c2
Possiblity: immediate subclassing constraints
Not really used – hardly feasible
![Page 40: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/40.jpg)
The integration problem
Two CPOs S1 = (C1, =>1, me1, φ1), S2 = (C2, =>2, me2, φ2)
Set of interoperation constraints I An integration witness is another
CPO S = (C, =>, me, φ) that satisifes S1, S2 and I
![Page 41: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/41.jpg)
Integration witness
Every class c in C1 U C2 Appears in C OR c:=:d appears in I and d є C i.e. no classes get “lost”
Similarly, no edges are lost No constraints are lost
If two identical constraint probabilities are on the same edge in both CPOs, take a probability p between the two
![Page 42: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/42.jpg)
Integration witness
Immediate subclassing constraints add edges to S
No cluster can be split as a result of merging
S is pseudoconsistent and well-structured (if it’s not, it’s of no use) Open problem: If it is not, how can we
minimally change it such that it has these properties?
![Page 43: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/43.jpg)
CPOmerge algorithm
CPOmerge produces an integration witness if exists
O(n3) – costly In pratice, much more efficient
through: Caching Some properties are preserved if the
original ontologies are pseudo-consistent and well-structured
![Page 44: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/44.jpg)
Who writes the interop constraints?
User – not feasible How to infer them? Intuitive solution: If enough neighbours
are in equality constraints, then infer respective nodes should be equivalent. But we still need some equivalence constraints
to get started – use lexical distance How many neighbors are “enough”?
![Page 45: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/45.jpg)
ICI – Simple solution
Neighbor: parent, immediate child, sibling from the same cluster
We define
ne – number of neighbors in equality constraints nc,d – number of neighbors of c,d Why? Number of equal neighbors / Total number
of neighbors (including self). Always < 1 ICI algorithm: if pe exceeds threshold, assume they
are equal Start with lexical distance
2
2),(
dc
ee nn
ndcp
![Page 46: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/46.jpg)
Content
Motivation and goals Running example Technical preliminaries CPO model CPO integration CPO inference algorithms Experimental results Ongoing work
![Page 47: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/47.jpg)
Give me a CPO…
Very little work so far on probabilistic ontologies. Nothing resembling CPOs around
How do we infer them: How do we build disjoint
decompositions? How do we infer probabilities?
![Page 48: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/48.jpg)
Building disjoint decompositions
Take regular ontologies from the Web Many sources: daml.org, SchemaWeb,
OntoBroker Modify CPOmerge to ignore labeling The merge result will contain
disjoint decompositions Equality constraints can be inferred
through ICI
![Page 49: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/49.jpg)
Infer probabilities – simple methods
Simple methods: Distribute probabilities uniformly within each
cluster For each cluster L in me(c), d=>c,
For any distance function (lexical or otherwise)
ce
ceDist
cdDist
cd e
ep
),(
),(
,
![Page 50: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/50.jpg)
Advanced methods
Probabilistic relational models with structural uncertainty Work by Dr. Getoor et. al
Classification approach Feature extraction determines entities of
interest Create conditional probabilities on those
entities User feedback approach
General, applicable to any of the above(ongoing work)
![Page 51: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/51.jpg)
Content
Motivation and goals Running example Technical preliminaries CPO model CPO integration CPO inference algorithms Experimental results Ongoing work
![Page 52: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/52.jpg)
Experimental setup
Java implementation CPO enhanced relational DB
Movies database maintained by Dr. Wiederhold
IMDB data IMDB to estimate recall Classifications from the Web to build
initial CPO
![Page 53: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/53.jpg)
Consistency check & inference
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
45 70 92 145 172 204 237 275 304 321 342ontology size [no. of classes]
run
nin
g t
ime
[s]
consistency
CPO inference
![Page 54: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/54.jpg)
Recall
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
45 70 92 145 172 204 237 275 304 321 342
Ontology size [no. of classes]
Re
call
threshold: 0.6
threshold:0.7
threshold:0.8
threshold:0.9
relational
![Page 55: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/55.jpg)
Precision
0
0.2
0.4
0.6
0.8
1
1.2
45 70 92 145 172 204 237 275 304 321 342
Ontology size [no. of classes]
Pre
cisi
on
Precision p:0.6
Precision p:0.7
Precision p:0.8
Precision p:0.9
Precision relational
![Page 56: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/56.jpg)
Answer quality
0.6
0.62
0.64
0.66
0.68
0.7
0.72
0.74
0.76
0.78
0.8
0 50 100 150 200 250 300 350 400
Ontology size [no. of classes]
Qu
alit
y [S
QR
T(P
reci
sio
n*R
eca
ll)]
Quality p:0.6
Quality p:0.7
Quality p:0.8
Quality p:0.9
Quality relational
![Page 57: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/57.jpg)
Query running time
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
45 70 92 145 172 204 237 275 304 321 342
Ontology size [no. of classes]
Ru
nn
ing
tim
e [s
]
Running time p:0.6
Running time p:0.7
Running time p:0.8
Running time p:0.9
Running time relational
![Page 58: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/58.jpg)
ICI quality
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0 0.2 0.4 0.6 0.8 1 1.2
Epsilon
Qu
alit
yJoin quality
Relational join quality
![Page 59: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/59.jpg)
Bottomline
Clear improvement in query answer quality Some time penalty, but reasonable
Very little user intervention CPOs are suited for a wide variety of
data sources Potentially, they can be used to convey
semantics across heterogenous data sources
![Page 60: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/60.jpg)
Content
Motivation and goals Running example Technical preliminaries CPO model CPO integration CPO inference algorithms Experimental results Ongoing work
![Page 61: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/61.jpg)
Current experimental setup
DBLP data over 60 years of scientific publications XML data set
CPOs from complex ontologies DBLP classification ACM classification of subjects
![Page 62: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/62.jpg)
Goals (1)
Determine the efficiency of advanced CPO inference methods
Experimentally determine the best approach in terms of minimizing user feedback
![Page 63: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/63.jpg)
Goals (2)
Use CPOs with RDF databases For extracting RDF from text as a means of
using semantic information For answering queries from RDF databases
Benefits: Probabilistic model is clearly formalized Proven improvement in answer quality
Experimentally determine what the probability threshold may be for various domains
![Page 64: Probabilistic answers to relational queries (PARQ)](https://reader035.vdocuments.us/reader035/viewer/2022062304/568145fc550346895db309de/html5/thumbnails/64.jpg)
Thank you