on the semantics and evaluation of top-k queries in probabilistic databases
DESCRIPTION
On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases. Presented by Xi Zhang Feburary 8 th , 2008. Outline. Background Motivation Examples Top-k Queries in Probabilistic Databases Conclusion. Outline. Background Probabilistic database model - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/1.jpg)
On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases
Presented by Xi Zhang
Feburary 8th, 2008
![Page 2: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/2.jpg)
Outline Background Motivation Examples Top-k Queries in Probabilistic Databases Conclusion
![Page 3: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/3.jpg)
Outline Background
Probabilistic database model Top-k queries & scoring functions
Motivation Examples Top-k Queries in Probabilistic Databases Conclusion
![Page 4: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/4.jpg)
Probabilistic Databases Motivation
Uncertainty/vagueness/imprecision in data History
Imcomplete information in relational DB [Imielinski & Lipski 1984]
Probabilistic DB model [Cavallo & Pittarelli 1987] Probabilistic Relational Algebra [Fuhr & Rölleke 1997 etc.]
Comeback Flourish of uncertain data in real world application
Examples: WWW, Biological data, Sensor network etc.
![Page 5: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/5.jpg)
Probabilistic Database Model [Fubr & Rölleke 1997] Probabilisitc Database Model
A generalizaiton of relational DB Probabilistic Relational Algebra (PRA)
A generalization of standard relational algebra
![Page 6: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/6.jpg)
DocNo Term12334
IRDBIRDBAI
Prob0.90.70.80.50.8
DocTerm:
Basic Event
eDT(1, IR)
eDT(2, DB)
eDT(3, IR)
eDT(3, DB)
eDT(4, AI)
A Table in Probabilistic Database
Event expression
Independent events
![Page 7: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/7.jpg)
Probabilistic Relational Algebra Just like in Relational Algebra…
Selection Projection Join Union Difference -
![Page 8: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/8.jpg)
Probabilistic Relational Algebra Just like in Relational Algebra…
Selection Projection Join Union Difference -
![Page 9: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/9.jpg)
DocNo Term
12334
IRDBIRDBAI
Prob
0.90.70.80.50.8
DocTerm:
Basic Event
eDT(1, IR)
eDT(2, DB)
eDT(3, IR)
eDT(3, DB)
eDT(4, AI)
Selection
DocNo Term13
IRIR
Prob0.90.8
Complex Event
eDT(1, IR)
eDT(3, IR)
In derived table
Propositional expression of basic events
![Page 10: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/10.jpg)
DocNo Term
12334
IRDBIRDBAI
Prob
0.90.70.80.50.8
DocTerm:
Basic Event
eDT(1, IR)
eDT(2, DB)
eDT(3, IR)
eDT(3, DB)
eDT(4, AI)
Projection
Term
IRDBAI
Prob
0.980.850.80
Complex Event
eDT(1, IR) eDT(3, IR)
eDT(2, DB) eDT(2, DB)
eDT(4, AI)
![Page 11: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/11.jpg)
Join
DocNo Term
12
IRDB
Prob
0.90.7
DocTerm:
Basic Event
eDT(1, IR)
eDT(2, DB)
DocNo AName
12
BauerMeier
Prob
0.90.8
Basic Event
eDU(1, Bauer)
eDU(2, Meier)
DocAu:
DocAu.DocNo
AName DocTerm.DocNo
Term
1122
BauerBauerMeierMeier
1212
IRDBIRDB
Prob
0.9*0.90.9*0.70.8*0.90.8*0.7
Complex Event
eDU(1, Bauer) eDT(1, IR)
eDU(1, Bauer) eDT(2, DB)
eDU(2, Meier) eDT(1, IR)
eDU(2, Meier) eDT(2, DB)
![Page 12: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/12.jpg)
DocNo Term
12334
IRDBIRDBAI
Prob
0.90.70.80.50.8
DocTerm:
Basic Event
eDT(1, IR)
eDT(2, DB)
eDT(3, IR)
eDT(3, DB)
eDT(4, AI)
Join + Projection
DocNo
13
Prob
0.90.8
Complex Event
eDT(1, IR)
eDT(3, IR)
IR:
DocNo
23
Prob
0.70.5
Complex Event
eDT(2, DB)
eDT(3, DB)
DB:
DocNo AName
1222344
BauerBauerMeier
SchmidtSchmidt
KochBauer
Prob
0.90.30.90.80.70.90.6
Basic Event
eDU(1, Bauer)
eDU(2, Bauer)
eDU(2, Meier)
eDU(2, Schmidt)
eDU(3, Schmidt)
eDU(3, Koch)
eDU(3, Bauer)
DocAu:
AName
BauerSchimdt
AName
BauerMeier
Schmidt
Prob
0.810.56
Complex Event
eDU(1, Bauer) eDT(1, IR)
eDU(3, S) eDT(3, IR)
Prob
0.210.630.91
Complex Event
eDU(2, Bauer) eDT(2, DB)
eDU(2, Meier) eDT(2, DB)
(eDU(2, S) eDT(2, DB))(eDU(3, S) eDT(3, DB) )
AName
BauerSchmidt
0.81 * 0.21 = 0.17010.56 * 0.91 = 0.5096
Prob Complex Event
(eDU(1, B) eDT(1, IR)) (eDU(2, B) eDT(2, DB))(eDU(3, S) eDT(3, IR) )( (eDU(2, S) eDT(2, DB)) (eDU(3, S) eDT(3, DB) ) )0.4368
![Page 13: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/13.jpg)
DocNo Term
12334
IRDBIRDBAI
Prob
0.90.70.80.50.8
DocTerm:
Basic Event
eDT(1, IR)
eDT(2, DB)
eDT(3, IR)
eDT(3, DB)
eDT(4, AI)
DocNo
13
Prob
0.90.8
Complex Event
eDT(1, IR)
eDT(3, IR)
IR:
DocNo
23
Prob
0.70.5
Complex Event
eDT(2, DB)
eDT(3, DB)
DB:
DocNo AName
1222344
BauerBauerMeier
SchmidtSchmidt
KochBauer
Prob
0.90.30.90.80.70.90.6
Basic Event
eDU(1, Bauer)
eDU(2, Bauer)
eDU(2, Meier)
eDU(2, Schmidt)
eDU(3, Schmidt)
eDU(3, Koch)
eDU(3, Bauer)
DocAu:
AName
BauerSchimdt
AName
BauerMeier
Schmidt
Prob
0.810.56
Complex Event
eDU(1, Bauer) eDT(1, IR)
eDU(3, S) eDT(3, IR)
Prob
0.210.630.91
Complex Event
eDU(2, Bauer) eDT(2, DB)
eDU(2, Meier) eDT(2, DB)
(eDU(2, S) eDT(2, DB))(eDU(3, S) eDT(3, DB) )
AName
BauerSchmidt
0.81 * 0.21 = 0.17010.56 * 0.91 = 0.5096
Prob Complex Event
(eDU(1, B) eDT(1, IR)) (eDU(2, B) eDT(2, DB))(eDU(3, S) eDT(3, IR) )( (eDU(2, S) eDT(2, DB)) (eDU(3, S) eDT(3, DB) ) )0.4368
Intensional Semanticsv.s.
Extensional Semantics
Join + Projection
![Page 14: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/14.jpg)
Intensional v.s Extensional Intensional Semantics
Assume data independence of base tables Keeps track of data dependence during the
evaluation Extensional Semantics
Assume data independence during the evaluation Could be WRONG with probability computation!
![Page 15: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/15.jpg)
When Intensional = Extensional? No identical underlying basic events in the
event expression
AName
BauerSchmidt
Prob
0.81 * 0.21 = 0.17010.56 * 0.91 = 0.5096
Complex Event
(eDU(1, B) eDT(1, IR)) (eDU(2, B) eDT(2, DB))(eDU(3, S) eDT(3, IR) )( (eDU(2, S) eDT(2, DB)) (eDU(3, S) eDT(3, DB) ) )0.4368
Identical basic event
![Page 16: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/16.jpg)
Fubr & Rölleke 1997 Summary
Probabilisitc DB Model Concept of event Basic v.s. complex event Event expression
Probabilistic Relational Algebra Just like in Relational Algebra…
Computation of event probabilities Intensional v.s. extensional semantics Yield the same result when NO data dependence in event
expressions
![Page 17: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/17.jpg)
Outline Background
Probabilistic database model Top-k queries & scoring functions
Motivation Examples Top-k Queries in Probabilistic Databases
Semantics Query Evaluation
Conclusion
![Page 18: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/18.jpg)
Top-k Queries Traditonally, given
Objects: o1, o2, …, on
An non-negative integer: kA scoring function s:
Question:What are the k objects with the highest score?
Have been studied in Web, XML, Relational Databases, and more recently in Probabilistic Databases.
![Page 19: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/19.jpg)
Scoring Function A scoring function s over a deterministic
relation R is
For any ti and tj from R,
![Page 20: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/20.jpg)
Outline Background Motivation Examples
Smart Enviroment Example Sensor Network Example
Top-k Queries in Probabilistic Databases Conclusion
![Page 21: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/21.jpg)
Motivating Example I Smart Environment
Sample Question “Who were the two visitors in the lab last Saturday night?”
Data Biometric data from sensors
We would be able to see how those data match the profile of every candidate -- a scoring function
Historical statistics e. g. Probability of a certain candidate being in lab on Saturday
nights
![Page 22: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/22.jpg)
Motivating Example I (cont.) Face Voice Detection, Detection,
Aiden score( 0.70 , 0.60, … ) = 0.65
Bob score( 0.50 , 0.60, … ) = 0.55
Chris score( 0.50 , 0.40, … ) = 0.45
Probability of being in lab on Saturday nights
0.3
0.9
0.4
PersonnelBiometrics
score( … )
Question: Find two people in the lab last Saturday night
a Top-2 query over the above probabilistic database under the above scoring function
![Page 23: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/23.jpg)
Motivating Example II Sensor Network in a Habitat
Sample Question “What is the temperature of the warmest spot?”
Data Sensor readings from different sensors At a sampling time, only one “real” reading from a
sensor Each sensor reading comes with a confidence value
![Page 24: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/24.jpg)
Motivating Example II (cont.)
Temp (F)
2210
2515
Prob
0.6
0.1
Question: What is the temperature of the warmest spot?
a Top-1 query over the above probabilistic database under the scoring function proportional to temperature
0.4
0.6
C1 (from Sensor 1)
C2 (from Sensor 2)
![Page 25: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/25.jpg)
Outline Background Motivation Examples Top-k Queries in Probabilistic Databases
Semantics Query Evaluation
Conclusion
![Page 26: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/26.jpg)
Models A probabilistic relation Rp=<R, p, >
R: the support deterministic relation p: probability function : a partition of R, such that
Simple v.s. General probabilistic relation Simple
Assume tuple independence, i.e. ||=|R| E.g. smart environment example
General Tuples can be independent or exclusive, i.e. ||<|R| E.g. sensor network example
![Page 27: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/27.jpg)
ChallengesGiven
A probabilistic relation Rp=<R, p, > An injective scoring function s over R
No ties A non-negative integer k
What is the top-k answer set over Rp ? (Semantics)
How to compute the top-k answer of Rp ? (Query Evaluation)
![Page 28: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/28.jpg)
What is a “Good” Semantics? Desired Properties
Exact-k Faithfulness Stability
![Page 29: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/29.jpg)
Properties Exact-k
If R has at least k tuples, then exactly k tuples are returned as the top-k answer
Faithfulness A “better” tuple, i.e. higher in score and probability, is more
likely to be in the top-k answer, compared to a “worse” one Stability
Raising the score/prob. of a winning tuple will not cause it to lose
Lowering the score/prob. of a losing tuple will not cause it to win
![Page 30: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/30.jpg)
Global-Topk SemanticsGiven
A probabilistic relation Rp=<R, p, > An injective scoring function s over R
No ties A non-negative integer k
What is the top-k answer set over Rp ? (Semantics) Global-Topk
Return the k highest-ranked tuples according to their probability of being in top-k answers in possible worlds
Global-Topk satisfies aforementioned three properties
![Page 31: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/31.jpg)
Smart Environment Example
Score( 0.50 , 0.40, … ) = 0.45Chris
Score( 0.50 , 0.60, … ) = 0.55Bob
Score( 0.70 , 0.60, … ) = 0.65Aiden
Face Voice Detection, Detection,
Prob.
0.3
0.9
0.4
PersonnelBiometrics
Score( … )
Query: Find two people in lab on last Saturday night
AidenBob
Chris
Aiden Bob ChrisAidenBob
AidenChris
BobChris
0.0180.042 0.378 0.028 0.1620.108
0.2520.012
Top-2
possible worlds
Pr(Chris in top-2) = 0.028 + 0.012 + 0.252 = 0.292
Global-Topk Semantics:
Pr(Aiden in top-2) = 0.3Pr(Bob in top-2) = 0.9 Top-2 Answer
![Page 32: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/32.jpg)
Other Semantics Soliman, Ilyas & Chang 2007 Two Alternative Semantics
U-Topk U-kRanks
![Page 33: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/33.jpg)
U-Topk SemanticsGiven
A probabilistic relation Rp=<R, p, > An injective scoring function s over R
No ties A non-negative integer k
What is the top-k answer set over Rp ? (Semantics) U-Topk
Return the most probable top-k answer set that belongs to possible worlds
U-Topk does not satisfies all three properties
![Page 34: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/34.jpg)
Smart Environment Example
Score( 0.50 , 0.40, … ) = 0.45Chris
Score( 0.50 , 0.60, … ) = 0.55Bob
Score( 0.70 , 0.60, … ) = 0.65Aiden
Face Voice Detection, Detection,
Prob.
0.3
0.9
0.4
PersonnelBiometrics
Score( … )
Query: Find two people in lab on last Saturday night
AidenBob
Chris
Aiden Bob ChrisAidenBob
AidenChris
BobChris
0.0180.042 0.378 0.028 0.1620.108
0.2520.012
Top-2
possible worlds
Pr({Aiden, Bob}) = 0.162 + 0.108 = 0.27
U-Topk Semantics:
…Pr({Bob}) = 0.378 Top-2 Answer
![Page 35: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/35.jpg)
U-kRanks SemanticsGiven
A probabilistic relation Rp=<R, p, > An injective scoring function s over R
No ties A non-negative integer k
What is the top-k answer set over Rp ? (Semantics) U-kRanks
For i=1,2,…,k, return the most probable ith-ranked tuples across all possible worlds
U-kRanks does not satisfies all three properties
![Page 36: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/36.jpg)
Smart Environment Example
Score( 0.50 , 0.40, … ) = 0.45Chris
Score( 0.50 , 0.60, … ) = 0.55Bob
Score( 0.70 , 0.60, … ) = 0.65Aiden
Face Voice Detection, Detection,
Prob.
0.3
0.9
0.4
PersonnelBiometrics
Score( … )
Query: Find two people in lab on last Saturday night
AidenBob
Chris
Aiden Bob ChrisAidenBob
AidenChris
BobChris
0.0180.042 0.378 0.028 0.1620.108
0.2520.012
Top-2
possible worlds
e.g. Pr(Chris at rank-2) = 0.012 + 0.252 = 0.292
U-kRanks Semantics:
Top-2 Answer{Bob}
Aiden BobRank-1Rank-2
0.30
0.630.27
0.0280.264
Chris Highest at rank-1
Highest at rank-2
![Page 37: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/37.jpg)
Properties
Semantics Exact-k Faithfulness Stability
Global-TopkU-Topk
U-kRanks
YesNoNo
YesYes/No*
No
YesYesNo
* Yes when the relation is simple, No otherwise
A better sementics
![Page 38: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/38.jpg)
ChallengesGiven
A probabilistic relation Rp=<R, p, > An injective scoring function s over R
No ties A non-negative integer k
What is the top-k answer set over Rp ? (Semantics)
How to compute the top-k answer of Rp ? (Query Evaluation)
Global-Topk
![Page 39: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/39.jpg)
Global-Topk in Simple Relation Given Rp=<R, p, >, a scoring function s, a
non-negative integer k Assumptions
Tuples are independent, i.e. ||=|R| R={t1,t2,…tn}, ordered in the decreasing order of their
scores, i.e.
![Page 40: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/40.jpg)
Global-Topk in Simple Relation Query Evaluation
Recursion Pk,s(ti): Global-Topk probability of tuple ti
Dynamic Programming
![Page 41: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/41.jpg)
Optimization Threshold Algorithm (TA)
[Fagin & Lotem 2001] Given a system of objects, such that
For each object attribute, there is a sorted list ranking objects in the decreasing order of its score on that attribute
An aggregation function f combines individual attribute scores xi, i=1,2,…m, to obtain the overall object score f(x1,x2,…,xm)
f is monotonic f(x1,x2,…,xm)<= f(x’1,x’2,…,x’m) whenever xi<=x’i for every i
TA is cost-optimal in finding the top-k objects TA and its variants are widely used in ranking queries, e.g.
top-k, skyline, etc.
![Page 42: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/42.jpg)
Applying TA Optimization Global-Topk
Two attributes: probability & score Aggregation function: Global-Topk probability
![Page 43: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/43.jpg)
Global-Topk in General Relation Given Rp=<R, p, >, a scoring function s, a
non-negative integer k Assumptions
Tuples are independent or exclusive, i.e. ||<|R| R={t1,t2,…tn}, ordered in the decreasing order of their
scores, i.e.
![Page 44: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/44.jpg)
Global-Topk in General Relation Induced Event Relation
For each tuple in R, there is a probabilistic relation Ep=<E, pE, E> generated by the following two rules
Ep is simple
![Page 45: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/45.jpg)
Sensor Network Example Temp (F)
2210
2515
Prob
0.6
0.1
0.4
0.6
C1 (from Sensor 1)
C2 (from Sensor 2)
15 0.6
EventteC1
tet
0.6 =
0.6 = p(t)
For example:
Induced Event Relation (simple)
t=
where i=1
Prob
Rule 1
Rule 2
Prob. Relation (general)
![Page 46: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/46.jpg)
Global-Topk in General Relation
![Page 47: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/47.jpg)
Evaluating Global-Topk in General Relation For each tuple t, generate corresponding
induced event relation Compute the Global-Topk probability of t by
Theorem 4.3 Pick the k tuples with the highest Global-Topk
probability
![Page 48: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/48.jpg)
Summary on Query Evaluation Simple (Independent Tuples)
Dynamic Programming Tuples are ordered on their scores Recursion on the tuple index and k
General (Independent/Exclusive Tuples) Polynomial reduction to simple cases
![Page 49: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/49.jpg)
Complexity
Global-Topk U-Topk U-kRanks
Simple O(kn) O(kn) O(kn)
General O(kn2) Θ(mknk-1 lg n)* Ω(mnk-1)*
* m is a rule engine related factor m represents how complicated the relationship between tuples could be
![Page 50: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/50.jpg)
Outline Background Motivation Examples Top-k Queries in Probabilistic Databases Conclusion
![Page 51: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/51.jpg)
Conclusion Three intuitive semantic properties for top-k
queries in probability databases Global-Topk semantics which satisfies all the
properties above Query evaluation algorithm for Global-Topk in
simple and general probabilistic databases
![Page 52: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/52.jpg)
Future Problems Weak order scoring function
Allow ties Not clear how to extend properties Not clear how to define the semantics (other than “arbitrary
tie breaker”) Preference Strength
Sensitivity to Score Given a prob. relation Rp, if the DB is sufficiently large, by
manipulating the scores of tuples, we would be able to get different answers
NOT satisfied by our semantics NOT satisfied by any semantics in literature
Need to consider preference strength in the semantics
![Page 53: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/53.jpg)
Thank you !
![Page 54: On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816850550346895dde5585/html5/thumbnails/54.jpg)
Related Works Introduction to Probabilistic Databases
Probabilistic DB Model & Probabilistic Relational Algebra [Fubr & Rölleke 1997]
Top-K Query in Probabilistic Databases On the Semantics and Evaluation of Top-k
Queries in Probabilistic Databases [Zhang & Chomicki 2008]
Alternative Top-k Semantics and Query Evaluation in Probabilistic Databases [Soliman, Ilyas & Chang 2007]