1 logical bayesian networks a knowledge representation view on probabilistic logical models daan...
TRANSCRIPT
1
Logical Bayesian NetworksA knowledge representation view on
Probabilistic Logical Models
Daan Fierens, Hendrik Blockeel, Jan Ramon, Maurice Bruynooghe
Katholieke Universiteit Leuven, Belgium
2
Probabilistic Logical Models
Variety of PLMs:• Origin in Bayesian Networks (Knowledge
Based Model Construction)• Probabilistic Relational Models• Bayesian Logic Programs• CLP(BN)• …
• Origin in Logic Programming• PRISM• Stochastic Logic Programs• …
THIS TALK
- learning- best known- most developed
3
Combining PRMs and BLPs
PRMs:• + Easy to understand, intuitive• - Somewhat restricted (as compared to BLPs)
BLPs:• + More general, expressive• - Not always intuitive
Combine strengths of both models in one model ?
We propose Logical Bayesian Networks (PRMs+BLPs)
4
Overview of this Talk
Example Probabilistic Relational Models Bayesian Logic Programs Combining PRMs and BLPs: Why and How ? Logical Bayesian Networks
5
Example [ Koller et al.]
University:• students (IQ) + courses (rating)
• students take courses (grade)
• grade IQ
• rating sum of IQ’s
Specific situation:• jeff takes ai, pete and rick take lp, no student
takes db
6
Bayesian Network-structure
rating(db) rating(ai)rating(lp)
iq(jeff)iq(pete) iq(rick)
grade(jeff,ai)grade(rick,lp)grade(pete,lp)
7
PRMs [Koller et al.]
PRM: relational schema,dependency structure (+ aggregates + CPDs)
key iq
Student
key rating
Course
key student
Takes
CPT aggr + CPT
course grade
Course
rating
Student
iq
Takes
grade
8
PRMs (2)
• Semantics: PRM induces a Bayesian network on the relational skeleton
key iq
jeff ?
pete ?
rick ?
Student
key rating
ai ?
lp ?
db ?
Course
key student
f1 jeff
f2 pete
f3 rick
Takes
course grade
ai ?
lp ?
lp ?
9
PRMs - BN-structure (3)
rating(db) rating(ai)rating(lp)
iq(jeff)iq(pete) iq(rick)
grade(jeff,ai)grade(rick,lp)grade(pete,lp)
10
PRMs: Pros & Cons (4)
Easy to understand and interpret Expressiveness as compared to BLPs, … :
• Not possible to combine selection and aggregation [Blockeel & Bruynooghe, SRL-workshop ‘03]
• E.g. extra attribute sex for students• rating sum of IQ’s for female students
• Specification of logical background knowledge ?
• (no functors, constants)
11
BLPs [Kersting, De Raedt]
Definite Logic Programs + Bayesian networks• Bayesian predicates (range)• Random var = ground Bayesian atom: iq(jeff)• BLP = clauses with CPT
rating(C) | iq(S), takes(S,C).
Range: {low,high}CPT + combining rule (can be anything)
• Semantics: Bayesian network• random variables = ground atoms in LH-model• dependencies grounding of the BLP
12
BLPs (2)
student(pete)., …, course(lp)., …, takes(rick,lp).
rating(C) | iq(S), takes(S,C).
rating(C) | course(C).
grade(S,C) | iq(S), takes(S,C).
iq(S) | student(S).
BLPs do not distinguish probabilistic and logical/certain/structural knowledge• Influence on readability of clauses
• What about the resulting Bayesian network ?
13
BLPs - BN-structure (3)
• Fragment:
iq(jeff)
grade(jeff,ai)
student(jeff)takes(jeff,ai)
student(jeff) iq(jeff)
truefalse
distribution for iq/1?
CPD
14
BLPs - BN-structure (3)
• Fragment:
iq(jeff)
grade(jeff,ai)
student(jeff)takes(jeff,ai)
distribution for grade/2, function of iq(jeff)
takes(jeff,ai) grade(jeff,ai)
truefalse ?
CPD
15
BLPs: Pros & Cons (4)
High expressiveness:• Definite Logic Programs (functors, …)
• Can combine selection and aggregation (combining rules)
Not always easy to interpret • the clauses
• the resulting Bayesian network
16
Combining PRMs and BLPs
Why ?• 1 model = intuitive + high expressiveness
How ? • Expressiveness: ( BLPs)
• Logic Programming
• Intuitive: ( PRMs)• Distinguish probabilistic and logical/certain knowledge• Distinct components (PRMs: schema determines
random variables / dependency structure)• (General vs Specific knowledge)
17
Logical Bayesian Networks
Probabilistic predicates (variables,range) vs Logical predicates
LBN - components:• Relational schema V
• Dependency Structure DE
• CPDs+ aggregates DI
Relational skeleton Logic Program Pl
• Description of DoD / deterministic info
18
Logical Bayesian Networks
Semantics:• LBN induces a Bayesian network on the
variables determined by Pl and V
19
Normal Logic Program Pl
student(jeff).
course(ai).
takes(jeff,ai).
student(pete).
course(lp).
takes(pete,lp).
student(rick).
course(db).
takes(rick,lp).
Semantics: well-founded model WFM(Pl) (when no negation: least Herbrand model)
20
Viq(S) <= student(S).
rating(C) <= course(C).
grade(S,C) <= takes(S,C).
Semantics: determines random variables• each ground probabilistic atom in WFM(Pl V) is random variable
• iq(jeff), …, rating(lp), …,grade(rick,lp)
• non-monotonic negation (not in PRMs, BLPs)• grade(S,C) <= takes(S,C), not(absent(S,C)).
21
DEgrade(S,C) | iq(S).
rating(C) | iq(S) <- takes(S,C).
Semantics: determines conditional dependencies
• ground instances with context in WFM(Pl)• e.g. rating(lp) | iq(pete) <- takes(pete,lp)• e.g. rating(lp) | iq(rick) <- takes(rick,lp)
22
V + DEiq(S) <= student(S).
rating(C) <= course(C).
grade(S,C) <= takes(S,C)
grade(S,C) | iq(S).
rating(C) | iq(S) <- takes(S,C).
23
LBNs - BN-structure
rating(db) rating(ai)rating(lp)
iq(jeff)iq(pete) iq(rick)
grade(jeff,ai)grade(rick,lp)grade(pete,lp)
24
DI The quantitative component
• ~ in PRMs: aggregates + CPDs
• ~ in BLPs: CPDs + combining rules For each probabilistic predicate p a logical CPD
• = function with• input: set of pairs (Ground prob atom,Value)• output: probability distribution for p
• Semantics: determines the CPDs for all variables about p
25
DI (2)
• e.g. for rating/1 (inputs are about iq/1)
If (SUM(iq(S),Val) Val) > 1000
Then 0.7 high / 0.3 low Else 0.5 high / 0.5 low• Can be written as logical probability tree (TILDE)
sum(Val, iq(S,Val), Sum), Sum > 1000
0.5 / 0.5 0.7 / 0.3
• cf [Van Assche et al., SRL-workshop ‘04]
26
DI (3)
DI determines the CPDs• e.g. CPD for rating(lp) = function of iq(pete) and
iq(rick)
• Entry in CPD for iq(pete)=100 and iq(rick)=120 ?
• Apply logical CPD for rating/1 to {(iq(pete),100),(iq(rick),120)}
• Result: probab. distribution 0.5 high / 0.5 low
If (SUM(iq(S),Val) Val) > 1000
Then 0.7 high / 0.3 low Else 0.5 high / 0.5 low
27
DI (4)
Combine selection and aggregation?• e.g. rating sum of IQ’s for female students
sum(Val, (iq(S,Val), sex(S,fem)), Sum), Sum > 1000
0.5 / 0.5 0.7 / 0.3
• again cf [Van Assche et al., SRL-workshop ‘04]
28
LBNs: Pros & Cons / Conclusion
Qualitative part (V + DE): easy to interpret High expressiveness
• Normal Logic Programs (non-monotonic negation, functors, …)
• Combining selection and aggregation Comes at a cost:
• Quantitative part (DI) is more difficult (than for PRMs)
29
Future Work: Learning LBNs
Learning algorithms for PRMs & BLPs• On high level: appropriate mix will probably do
for LBNs
• LBNs PRMs: learning quantitative component is more difficult for LBNs
• LBNs BLPs:• LBNs have separation V vs DE• LBNs: distinction probabilistic predicates vs logical
predicates = bias (but also used by BLPs in practice)