context-specific independence parameter learning: mle
TRANSCRIPT
![Page 1: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/1.jpg)
Use Chapter 3 of K&F as a reference for CSIReading for parameter learning: Chapter 12 of K&F
Context-specific independenceParameter learning: MLE
Graphical Models – 10708Carlos GuestrinCarnegie Mellon University
October 5th, 2005
![Page 2: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/2.jpg)
AnnouncementsHomework 2:
Out today/tomorrowProgramming part in groups of 2-3
Class projectTeams of 2-3 studentsIdeas on the class webpage, but you can do your own
Timeline:10/19: 1 page project proposal11/14: 5 page progress report (20% of project grade)12/2: poster session (20% of project grade)12/5: 8 page paper (60% of project grade)All write-ups in NIPS format (see class webpage)
![Page 3: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/3.jpg)
Clique trees versus VE
Clique tree advantagesMulti-query settingsIncremental updatesPre-computation makes complexity explicit
Clique tree disadvantagesSpace requirements – no factors are “deleted”Slower for single queryLocal structure in factors may be lost when they are multiplied together into initial clique potential
![Page 4: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/4.jpg)
Clique tree summarySolve marginal queries for all variables in only twice the cost of query for one variableCliques correspond to maximal cliques in induced graphTwo message passing approaches
VE (the one that multiplies messages)BP (the one that divides by old message)
Clique tree invariantClique tree potential is always the sameWe are only reparameterizing clique potentials
Constructing clique tree for a BNfrom elimination orderfrom triangulated (chordal) graph
Running time (only) exponential in size of largest cliqueSolve exactly problems with thousands (or millions, or more) of variables, and cliques with tens of nodes (or less)
![Page 5: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/5.jpg)
Global Structure: Treewidth w
))exp(( wnO
![Page 6: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/6.jpg)
Local Structure 1:Context specific indepencence
Battery Age Alternator Fan Belt
BatteryCharge Delivered
Battery Power
Starter
Radio Lights Engine Turn Over
Gas Gauge
Gas
Fuel Pump Fuel Line
Distributor
Spark Plugs
Engine Start
![Page 7: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/7.jpg)
Battery Age Alternator Fan Belt
BatteryCharge Delivered
Battery Power
Starter
Radio Lights Engine Turn Over
Gas Gauge
Gas
Fuel Pump Fuel Line
Distributor
Spark Plugs
Engine Start
Context Specific Independence (CSI)After observing a variable, some varsbecome independent
Local Structure 1:Context specific indepencence
![Page 8: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/8.jpg)
CSI example: Tree CPDRepresent P(Xi|PaXi) using a decision tree
Path to leaf is an assignment to (a subset of) PaXi
Leaves are distributions over Xi given assignment of PaXi on path to leaf
Interpretation of leaf: For specific assignment of PaXi on path to this leaf – Xi is independent of other parents
Representation can be exponentially smaller than equivalent table
Apply SAT Letter
Job
![Page 9: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/9.jpg)
Tabular VE with Tree CPDsIf we turn a tree CPD into table
“Sparsity” lost!Need inference approach that deals with tree CPD directly!
![Page 10: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/10.jpg)
Local Structure 2: Determinism
Battery Age Alternator Fan Belt
BatteryCharge Delivered
Battery Power
Starter
Radio Lights Engine Turn Over
Gas Gauge
Gas
Fuel Pump Fuel Line
Distributor
Spark PlugsEngine Start
ON OFF
OK
WEAK
DEAD
Lights
Bat
tery
P
ower .99 .01
.20 .800 1
If Battery Power = Dead, then Lights = OFF
Determinism
![Page 11: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/11.jpg)
Determinism and inference
Determinism gives a little sparsity in table, but much bigger impact on inferenceMultiplying deterministic factor with other factor introduces many new zeros
Operations related to theorem proving, e.g., unit resolution
ON OFF
OK
WEAK
DEAD
Lights
Bat
tery
P
ower .99 .01
.20 .800 1
![Page 12: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/12.jpg)
Today’s Models …Often characterized by:
Richness in local structure (determinism, CSI)Massiveness in size (10,000’s variables)High connectivity (treewidth)
Enabled by:High level modeling tools: relational, first orderAdvances in machine learningNew application areas (synthesis):
Bioinformatics (e.g. linkage analysis)Sensor networks
Exploiting local structure a must!
![Page 13: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/13.jpg)
Exact inference in large models is possible…
BN from a relational model
![Page 14: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/14.jpg)
Recursive ConditioningTreewidth complexity (worst case)
Better than treewidth complexity with local structure
Provides a framework for time-space tradeoffs
Only quick intuition today, details:Koller&Friedman: 3.1-3.4, 6.4-6.6“Recursive Conditioning”, Adnan Darwiche. In Artificial Intelligence Journal, 125:1, pages 5-41
![Page 15: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/15.jpg)
A. Darwiche
The Computational Power of Assumptions
Battery Age Alternator Fan Belt
BatteryCharge Delivered
Battery Power
Starter
Radio Lights Engine Turn Over
Gas Gauge
Gas
Leak
Fuel Line
Distributor
Spark Plugs
Engine Start
![Page 16: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/16.jpg)
A. Darwiche
The Computational Power of Assumptions
Battery Age Alternator Fan Belt
BatteryCharge Delivered
Battery Power
Starter
Radio Lights Engine Turn Over
Gas Gauge
Gas
Leak
Fuel Line
Distributor
Spark Plugs
Engine Start
![Page 17: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/17.jpg)
A. Darwiche
Decomposition
Battery Age Alternator Fan Belt
BatteryCharge Delivered
Battery Power
Starter
Radio Lights Engine Turn Over
Gas Gauge
Gas
Leak
Fuel Line
Distributor
Spark Plugs
Engine Start
![Page 18: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/18.jpg)
A. Darwiche
Case AnalysisBattery Age Alternator Fan Belt
Battery
Charge Delivered
Battery Power
Starter
Radio Lights Engine Turn Over
Gas Gauge
Gas
Leak
Fuel Line
Distributor
Spark Plugs
Engine Start
Battery Age Alternator Fan Belt
Battery
Charge Delivered
Battery Power
Starter
Radio Lights Engine Turn Over
Gas Gauge
Gas
Leak
Fuel Line
Distributor
Spark Plugs
Engine Start
+p p
![Page 19: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/19.jpg)
A. Darwiche
Case AnalysisBattery Age Alternator Fan Belt
Battery
Charge Delivered
Battery Power
Starter
Radio Lights Engine Turn Over
Gas Gauge
Battery Age Alternator Fan Belt
Battery
Charge Delivered
Battery Power
Starter
Radio Lights Engine Turn Over
Gas Gauge
Gas
Leak
Fuel Line
Distributor
Spark Plugs
Engine Start
Gas
Leak
Fuel Line
Distributor
Spark Plugs
Engine Start
* +pl pr p
![Page 20: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/20.jpg)
A. Darwiche
Case AnalysisBattery Age Alternator Fan Belt
Battery
Charge Delivered
Battery Power
Starter
Radio Lights Engine Turn Over
Gas Gauge
Battery Age Alternator Fan Belt
Battery
Charge Delivered
Battery Power
Starter
Radio Lights Engine Turn Over
Gas Gauge
Gas
Leak
Fuel Line
Distributor
Spark Plugs
Engine Start
Gas
Leak
Fuel Line
Distributor
Spark Plugs
Engine Start
* + *pl pr pl pr
![Page 21: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/21.jpg)
A. Darwiche
Case AnalysisBattery Age
Alternator Fan Belt
Battery
Charge Delivered
Battery Power
Starter
Radio Lights Engine Turn Over
Gas Gauge
Battery Age Alternator Fan Belt
Battery
Charge Delivered
Battery Power
Starter
Radio Lights Engine Turn Over
Gas Gauge
Gas
Leak
Fuel Line
Distributor
Spark Plugs
Engine Start
Gas
Leak
Fuel Line
Distributor
Spark Plugs
Engine Start
* + *pl pr pl pr
![Page 22: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/22.jpg)
A. Darwiche
Case AnalysisBattery Age
Alternator Fan Belt
Battery
Charge Delivered
Battery Power
Starter
Radio Lights Engine Turn Over
Gas Gauge
Battery Age Alternator Fan Belt
Battery
Charge Delivered
Battery Power
Starter
Radio Lights Engine Turn Over
Gas Gauge
Gas
Leak
Fuel Line
Distributor
Spark Plugs
Engine Start
Gas
Leak
Fuel Line
Distributor
Spark Plugs
Engine Start
* + *pl pr pl pr
![Page 23: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/23.jpg)
A. Darwiche
Decomposition TreeA B C D E
A A B B C
C DDB
E
B
f(A) f(A,B) f(B,C)
f(C,D) f(B,D,E)
Cutset
![Page 24: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/24.jpg)
A. Darwiche
Decomposition TreeA B C D E
A A B B C
C DDB
E
B
f(A) f(A,B) f(B,C)
f(C,D) f(B,D,E)
Cutset
![Page 25: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/25.jpg)
A. Darwiche
Decomposition TreeA B C D E
A A B C
C DD E
B
Time: O(n exp(w log n))Space: Linear(using appropriate dtree)
f(A) f(A,B) f(B,C)
f(C,D) f(B,D,E)
Cutset
![Page 26: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/26.jpg)
A. Darwiche
RC1
RC1(T,e) // compute probability of evidence e on dtree T
If T is a leaf nodeReturn Lookup(T,e)Else
p := 0for each instantiation c of cutset(T)-E do
p := p + RC1(Tl,ec) RC1(Tr,ec)return p
![Page 27: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/27.jpg)
A. Darwiche
Lookup(T,e)ΘX|U : CPT associated with leaf TIf X is instantiated in e, then
x: value of X in eu: value of U in eReturn θx|u
Else return 1 = Σx θx|u
![Page 28: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/28.jpg)
A. Darwiche
CachingA B C D E F
A
A B
B C
C D
D E E F
A B CABCABCABCABCABCABCABCABC
AB
C CC
.27
.39
Context
![Page 29: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/29.jpg)
A. Darwiche
CachingA B C D E F
A
A B
B C
C D
D E E F
A B CABCABCABCABCABCABCABCABC
Time: O(n exp(w))Space: O(n exp(w))(using appropriate dtree)
AB
C CC
.27
.39
Context
Recursive ConditioningAn any-space algorithm with treewidth complexity
Darwiche AIJ-01
![Page 30: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/30.jpg)
A. Darwiche
RC2
RC2(T,e)If T is a leaf node, return Lookup(T,e)y := instantiation of context(T)If cacheT[y] <> nil, return cacheT[y] p := 0For each instantiation c of cutset(T)-E do
p := p + RC2(Tl,ec) RC2(Tr,ec)
cacheT[y] := pReturn p
![Page 31: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/31.jpg)
A. Darwiche
Decomposition with Local Structure
B
C
X
A
A, B, C
X Independent of B, C given A
![Page 32: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/32.jpg)
A. Darwiche
Decomposition with Local Structure
B
C
X
A
A, B, C
X Independent of B, C given A
![Page 33: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/33.jpg)
A. Darwiche
Decomposition with Local Structure X Independent of B, C given A
B
C
X
A
A, B, C
No need to consider an exponential number of cases (in the cutset size) given local structure
![Page 34: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/34.jpg)
A. Darwiche
Caching with Local Structure
B
C
X
A
A,B,C
B,C
A
CBACBACBACBACBACBACBACBA
Structural cache
![Page 35: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/35.jpg)
A. Darwiche
Caching with Local Structure
B
C
X
A
A,B,C
B,C
A
CBACBACBACBACBACBACBACBA
Structural cache
![Page 36: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/36.jpg)
A. Darwiche
Caching with Local Structure
B
C
X
A
A,B,C
B,C
A
CBACBACBACBA
A
Non-Structural cache
CBACBACBACBACBACBACBACBA
Structural cache
No need to cache an exponential number of results (in the context size) given local structure
![Page 37: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/37.jpg)
A. Darwiche
Determinism…
B
C
X
A
A, B, C
XCXBXA
XCBA
⇒⇒⇒
¬⇒¬∧¬∧¬
A natural setup to incorporate SAT technology:
•Unit resolution to:•Derive values of variables•Detect/skip inconsistent cases
•Dependency directed backtracking•Clause learning
![Page 38: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/38.jpg)
CSI Summary
Exploit local structure Context-specific independenceDeterminism
Significantly speed-up inferenceTackle problems with tree-width in the thousands
AcknowledgementsRecursive conditioning slides courtesy of AdnanDarwicheImplementation available:
http://reasoning.cs.ucla.edu/ace
![Page 39: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/39.jpg)
Where are we?
Bayesian networks Represent exponentially-large probability distributions compactly
Inference in BNsExact inference very fast for problems with low tree-widthExploit local structure for fast inference
Now: Learning BNsGiven structure, estimate parameters
![Page 40: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/40.jpg)
Thumbtack – Binomial Distribution
P(Heads) = θ, P(Tails) = 1-θ
Flips are i.i.d.:Independent eventsIdentically distributed according to Binomial distribution
Sequence D of αH Heads and αT Tails
![Page 41: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/41.jpg)
Maximum Likelihood Estimation
Data: Observed set D of αH Heads and αT Tails Hypothesis: Binomial distribution Learning θ is an optimization problem
What’s the objective function?
MLE: Choose θ that maximizes the probability of observed data:
![Page 42: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/42.jpg)
Your first learning algorithm
Set derivative to zero:
![Page 43: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/43.jpg)
MLE for conditional probabilities
MLE estimate of P(X=x) =
MLE estimate of P(X=x|Y=y)Only consider subset of data where Y=y
![Page 44: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/44.jpg)
Learning the CPTs
x(1)
…x(m)
Data
![Page 45: Context-specific independence Parameter learning: MLE](https://reader031.vdocuments.us/reader031/viewer/2022012516/619020861fdc3b289e42110c/html5/thumbnails/45.jpg)
MLE learning CPTs for general BN
Vars X1,…,Xn and BN structure given
Each i.i.d. data point assigns a value all varsLikelihood of the data:
MLE for CPT P(Xi | PaXi):