lecture 5: variational estimation and inference
TRANSCRIPT
Outline• Es$ma$on)of)Models)in)Exponen$al)Families
• Es$ma$on)with)par$al)observa$ons)9)EM
• Mean)Field)Methods
2
Factorized+Exponen0al+Family
Consider)an)exponen-al)family)of)joint)distribu-ons)over) :
Here,% %indicates%the%subset%of%components%involved%in%the% 6th%factor.
3
With%Complete%Observa2ons
Given& ,&the&op,mal&es,mates&are&given&by
• With&canonical¶meteriza1on,&this&is&convex.
• Parameters&may&come&with&constraints.
• There&can&be&analy%c'solu%ons,&otherwise,&one&can&solve&this&using&numerical&methods.&
4
Example:)GMM• GMM"involves:"observed*feature" "and"component*indicator" .
• How%to%es)mate%if%both% %and% %are%observed?
5
Par$ally'Observed'Models
Consider)an)exponen-al)family)involving)observed(variables) )and)latent(variables) :
Here,% %and% %refer%to%the%observed%parts%and%latent%parts%of%the%en2re%sample%set.
8
Par$ally'Observed'Models'(cont'd)
Given&an&observa,on& ,&we&have
where% %is%called%condi&onal)log+par&&on:
This%also%belongs%to%an%exponen&al)family.9
MLE$with$Par,al$Observa,ons
The$maximum&likelihood&es.mate$is$obtained$by$maximizing$the$marginal&likelihood$over$observed&data:
10
Issues
• The%condi&onal)log+par&&on% %as%below%is%o-en%very%difficult%to%evaluate:
• We$usually$resort$to$Expecta(on+Maximiza(on0(EM)$--$a$strategy$that$itera1vely$construct$and$maximize$lower$bounds$of$ .
11
Expecta(on+Maximiza(on
The$Expecta(on+Maximiza(on0(EM)$algorithm$is$coordinate0ascent$on$ :
• E"step:
• M"step:
14
E"step
• Each&E"step&reduces&to&maximize& ,&the&op4mal&solu4on&is&the&expecta*on&of& :
• By$conjugate*duality,$with$ ,$we$have$,$thus:
15
logL(✓|x)
Q(✓;µ(t+1))
Q(✓;µ(t))
✓(t�1)✓(t)✓(t+1)
EM#Op&mizes#
Sta$onary)point)is)a-ained)when))and) )are)dually&coupled,)w.r.t.)
both) )and) :
18
Info.&Geo.&Interpreta-on
• A#parameter# #indicates#a#condi0onal#distribu0on#over# :# .
• A#mean# #is#realized#by#another#condi0onal#distribu0on# #with# .
• The#KL#divergence#between#them:
19
Info.&Geo.&Interpreta-on
• For%any% %and% :
• E#step:)minimize) )to)close)the)gap)between) )and) .
• M#step:)M#projec;on)of) )onto) .
20
EM#with#iid#samples
Consider)a)common)problem:) )are)generated)from)an)exponen5al)family)distribu5on,)and)only) )is)observed)for)each) :
!
21
Varia%onal)EM
• Basic&idea:"Use"a"distribu-on" "from"a"tractable"family" "to"approximate" ,"and"thus"
"to"approximate" .
• This"is"to"restrict" "to".
• The"lower"bound"becomes:
27
Varia%onal)EM)(cont'd)
• Varia%onal)E+step:"with"restric+on"to" ,"compu+ng" "is"tractable:
!
• M"step:"remains"the"same
28
Varia%onal)E+step
• "is"usually"chosen"to"be"an"exponen&al)family,"parameterized"by" ."Then"the"varia&onal)E1step"reduces"into"two"steps.
• Step"1:"Find"op=mal" "through"I1projec&on:
• Step&2:&Compute&
29
Key$Problem
• With& &given,& &remains&an&exponen3al&family&distribu3on:&
&with&
&and& .
• &plays&a&key&role&in&model&es3ma3on.
• Key$problem:&choose&a&tractable&distribu3on& &from& &to&approximate& &and&compute&
30
Mean%Field%Methods
• Consider*an*exponen.al*family*distribu.on* *for*which*it*is*intractable*to*compute*the*mean*given* .
• Mean%field%methods*use*a*distribu.on* *from*a*tractable*family,*usually*in*a*product%form,*to*approximate*the*given*distribu.on* ,*and*use*
*to*approximate* .*
31
Product(Form
• We$say$a$joint$distribu1on$over$ $is$of$the$product(form,$if$its$density$can$be$wri8en:
• An$exponen&al)family$of$product)form:
32
Product(Form((cont'd)• Log%par))on+func)on:
• Expecta)on:
• If$each$factor$is$tractable,$then$the$whole$distribu5on$is$tractable.
33
Ising&Model&(approxima2on)
To#find# #that#approximates# ,#we#perform#I"projec)on#of# #onto#the#factorized1family# :
with% .
36
Ising&Model&(approxima2on)
The$best$approxima(on$can$be$solved$itera1vely:
Whereas' 'is'in'a'product(form,'the'parameters'associated'with'different'components'are'usually'coupled'in'the'op6mal'approxima6on.
37
Mean%Field%Theory
Consider)an)exponen&al)family:)
and$a$tractable(family$ .$Then$for$any$ :
!
!can!generally!be!factorized!into!simpler!forms.
38
Mean%Field%Theory%(cont'd)
The$difference$between$ $and$the$tractable(lower(bound$is$the$KL$divergence:
with% .%The%op+ma% %is%the%I"projec)on:
39
Naive&Mean&Field
The$mean%field%methods$are$called$naive%mean%field$when$ $is$of$product%form.$Consider:
and
40
Hence,&the&nega+ve&entropy&of& &can&be&factorized:
The$op'ma$ $can$be$solved$by$minimizing:
where% .
41
Naive&Mean&Field&(Op/ma)• This&problem&can&be&solved&by&coordinate*descent.&
• When&op6ma&is&a7ained:&
• Hence,'the'op,ma' 'is'given'by'
42
Naive&Mean&Field&(Discussion)
• In$naive$mean$field,$while$ $is$of$a$product$form,$the$parameters$associated$with$different$components$are$generally$coupled$in$the$op;mal$approxima;on.
• The$I"projec)on$problem$in$naive$mean$field$is$non#convex$in$general.$In$prac;ce,$the$coordinate$ascent$procedure$can$be$trapped$in$a$local-valley.$
• Generally,$it$is$unclear$how$far$ $is$from$ .
43
M
Nnd
↵
✓d
zdi
wdi
�k
Latent&Dirichlet&Alloca/on
• Variables
• Parameters:. ,.
• Observed:.
• Latent:. ,.
45