Download - n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

Transcript

Page 1: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

机器学习十大算法-EM算法

一.两个例子

例1：Dempster,Laird,Rubin(1977)

2 31 4

1 2 3 4

1 2 3 4

1 2 3 4

1 1 1( , , , ) ( , , , , )

2 4 4 4 4

! 2 1

! ! ! ! 4 4 4

log(2 ) ( ) log(1 ) log log

x xx x

X x x x x MN n

n

x x x x

x x x x C

C

设。

对数似然函数为：

L( |X)=logf(X; )=log[ ( ) ( ) ( ) ]

=

其中为与无关的常数。

多项分布

Page 2: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

2 31 4( | )0

2 1

x xx xL X

于是似然方程为：

解称为极大似然估计。

下面给出另外一种算法

2 31 2 4

1 1 2

1 2 2 3 4

1 2 2 3 4

2 4 2 3

( , ),

1 1 1, , , , ( , , , , , )

2 4 4 4 4

! 1 1( | , ) log

! ! ! ! ! 2 4 4

) log ( ) log(1 ) log '

'

x xy y x

C

x y y

y y x x x MN n

nL X Y

y y x x x

x x x C

C

将分为使

（）。

于是上述完全数据的对数似然函数为：

[ ( ) ( ) ( ) ]

=(y

其中为与无关的常数。

Page 3: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

2 42

2 2 3 4

ˆ y xy

y x x x

，但为缺失数据，不能使用。

EM算法的思路如下：( ) ( )

( )

2 1 ( ) ( )

( )

( )

2 4 2 3

( )

2 4 2 3

( )

14 2 3( )

/ 4| , ( , )

1/ 2 / 4 2

[ ( | , ) | , ]

[ ) log ( ) log(1 ) log ' | , ]

( [ | , ] ) log ( ) log(1 ) log '

( ) log ( ) log(1 ) lo2

t tt

t t

t

C

t

t

t

t

X BN x

E L X Y X

E x x x C X

E y X x x x C

xx x x

因为y

于是（E步）

(y

( )g ' ( | )tC Q 。

容易看出其MLE为：

二项分布

Page 4: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

( )

( )

14( )

1)

( )

12 3 4( )

2 4

2 2 3 4

max ( | )

2

2

ˆ

t

t

tt

t

t

Q

xx

xx x x

y x

y x x x

EM

（

然后求解（M步）

得迭代公式

注意与进行比较，可以看出

算法的合理性。

Page 5: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

若X=(125,18,20,34),不难计算此时极大似然估计为0.6268.

现用EM算法进行计算,可以看出不管初值如何,均收敛到极大似然估计。

k 0 theta 0.5

k 1 theta 0.608247

k 2 theta 0.624321

k 3 theta 0.626489

k 4 theta 0.626777

k 5 theta 0.626816

k 0 theta 0.2

k 1 theta 0.544166

k 2 theta 0.615135

k 3 theta 0.625256

k 4 theta 0.626613

k 5 theta 0.626794

k 6 theta 0.626818

实例计算如下：

Page 6: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

若X=(1997,906,904,32),不难此时计算极大似然估计为0.0357.现用EM算法计算如下：

k 0 theta 0.3

k 1 theta 0.139111

k 2 theta 0.0820893

k 3 theta 0.0576522

k 4 theta 0.0463409

k 5 theta 0.0409191

k 6 theta 0.0382769

k 7 theta 0.0369788

k 8 theta 0.0363386

k 9 theta 0.0360222

k 10 theta 0.0358657

k 11 theta 0.0357882

k 12 theta 0.0357499

k 13 theta 0.0357309

k 0 theta 0.9

k 1 theta 0.264753

k 2 theta 0.127901

k 3 theta 0.0774875

k 4 theta 0.0555629

k 5 theta 0.0453485

k 6 theta 0.0404376

k 7 theta 0.0380408

k 8 theta 0.0368625

k 9 theta 0.0362811

k 10 theta 0.0359938

k 11 theta 0.0358516

k 12 theta 0.0357813

k 13 theta 0.0357465

k 14 theta 0.0357292

Page 7: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

例2：混合分布

1 2

1

1

1

1 1

, , ,

g( , , , ) ( )

0 j 1,2, , ; 1.

L( | ) [ ( )]

n

m

m j j

j

m

j j

j

n m

j j i

i j

X X X

x p p p x

p m p

p X Log p X

设独立同分布，密度函数为

，

其中，

对数似然函数为

上述函数的最大似然估计不易求解。

Page 8: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

无关的常数。为与其中

)(

)]([

])([)|p（

为于是完全对数似然函数

.,,1),(的条件分布服从|且

;))0,0,1,0,,0((

布。为缺失数据，独立同分,,,设

1 1

1 1

1 1

21

pC

CLogpY

XLogYLogpY

XpLogZL

mjxeYX

peYP

YYY

m

jj

n

iij

n

i

m

jijijjij

n

i

m

ji

Y

j

Y

jC

jjii

jji

n

ijij

Page 9: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

1

1p̂ , 1, , .

n

j ij ij

i

Y j m Yn

但为缺失数据，无法使用。

(t) ( )

( )

1 1

( )

( )1 1

1

E

Q(p|p ) [ ( | ) | , ]

[ ( | , )]

( )[ ]

( )

t

C

m nt

j ij

j i

tm nj j i

j mtj i

j j i

j

E L p Z X p

Logp E Y X p C

p XLogp C

p X

步：

容易看出其MLE为：

EM算法如下：

Page 10: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

( ) ( )

( ) ( )

( )

1

( )

( )

( )

1

( )

( 1) ( )

1

( ) ( | )( | , )

( ) ( | )

( ), 1, , .

( )

max ( | )

1, 1, , .

t t

t t

i j i i jp pt

i j i m

i j i i jp pj

t

j j i t

ijmt

j j i

j

t

p

nt t

j ij

i

P Y e P X Y eP Y e X p

P Y e P X Y e

p XY j m

p X

M

Q p p

p Y j mn

上式条件期望的计算利用贝叶斯公式

步：

解得迭代公式

Page 11: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

p <- 0.7 #mixing parameter

n <- 50 #sample size

mu <- c(3, 10) #parameters of the normal densities

sigma <- c(1, 1)

# generate the mixed normal distribution sample

i <- sample(1:2, size=n, replace=TRUE, prob=c(p, 1-p))

x <- rnorm(n, mu[i], sigma[i])

# mle of the mixed normal distribution

logL <- function(y) {

-sum( log(y * dnorm(x, mu[1], sigma[1]) +

(1-y) * dnorm(x, mu[2], sigma[2])) )

}

optimize(logL, lower=0, upper=1)

$minimum

[1] 0.7399919

$objective

[1] 102.5042

Page 12: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

# EM alogrithm

p <- 0.2 #intialize of parameter

y <- c(1:n)

k <- 1

while(k <= 5) {

for(i in 1:n) {

y[i] <- ( p * dnorm(x[i], mu[1], sigma[1]) ) /

( p * dnorm(x[i], mu[1], sigma[1]) + ( 1 - p ) * dnorm(x[i], mu[2], sigma[2]) )

}

p <- sum( y ) / n

print(p)

k <- k+1

}

[1] 0.7400001

[1] 0.740003

[1] 0.740003

[1] 0.740003

[1] 0.740003

Page 13: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

例3：带参数的混合分布

2

2

1 2

1

( )

2

1

2

1 1

1 1

, , ,

( ; ) ( )

10 1, ( ) ,j 1,2, , .

2

( , , , , , , ).

L( | ) [ ( )]

i

n

m

j j

j

xm

j j j

j

m m

n m

j j i

i j

X X X

f x p f x

p p f x e m

p p

p X Log p f X

设独立同分布，密度函数为

，

其中，

对数似然函数为

上述函数的最大似然估计不易求解。

Page 14: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

1 2

1 1

1 1

2

1 1

, , ,

( (0, ,0,1,0, 0)) ;

| ( ), 1, , .

( | ) [ ( )]

[ ( )]

(1( ) [log

2

ij ij

n

i j j

i i j j

n mY Y

C j j i

i j

n m

ij j ij j i

i j

m ni

ij j ij

j i

Y Y Y

P Y e p

X Y e x j m

L Z Log p f X

Y Logp Y Logf X

XY Logp Y

设为缺失数据，独立同分布。

且的条件分布服从f

于是完全对数似然函数为

2

21 1

)]

m nj

j i

C

C

其中为与无关的常数。

Page 15: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

(t) ( )

( )

1 1

2

( ) 2

21 1

E

Q(p|p ) [ ( | ) | , ]

[ ( | , )]

( )1( | , )[log ]

2

t

C

m nt

j ij

j i

m ni jt

ij

j i

E L Z X

Logp E Y X

XE Y X C

步：

EM算法如下：

Page 16: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

( )

( ) ( )

( )

1

( )

( 1) ( )

1

( )

( 1) 2 ( 1) ( ) ( 1) 21

( ) 1 1

1

( )( | , ) , 1, , .

( )

max ( | )

1, 1, , ;

1, 1, , ; ( ) ( )

t

j j it t

ij i ijmt

j j i

j

t

p

nt t

j ij

i

nt

ij i m nt t t ti

j ij i jnt j i

ij

i

p f XE Y X Y j m

p f X

M

Q

p Y j mn

Y X

j m Y Xn

Y

上式条件期望

步：

解得迭代公式

Page 17: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

p <- 0.7 #mixing parameter

n <- 200 #sample size

mu <- c(3, 10) #parameters of the normal densities

sigma0 <- c(1, 1)

# generate the mixed normal distribution sample

i <- sample(1:2, size=n, replace=TRUE, prob=c(p, 1-p))

x <- rnorm(n, mu[i], sigma0[i])

# EM alogrithm

y <- c(1:n)

k <- 1

for(j in 1:5) {

for(i in 1:n) {

y[i] <- ( p * dnorm(x[i], mu1, sigma) ) /

( p * dnorm(x[i], mu1, sigma) + ( 1 - p ) * dnorm(x[i], mu2, sigma) )

}

p <- sum( y ) / n

mu1 <- sum(y * x) / sum(y)

mu2 <- sum((1 - y) * x) / sum(1-y)

sigma <- ( sum(y*(x-mu1)^2) + sum((1-y)*(x-mu2)^2) ) / n

print(cbind(p,mu1,mu2,sigma))

}

Page 18: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

p <- 0.6 #intialize of parameter

mu1 <- 5

mu2 <- 7

sigma <- 1

p mu1 mu2 sigma

[1,] 0.7066418 2.919782 9.94336 0.9731005

p mu1 mu2 sigma

[1,] 0.71 2.92178 10.0198 0.8256728

p mu1 mu2 sigma

[1,] 0.71 2.92178 10.0198 0.8256686

p mu1 mu2 sigma

[1,] 0.71 2.92178 10.0198 0.8256686

p mu1 mu2 sigma

[1,] 0.71 2.92178 10.0198 0.8256686

p <- 0.6 #intialize of parameter

mu1 <- 2

mu2 <- 5

sigma <- 1

p mu1 mu2 sigma

[1,] 0.491604 2.627146 7.500634 4.755987

p mu1 mu2 sigma

[1,] 0.4950997 4.027033 6.161662 9.552981

p mu1 mu2 sigma

[1,] 0.4950526 4.978672 5.228472 10.67643

p mu1 mu2 sigma

[1,] 0.495052 5.092976 5.116407 10.6919

p mu1 mu2 sigma

[1,] 0.495052 5.103701 5.105893 10.69203

Page 19: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

二.EM算法及其收敛性

设Z=(X,Y),其中X为观测数据，Y为缺失数据，Z为完全数据。

( | ) log ( ; )

max ( | )

L X f X

L X

对数似然函数为：

，

可能不易求解，其中为的值域。

EM算法思路如下：

( )

( ) ( )) [ ( | ) | , ]

t

t t

C

E

E L Z X

步：在假定的条件下，计算对数完全似然的条件

期望，即：

Q( |

Page 20: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

( ) 1)

( 1) ( ) ( )

( 1) ( ) ( ) ( )

),

( | ) ( | ), .

( | ) ( | )

t t

t t t

t t t t

M

Q Q

Q Q

GEM

（步：通过最大化Q( | 求出。即

注：若只要求，则称为

算法。

( 1) ( )( | ) ( | ), 0,1,2,t tL X L X t 定理：

Page 21: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

dYXYfXYLogfdYXYfXYLogf

Jensen

dYXYfXYLogfXLQ

XYf

XYLogfXLZL

ttt

tt

t

C

),|(),|(),|(),|(

不等式，有由

),|(),|()|()|(

求期望，得),|(两边对

),|()|()|(证明：

)()()(

)()(

)(

)|()|(于是有

),|(),|()|()|(

),|(),|()|()|(又

)()1(

)()()()()(

)()1()1()()1(

XLXL

dYXYfXYLogfXLQ

dYXYfXYLogfXLQ

tt

ttttt

ttttt

Page 22: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

定义规则条件如下：

0

( 1)

0

0 0

( )

1). ;

2). { | ( | ) ( | )}

{ | ( | ) };

3). ( | )

( | )4). | 0.t

d

t

R

L X L X

L X

L X

Q

为紧集，

在上关于连续，在内部关于可导；

问题： ( ) ˆt

MLE 是否成立？

Page 23: n! 2 1 xxxx 1 2 3 4math.nju.edu.cn/uploads/ckeditor/attachments/106/_数学...p < -0.6 #intialize of parameter mu1 < -5 mu2 < -7 sigma < -1 p mu1 mu2 sigma [1,] 0.7066418 2.919782

( )

) ( )

( | )

{ } ( | )

t

t t

Q

L X

（

在上述规则条件下，若关于，

为连续函数，则的任意收敛点均为

的驻点。证明：参见Wu(1983).

2 21 22 21 2

0

( ) ( )

2 2

1 2

0 0 2 2

1). ( | )

1 1

2 2

{ | ( | ) ( | )}

X X

L X

L X L X

注：的驻点可能为局部极大点或鞍点。关于收敛到局部

极大点的例子参见后面的例题；若收敛到鞍点，只要将初值进行

变动，就可以避开鞍点。

2).规则条件中2）可能不一定满足，例如带参数的混合分布：

L( |X)=log[p e +(1-p) e ],

对任意，对，无界。

（令 =( 1 2 1 2 1 1 2 2, , , ,p)中 =X, 趋于0，则，可以任意大）

定理：

Top Related

MU-2 Comments...1986, and since as a self employed aircraft broker with a specialty in the MU2. I have heard over about 40 ... characteristics of the MU2 is the difference in control

MU-2 Comments...1986, and since as a self employed aircraft broker with a specialty in the MU2. I have heard over about 40 ... characteristics of the MU2 is the difference in control

L&T Authorised Stockist MAAMIDI ENTERPRISES . Agri Price List 1st Jan 2016.pdf · MB1 MB2 ML1.5 ML2 MK1 MU1 MU2 MB1 MB2 ML1.5 ML2 MK1 MK1 Starter Type Composition for FASD Starters

L&T Authorised Stockist MAAMIDI ENTERPRISES . Agri Price List 1st Jan 2016.pdf · MB1 MB2 ML1.5 ML2 MK1 MU1 MU2 MB1 MB2 ML1.5 ML2 MK1 MK1 Starter Type Composition for FASD Starters

501-700-MU1 CABINA CAS REV105 - sibelmed.com · 501-700-MU1 1 MANUAL DE USO 501-750-001 1 INTERCONEXIONES AUDIOMETRO CABINA (6) ... recomendaciones de este Manual de Uso. La serie

501-700-MU1 CABINA CAS REV105 - sibelmed.com · 501-700-MU1 1 MANUAL DE USO 501-750-001 1 INTERCONEXIONES AUDIOMETRO CABINA (6) ... recomendaciones de este Manual de Uso. La serie

SFAR for MU2.pdf

SFAR for MU2.pdf

511-B00-MU2 GENERAL Rev 1.02 - gimaitaly.com1 DATOSPIR TOUCH User’ s Manual 4 511-B00-MU2 • REV. 1.02 The DATOSPIR TOUCH Spirometer has been designed by the R+D+I Department of

511-B00-MU2 GENERAL Rev 1.02 - gimaitaly.com1 DATOSPIR TOUCH User’ s Manual 4 511-B00-MU2 • REV. 1.02 The DATOSPIR TOUCH Spirometer has been designed by the R+D+I Department of

agri range pricelist 301215 - AAYUSHI ENTERPRISES · MB1 MB2 ML1.5 ML2 MK1 MK1 Starter Type Composition for FASD (Fully Automatic Star-Delta) Starters 6-10 9-14 MK1 MU2 MK1 MU2 MK1

agri range pricelist 301215 - AAYUSHI ENTERPRISES · MB1 MB2 ML1.5 ML2 MK1 MK1 Starter Type Composition for FASD (Fully Automatic Star-Delta) Starters 6-10 9-14 MK1 MU2 MK1 MU2 MK1

Sigma Sigma Sigma Golden Glimpses

Sigma Sigma Sigma Golden Glimpses

Agri Price List 070717 ( Old New) - Larsen & Toubro · 2017-07-17 · MU1 MB1 ML1.5 MU2 MB2 ML2 6 - 10 12.5 15 20 25 30 35 40 50 75 MU-G15 MU-G20 MU-G20H MU-G30 MU-G50 MU-G75 MU-G40

Agri Price List 070717 ( Old New) - Larsen & Toubro · 2017-07-17 · MU1 MB1 ML1.5 MU2 MB2 ML2 6 - 10 12.5 15 20 25 30 35 40 50 75 MU-G15 MU-G20 MU-G20H MU-G30 MU-G50 MU-G75 MU-G40