d p h v p i j 16 / 2 2021) 15, h c lecture-22(mar c b a p b a...

3
BITS F464: Machine Learning 22 Graphical Model: Bayesian Belief Networks Dr. Kamlesh Tiwari Assistant Professor, Department of CSIS, BITS Pilani, Pilani Campus, Rajasthan-333031 INDIA March 15, 2021 ONLINE (Campus @ BITS-Pilani Jan-May 2021) http://ktiwari.in/ml Bayesian Learning Addresses most probable classication of new instance instead of best hypothesis for data. Searching a possibility to do better then MAP Bayes optimal classication: argmax v j V h i H P(v j |h i )P(h i |D) Outperforms on an average but, quite costly to apply GIBBS Algorithm: Choose a hypothesis h H at random, according to the posterior probability distribution over H. (Expected misclassication error is bounded to the twice of the Bayes optimal classier) Naive Bayes Classier: assumes independence given the target value argmax v j V P(a 1 , a 2 , ..., a n |v j )P(v j ) argmax v j V P(v j )Π i P(a i |v j ) Highly practical method Machine Learning (BITS F464) M W F (10-11AM) online@BITS-Pilani Lecture-22(March 15, 2021) 2 / 16 Probability P(x , y )= P(x ) × P(y |x ) Independence of x and y implies P(y |x )= P(y ) Then P(x , y )= P(x ) × P(y ) Bayes Rule P(x |y )= P(x , y ) P(y ) = P(y |x ) × P(x ) P(y ) Marginal: distribution of a single variable x can be obtained from a given joint distribution p(x , y ) by p(x )= y p(x , y ) The process of computing a marginal from a joint distribution is called marginalisation. p(x 1 , ..., x i 1 , x i +1 , ..., x n )= x i p(x 1 , x 2 , ..., x n ) Machine Learning (BITS F464) M W F (10-11AM) online@BITS-Pilani Lecture-22(March 15, 2021) 3 / 16 Marginalisation Conditional Independence when two variable are independent of each other, provided that we know state of some other variable P(x , y |z )= P(x |z ) × P(y |z ) Consider: C as soft XOR A B p(C=1|A, B) 0 0 0.10 0 1 0.99 1 0 0.80 1 1 0.25 If p(A=1)= 0.65, p(B=1)= 0.77 Determine p(A=1|C=0) p(A=1, C=0)= B p(A = 1, B, C = 0) = B p(C = 0|A = 1, B)p(A = 1)p(B) = p(C = 0|A = 1, B = 0)p(A = 1)p(B = 0) +p(C = 0|A = 1, B = 1)p(A = 1)p(B = 1) = 0.2 × 0.65 × 0.23 + 0.75 × 0.65 × 0.77 = 0.405 Similarly p(A=0, C=0)= 0.075 p(A=1|C=0)= p(A=1,C=0) p(C=0) = p(A=1,C=0) p(A=1,C=0)+p(A=0,C=0) = 0.843 Machine Learning (BITS F464) M W F (10-11AM) online@BITS-Pilani Lecture-22(March 15, 2021) 4 / 16 Let’s see this Statement: J p(J |R) × f (R) = f (R) Proof: J p(J |R) × f (R) = J p(J , R) p(R) × f (R) = J p(J , R) × f (R) p(R) = f (R) × J p(J , R) p(R) = f (R) × p(R) p(R) = f (R) Machine Learning (BITS F464) M W F (10-11AM) online@BITS-Pilani Lecture-22(March 15, 2021) 5 / 16 Belief network (a graphical model) Belief network uses graphs to represent independence among the variables in probabilistic model Independently specifying all the attributed is overkill With distribution of n attributes, marginal for one takes O(2 n1 ) By constraining variable interaction (specifying independence) one can get the form like p(x 1 , x 2 , ..., x 100 )= Π 99 i =1 φ(x i , x i +1 ) Belief networks are a convenient framework for representing such independence assumptions Belief networks are also called as Bayes’ Networks or Bayesian Belief Networks Machine Learning (BITS F464) M W F (10-11AM) online@BITS-Pilani Lecture-22(March 15, 2021) 6 / 16

Upload: others

Post on 11-Apr-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: D P h v P i j 16 / 2 2021) 15, h c Lecture-22(Mar C B A p B A ...ktiwari.in/ml/ml2021-handout-22.pdfvidence e tain Uncer vidence e tain uncer or Soft let , dom ( x = ) { red, lue b,

BIT

SF4

64:M

achi

neLe

arni

ng

22Gra

phical

Model

:Baye

sianBelie

fNetwork

sD

r.K

amle

shTi

war

iA

ssis

tant

Pro

fess

or,D

epar

tmen

tofC

SIS

,B

ITS

Pila

ni,P

ilani

Cam

pus,

Raj

asth

an-3

3303

1IN

DIA

Mar

ch15

,202

1O

NLI

NE

(Cam

pus

@B

ITS

-Pila

niJa

n-M

ay20

21)

http://ktiwari.in/ml

Bay

esia

nLe

arni

ng

Add

ress

esm

ostp

roba

ble

clas

sific

atio

nof

new

inst

ance

inst

ead

ofbe

sthy

poth

esis

ford

ata.

Sea

rchi

nga

poss

ibili

tyto

dobe

ttert

hen

MA

PB

ayes

optim

alcl

assi

ficat

ion:

argmax

v j∈V

�h i∈H

P(v

j|hi)

P(h

i|D)

Out

perfo

rms

onan

aver

age

but,

quite

cost

lyto

appl

yG

IBB

SA

lgor

ithm

:C

hoos

ea

hypo

thes

ish∈

Hat

rand

om,

acco

rdin

gto

the

post

erio

rpro

babi

lity

dist

ribut

ion

over

H.

(Exp

ecte

dm

iscl

assi

ficat

ion

erro

ris

boun

ded

toth

etw

ice

ofth

eB

ayes

optim

alcl

assi

fier)

Nai

veB

ayes

Cla

ssifi

er:

assu

mes

inde

pend

ence

give

nth

eta

rget

valu

eargmax

v j∈V

P(a

1,a 2

,...,a

n|v

j)P(v

j)

argmax

v j∈V

P(v

j)Π

iP(a

i|vj)

Hig

hly

prac

tical

met

hod

Mac

hine

Lear

ning

(BIT

SF4

64)

MW

F(1

0-11

AM

)onl

ine@

BIT

S-P

ilani

Lect

ure-

22(M

arch

15,2

021)

2/1

6

Pro

babi

lity

P(x,y

)=

P(x)×

P(y|x)

Inde

pend

ence

ofx

and

yim

plie

sP(y|x)=

P(y)

Then

P(x,y

)=

P(x)×

P(y)

Bay

esR

ule

P(x|y)=

P(x,y

)

P(y)

=P(y|x)×

P(x)

P(y)

Mar

gina

l:di

strib

utio

nof

asi

ngle

varia

ble

xca

nbe

obta

ined

from

agi

ven

join

tdis

trib

utio

np(

x,y)

by

p(x)

=� y

p(x,

y)

The

proc

ess

ofco

mpu

ting

am

argi

nalf

rom

ajo

intd

istr

ibut

ion

isca

lled

mar

gina

lisat

ion.

p(x 1,...,x

i−1,

x i+

1,...,

x n)=

� x i

p(x 1,x

2,...,

x n)

Mac

hine

Lear

ning

(BIT

SF4

64)

MW

F(1

0-11

AM

)onl

ine@

BIT

S-P

ilani

Lect

ure-

22(M

arch

15,2

021)

3/1

6

Mar

gina

lisat

ion

Con

ditio

nalI

ndep

ende

nce

whe

ntw

ova

riabl

ear

ein

depe

nden

tof

each

othe

r,pr

ovid

edth

atw

ekn

owst

ate

ofso

me

othe

rvar

iabl

e

P(x,y

|z)=

P(x|z)×

P(y|z)

Con

side

r:C

asso

ftX

OR

AB

p(C

=1|A,B

)

00

0.10

01

0.99

10

0.80

11

0.25

Ifp(

A=1

)=

0.65

,p(B

=1)=

0.77

Det

erm

ine

p(A

=1|C

=0)

p(A

=1,

C=0

)=

�B

p(A

=1,

B,

C=

0)

=� B

p(C

=0|

A=

1,B)p(A

=1)

p(B)

=p(

C=

0|A

=1,

B=

0)p(

A=

1)p(

B=

0)

+p(

C=

0|A

=1,

B=

1)p(

A=

1)p(

B=

1)

=0.

0.65

×0.

23+

0.75

×0.

65×

0.77

=0.

405

Sim

ilarly

p(A

=0,

C=0

)=

0.07

5

p(A

=1|C

=0)=

p(A=

1,C=

0)p(

C=

0)=

p(A=

1,C=

0)p(

A=

1,C=

0)+

p(A=

0,C=

0)

=0.

843

Mac

hine

Lear

ning

(BIT

SF4

64)

MW

F(1

0-11

AM

)onl

ine@

BIT

S-P

ilani

Lect

ure-

22(M

arch

15,2

021)

4/1

6

Let’s

see

this

Sta

tem

ent:

� J

� p(J|

R)×

f(R)� =

f(R)

Pro

of:

� J

� p(J|

R)×

f(R)�

=� J

� p(J,R

)

p(R)

×f(

R)�

=

�J� p(

J,R)×

f(R)�

p(R)

=f(

R)×

�J

p(J,

R)

p(R)

=f(

R)×

p(R)

p(R)

=f(

R)

Mac

hine

Lear

ning

(BIT

SF4

64)

MW

F(1

0-11

AM

)onl

ine@

BIT

S-P

ilani

Lect

ure-

22(M

arch

15,2

021)

5/1

6

Bel

iefn

etw

ork

(agr

aphi

calm

odel

)

Bel

iefn

etw

ork

uses

grap

hsto

repr

esen

tind

epen

denc

eam

ong

the

varia

bles

inpr

obab

ilist

icm

odel

Inde

pend

ently

spec

ifyin

gal

lthe

attr

ibut

edis

over

kill

With

dist

ribut

ion

ofn

attr

ibut

es,m

argi

nalf

oron

eta

kes

O(2

n−1 )

By

cons

train

ing

varia

ble

inte

ract

ion

(spe

cify

ing

inde

pend

ence

)on

eca

nge

tthe

form

like

p(x 1,x

2,...,

x 100)=

Π99 i=

1φ(x

i,x i+

1)

Bel

iefn

etw

orks

are

aco

nven

ient

fram

ewor

kfo

rrep

rese

ntin

gsu

chin

depe

nden

ceas

sum

ptio

nsB

elie

fnet

wor

ksar

eal

soca

lled

asB

ayes

’Net

wor

ksor

Bay

esia

nB

elie

fNet

wor

ks

Mac

hine

Lear

ning

(BIT

SF4

64)

MW

F(1

0-11

AM

)onl

ine@

BIT

S-P

ilani

Lect

ure-

22(M

arch

15,2

021)

6/1

6

Page 2: D P h v P i j 16 / 2 2021) 15, h c Lecture-22(Mar C B A p B A ...ktiwari.in/ml/ml2021-handout-22.pdfvidence e tain Uncer vidence e tain uncer or Soft let , dom ( x = ) { red, lue b,

Mod

elin

gIn

depe

nden

cies

One

mor

ning

Trac

eyle

aves

herh

ouse

and

real

ities

that

herg

rass

isw

et.

Isit

due

toov

erni

ghtr

ain

ordi

dsh

efo

rget

totu

rnof

fthe

sprin

kler

last

nigh

t?N

exts

heno

tices

that

the

gras

sof

hern

eigh

bor,

Jack

,is

also

wet

.

(R=1

)→ra

inla

stni

ght,

(S=1

)→sp

rinkl

eron

last

nigh

t,(J

=1)→

Jack

’sgr

ass

isw

et,

(T=1

)→Tr

acey

a’s

Gra

ssis

wet

Mod

elof

Trac

eya’

sw

orld

invo

lves

prob

abili

tydi

strib

utio

non

T,J

,R,S

that

has

24=

16st

ates

Mac

hine

Lear

ning

(BIT

SF4

64)

MW

F(1

0-11

AM

)onl

ine@

BIT

S-P

ilani

Lect

ure-

22(M

arch

15,2

021)

7/1

6

Mod

elin

gIn

depe

nden

cies

(con

td..)

How

ever

we

know

p(T,J

,R,S

)=

p(T|J,R

,S)p(J,R

,S)

=p(

T|J,R

,S)p(J|R

,S)p(R

,S)

=p(

T|J,R

,S)p(J|R

,S)p(R

|S)p(S

)

Com

puta

tion

ofp(

T|J,R

,S)

requ

ires

usto

spec

ify23

=8

valu

esW

ithp(

T=

1|J,

R,S

),on

eca

nus

eno

rmal

izat

ion

toco

mpu

tep(

T=

0|J,

R,S

)as

1−

p(T

=1|

J,R,S

)

Com

puta

tion

ofot

herf

acto

rsw

ould

also

need

4+2+

1va

lues

Tota

lwe

need

8+4+

2+1=

15va

lues

Mac

hine

Lear

ning

(BIT

SF4

64)

MW

F(1

0-11

AM

)onl

ine@

BIT

S-P

ilani

Lect

ure-

22(M

arch

15,2

021)

8/1

6

Con

ditio

nalI

ndep

ende

nce

We

may

assu

me

that

Trac

eya’

sgr

ass

isw

etde

pend

son

lydi

rect

lyon

whe

ther

orno

tith

asbe

enra

inin

gan

dw

heth

eror

noth

ersp

rinkl

erw

ason

sop(

T|J,R

,S)=

p(T|R

,S)

Ass

ume

that

Jack

’sgr

ass

isw

etis

influ

ence

don

lydi

rect

lyby

whe

ther

orno

tith

asbe

enra

inin

gp(

J|R,S

)=

p(J|

R)

Furt

herm

ore,

we

assu

me

the

rain

isno

tdire

ctly

influ

ence

dby

the

sprin

kler

p(R|S)=

p(R)

Ther

efor

e,ou

rmod

elbe

com

es

p(T,J

,R,S

)=

p(T|J,R

,S)p(J|R

,S)p(R

|S)p(S

)

=p(

T|R

,S)p(J|R

)p(R

)p(S

)

Num

bero

fval

ues

we

need

tosp

ecify

is4+

2+1+

1=8

We

can

repr

esen

tthe

seco

nditi

onal

inde

pend

ence

as

Mac

hine

Lear

ning

(BIT

SF4

64)

MW

F(1

0-11

AM

)onl

ine@

BIT

S-P

ilani

Lect

ure-

22(M

arch

15,2

021)

9/1

6

Bel

iefn

etw

ork

Ans

wer

s“h

owto

repr

esen

tthe

seco

nditi

onal

inde

pend

ence

?”B

elie

fnet

wor

kis

adi

strib

utio

nof

the

form

p(x 1,x

2,...,

x n)=

Πn i=

1p(x

i|pa(

x i))

whe

repa

(xi)

repr

esen

tthe

pare

ntal

varia

bles

ofva

riabl

ex i

Rep

rese

nted

asa

dire

cted

grap

h,w

ithan

arro

wpo

intin

gfro

ma

pare

ntva

riabl

eto

child

varia

ble,

abe

liefn

etw

ork

corr

espo

nds

toa

Dire

cted

Acy

clic

Gra

ph(D

AG

)

Mac

hine

Lear

ning

(BIT

SF4

64)

MW

F(1

0-11

AM

)onl

ine@

BIT

S-P

ilani

Lect

ure-

22(M

arch

15,2

021)

10/1

6

Exa

mpl

eO

nem

orni

ngTr

acey

real

ises

that

herg

rass

isw

etan

dth

egr

ass

ofhe

rne

ighb

our,

Jack

,is

also

wet

.Le

tthe

prio

rpro

babi

litie

sbe

p(R

=1)=

0.2

and

p(S

=1)=

0.1.

We

setp

(J=1

|R=1

)=

1,p(

J=1|

R=0

)=

0.2,

p(T

=1|R

=1,S

=0)=

1,p(

T=1

|R=1

,S=1

)=

1,p(

T=1

|R=0

,S=1

)=

0.9,

p(T

=1|R

=0,S

=0)=

0

Usi

ngfo

llow

ing

Bel

eifN

etw

ork;

calc

ulat

e

1P

roba

bilit

yth

atsp

rinkl

erw

asO

Nov

erni

ght,

give

nth

atTr

acey

a’s

gras

sis

wet

.p(

S=1

|T=1

)=

0.33

82H

ow?

onne

xt

slid

e

2P

roba

bilit

yth

atsp

rinkl

erw

asO

Nov

erni

ght,

give

nth

atTr

acey

a’s

gras

sis

wet

and

Jack

’sgr

ass

isal

sow

et.

p(S

=1|T

=1,J

=1)

=0.

1604

Mac

hine

Lear

ning

(BIT

SF4

64)

MW

F(1

0-11

AM

)onl

ine@

BIT

S-P

ilani

Lect

ure-

22(M

arch

15,2

021)

11/1

6

Exa

mpl

e:P

roba

bilit

yof

p(S

=1|T

=1)

p(S=

1|T

=1)

=p(

S=

1,T

=1)

p(T

=1)

=

�J,

Rp(

S=

1,J,

R,T

=1)

�J,

R,S

p(T

=1,

J,R,S

)(1

)

=

�J,

Rp(

J|R)p(T

=1|

R,S

=1)

p(R)p(S

=1)

�J,

R,S

p(J|

R)p(T

=1|

R,S

)p(R

)p(S

)

=

�R

p(T

=1|

R,S

=1)

p(R)p(S

=1)

�R,S

p(T

=1|

R,S

)p(R

)p(S

)(2

)

=0.

0.8×

0.1+

0.2×

0.1

.9×.8

×.1

+1×.2

×.1

+0×.8

×.9

+1×.2

×.9

=0.

3382

Use

sgi

ven

belie

fnet

wor

kin

(1)a

ndpr

oofi

n(2

)

Mac

hine

Lear

ning

(BIT

SF4

64)

MW

F(1

0-11

AM

)onl

ine@

BIT

S-P

ilani

Lect

ure-

22(M

arch

15,2

021)

12/1

6

Page 3: D P h v P i j 16 / 2 2021) 15, h c Lecture-22(Mar C B A p B A ...ktiwari.in/ml/ml2021-handout-22.pdfvidence e tain Uncer vidence e tain uncer or Soft let , dom ( x = ) { red, lue b,

Unc

erta

inev

iden

ceS

ofto

run

cert

ain

evid

ence

,let

dom(x)=

{red

,blu

e,gr

een}

and

the

vect

ory=

(0.6,0

.1,0

.3)

repr

esen

tsth

ebe

liefi

nth

ere

spec

tive

stat

es.

Har

dev

iden

cear

elik

e(0

,0,1

).A

ssum

ptio

nis

that

p(x|

y,y)

=p(

x|y)

p(x|

y)=

�y

p(x,

y|y)

=�

yp(

x|y,

y)p(

y|y)

=�

yp(

x|y)

p(y|

y)w

here

p(y=

i|y)

repr

esen

tsth

epr

obab

ility

that

yis

inst

ate

iD

ashe

dci

rcle

isus

edto

repr

esen

tava

riabl

ein

soft-

evid

ence

stat

e

Exa

mpl

e:Le

tpro

babi

lity

offir

e,w

hen

ther

eis

afir

eal

arm

is0.

9.M

onu

said

heis

70%

confi

dent

that

heha

dhe

ard

afir

eal

arm

.W

hati

spr

obab

ility

offir

e.

p(F

=1|

A)=

�A

p(F

=1|

A)p(A

|A)=

p(F

=1|

A=

0)p(

A=

0|A)+

p(F

=1|

A=

1)p(

A=

1|A)=

0.1×

0.3+

0.9×

0.7=

0.66

Mac

hine

Lear

ning

(BIT

SF4

64)

MW

F(1

0-11

AM

)onl

ine@

BIT

S-P

ilani

Lect

ure-

22(M

arch

15,2

021)

13/1

6

Unr

elia

ble

Evi

denc

e

Unr

elia

ble:

(con

tinue

dfro

mpr

evio

usex

ampl

e)Ia

sked

Raj

uab

outt

heal

arm

.It

isbe

lieve

dth

atif

alar

mha

dso

und,

ther

eis

80%

chan

ceth

atR

aju

wou

ldte

llit

soun

d.If

alar

mha

dN

OT

soun

ded,

ther

eis

70%

chan

ceth

athe

wou

ldte

llN

OT

soun

d.U

nrel

iabl

eev

iden

ces

are

mod

eled

usin

gda

shed

lines

asbe

low

Mac

hine

Lear

ning

(BIT

SF4

64)

MW

F(1

0-11

AM

)onl

ine@

BIT

S-P

ilani

Lect

ure-

22(M

arch

15,2

021)

14/1

6

Exa

mpl

e

LetB

=1,m

eans

Hol

mes

hous

eha

sbe

enbu

rgle

d.A

=1,m

eans

alar

mw

ento

ff.W

=1,m

eans

Wat

son

hear

dal

arm

.G

=1,m

eans

Gib

bon

hear

dal

arm

.

(a)B

Nfo

rthe

envi

ronm

ent,

(b)I

fGib

bon

isa

little

defa

ndis

only

80%

sure

abou

tthe

alar

mso

und

bein

ghe

ard,

(c)R

epla

cem

ento

fevi

denc

e,(d

)Hol

mes

feel

sW

atso

n’s

obse

rvat

ion

isun

relia

ble

Mac

hine

Lear

ning

(BIT

SF4

64)

MW

F(1

0-11

AM

)onl

ine@

BIT

S-P

ilani

Lect

ure-

22(M

arch

15,2

021)

15/1

6

Than

kYo

u!

Than

kyo

uve

rym

uch

for

your

atte

ntio

n!

Que

ries

?

(Ref

eren

ce1)

1[1

]Tex

tBoo

k:ch

:1/2

/3B

ayes

ian

Rea

soni

ngan

dM

achi

neLe

arni

ng,b

yD

avid

Bar

ber

Mac

hine

Lear

ning

(BIT

SF4

64)

MW

F(1

0-11

AM

)onl

ine@

BIT

S-P

ilani

Lect

ure-

22(M

arch

15,2

021)

16/1

6