updating with incomplete observations (uai-2003)

20
Updating with incomplete observations Updating with incomplete observations (UAI-2003) (UAI-2003) Marco Marco Zaffalon Zaffalon Dalle Molle” Institute for Dalle Molle” Institute for Artificial Intelligence Artificial Intelligence SWITZERLAND SWITZERLAND http://www.idsia.ch/~zaffalon http://www.idsia.ch/~zaffalon [email protected] [email protected] Gert de Gert de Cooman Cooman SYSTeMS research group SYSTeMS research group BELGIUM BELGIUM http://ippserv.ugent.be/~gert http://ippserv.ugent.be/~gert [email protected] [email protected] IDSI A

Upload: arissa

Post on 05-Jan-2016

18 views

Category:

Documents


2 download

DESCRIPTION

IDSIA. Updating with incomplete observations (UAI-2003). Gert de Cooman. Marco Zaffalon. “Dalle Molle” Institute for Artificial Intelligence SWITZERLAND http://www.idsia.ch/~zaffalon [email protected]. SYSTeMS research group BELGIUM http://ippserv.ugent.be/~gert [email protected]. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Updating with incomplete observations (UAI-2003)

Updating with incomplete observationsUpdating with incomplete observations(UAI-2003)(UAI-2003)

Marco Marco ZaffalonZaffalon

““Dalle Molle” Institute for Artificial Dalle Molle” Institute for Artificial IntelligenceIntelligence

SWITZERLANDSWITZERLANDhttp://www.idsia.ch/~zaffalonhttp://www.idsia.ch/~zaffalon

[email protected]@idsia.ch

Gert de Gert de CoomanCooman

SYSTeMS research groupSYSTeMS research groupBELGIUMBELGIUM

http://ippserv.ugent.be/~gerthttp://ippserv.ugent.be/[email protected]@ugent.be

IDSIA

Page 2: Updating with incomplete observations (UAI-2003)

22

What are incomplete What are incomplete observations?observations?

A simple exampleA simple example C (class) and A (attribute) are Boolean random C (class) and A (attribute) are Boolean random

variablesvariables C = 1 is the presence of a diseaseC = 1 is the presence of a disease A = 1 is the positive result of a medical testA = 1 is the positive result of a medical test

Let us do diagnosisLet us do diagnosis Good point: you know that Good point: you know that

p(C = 0, A = 0) = 0.99p(C = 0, A = 0) = 0.99 p(C = 1, A = 1) = 0.01p(C = 1, A = 1) = 0.01 Whence p(C = 0 | A = a) allows you to make a sure diagnosisWhence p(C = 0 | A = a) allows you to make a sure diagnosis

Bad point: the test result can be missingBad point: the test result can be missing This is an incomplete, or This is an incomplete, or set-valuedset-valued, observation {0,1} for A, observation {0,1} for A

What is p(C = 0 | A is missing)?What is p(C = 0 | A is missing)?

Page 3: Updating with incomplete observations (UAI-2003)

33

Example ctdExample ctd

Kolmogorov’s Kolmogorov’s definitiondefinition of conditional probability of conditional probability seemsseems to say to say p(C = 0 | A p(C = 0 | A {0,1}) = p(C = 0) = 0.99 {0,1}) = p(C = 0) = 0.99 i.e., with high probability the patient is healthyi.e., with high probability the patient is healthy

Is this right?Is this right? In general, In general, it is notit is not Why?Why?

Page 4: Updating with incomplete observations (UAI-2003)

44

Why?Why?

Because A can be Because A can be selectivelyselectively reported reported e.g., the medical test machine is broken;e.g., the medical test machine is broken;

it produces an output it produces an output the test is negative (A = 0) the test is negative (A = 0) In this case p(C = 0 | A is missing) = p(C = 0 | A = 1) = 0In this case p(C = 0 | A is missing) = p(C = 0 | A = 1) = 0 The patient is definitely ill!The patient is definitely ill! Compare this with the former naive application ofCompare this with the former naive application of

Kolmogorov’s updating (or Kolmogorov’s updating (or naive updatingnaive updating, for short), for short)

Page 5: Updating with incomplete observations (UAI-2003)

55

Modeling it the right wayModeling it the right way

Observations-generating modelObservations-generating model

o is a generic value for O, another random variableo is a generic value for O, another random variable o can be 0, 1, or * (i.e., missing value for A)o can be 0, 1, or * (i.e., missing value for A)

IM = p(O | C,A) should not be neglected!IM = p(O | C,A) should not be neglected!

The correct The correct overalloverall model we need is p(C,A)p(O | model we need is p(C,A)p(O | C,A)C,A)

p(C,A)(c,a)

Distribution generating pairs for (C,A)

Complete pair (not observed)

IM

Incompleteness Mechanism (IM)

o

Actual observation (o) about A

Page 6: Updating with incomplete observations (UAI-2003)

66

What about Bayesian nets (BNs)?What about Bayesian nets (BNs)?

Asia netAsia net

Let us predict C on the basis of the observation (L,S,T) = (y,y,n)Let us predict C on the basis of the observation (L,S,T) = (y,y,n)

BN BN updatingupdating instructs us to use p(C | L = y,S = y,T = n) to instructs us to use p(C | L = y,S = y,T = n) to predict Cpredict C

(T)uberculosis = n

(V)isit to Asia (S)moking = y

Lung (C)ancer? Bronc(H)itis

Abnorma(L) X-rays = y

(D)yspnea

Page 7: Updating with incomplete observations (UAI-2003)

77

Asia ctdAsia ctd

Should we really use p(C | L = y,S = y,T = n) to predict Should we really use p(C | L = y,S = y,T = n) to predict C?C?

(V,H,D) is missing(V,H,D) is missing

(L,S,T,V,H,D) = (y,y,n,*,*,*) (L,S,T,V,H,D) = (y,y,n,*,*,*) is an incomplete is an incomplete observationobservation

p(C | L = y,S = y,T = n) is just the naive updatingp(C | L = y,S = y,T = n) is just the naive updating By using the naive updating, we are neglecting the IM!By using the naive updating, we are neglecting the IM!

Wrong inference in generalWrong inference in general

Page 8: Updating with incomplete observations (UAI-2003)

88

New problem?New problem?

Problems with naive updating were already clear Problems with naive updating were already clear since 1985 at least (Shafer)since 1985 at least (Shafer)

Practical consequences were not so clearPractical consequences were not so clear How often does naive updating make problems?How often does naive updating make problems? Perhaps it is not a problem in practice?Perhaps it is not a problem in practice?

Page 9: Updating with incomplete observations (UAI-2003)

99

Grünwald & Halpern (UAI-2002) Grünwald & Halpern (UAI-2002) on naive updatingon naive updating

Three points made stronglyThree points made strongly1)1) naive updating works naive updating works CAR holds CAR holds

i.e., neglecting the IM is correct i.e., neglecting the IM is correct CAR holds CAR holds With missing data:With missing data:

CAR (coarsening at random) = MAR (missing at random) =CAR (coarsening at random) = MAR (missing at random) =p(A is missing | c,a) is the same for all pairs (c,a)p(A is missing | c,a) is the same for all pairs (c,a)

2)2) CAR holds rather infrequentlyCAR holds rather infrequently

3)3) The IM, p(O | C,A), can be difficult to modelThe IM, p(O | C,A), can be difficult to model

2 & 3 = serious theoretical & practical problem2 & 3 = serious theoretical & practical problem

How should we do updating given 2 & 3?How should we do updating given 2 & 3?

Page 10: Updating with incomplete observations (UAI-2003)

1010

What this paper is aboutWhat this paper is about

Have a conservative (i.e., robust) point of viewHave a conservative (i.e., robust) point of view Deliberately worst case, as opposed to the MAR best Deliberately worst case, as opposed to the MAR best

casecase

Assume little knowledge about the IMAssume little knowledge about the IM You are not allowed to assume MARYou are not allowed to assume MAR You are not able/willing to model the IM explicitlyYou are not able/willing to model the IM explicitly

Derive an updating rule for this important caseDerive an updating rule for this important case Conservative updating ruleConservative updating rule

Page 11: Updating with incomplete observations (UAI-2003)

1111

11stst step: plug ignorance into your step: plug ignorance into your modelmodel

Fact: the IM is unknownFact: the IM is unknown p(Op(O{0,1,*} | C,A) = 1{0,1,*} | C,A) = 1

a constraint on p(O | C,A) a constraint on p(O | C,A) i.e. any distribution i.e. any distribution

p(O | C,A) is possiblep(O | C,A) is possible This is too conservative;This is too conservative;

to draw useful conclusionsto draw useful conclusionswe need a little less ignorancewe need a little less ignorance

Consider the set of all p(O | C,A) s.t. p(O | C,A) = p(O | Consider the set of all p(O | C,A) s.t. p(O | C,A) = p(O | A)A) i.e., all the IMs which do i.e., all the IMs which do notnot depend on what you want to predict depend on what you want to predict

Use this set of IMs jointly with prior information p(C,A)Use this set of IMs jointly with prior information p(C,A)

p(C,A)(c,a)

Known prior distribution

Complete pair (not observed)

IM

Unknown Incompleteness Mechanism

o

Actual observation (o) about A

Page 12: Updating with incomplete observations (UAI-2003)

1212

22ndnd step: derive the conservative step: derive the conservative updatingupdating

Let E = evidence = observed variables, in state eLet E = evidence = observed variables, in state e Let R = remaining unobserved variables (except C)Let R = remaining unobserved variables (except C)

Formal derivation yields:Formal derivation yields:1)1) All the values for R should be consideredAll the values for R should be considered2)2) In particular, updating becomes:In particular, updating becomes:

Conservative Updating RuleConservative Updating Rule (CUR)(CUR)

minminrrRR p(c | E = e,R = r)p(c | E = e,R = r) p(c | o) p(c | o) max maxrrRR p(c | E = p(c | E = e,R = r)e,R = r)

Page 13: Updating with incomplete observations (UAI-2003)

1313

Evidence: (L,S,T) = (y,y,n) Evidence: (L,S,T) = (y,y,n)

What is your posterior What is your posterior confidence on C = y?confidence on C = y?

Consider all the jointConsider all the jointvalues of nodes in Rvalues of nodes in RTake min & max of p(C = y | L = y,S = y,T = n,v,h,d) Take min & max of p(C = y | L = y,S = y,T = n,v,h,d)

Posterior confidence Posterior confidence [0.42,0.71] [0.42,0.71]

Computational note: Computational note: only Markov blanket mattersonly Markov blanket matters!!

CUR & Bayesian netsCUR & Bayesian nets

(T)uberculosis = n

(V)isit to Asia (S)moking = y

Lung (C)ancer? Bronc(H)itis

Abnorma(L) X-rays = y

(D)yspnea

Page 14: Updating with incomplete observations (UAI-2003)

1414

A few remarksA few remarks

The CUR…The CUR… is based is based onlyonly on p(C,A), like the naive updating on p(C,A), like the naive updating produces lower & upper probabilitiesproduces lower & upper probabilities can produce indecisioncan produce indecision

Page 15: Updating with incomplete observations (UAI-2003)

1515

CUR & decision-makingCUR & decision-making

DecisionsDecisions c’ c’ dominatesdominates c’’ (c’,c’’ c’’ (c’,c’’ CC) if ) if for all rfor all r RR , ,

p(c’ | E = e, R = r) > p(c’’ | E = e, R = r)p(c’ | E = e, R = r) > p(c’’ | E = e, R = r)

Indecision?Indecision? It may happen that It may happen that r’,r’’r’,r’’ RR so that: so that:

p(c’ | E = e, R = r’) > p(c’’ | E = e, R = r’)p(c’ | E = e, R = r’) > p(c’’ | E = e, R = r’)andand

p(c’ | E = e, R = r’’) < p(c’’ | E = e, R = r’’)p(c’ | E = e, R = r’’) < p(c’’ | E = e, R = r’’)

There is no evidence that you should prefer c’ to c’’ and vice There is no evidence that you should prefer c’ to c’’ and vice versaversa

(= keep both)(= keep both)

Page 16: Updating with incomplete observations (UAI-2003)

1616

Decision-making exampleDecision-making example

Evidence: Evidence: E = (L,S,T) = (y,y,E = (L,S,T) = (y,y,nn) = e) = e

What is your What is your diagnosisdiagnosis for C? for C? p(C = y | E = e, H = n, D = y) > p(C = n | E = e, H = n, D = y) p(C = y | E = e, H = n, D = y) > p(C = n | E = e, H = n, D = y) p(C = y | E = e, H = n, D = n) < p(C = n | E = e, H = n, D = n) p(C = y | E = e, H = n, D = n) < p(C = n | E = e, H = n, D = n) Both C = y and C = n are plausibleBoth C = y and C = n are plausible

Evidence:Evidence:E = (L,S,T) = (y,y,E = (L,S,T) = (y,y,yy) = e) = e

C = n C = n dominatesdominates C = y: “cancer” is ruled out C = y: “cancer” is ruled out

(T)uberculosis

(V)isit to Asia (S)moking = y

Lung (C)ancer? Bronc(H)itis

Abnorma(L) X-rays = y

(D)yspnea

Page 17: Updating with incomplete observations (UAI-2003)

1717

Algorithmic factsAlgorithmic facts

CUR CUR restrict attention to Markov blanket restrict attention to Markov blanket State enumeration still prohibitive in some casesState enumeration still prohibitive in some cases

e.g., naive Bayese.g., naive Bayes

Dominance test based on dynamic programmingDominance test based on dynamic programming Linear in the number of children of class node CLinear in the number of children of class node C

However:However:decision-making possible in decision-making possible in linear timelinear time, ,

by provided algorithm, even on some multiply by provided algorithm, even on some multiply connected nets!connected nets!

Page 18: Updating with incomplete observations (UAI-2003)

1818

On the application sideOn the application side

Important characteristics of present approachImportant characteristics of present approach Robust approach, easy to implementRobust approach, easy to implement Does not require changes in pre-existing BN knowledge basesDoes not require changes in pre-existing BN knowledge bases

based on p(C,A) only!based on p(C,A) only! Markov blanket Markov blanket favors low computational complexity favors low computational complexity If you can write down the IM explicitly, your If you can write down the IM explicitly, your

decisions/inferences will be contained in oursdecisions/inferences will be contained in ours By-product for large networksBy-product for large networks

Even when naive updating is OK, CUR can serve as a useful Even when naive updating is OK, CUR can serve as a useful preprocessing phasepreprocessing phase

Restricting attention to Markov blanket may produce strong enough Restricting attention to Markov blanket may produce strong enough inferences and decisionsinferences and decisions

Page 19: Updating with incomplete observations (UAI-2003)

1919

What we did in the paperWhat we did in the paper

Theory of Theory of coherent lower previsionscoherent lower previsions ( (imprecise imprecise probabilitiesprobabilities)) CoherenceCoherence

Equivalent to a large extent to sets of probability Equivalent to a large extent to sets of probability distributionsdistributions

Weaker assumptionsWeaker assumptions

CUR derived in quite a general frameworkCUR derived in quite a general framework

Page 20: Updating with incomplete observations (UAI-2003)

2020

Concluding notesConcluding notes

There are cases when:There are cases when: IM is unknown/difficult to modelIM is unknown/difficult to model MAR does not holdMAR does not hold

Serious theoretical and practical problemSerious theoretical and practical problem

CUR appliesCUR applies Robust to the unknown IMRobust to the unknown IM Computationally easy decision-making with BNsComputationally easy decision-making with BNs

CUR works with credal nets, tooCUR works with credal nets, too Same complexitySame complexity

Future: how to make stronger inferences and decisionsFuture: how to make stronger inferences and decisions Hybrid MAR/non-MAR modeling?Hybrid MAR/non-MAR modeling?