a reinforcement learning algorithm for sampling design in markov random fields

44
A reinforcement learning algorithm for sampling design in Markov random fields Mathieu BONNEAU Nathalie PEYRARD Régis SABBADIN INRA-MIA Toulouse E-Mail: {mbonneau,peyrard,sabbadin}@toulouse.inra.fr MSTGA, Bordeaux, 7 2012

Upload: aren

Post on 12-Feb-2016

43 views

Category:

Documents


0 download

DESCRIPTION

A reinforcement learning algorithm for sampling design in Markov random fields. Mathieu BONNEAU Nathalie PEYRARD Régis SABBADIN INRA-MIA Toulouse E-Mail : { mbonneau,peyrard,sabbadin }@ toulouse.inra.fr MSTGA , Bordeaux, 7 2012. Plan. Problem statement. General Approach. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

A reinforcement learning algorithm for sampling design in Markov random fields

Mathieu BONNEAU Nathalie PEYRARD Régis SABBADIN

INRA-MIA ToulouseE-Mail: {mbonneau,peyrard,sabbadin}@toulouse.inra.fr

MSTGA, Bordeaux, 7 2012

Page 2: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

1. Problem statement.

2. General Approach.

3. Formulation using dynamic model.

4. Reinforcement learning solution.

5. Experiments.

6. Conclusions.

Plan

Page 3: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

PROBLEM STATEMENT Adaptive sampling

X(7)

X(6)X(5)

X(1)

X(4)

X(3)X(2)

X(8) X(9)

• Adaptive selection of variables to observe for reconstruction of the random vector

X=(X(1),….,X(n))

• c(A) -> Cost of observing variables X(A)

• B -> Initial budget

•Observations are reliable

Page 4: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

PROBLEM STATEMENT Adaptive sampling

• Adaptive selection of variables to observe for reconstruction of the random vector X=(X(1),….,X(n))

• c(A,x(A)) -> Cost of observing variables X(A) in state x(A)

• B -> Initial budget•Observations are reliable

Problem: Find strategy / sampling policy to adaptively select variables to observe in order to :

Optimize Quality of the reconstruction of X / Respect Initial Budget

Page 5: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

PROBLEM STATEMENT Adaptive sampling

• Adaptive selection of variables to observe for reconstruction of the random vector X=(X(1),….,X(n))

• c(A,x(A)) -> Cost of observing variables X(A) in state x(A)

• B -> Initial budget•Observations are reliable

Problem: Find strategies / sampling policy to adaptively select variables to observed in order to :

Optimize Quality of the reconstructed vector X / Respect Initial Budget

Page 6: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

For any sampling plans A1 ,…,At and observations x(A1),…, x(At) , an adaptive sampling policy 𝛿 is a function giving the next variable(s) to observe:

𝛿((A1,x(A1)),…,(At,x(At)))=At+1.

DEFINITION: Adaptive sampling policy

Page 7: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

A1= 𝛿1={ 8 }A2= 𝛿2 ((8,0))={ 6 , 7 }

x(A1)=0

- - Example - -

x(A2)=(1,3)

For any sampling plans A1 ,…,At and observations x(A1),…, x(At) , an adaptive sampling policy 𝛿 is a function giving the next variable(s) to observe:

𝛿((A1,x(A1)),…,(At,x(At)))=At+1.

DEFINITION: Adaptive sampling policy

X(7)

X(6)X(5)

X(1)

X(4)

X(3)X(2)

X(8) X(9)

Page 8: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

DEFINITIONS

A1= 𝛿1={ 8 }A2= 𝛿2 ((8,0))={ 6 , 7 }

•Vocabulary :

•A history {(A1, x (A1)) ,… ,(AH, x (AH) } is a trajectory followed when applying 𝛿• : set of all reachable histories of 𝛿• c(𝛿) ≤ B cost of any history respects the initial budget

δτ x(A2)=(1,3)x(A1)=0

For any sampling plans A1 ,…,At and observations x(A1),…, x(At) , an adaptive sampling policy 𝛿 is a function giving the next variable(s) to observe:

𝛿((A1,x(A1)),…,(At,x(At)))=At+1.

Page 9: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

GENERAL APPROACH

Page 10: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

GENERAL APPROACH

1. Find a distribution ℙ that well describes the phenomenon under study.

2. Define the value of adaptive sampling policy:

3. Define approximate resolution method for finding near optimal policy:

Page 11: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

STATE OF THE ART1. Find a distribution ℙ that well describes the phenomenon

under study.

2. Define the value of adaptive sampling policy:

3. Define approximate resolution method for finding near optimal policy:

X continuous random vector ℙ multivariate Gaussian joint distribution

Entropy based criterionKriging variance

Greedy algorithm

Page 12: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

1. Find a distribution ℙ that well desribed the phenomenon under study.

2. Define the value of adaptive sampling policy:

3. Define approximate resolution method for finding near optimal policy:

X continuous random vector X discrete random vector ℙ multivariate Gaussian joint distribution

ℙ Markov random field distribution

Entropy based criterions Maximum Posterior Marginals (MPM)Kriging variance

OUR CONTRIBUTION

Greedy algorithm Reinforcement learning

Page 13: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

1. Find a distribution ℙ that well desribed the phenomenon under study.

2. Define the value of adaptive sampling policy:

3. Define approximate resolution method for finding near optimal policy:

X continuous random vector X discrete random vector ℙ multivariate Gaussian joint distribution

ℙ Markov random field distribution

Entropy based criterions Maximum Posterior Marginals (MPM)Kriging variance

OUR CONTRIBUTION

Greedy algorithm Reinforcement learning

Page 14: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

Formulation using dynamic model

An adapted framework for reinforcement learning

Page 15: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

Summarize knowledge on X in a random vector S of length n

•Observe variables update our knowledge on X Evolution of S

•Example: s = ( -1, …….. , k , …….. , -1 ) Variable X(i) was observed in state k i

s1 s2 s3

U(sH+1)

A1 A2 A3

sH+1

Page 16: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

Summarize knowledge on X in a random vector S of length n

•Observe variables update our knowledge on X Evolution of S

•Example: s = ( -1, …….. , k , …….. , -1 ) Variable X(i) was observed in state k i

s1 s2 s3

U(sH+1)

A1 A2 A3

sH+1

Page 17: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

Reinforcement learning solution

Page 18: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

Find optimal policy: The Q-function

Compute Q* Compute 𝛿 *

= « The expected value of the history when starting in st, observing variables X(At)and then

following policy 𝛿*»

Page 19: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

Find optimal policy: The Q-function

s1 s2 s3

U(s3)

A1 A2

• How to compute Q*: classical solution (Q-learning …)

with proba 1-𝜀random with proba 𝜀

1. Initialize Q

2. Simulate history

Page 20: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

Find optimal policy: The Q-function

s1 s2 s3

U(s3)

Update Q(s1,A1) Update Q(s2,A2)

A1 A2

with proba 1-𝜀random with proba 𝜀

many times!

• How to compute Q*: classical solution (Q-learning …)

1. Initialize Q

2. Simulate history

Q Q*C.V

Page 21: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

Alternative approach• Linear approximation of Q-function:

• Choice of function Φi:

Page 22: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

LSDP Algorithm

• Linear approximation of Q-function:

Define weights for each decision step

Compute weights using “backward induction”

H

Page 23: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

LSDP Algorithm: application to sampling1. Computation of Φi(st,At):

2. Computation of

3. Computation of

Page 24: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

LSDP Algorithm: application to sampling1. Computation of Φi(st,At):2. Computation of 3. Computation of

• We fix|At|=1 and use the approximation:

Page 25: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

Experiments

Page 26: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

Experiments• Regular grid with first order neighbourhood.

•X(i) are binary variables.

•ℙ is a Potts model with β=0.5

• Simple cost: observation of each variale cost 1

X(7)

X(6)X(5)

X(1)

X(4)

X(3)X(2)

X(8) X(9)

Page 27: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

Experiments• Comparison between:

Random policy

BP-max heuristic: at each time step observed variable

LSPI policy “ common reinforcement learning algorithm”

LSDP policy

• using score:

Page 28: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

Experiment: 100 variables (n=100)

Page 29: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

LSDP et BP-max : No observation

Action’s value for LSDP policy

Max marginals

Page 30: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

LSDP et BP-max : one observation

Action’s value for LSDP policy

Action’s value for LSDP policy

Max marginals

Page 31: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

LSDP et BP-max : two observations

Action’s value for LSDP policy

Action’s value for LSDP policy

Max marginals

Page 32: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

Experiment: 100 variables - constraint move

• Allowed to visit second ordre neighbourood only !

Page 33: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

Experiment: 100 variables - constraint move

• Allowed to visit second ordre neighbourood only !

Page 34: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

Experiment: 100 variables - constraint move

Page 35: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

Experiment: 200 variables - Different costRandom

BP-Max

LSDP Min Cost

Policy Value

60.27% 61.77%

64.8% 64.58%

Mean number

of observe

d variable

s

26.65 19 27.3 38

Initial Budget = 38

Page 36: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

Experiment: 200 variables - Different costRandom BP-

MaxLSDP Min Cost

Policy Value

60.27% 61.77% 64.8% 64.58%

Mean number

of observed variables

26.65 19 27.3 38

Initial Budget = 38

•Cost Repartition:

Random

BP-max LSDP

Page 37: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

Experiment: 100 variables - Different costRandom BP-Max LSDP

Policy Value 65.3% 66.2% 67%

Mean number of observed variables

15.4 15.4 15.4

WR 1 65.9% 65.4% 67%

WR 0 61% 63.2% 64%

•Initial Budget = 30

•Cost of 1 when observed variable is in state 0

•Cost of 3 when observed variable is in state 1

Page 38: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

Conclusions• An adapted framework for adaptive sampling in discrete random variables

•LSDP: a reinforcement learning approach for finding near optimal policy

Adaptation of common reinforcement learning algorithm for solving adaptive sampling problem

Computation of near optimal policy « off-line »

Design of new policies that outperform simple heuristics and usual RL method

• Possible application?

Weeds sampling in crop field

Page 39: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

THANK YOU!

Page 40: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

Reconstruction of X(R) and trajectory valueA1= 𝛿1={ 8 }

A2= 𝛿2 ((8,0))={ 6 , 7 }xA1=0

xA2=(4,2)=Reconstruction of XR

• Maximum Posterior Marginal for reconstruction:

Page 41: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

Reconstruction of X(R) and trajectory valueA1= 𝛿1={ 8 }

A2= 𝛿2 ((8,0))={ 6 , 7 }xA1=0

xA2=(4,2)

• Maximum Posterior Marginal for reconstruction:

• Quality of trajectory:

Page 42: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

LSDP Algorithm• Linear approximation of Q-function:

•How to comput w1,…,wH:

sH+1s1 s2 sH-1U(sH+1)

a1 a2

sH

aH-1 aH

sH+1s1 s2 sH-1

a1 a2

sH

aH-1 aH

U(sH+1)

Page 43: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

LSDP Algorithm• Linear approximation of Q-function:

•How to comput w1,…,wH:

sH+1s1 s2 sH-1U(sH+1)

a1 a2

sH

aH-1 aH

sH+1s1 s2 sH-1

a1 a2 aH-1

U(sH+1)sH

aH

LINEAR SYSTEM

wH

Page 44: A  reinforcement learning algorithm  for  sampling  design in Markov  random fields

LSDP Algorithm• Linear approximation of Q-function:

•How to comput w1,…,wH:

s1 s2 sH-1

a1 a2 aH-1

s1 s2 sH-1

a1 a2 aH-1

LINEAR SYSTEM

wH-1