maximizing information gain via prediction reward · this paper • main theoretical result • how...

Maximizing Information Gain via Prediction Reward

Yash Satsangi, Sungsu Lim, Shimon Whiteson, Frans Oliehoek, Martha White

Active perception

Sensor selection Visual attention

The ability to take actions to reduce uncertainty

Active perception as an RL taskReward:

⇢(b) = �H(b)

Active perception as an RL task

Reward: ⇢(b) = �H(b)

Explicit belief

inference?

This paper• Main theoretical result

• How can we design state-based reward functions that approximate information gain.

• Prediction reward are linear approximation to entropy.

• Deep Anticipatory Networks (DAN)

• A deep RL algorithm that trains two deep neural networks simultaneously on each other’s feedback.

• Useful when reward is a convex function of the belief of the agent.

• Experiments

• Sensor selection with DAN

• Visual attention with DAN

Prediction rewardA connection between prediction reward and information gain

Prediction reward: reward agent for making accurate prediction.

Expected prediction reward

r’ for correct prediction

r’’ otherwise

Main resultA connection between prediction reward and information gain

⇢(b) = �H(b)

⇢0(b)

<latexit sha1_base64="5KHyYJ9bUaFbPVnLxKltESaFziY=">AAAB83icbVBNSwMxEJ2tX7V+VT16CbZivZTdIqi3ohePFewHdJeSTdM2NJssSVYoS/+GFw+KePXPePPfmLZ70NYHA4/3ZpiZF8acaeO6305ubX1jcyu/XdjZ3ds/KB4etbRMFKFNIrlUnRBrypmgTcMMp51YURyFnLbD8d3Mbz9RpZkUj2YS0yDCQ8EGjGBjJb/sq5E8r6Th9KLcK5bcqjsHWiVeRkqQodErfvl9SZKICkM41rrrubEJUqwMI5xOC36iaYzJGA9p11KBI6qDdH7zFJ1ZpY8GUtkSBs3V3xMpjrSeRKHtjLAZ6WVvJv7ndRMzuA5SJuLEUEEWiwYJR0aiWQCozxQlhk8swUQxeysiI6wwMTamgg3BW355lbRqVe+yevNQK9VvszjycAKnUAEPrqAO99CAJhCI4Rle4c1JnBfn3flYtOacbOYY/sD5/AGNDpC7</latexit>

expected prediction reward

constant term

<latexit sha1_base64="ZwN7C2KGpUGWyQTjXS7eUjXHlRo=">AAAB83icbVBNSwMxEM3Wr1q/qh69BFvBU9ktgnorevFYwdZCdynZdLYNzSYhyQql9G948aCIV/+MN/+NabsHbX0w8Hhvhpl5seLMWN//9gpr6xubW8Xt0s7u3v5B+fCobWSmKbSo5FJ3YmKAMwEtyyyHjtJA0pjDYzy6nfmPT6ANk+LBjhVEKRkIljBKrJPCagjKMC5FL6j2yhW/5s+BV0mQkwrK0eyVv8K+pFkKwlJOjOkGvrLRhGjLKIdpKcwMKEJHZABdRwVJwUST+c1TfOaUPk6kdiUsnqu/JyYkNWacxq4zJXZolr2Z+J/XzWxyFU2YUJkFQReLkoxjK/EsANxnGqjlY0cI1czdiumQaEKti6nkQgiWX14l7XotuKhd39crjZs8jiI6QafoHAXoEjXQHWqiFqJIoWf0it68zHvx3r2PRWvBy2eO0R94nz834JEq</latexit>

<latexit sha1_base64="eyKmuJ/6IHI69Xzd56EWrhTc7AE=">AAAB83icbVBNSwMxEM3Wr1q/qh69BFvBU9ktgnorevFYwdZCt5RsOtuGZpOQZIWy9G948aCIV/+MN/+NabsHbX0w8Hhvhpl5keLMWN//9gpr6xubW8Xt0s7u3v5B+fCobWSqKbSo5FJ3ImKAMwEtyyyHjtJAkojDYzS+nfmPT6ANk+LBThT0EjIULGaUWCeF1RCUYVyKfr3aL1f8mj8HXiVBTiooR7Nf/goHkqYJCEs5MaYb+Mr2MqItoxympTA1oAgdkyF0HRUkAdPL5jdP8ZlTBjiW2pWweK7+nshIYswkiVxnQuzILHsz8T+vm9r4qpcxoVILgi4WxSnHVuJZAHjANFDLJ44Qqpm7FdMR0YRaF1PJhRAsv7xK2vVacFG7vq9XGjd5HEV0gk7ROQrQJWqgO9RELUSRQs/oFb15qffivXsfi9aCl88coz/wPn8AOWWRKw==</latexit>

Main consequencesA connection between prediction reward and information gain

Can estimate using samples

Question answering

Visual attention

Intrinsic motivation

This paper

Active sensing

Active perception

Sensor selection

DAN: Deep Anticipatory NetworksTrain Q and M network simultaneously

Q agent is rewarded is M agent predicts the unknown variable correctly.

ExperimentsSensor selection

Baselines • Coverage • Random • Coverage + DAN • Shared representations

At each time step: • Agent must select 1 out of 10 sensors to process observations from. • Agent is rewarded for correctly predicting the <x-y> position of a person

ExperimentsSensor selection

Correct Predictions in Multi-person Tracking

02010521

Num. Tracked People

DAN + CoverageCoverageRandom Policy

DAN sharedDAN

ExperimentsVisual attention

Test Curve in Terminal Reward Setting1.0

150001000050000 20000Training Episodes

Fashion-MNIST terminal-rewardMNIST terminal-reward

MNIST DANFashion-MNIST DAN

Test Curve in Continuous Reward Setting12

150001000050000 20000Training Episodes

Fashion-MNIST terminal-rewardMNIST terminal-reward

MNIST DANFashion-MNIST DAN

Thank you!Contact:Y.Satsangi@tilburguniversity.edu

yashsatsangi.com

Summary • Connection between prediction rewards and information gain.

• Compute information gain estimates without explicit belief inference.

maximizing information gain via prediction reward · this paper • main theoretical result • how...

Documents

information vs reward and punishment -...

“honey, we shrunk the weights” gender prediction using...

riio-2 framework - information revealing devices (irds...

dynamic integration of reward and stimulus information

an information gain formulation for active volumetric 3d

managing marketing information to gain customer insights....

reward design via online gradient ascent - neural...

introduction to information retrieval · introduction to...

to reward or not to reward

pss718 - data mining - hacettepe...

hay group reward information services

classnk - technical information · 2018-08-31 · classnk...

reliance health gain policy customer information sheet

the influence of reward on early information processing

incentives & gain sharing incentives and gain sharing are...

recognize, reward and motivate. why recognize & reward?

frontier for ai research the hanabi challenge: a new -...

information gain feature selection based on feature

utilisation of maintenance information to gain...

ijj-total quality management and choice of information and...