Studies on Goal-Directed Feature Learning
Cornelius Weber, FIAS
presented at:“Machine Learning Approachesto Representational Learning
and Recognition in Vision”
Workshop at the Frankfurt Institute for Advanced Studies (FIAS),November 27-28, 2008
for taking action, we need only the relevant features
x
y
z
models’ background & overview:
- unsupervised feature learning models are enslaved by bottom-up input
- reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002)
- reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007); ...
- RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)
- reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007)
(model 3 presented here, extending to delayed reward)
- feature-pruning models learn all features but forget the irrelevant ones (models 1 & 2 presented here)
sensory input
reward
action
purely sensory data, in which one feature type is linked to reward
the action is not controlled by the network
model 1: obtaining the relevant features
1) build a feature detecting model
2) learn associations between features
3) register the average features’ reward
4) spread value along associative connections
5) check whether actions in-/decrease value
6) remove features where action doesn’t matter
irrelevant relevant
Földiák, Biol Cybern 64, 165-70 (1990)
→ homogeneous activity distr.
features
thresholds
late
ral w
eigh
ts (
deco
rrel
atio
n)
selectedfeatures
asso
ciat
ive
wei
ghts
actioneffect
Weber & Triesch, Proc ICANN, 740-9 (2008);Witkowski, Adap Behav, 15(1), 73-97 (2007);Toussaint, Proc NIPS, 929-36 (2003);Weber, Proc ICANN, 1147-52 (2001)
→ relevant features indentified
sensory input reward
motor-sensory data (again, one feature type is linked to reward)
the network selects the action (to get reward)
irrelevantsubspace
relevantsubspace
model 2: removing the irrelevant inputs
1) initialize feature detecting model
(but continue learning)
2) perform actor-critic RL, taking the features’
outputs as state representation
- works despite irrelevant features
- challenge: relevant features will occur
at different frequencies
- nevertheless, features may remain stable
3) observe the critic: puts negative value
on irrelevant features after long training
4) modulate (multiply) learning by critic’s value
frequency
value
Lücke & Bouecke, Proc ICANN, 31-7 (2005)
features
criticvalue action weights
→ relevant subspace discovered
model 3: learning only the relevant inputs
1) top level: reinforcement learning model (SARSA)
2) lower level: feature learning model (SOM / K-means)
3) modulate learning by δ, in both layers
RL weights
featureweights
input
action
model 3: SARSA with SOM-like activation and update
relevantsubspace RL action weights
subspacecoverage
feature weights
RL action weights
feature weights
input reward 2 actions (not shown)
data
learning ‘long bars’ data
RL action weights
feature weights
input data:bars controlled by actions‘up’, ‘down’, ‘left’, ‘right’
learning the ‘short bars’ data
reward
action
short bars in 12x12 average # of steps to goal: 11
cortex
striatum
GPi (output of basal ganglia)
biological interpretation
- no direct feedback from striatum to cortex
- convergent mapping → little receptive field overlap, consistent with subspace discovery
feature/subspace detection
action selection
Discussion
- models 1 and 2 learn all features and identify the relevent features
- either requires homogeneous feature distribution (model 1)
- or can do only subspace- (no real feature) detection (model 2)
- model 3 is very simple: SARSA on SOM with δ-feedback
- learns only the relevant subspace or features in the first place
- link between unsupervised- and reinforcement learning
Sponsors
BernsteinFocusNeurotechnology
EU project 231722“IM-CLeVeR”call FP7-ICT-2007-3
Frankfurt Institutefor Advanced StudiesFIAS
early learning late learning
Jog et al,Science, 286,1158-61 (1999)
relevant features change during learning
units in the basal ganglia are active at the junctionduring early task acquisition but not at a later stage
T - maze decision task (rat)
evidence for reward/action modulated learning in the visual system
Shuler & Bear, "Reward timing in the primary visual cortex", Science, 311, 1606-9 (2006)
Schoups et al. "Practising orientation identification improves orientation coding in V1 neurons" Nature, 412, 549-53 (2001)