neurobiological models and research themes

Neurobiological Models and Research Themes

Matthew J. Crossley

Department of Psychological and Brain Sciences University of California, Santa Barbara, 93106

I. A neurobiological model of appetitive instrumental conditioning

II. Overview of my research

III. Contribution to the Ivry lab

Talk Goals

Why Instrumental Conditioning?

• The Ashby lab bread and butter is category learning

• Information-Integration category-learning is a procedural skill

• Appetitive Instrumental Conditioning is a procedural skill

• Procedural Skills

• Model Architecture

• Instrumental Conditioning Applications

• Instrumental Conditioning Summary

Part I Outline




• Category Learning Applications


Outline

• Learned incrementally from feedback

• Model-free reinforcement learning

• Habitual control

• E.g., riding a bike or playing an instrument

• E.g., radiology

Procedural Skills

Procedural Skills

Where are the tumors?

Procedural Skills

TUMORS!

Procedural Skills Depend on the Basal Ganglia

• Basal ganglia are a collection of subcortical nuclei

• Interconnects with cortex in well defined circuits

• Striatum is a major input structure

Cortex Excites the Striatum

Striatum Inhibits the GPi

GPi Inhibits the Thalamus

High baseline firing rate

Striatum Disinhibits the Thalamus

Thalamus Excites Cortex

Dopamine Modulates Activity

Procedural Learning Depends on the Striatum

• Single-cell recordings Carelli, Wolske, & West, 1997; Merchant, Zainos, Hernadez, Salinas, & Romo, 1997; Romo, Merchant, Ruiz, Crespo, & Zainos, 1995

• Lesion studies Eacott & Gaffan, 1991; Gaffan & Eacott, 1995; Gaffan & Harrison, 1987; McDonald & White, 1993, 1994; Packard, Hirsch, & White, 1989; Packard & McGaugh, 1992

• Neuropsychological patient studies Filoteo, Maddox, & Davis, 2001; Filoteo, Maddox, Salmon, & Song, 2005; Knowlton, Mangels, & Squire, 1996

• Neuroimaging Nomura et al., 2007; Seger & Cincotta, 2002; Waldschmidt & Ashby, 2011

Striatal Neurons

Medium Spiny Projection Neurons (MSNs)

96%

GABA Interneurons 2%

TANs - Cholinergic Interneurons 2%

The TANs are of Particular Interest

• Tonically active and pause to excitatory input

• Presynaptically inhibit cortical input to MSNs

• Get major input from CM-Pf (thalamus)

• Learn to pause to stimuli that predict reward (requires dopamine)




• Category Learning Applications

• Closing Remarks

Outline

Model Architecture

Ashby and Crossley (2011)

Learning Occurs at the CTX-MSN Synapse and at Pf-TAN Synapses

Pf-TAN Synapse

CTX-MSN Synapse


Network Dynamics: Early Trial

Network Dynamics - Early Trial

Network Dynamics - Early Trial

SMA

Response and Feedback

• Model responds if SMA crosses threshold

• Model is given feedback after every trial

Learning Occurs at the CTX-MSN Synapse and at Pf-TAN Synapses

Pf-TAN Synapse

CTX-MSN Synapse


CTX-MSN Synaptic Modification Requires a TANs Pause

• Synaptic Strengthening:

- Strong presynaptic activation

- Strong postsynaptic activation

- Elevated DA levels

• Synaptic Weakening:



- Depressed DA levels

Arbuthnott, Ingham, & Wickens (2000) Calabresi, Pisani, Mercuri, & Bernardi (1996) Reynolds & Wickens (2002)

Synaptic Plasticity in the Striatum Depends on Dopamine (DA)

• Synaptic Strengthening:



- Elevated DA levels

• Synaptic Weakening:



- Depressed DA levels

Arbuthnott, Ingham, & Wickens (2000) Calabresi, Pisani, Mercuri, & Bernardi (1996) Reynolds & Wickens (2002)

DA Encodes Reward Prediciton Error (RPE)

• Elevated after unexpected reward

• Depressed after unexpected no-reward

• Does nothing if anything expected happens

Bayer & Glimcher (2005)

Computing RPE

Obtained feedback on trial n:

Predicted feedback on trial n:

Rn =

�1 if positive feedback0 otherwise

Pn = Pn�1 + �(Rn�1 � Pn�1)

RPE on trial n:

RPE(n) = Rn � Pn

DA Released on Trial n

DA(n) =

�⌅⇤

⌅⇥

1 if RPE > 10.8RPE + 0.2 if � 0.25 < RPE � 10 if RPE < 0.25

Updating Synapses in the Model

!

wK ,J

(n +1) = wK ,J

(n)

+"wIK

(n) SJ(n) #$

NMDA[ ]+D(n) #D

base[ ]+

1# wK ,J

(n)[ ]

#%wIK

(n) SJ(n) #$

NMDA[ ]+Dbase

#D(n)[ ]+wK ,J

(n)

# &wIK

(n) $NMDA

# SJ(n)[ ]

+' S

J(n) #$

AMPA[ ]+wK ,J

(n).

Presynaptic Activity

Presynaptic Activity

Synaptic Strengthening

Synaptic Weakening


!

wK ,J

(n +1) = wK ,J

(n)

+"wIK

(n) SJ(n) #$

NMDA[ ]+D(n) #D

base[ ]+

1# wK ,J

(n)[ ]

#%wIK

(n) SJ(n) #$

NMDA[ ]+Dbase

#D(n)[ ]+wK ,J

(n)

# &wIK

(n) $NMDA

# SJ(n)[ ]

+' S

J(n) #$

AMPA[ ]+wK ,J

(n).

Postsynaptic Activation

Postsynaptic Activation


Synaptic Weakening


!

wK ,J

(n +1) = wK ,J

(n)

+"wIK

(n) SJ(n) #$

NMDA[ ]+D(n) #D

base[ ]+

1# wK ,J

(n)[ ]

#%wIK

(n) SJ(n) #$

NMDA[ ]+Dbase

#D(n)[ ]+wK ,J

(n)

# &wIK

(n) $NMDA

# SJ(n)[ ]

+' S

J(n) #$

AMPA[ ]+wK ,J

(n).

Elevated DA

Depressed DA


Synaptic Weakening

Network Dynamics: Late Trial

Network Dynamics - Late Trial

Network Dynamics - Late Trial

SMA

Model Accounts for Electrophysiological Recordings from TANs


Model Accounts for Electrophysiological Recordings from MSNs






Outline

Fast Reacquisition


Fast reacquisition is evidence that extinction did not erase initial learning

Fast Reacquisition Mechanics

TANs quickly stop pausing, and thereby protect cortico-striatal synapses

Fast Reacquisition Mechanics

Partial Reinforcement Extinction (PRE)

Extinction is slower when acquisition is trained with partial reinforcement

PRE Mechanics

TANs take longer to stop pausing under partial reinforcement

Slowed Reacquisition

Condition

Phase

Ext2 Ext8 Prf2 Prf8

Acquisition VI-30 sec VI-30 sec VI-30 sec VI-30 sec

ExtinctionNo

ReinforcementNo

ReinforcementLean Schedule Lean Schedule

Reacquisition VI-2 min VI-8 min VI-2 min VI-8 min

Woods and Bouton (2007)

Behavioral Results

Crossley, Horvitz, Balsam, & Ashby (in prep)

Modeling Results


TANs don’t stop pausing during extinction in Prf Conditions

CTX-MSN Synapse Pf-TAN Synapse

Renewal - Basic Design

Condition

Phase

ABA AAB ABC

Acquisition Environment A Environment A Environment A

Extinction Environment B Environment A Environment B

Renewal (Extinction)

Environment A Environment B Environment C

Bouton et al. (2011)

Renewal

Model Architecture


Synaptic Plasticity at ALL Pf-TAN Synapses


Renewal


ABA Mechanics


Net Pf-TAN synaptic weight is the average of all active Pf-TAN synapses

Instrumental Conditioning Summary

• The TANs protect learning at CTX-MSN synapses.

• Manipulations that keep the TANs paused during extinction leave learning at the CTX-MSN synapse subject to change.

I. A Neurobiological model of appetitive instrumental conditioning


III. Contribution to the Ivry Lab

Talk Goals

Category Learning: The Basics

A or B

Rule-Based Category Learning

Spatial Frequency

Ori

enta

tion

Information-Integration Category Learning

Spatial Frequency

Ori

enta

tion

Many Qualitative Differences Between RB and II

RB II

Unsupervised learning Yes No

Observational learning Yes No

Dual-task interference Yes No

Time needed to process feedback

Yes No

Interference from button switch

No Yes

Interference from Feedback Delay

No Yes

II Category Learning is a Procedural Skill

Major Research Themes

• Unlearning

• System Interaction

• Miscellaneous

Unlearning Experiment Design

Crossley, Maddox & Ashby (under review)

Condition

Phase

Active ConditionMeta-Learning

Condition

Acquisition True Feedback True Feedback

Extinction Feedback Manipulation Feedback Manipulation

Reacquisition True FeedbackTrue Feedback

New Categories

We Achieved Unlearning

Unlearning requires partially-contingent feedbackCrossley, Maddox & Ashby (under review)

Theoretical AccountNetwork architecture and new DA model

Crossley, Maddox & Ashby (under review)

• DA is RPE scaled by response-feedback contingency


• Unlearning


• Miscellaneous

System Interaction Theme

• Development of TANs pause precedes development of category-specific responses in MSNs

• TANs should stop pausing during extinction (i.e., reward removal in instrumental conditioning and noncontingent feedback in category learning).

• Phasic DA response should be scaled by response-feedback contingency.

• Do systems cooperate to learn optimal behavior?

• What does it take to get system-switching?

• Does the procedural system learn during declarative control?

• What mechanistic models describe system switching throughout learning?

• What is the correct neurobiological model of system switching?

Do Systems Cooperate?

Perfect accuracy is possible with trial-by-trial switching between RB and II strategies

Ashby & Crossley (2010)

2 days (1200 trials) of training on:

Systems Compete

Information-Integration Uniform Hybrid Non-Uniform Hybrid

GuessingRule-BasedInformation_integrationHybrid

Decision-Bound Model Fit Summary

Num

ber o

f Par

ticip

ants

05

1015

20

Almost nobody was best fit by a hybrid model

Ashby & Crossley (2010)

What does it take to get successful system switching?

A

B

DC

Behavioral: Crossley, Roeder & Ashby (in prep)

fMRI: Turner, Crossley & Ashby (in prep)

Crossley, Roeder & Ashby (in prep)

Successful System-Switching

Training Protocol

• 100 RB trials

• 400 II trials

• 300 intermixed trials

• 100 button-switched intermixed trials

Successful System-Switching

Button Switch

Crossley, Roeder & Ashby (in prep)

Persistent button-switch interference on II trials but not RB trials supports true system switching

Butt

on S

witc

h In

terf

eren

ce

Butt

on S

witc

h In

terf

eren

ce

Does the procedural system learn during declarative control?

Conditions

• Transfer Positive

• All Positive

• Transfer Negative

• All Negative

Crossley & Ashby (in prep)

Potential for weak bootstrapping

Small, but significant hit in Transfer Negative condition during first 50 trials after transfer

TransferTrain

Crossley & Ashby (in prep)







• Does the II system learn during RB control?



Explicitly Modeling System Switching

Turner, Crossley & Ashby (in prep)







• Does the II system learn during RB control?



Neurobiological Models of System Interaction


• Unlearning


• Miscellaneous

Category Structure and Feedback Effects




• What system learns unstructured categories?

• Does probabilistic feedback induce procedural learning?

The Experiment

Crossley, Madsen & Ashby (in prep)

Conditions

• Unstructured - Deterministic

• Unstructured - Probabilistic

• Rule-based - Deterministic

• Rule-based - Probabilistic

The Experiment


The Experiment


Butt

on S

witc

h In

terf

eren

ce

Acc

urac

y

Butt

on S

witc

h In

terf

eren

ce

Rea

ctio

n T

ime

Button-switch effect on unstructured categories suggests procedural control

Learning Under a Dual-Task




• Hypothesis 1: Dual-task induces procedural control.

• Hypothesis 2: Dual-task only slows the declarative system down.

RB category learning with a simultaneous numerical Stroop task

The Experiment

Paul, Crossley & Ashby (in prep)

• Every participant does either RB or II structures with:

• Single-task, button-switch

• Dual-task, button-switch

The Experiment

Paul, Crossley & Ashby (in prep)

I. A Neurobiological model of appetitive instrumental conditioning


III. Contribution to the Ivry Lab

Talk Goals

I. Lots of room to build spiking networks

Hand / Object Choice networks

Inhibitory Control and Competition Resolution

Supervised learning in the cerebellum

Model of timing in instrumental conditioning

II. Object choice, hand choice, and categorization: Experiment ideas

Contribution to the Ivry Lab

Spiking Networks of Hand and Object Choice

Motivation

• Predictive clarity

• Model-based imaging

• Natural ability to account for patient data

• Generate new experiments

Supervised Learning in the Cerebellum

Hypothesized hand and object choice brain systems operate with different learning algorithms.

Doya, 2000

Spiking Networks of IC and CR

• Role of the hyperdirect pathway?

• Relationship to our studies of system switching?

I. Many of the tools used to dissociate RB and II category learning systems might be used to dissociate hand choice from object choice, and subsystems thereof.

Feedback delay

Time duration to process feedback

Feedback contingency

Automaticity

Object choice, hand choice, and categorization experiment ideas

Acknowledgments Collaborators:

Greg Ashby

The Ashby Lab

Todd Maddox

Jon Horvitz

Peter Balsam

!

Funding:

NIMH Grant MH3760-2, Todd Wilkinson