tracking the evolution of learning on a complex

141
Tracking the evolution of Tracking the evolution of learning on a complex l visualmotor task Devika Subramanian Rice University

Upload: others

Post on 26-Oct-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tracking the evolution of learning on a complex

Tracking the evolution of Tracking the evolution of learning on a complex

lvisualmotor task

Devika SubramanianRice University

Page 2: Tracking the evolution of learning on a complex

OutlineOutlineA quick overview of current researchqTracking the evolution of human learning on a complex visualmotor learning on a complex visualmotor task

Page 3: Tracking the evolution of learning on a complex

Research questionq

EnvironmentEnvironment

Systemyperceptionsaction

actuators sensors

Goal: build systems that adapt to changes effectively.

Page 4: Tracking the evolution of learning on a complex

Research questionsResearch questionsWhat aspects of the task environment and What aspects of the task, environment and its internal dynamics does a system have to model for autonomous decision making?How can a system build and maintain such models in changing environments?How can a system with limited resources efficiently use these models to make d i i ?decisions?

Page 5: Tracking the evolution of learning on a complex

Current projectsModeling learning on a complex visual-motor task(ONR)Adaptive compilers (with Cooper & Torczon NSF Adaptive compilers (with Cooper & Torczon, NSF ITR,ATP and ARPA)Predicting militarized interstate disputes (with

ll N F ITR I l)Stoll, NSF ITR,Intel)Designing controllers for life support systems(NASA)( )Designing opto-mechanical systems from specifications of behavior (NSF)Virgil the Rice robotic tour guide (School of Engg Virgil, the Rice robotic tour guide (School of Engg, Rice)Predicting protein-protein interactions (with K. M h K k F d i )Matthews, Keck Foundation)

Page 6: Tracking the evolution of learning on a complex

Adaptive systems for analyzing p y y gand predicting conflict

Supported by NSF ITR, Oct 2002.

Page 7: Tracking the evolution of learning on a complex

QuestionsQuestionsGiven a non-stationary time series, y ,determine its properties: e.g., what is the nature of the non-stationarity?Can the series be segmented into quasi-stationary segments over which

di i d l b l d?predictive models can be learned?How can we detect the onset of h f d i i i i ?change of dynamics in a time series?

Page 8: Tracking the evolution of learning on a complex

Analyzing conflictAnalyzing conflict

Page 9: Tracking the evolution of learning on a complex

Analyzing conflictAnalyzing conflict

Page 10: Tracking the evolution of learning on a complex

The need for adaptive pcompilers

Today’s production compilers not sensitive to new compilation pobjectives: e.g., power, space.– Use the same optimization sequence Use the same opt m zat on sequence

independent of characteristics of input program!

Supported by NSF ITR, July 2002.

Page 11: Tracking the evolution of learning on a complex

The need for adaptive pcompilers

Building a high quality compiler is an expensive, labor-intensive effort that requires experts who are in short supply.Compiler experts determine code t f ti / ti i ti f transformation/optimization sequences for any program presented to the compiler (chosen by the –O option)(chosen by the –O option).

Page 12: Tracking the evolution of learning on a complex

Combinatorics

Large solution space.Discrete, non-linear objective function.jHow do we sample the space to geta good solution?

Page 13: Tracking the evolution of learning on a complex

SolutionSolutionUse probabilistic models of the peffects of transformation sequences on classes of programs, for different p gobjective functions.Learn these models by biased random Learn these models by biased random sampling of the space of possible sequences.sequences.

Page 14: Tracking the evolution of learning on a complex

Code space optimization p pexperiments

We ran a biased random sampler to find optimization sequences for several benchmark programsprograms– Fortran: fmin, rkf45, seval, solve, svd, urand, zeroin

(FMM benchmarks), tomcatv (SPEC).C: adpcm compress fft dfa dhrystone nsieve– C: adpcm, compress, fft, dfa, dhrystone, nsieve.

Page 15: Tracking the evolution of learning on a complex

ResultsResultsSpace optimization (LCTES99, Schielke 2001) – 13% smaller code than fixed sequence (0 to

41%)C d ll f t (26% t 25% 5 f – Code was generally faster (26% to -25%; 5 of 14 slower)

– Best methods at this point yield 5% space Best methods at this point yield 5% space advantage at the cost of 2% slowdown in time.

*

Page 16: Tracking the evolution of learning on a complex

Adaptive Compilers Adaptive Compilers

Front end i i t t

Vary parameters

Objectivefunction

remains intact

SteeringAlgorithm

We are exploring new organizing principlesExplicit objective function (chosen by the user )

executable code

Explicit objective function (chosen by the user )Biased random sampler controls optimizer and back end.back end.

Page 17: Tracking the evolution of learning on a complex

OutlineOutlineA quick overview of current researchqTracking the evolution of human learning on a complex visualmotor learning on a complex visualmotor task

Page 18: Tracking the evolution of learning on a complex

The context: training submarine pilotspilots

NRL task

Agent

Track the evolution of a human learning a visualmotortask with a significant strategic component, andalter training protocol to improve the speed anda t r tra n ng protoco to mpro th sp anefficacy of that learning.

Page 19: Tracking the evolution of learning on a complex

Goal of projectGoal of projectConstruct computational models of human Construct computational models of human learning based on performance data gathered during task learning. g g g– Models will be used to diagnose problems in

learning and aid in the design of training protocols that help humans achieve high levels protocols that help humans achieve high levels of competence on the task.

A computational microscope for training: i f iti t t f can we infer cognitive constructs from

objective performance data?.

Page 20: Tracking the evolution of learning on a complex

OutlineOutlineThe NRL Navigation TaskThe NRL Navigation TaskChallenges in modeling human learningUnderstanding the task: optimal playerUnderstanding the task: optimal playerA hybrid model for human learningUnderstanding the task: reinforcement Understanding the task: reinforcement learnerHigh-fidelity models for human learningHigh fidelity models for human learning

Page 21: Tracking the evolution of learning on a complex

The NRL Navigation Task The NRL Navigation Task

Page 22: Tracking the evolution of learning on a complex

The NRL Navigation TaskThe NRL Navigation Task

Page 23: Tracking the evolution of learning on a complex

Mathematical characteristics of the NRL task

A partially observable Markov decision process which can be made fully observable b t ti f t t ith i by augmentation of state with previous action.State space of size 1014 at each step a State space of size 1014, at each step a choice of 153 actions (17 turns and 9 speeds)speeds).Challenging for both humans and machines.

Page 24: Tracking the evolution of learning on a complex

Challenges for a human learnerg

A task with a significant strategic and a visual-motor component.Need for rapid decision making with i l i f iincomplete information.The sheer number (1014) of sensor panel

fi ti d ti h i (153)configurations and action choices (153).Binary feedback at end of episode (200 steps)steps).

Page 25: Tracking the evolution of learning on a complex

Experiments on human subjects

Conducted at San Diego with ASL eyetrackereyetracker.5 subjects, five one-hour sessions each.60 mi s sm ll mi d ift sm ll s s 60 mines, small mine drift, small sensor noise.Collected visualmotor data verbal Collected visualmotor data, verbal protocols and eyetracker data.

Page 26: Tracking the evolution of learning on a complex

Learning curves (success)Learn ng curves (success)

100

70

8090

S3

40

5060

70

cces

s %

S3S4S5S1

1020

30

40

Suc S1

S2

0

10

1 51 101

151

201

251

301

351

401

451

501

551

601

651

701

751

Episode

Page 27: Tracking the evolution of learning on a complex

Learning curves (explosions)Learning curves (explosions)

100

120

S3

60

80

losi

on %

S3S4S5

20

40Expl S1

S2

0

1 61 121

181

241

301

361

421

481

541

601

661

721

Episode

Page 28: Tracking the evolution of learning on a complex

Learning curves (timeouts)Learning curves (timeouts)

100

120

60

80

eout

%

S3S4S5

20

40Tim

e

S1S2

0

20

1 46 91 136

181

226

271

316

361

406

451

496

541

586

631

676

721

766

1 1 2 2 3 3 4 4 4 5 5 6 6 7 7

Episodes

Page 29: Tracking the evolution of learning on a complex

Observations on human learningObservat ons on human learn ng

Learning curves qualitatively similar for Learning curves qualitatively similar for successful learners. – raises hope for a common learning model!p f mm g m

Success learning curves similar for unsuccessful learners, but timeout and ,explosion curves show individual differences in failure to learn task.

Page 30: Tracking the evolution of learning on a complex

OutlineOutlineThe NRL Navigation TaskThe NRL Navigation TaskChallenges in modeling human learningUnderstanding the task: optimal playerUnderstanding the task: optimal playerA model for human learningUnderstanding the task: reinforcement Understanding the task: reinforcement learnerHigh-fidelity models for human learningHigh fidelity models for human learning

Page 31: Tracking the evolution of learning on a complex

Building Representative ModelsBu ld ng Representat ve Models

Behavioral equivalence (similarity in Behav oral equ valence (s m lar ty n learning curves)

Task

cess

%

a(t) Playertime

Succ

Modela’(t)p(t)

Page 32: Tracking the evolution of learning on a complex

Challenges in modeling learningChallenges in modeling learningIt is not possible to gather objective data It is not possible to gather objective data from subjects about their strategy; game doesn’t allow useful verbalizations during f gplay, and post-play explanations are often inaccurate and incomplete.

Page 33: Tracking the evolution of learning on a complex

Cognitive modeling by machine g g ylearningTreat low-level visualmotor data stream as ground truth from which to induce models.A model m: sensor history → actions is an approximation of a subject’s strategy function learned directly from the function learned directly from the available (p(t),a(t)) time series data.Cognitive modeling is a data compression Cognitive modeling is a data compression problem!

Page 34: Tracking the evolution of learning on a complex

Difficulties D ff cult es

High-dimensionality of visual-motor data (11 High dimensionality of visual motor data (11 dimensions spanning a space of size 1014)Noise in visual-motor datam– lapse of attention.– Joystick hysteresis.

Non-stationarity– Subjects have static periods followed by radical

l h f h h ll conceptual shifts which usually trigger significant performance gains.

Page 35: Tracking the evolution of learning on a complex

Action distribution close to mines (subject 1, day 2)

Page 36: Tracking the evolution of learning on a complex

Action distribution close to mines (subject 1, day 4)

Page 37: Tracking the evolution of learning on a complex

Action distribution close to mines (subject 1, day 5)

Page 38: Tracking the evolution of learning on a complex

Action distribution far from mines (subject 1, day 5)

Page 39: Tracking the evolution of learning on a complex

Formulating the modeling taskFormulating the modeling taskGiven: an episodic non-stationary time p yseries – episode 1: (sv0,a0),(sv1,a1)……(svn,an)

episode 2: – episode 2: ….– Episode N:

Find:Find– stationary segments in the data.– an appropriate class of models m: sensor

hi t ti t fit th t ti history -> actions to fit the stationary segments.

Page 40: Tracking the evolution of learning on a complex

Model class selectionModel class selection

hi t tim: sensor history actions– What prefix of the sensor history

d t i s ti s t ti t?determines actions at time t?– How to abstract the sensor space?

Response equivalent partitionsResponse-equivalent partitions

Page 41: Tracking the evolution of learning on a complex

OutlineOutlineThe NRL Navigation TaskThe NRL Navigation TaskChallenges in modeling human learningUnderstanding the task: optimal playerUnderstanding the task: optimal playerA hybrid model for human learningUnderstanding the task: reinforcement Understanding the task: reinforcement learnerHigh-fidelity models for human learningHigh fidelity models for human learning

Page 42: Tracking the evolution of learning on a complex

A near optimal playerA near-optimal player

A h d i i i ll A three-part deterministic controller solves the task!Th l i f ti i d b t th The only information required about the previous state is the last-turn made.A very coarse discretization of the state A very coarse discretization of the state space is needed: about 1000 states!Discovering this solution was not easy! Discovering this solution was not easy!

Page 43: Tracking the evolution of learning on a complex

Part 1: Seek GoalPart 1: Seek Goal

There is a clear sonar in the direction of the goal.

If the sonar in the direction of the goal is clear, follow itat speed of 20, unless goal is straight ahead, then travelat speed 40.

Page 44: Tracking the evolution of learning on a complex

Part 2: Avoid MinePart 2: Avoid Mine

There is a clear sonar but not in the direction of the goalThere is a clear sonar but not in the direction of the goal

Turn at zero speed to orient with the first clear sonarcounted from the middle outward. If middle sonar is clear move forward with speed 20clear, move forward with speed 20.

Page 45: Tracking the evolution of learning on a complex

Part 3: Gap FinderPart 3: Gap FinderThere are no clear sonarsThere are no clear sonars.

If the last turn was non-zero, turn again by the sameamount, else initiate a soft turn by summing the rightand left sonars and turning in the direction of theand left sonars and turning in the direction of thelower sum.

Page 46: Tracking the evolution of learning on a complex

Performance of optimal playerPerformance of optimal playerPlayer Success Behaviory

%

Opt. player 99.7% BaselineOpt. player – Part 3 79.9% Oscillates/times outOpt. player – Part 2 98.3% Times outOpt. player – Part 1 7.3% Never gets to goal/times outPart 1 50.1% Aggressive goal

seeker/blows up

Mine density = 60All results reported for 10,000 episodes

Page 47: Tracking the evolution of learning on a complex

Properties of optimal playerProperties of optimal playerReflects task decomposition found in human pplayers.– However, sub-goals are very coupled and this

coupling is what is hard for humans to learn.coupling is what is hard for humans to learn.Key to success:– state space partitioning; threshold of 50 on

l k i ht i b t sonar value makes right compromise between succeeding, timing out and blowing up.

– turning at zero speeds.g p– turning consistently in a given direction to find

gap in mines.

Page 48: Tracking the evolution of learning on a complex

OutlineOutlineThe NRL Navigation TaskThe NRL Navigation TaskChallenges in modeling human learningUnderstanding the task: optimal playerUnderstanding the task: optimal playerA model for human learningUnderstanding the task: reinforcement Understanding the task: reinforcement learnerHigh-fidelity models for human learningHigh fidelity models for human learning

Page 49: Tracking the evolution of learning on a complex

A modeling approachA model ng approachAbstraction of sensor space– view sensors through prism of equivalence

classes defined by a near-optimal policy for task.

– Extract subject’s part 1 and part 2 policies as probability distributions on actions, and part 3 policy as a hidden Markov model.

Ad tAdvantage– deviations from optimal can be the basis for

directed training of subjects.g f jDisadvantage– humans may not adopt anything close to the

t li ti d d f ti l lconceptualization needed for optimal play.

Page 50: Tracking the evolution of learning on a complex

A modelsensors

Part 1act dist

Part 2act dist

Part 3HMM

actionA ll b f i ffi i A very small number of parameters is sufficient to capturesubject. Can acquire subject model online!

Page 51: Tracking the evolution of learning on a complex

Model extraction algorithmg

To find stationary subsequences, segment the d t i KL di ll di t ib tidata using KL divergence on all distributions– chunk data into uniform segments.– for each segment, compute Part 1, 2 and 3 distributions– for each Part, compute KL divergence between successive

segments, and identify change points as those segment boundaries where measure changes significantly.

Gi t ti b f i l t Given a stationary subsequence of visual-motor data– learn Part 1 and Part 2 conditional action distributions

f d t ( ti )from data (a counting process)– obtain action sequences in Part 3 and learn HMM.

Page 52: Tracking the evolution of learning on a complex

Evolution of gap finding g p gstrategy

Subject Col: episodes 45-67 and episodes 68-90 on day 2.Subject learns to turn in place.

HMM models for gap finding.

Page 53: Tracking the evolution of learning on a complex

Pre shift gap finding strategyPre-shift gap finding strategy0.79 0.005 1

0 9951 2 3

0.21 0.995

0 0 0.84 0.00 0.36right 0 0.03 0.003 0.33left 0 0.05 0.00 0.31

th 0 08 0 997 0 00other 0.08 0.997 0.00

Page 54: Tracking the evolution of learning on a complex

Post shift gap finding strategyPost-shift gap finding strategy0.73 0.62 1

0 381 2 3

0.27 0.38

0 0 0.82 0.05 0.37right 0 0.024 0.00 0.553left 0 0.025 0.95 0.07

th 0 131 0 00 0 00other 0.131 0.00 0.00

Page 55: Tracking the evolution of learning on a complex

ResultsResultsP hift S E l i Ti t T t l i dPre-shift Successes Explosions Timeouts Total episodesCol 0 12 11 23Model 0 17 6 23

Post-shift Successes Explosions Timeouts Total episodesC l 0 2 13 15Col 0 2 13 15Model 0 4 11 15

A better fit than using C4.5.

Page 56: Tracking the evolution of learning on a complex

Problems with modelProblems with model

Very sensitive to choice of equivalence Very sensitive to choice of equivalence classes; near-optimal policy does not always provide the right classes to model subjects p jaccurately.Fit of learning curve worsens, especially for later days in training as subject becomes an expert. H ill h b i However, still the best way to summarize strategy adopted by human, at a high level.

Page 57: Tracking the evolution of learning on a complex

OutlineOutlineThe NRL Navigation TaskThe NRL Navigation TaskChallenges in modeling human learningUnderstanding the task: optimal playerUnderstanding the task: optimal playerA hybrid model for human learningUnderstanding the task: reinforcement Understanding the task: reinforcement learnerHigh-fidelity models for human learningHigh fidelity models for human learning

Page 58: Tracking the evolution of learning on a complex

Machine learning of NRL task

What does it take to get machines to learn t k?task?Can machine learners achieve higher levels f c mp t nc ?of competence?

How does the sample complexity of machine learning compare with humans?machine learning compare with humans?Can we use machine learning to improve human learning?human learning?

Page 59: Tracking the evolution of learning on a complex

Reinforcement learningReinforcement learning

T kTask

d

Learneraction

reward

stateLearner

s1,a1,r1,s2,a2,r2,……….

Page 60: Tracking the evolution of learning on a complex

Reinforcement learningRe nforcement learn ng

Representational hurdlesp– State space has to be manageably small.– Good intermediate feedback in the form of a

d f d dnon-deceptive progress function needed.Algorithmic hurdles

A i t dit i t li d d– Appropriate credit assignment policy needed.– sum-of-rewards assessment criterion is too

slow to convergeslow to converge.

Page 61: Tracking the evolution of learning on a complex

State space designp g

Binary distinction on sonar: is it > 50?Binary distinction on sonar: is it > 50?Six distinctions on bearing: 12, {1,2}, {3,4}, {5 6 7} {8 9} {10 11}{5,6,7},{8,9}, {10,11}State space size = 27 * 6 = 768.Discretization of actionsDiscretization of actions– speed: 0, 20 and 40.– turn: -32, -16, -16, -8, 0, 8, 16, 32.turn 32, 16, 16, 8, 0, 8, 16, 32.

Page 62: Tracking the evolution of learning on a complex

The progress functionThe progress function

( ’) 0 if ’ i t t h l hit ir(s,a,s’) = 0 if s’ is a state where player hits mine.= 1 if s’ is a goal state= 0.5 if s’ is a timeout state

= 0.75 if s is a Part3 state and s’ is a Part1or Part2 stateor Part2 state

= 0.5 + sum of sonars/1000 if s’ is a Part3 state= 0.5 + range/1000 + abs(bearing - 6)/40

th iotherwise

Page 63: Tracking the evolution of learning on a complex

Credit assignment policyCredit assignment policy

Penalize the last action alone in a sequence Penalize the last action alone in a sequence which ends in an explosion.Penalize all actions in sequence which ends Penalize all actions in sequence which ends in a timeout.

Page 64: Tracking the evolution of learning on a complex

Simplification of value pestimation

E i h l l d f h Estimate the average local reward for each action in each state.

A bi chan e from learnin sum of rewards – A big change from learning sum-of-rewards from each state.

s

Q(s,a) is the sum of rewards from s to terminalstate Here we only maintain local reward at state. Here we only maintain local reward at state s.

Page 65: Tracking the evolution of learning on a complex

Staged learningStaged learning

First learn turns alone with speed supplied First learn turns alone, with speed supplied by near-optimal player.Next learn both turn and speed.p .Differences in two learners suggest new protocols for training humans.p g

Page 66: Tracking the evolution of learning on a complex

Results of learning turns

Page 67: Tracking the evolution of learning on a complex

Turn learner/600 episodesTurn learner/600 episodes

Page 68: Tracking the evolution of learning on a complex

Turn learner/10 000 episodesTurn learner/10,000 episodes

Page 69: Tracking the evolution of learning on a complex

Turn learner/failure after 10KTurn learner/failure after 10K

Page 70: Tracking the evolution of learning on a complex

Learning of complete policyLearning of complete policyEstimate of average local reward not a Estimate of average local reward not a perfect substitute for global sum-of-rewards.Make action choice based on estimated local reward weighted by the global g y gmeasure of wins/(wins+timeouts) from that state.Optimistic initialization of q values.

Page 71: Tracking the evolution of learning on a complex

Results of learning complete policy

Page 72: Tracking the evolution of learning on a complex

Full Q learner/1500 episodesFull Q learner/1500 episodes

Page 73: Tracking the evolution of learning on a complex

Full Q learner/10000 episodesFull Q learner/10000 episodes

Page 74: Tracking the evolution of learning on a complex

FullQ learner/failure after 10KFullQ learner/failure after 10K

Page 75: Tracking the evolution of learning on a complex

Why learning takes so longWhy learning takes so long

Page 76: Tracking the evolution of learning on a complex

Effect of discretizationEffect of discretization

Page 77: Tracking the evolution of learning on a complex

Lessons from machine learningWhy task is hard: most frequently occurring state occurs 45% of time, all others are less than 5%.Long sequence of moves makes credit assignment hard.Staged learning makes task easier; and might h l h k help humans acquire task easier.Need for a locally non-deceptive reward f ti t d t i i C i i function to speed up training. Can giving progress function as hints to human players help?help?

Page 78: Tracking the evolution of learning on a complex

OutlineOutlineThe NRL Navigation TaskThe NRL Navigation TaskChallenges in modeling human learningUnderstanding the task: optimal playerUnderstanding the task: optimal playerA hybrid model for human learningUnderstanding the task: reinforcement Understanding the task: reinforcement learnerHigh-fidelity models for human learningHigh fidelity models for human learning

Page 79: Tracking the evolution of learning on a complex

Direct modelsDirect modelsHow well can stateless stochastic models of the form m:sensors → P(actions) match of the form m:sensors → P(actions) match subject learning curves?– Associate with every observed sensor Associate with every observed sensor

configuration, the distribution of actions taken by the player at that configuration.

Advantage: – no need to abstract sensor space.

M d l i b d i l i !– Model construction can be done in real time!

Page 80: Tracking the evolution of learning on a complex

Surely, this can’t work!Surely, th s can t work!

There are 1014 sensor configurations There are 10 sensor configurations possible in the NRL Navigation task.However there are between 103 to 104 of However, there are between 10 to 10 of those configurations actually observed by humans in a training run of 600 episodes.g pExploit sparsity in sensor configuration space to build a direct model of the subject.

Page 81: Tracking the evolution of learning on a complex

Model constructionModel construction

Segmentation of episodic data

episodespStart oftraining

End oftraining

Fitting models of the form sensors P(actions) on the stationary segments.(a n ) n a nary gm n .

Page 82: Tracking the evolution of learning on a complex

Model Derivative

dm/dt =wiiswiswiKLdiv )),(),2,(( +Π−+−+Π

empirical optimum: w = 20, s = 5

w

p pComputed by Monte Carlo sampling (stabilizes after 5% of entries are s l d)sampled)

w Overlap = sw

Overlap = s

Page 83: Tracking the evolution of learning on a complex

Model derivative for CeaModel derivative for Cea

Page 84: Tracking the evolution of learning on a complex

Before shift: Cea (episode 300)Before shift: Cea (episode 300)

Page 85: Tracking the evolution of learning on a complex

After shift: Cea (episode 320)After shift: Cea (episode 320)

Page 86: Tracking the evolution of learning on a complex

Model derivative for ColModel derivative for Col

Gap strategyGap strategyshift here

Page 87: Tracking the evolution of learning on a complex

Model derivative for HeiModel der vat ve for He

Page 88: Tracking the evolution of learning on a complex

How humans learn

Subjects have relatively static Subjects have relat vely stat c periods of action policy choice punctuated by radical shifts.punctuated by rad cal sh fts.Successful learners have conceptual shifts during the first part of shifts during the first part of training; unsuccessful ones keep trying till the end of the protocol!trying till the end of the protocol!

Page 89: Tracking the evolution of learning on a complex

How model is usedHow model s used

To compute action a associated with pcurrent sensor configuration s– take 100 neighbors of s in lookup table.g p– Compute weighted average of the

actions taken by these neighbors, OR– perform locally weighted regression

(LWR) on these 100 (s,a) pairs.

Page 90: Tracking the evolution of learning on a complex

Evaluation protocolp

Same mine configurations as subject.Model switched on segment boundaries.Cross-validation method on each segment:– Train on 9/10ths of data– Test on left-out chunk

Page 91: Tracking the evolution of learning on a complex

Results: w.avg. vs. LWR

Page 92: Tracking the evolution of learning on a complex

LWR is worse: why?LWR is worse: why?

LWR performs worse than w.avg.– data sparsity implies otherwisedata sparsity implies otherwise

Reason: LWR extrapolates oftenReason: LWR extrapolates often– shown by timeout record

Page 93: Tracking the evolution of learning on a complex

Biased dimension eliminationBiased dimension elimination

P j ti t di si s t f Projecting out dimensions to force interpolation

ept

dim

2

candidate percept

projected candidate percept

candidate percept

Perc

e p p

Percept dim 1Percept dim 1

Page 94: Tracking the evolution of learning on a complex

Results: use of bde with LWR

Page 95: Tracking the evolution of learning on a complex

Richer models: internal stateR cher models nternal state

Remember past k actions>=< aapf

k d l i t d ith

>=< −− ktttk aapf ,,, 1 K

k-gram models: experimented with k=1, 2, 3

Page 96: Tracking the evolution of learning on a complex

Results: 1-gram modelsResults 1 gram models

Page 97: Tracking the evolution of learning on a complex

Increasing state preferentiallyIncreas ng state preferent ally

Add additional history information Add additional history information for sensor configurations “close to mines”mines .Two-tier model

ti 1 i f i 1 d l– tier 1: in far-mine, use 1-gram model– tier 2: in close-mine, use

>=< −−−−− 7531,17 ,,,, tttttt aaaappf

Page 98: Tracking the evolution of learning on a complex

Results: 2-tier modelsResults 2 tier models

Page 99: Tracking the evolution of learning on a complex

Subject Cea: Day 5: 1Subject Cea: Day 5: 1

Subject Model

Page 100: Tracking the evolution of learning on a complex

Subject Cea: Day 5: 2Subject Cea: Day 5: 2

Subject Model

Page 101: Tracking the evolution of learning on a complex

Subject Cea: day 5: 3Subject Cea: day 5: 3

Subject Model

Page 102: Tracking the evolution of learning on a complex

Subject Cea: Day 5: 4Subject Cea: Day 5: 4

Subject Model

Page 103: Tracking the evolution of learning on a complex

Subject Cea: Day 5: 5Subject Cea: Day 5: 5

Subject Model

Page 104: Tracking the evolution of learning on a complex

Subject Cea: Day 5: 6Subject Cea: Day 5: 6

Subject Model

Page 105: Tracking the evolution of learning on a complex

Subject Cea: Day 5: 7Subject Cea: Day 5: 7

Subject Model

Page 106: Tracking the evolution of learning on a complex

Subject Cea: Day 5: 8Subject Cea: Day 5: 8

Subject Model

Page 107: Tracking the evolution of learning on a complex

Subject Cea: Day 5: 9Subject Cea: Day 5: 9

Subject Model

Page 108: Tracking the evolution of learning on a complex

Comparison with global methodsCompar son w th global methods

Page 109: Tracking the evolution of learning on a complex

Result summary

We can model subjects on the NRL task in real time achieving excellent fits to their real-time, achieving excellent fits to their learning curves, using the technique of 1-gram/bde-LWR/2-tier on the available gram/bde LWR/2 tier on the available visual-motor data stream.

Page 110: Tracking the evolution of learning on a complex

ConclusionsWe have used inductive machine learning techniques to construct compact cognitive models in real time from the vast empirical visual motor in real-time from the vast empirical visual-motor data gathered from subjects.Direct models offer the best approach to

d l h l h kmodeling human learning on the task.We have studied machine learners for the task and used the results to understand complexities pof task.Machine learning the NRL task has pushed the science and engineering of reinforcement learningscience and engineering of reinforcement learning.Nice interplay between human and machine learning.

Page 111: Tracking the evolution of learning on a complex

Conclusions (contd )Conclusions (contd.)One of the first in the cognitive science f f gcommunity to directly use objective visualmotor performance data to derive high-level strategy models on a complex taskmodels on a complex task.A scalable solution that harnesses the power of new sensors and computing to change training p g g gprotocols.New algorithms for detecting changepoints and building predictive stochastic models for massive building predictive stochastic models for massive, noisy, non-stationary, vector, time series data.

Page 112: Tracking the evolution of learning on a complex

Open questionsOpen questions

How to design algorithms that can learn to How to design algorithms that can learn to include relevant aspects of sensor history to increase goodness of fit with data?g f fHow would algorithms such as SVM perform on this data? What class of pkernels will be appropriate? Can DBNs be learned from this data? How do we represent/approximate the needed probability distributions?

Page 113: Tracking the evolution of learning on a complex

More open questionsp qBuilding explanatory models

ili s HMM d ls ith th bd– reconciling coarse HMM models with the bde-LWR models

Conjecture: a fundamental problem?j f p– Explanatory models do not fit performance well.– Performance models may not be very abstract,

the task seems to need a series of local models the task seems to need a series of local models rather than a single global model.

– Performance models can be used to modify l l d f d training protocols online and for designing

directed lessons because they identify sensor configurations where subject has trouble with

haction choice.

Page 114: Tracking the evolution of learning on a complex

Current workCurrent work

Training subjects to achieve higher Training subjects to achieve higher competence by giving them access to their learning.Use of neuro-imaging to find the signature of strategy shifts in the brain.

Page 115: Tracking the evolution of learning on a complex

AcknowledgementsgDiana Gordon and Sandra Marshall

H bj t d t ll ti– Human subject data collectionMy students at Rice

S tt G iffi d d t– Scott Griffin, undergraduate– Sameer Siruguri, graduate studentP di t t ONRProgram directors at ONR– Helen Gigley, Susan Chipman and Astrid

Schmidt NielsenSchmidt-Nielsen

Page 116: Tracking the evolution of learning on a complex

EEG setupEEG setup

Page 117: Tracking the evolution of learning on a complex

Data Sources: EEG dataData Sources: EEG data

Sampled at 2 kHz into binary (BDF) formatContinuous recordinggGame now sends ‘markers’ to EEG acq software:– S1: game (episode) start markerg ( p )– S2: game (episode) end marker– S3: for each timestep in details file

Raw data file size: for 20 mins (50-100 games), 1.5-1.8 GB for 256 channels

Page 118: Tracking the evolution of learning on a complex

EEG data pre processingEEG data pre-processing

Using Analyzer (proprietary software i Bi di l l b l 1 hi )in Biomedical lab: only 1 machine)New referenceFilter signal above 50HzDown sample to 512 HzDown sample to 512 HzSegment into individual games by markersmarkers

Page 119: Tracking the evolution of learning on a complex

EEG analysisEEG analysis

ERP vs long-term EEGERP: Evoked response potentialsp p– More ‘popular’ research topic– Instantaneous stimuli evoke clear responses– Easier to detect and analyze

Long-term EEG– Aims to detect long-term trends in EEG data– Mainly clinical studies e.g. seizures

O t k i thi t– Our task in this category

Page 120: Tracking the evolution of learning on a complex

EEG: Computational researchEEG: Computational research

Inverse problem: – the blind source separation problem of locating the generating

neural assemblies from external readingsm f m g– Physical computing methods using electromagnetic physics – statistical methods such as ICA

A tif t d is m lArtifact and noise removal– Noise due to electrical activity or equipment limitations– Muscle movements, eye blinks and other ‘unwanted’ signalsy

Brain dynamics– Which regions are active at any given time? Correlated activity?

How does the activity change over time?– How does the activity change over time?

Page 121: Tracking the evolution of learning on a complex

Some known resultsSome known results

Visualmotor task learning:– At the beginning, high levels of activity in pre-frontal y p

cortex and other ‘conscious thinking’ areas– After training, activity shifts to cerebellum and other

‘involuntary’ areasinvoluntary areas– Decrease in overall activity intensity

Hypothesis: Hypothesis: – Initially, more of a conscious cognitive process (think

about what you are doing!) After training, automatic (do it without thinking)

– Specialized small brain regions take over.

Page 122: Tracking the evolution of learning on a complex

Some known results (contd)Some known results (contd)

Cognitive tasks: mostly, working memory tasks, simple strategy tasks such as subtraction have b n st di dbeen studiedFound to be mediated by multiple areas of the brain working in tandembrain working in tandemAgain, shifts and changes in intensity depending on level of achievementHypothesis: cognitive processes rely on large-scale integration of multiple brain regions:

t f f ti l ti it /f ti l concept of functional connectivity/functional networks

Page 123: Tracking the evolution of learning on a complex

ArtifactsArtifacts

What are artifacts?– External (electrical) noise– Unwanted muscular movements, eye blinks or

other unrelated brain activityI t k diffi lt t h t iIn our task, difficult to characterizeWe expect high levels of cognitive activity, and muscular movementsand muscular movements

Page 124: Tracking the evolution of learning on a complex

Artifact detectionArtifact detection

Using 4-sigma rule: for each game in each channel, compute median and std devR if h l i i di Remove game if any channel contains points exceeding 4 stddev from medianHowever such points are very rare For any game only However, such points are very rare. For any game, only 0.2% points (average) in the signalEffect of these points smoothed by averagingRemoving artifacts by this method may throw out lot of valuable information

Page 125: Tracking the evolution of learning on a complex

Artifacts: exampleArtifacts: example

Page 126: Tracking the evolution of learning on a complex

4 sigma artifact statistics4-sigma artifact statistics

#games without

artifacts

Avg % artifacts in a game

Max % artifacts in a game

Vishal 6/237 0.15 3.6

Rosario 0/145 0.2 3.1

Norbert 0/153 0.13 2.9

Page 127: Tracking the evolution of learning on a complex

6 sigma artifact statistics6-sigma artifact statistics

#games without

artifacts

Avg % artifacts in a game

Max % artifacts in a game

Vishal 150/237 0.01 1.37

Rosario 26/145 0.08 1.9

Norbert 33/153 0.02 1.26

Page 128: Tracking the evolution of learning on a complex

Artifacts: suggestionsArtifacts: suggestions

Leave as is: hopefully smoothes outRemove beginning and end sections of g glonger games (20s +)Remove very short games altogether (4-6s) since they may not contain enough information to detect learning

Page 129: Tracking the evolution of learning on a complex

Data analysisData analysis

Statistical techniques: sliding window averaging variations – inconclusiveTime-frequency analysis– Windowed fourier transforms: average over

f b d i l ifrequency bands – inconclusive– Wavelets: have been applied previously for

ERPs Detect discontinuities Can we apply to ERPs. Detect discontinuities. Can we apply to our problem?

Page 130: Tracking the evolution of learning on a complex

Functional network analysisFunctional network analysis

Method to find functional networksFor every channel pair ci and cj– Compute mij, which measures degree of similarity between the signals from the two channels: eg: statistical correlation, mutual information, spectral coherence etc.

– If mij > t, mark ci and cj as an edge(t is a threshold)

Drawbacks in the basic algorithm– Computationally intensive: every channel pair considered

No temporal variance in functional networks considered– No temporal variance in functional networks considered

Page 131: Tracking the evolution of learning on a complex

Spectral CoherenceSpectral Coherence

Consider 2 signals xi(t), xj(t)( ))( 2fCM ij

ij =

Cij(f): Cross-spectral density at frequency f, by Welch’s method

)()( fCfCM

jjiiij =

Cii(f): Power spectrum of signal xi(t) at fLinear measure between 0 and 1.Matlab function ‘cohere’Matlab function cohere

Page 132: Tracking the evolution of learning on a complex

Windowed coherence analysis

Take smaller subset of channels (64).For each channel pairFor each channel pair– Slide window over signal.– Compute coherence for each window Compute coherence for each window

between the 2 channels.– Average coherence over frequency bands.g q yFor each frequency band, consider channel pairs with coherence above pthreshold.

Page 133: Tracking the evolution of learning on a complex

Preliminary ResultsPreliminary Results

KLD curves# of Coherent channels (size of functional # of Coherent channels (size of functional network) at different thresholdsPlots of coherent channels

Page 134: Tracking the evolution of learning on a complex

Ob iObservations

(above) Motor data shifts(below) Number of coherent channel pairs in frequency band 4 8H b 4-8Hz above coherence 0.6

Page 135: Tracking the evolution of learning on a complex

ObservationsObservations

For a sharp peak (game 30) note the high coherences within For a sharp peak (game 30), note the high coherences within the frontal and visual cortices, and the coherences between the two.

Page 136: Tracking the evolution of learning on a complex

ObservationsObservations

For a non-peak time (game 140), the coherences are fewer,For a non p a t m (gam ), th coh r nc s ar f w r,and confined to specific regions

Page 137: Tracking the evolution of learning on a complex

Ob iObservations

Number of coherent channel pairs increases sharply 15-20 trials before conceptual motor data shifts.Coherence peaks involve greater coherence between channels in frontal cortex, and between frontal and visual cortices.

f l h h h Left parietal cortex shows high activity, since subjects are all right-handed.

Page 138: Tracking the evolution of learning on a complex

ObservationsObservations

There are sharp changes in brain activity and size of functional networks correlated with st t d i ti p ksstrategy derivative peaks.However, relative intensities of peaks not necessarily the same Why?necessarily the same. Why?– Use of arbitrary empirical threshold– Intensity of shift in cognitive process shift not

l d f f h fnecessarily indicative of intensity of strategy shift– Non-implemented cognitive shifts?

Need a better metric for shifts in fn Need a better metric for shifts in fn networks!

Page 139: Tracking the evolution of learning on a complex

Mutual InformationMutual Information

How much information does one signal (rand var) provide about another?

h(u): ‘randomness’ (entropy) of RV u.)|()()|()(),( uvhvhvuhuhvuI −=−=

( ) ( py)h(u|v): ‘randomness’ of u given distribution of v. ))](([log)( 2 uPEuh −=

Non-linear measure))]|(([log)|(

))](([log)(2

2

vuPEvuhuPEuh

−=

Non linear measure

Page 140: Tracking the evolution of learning on a complex

Mutual Information (contd)Mutual Information (contd)

Has been applied to fMRI dataEasy to implement for discrete-valued y p fsignals e.g. voxels in fMRI dataHow to compute for continuous-valued psignals such as EEG?Matlab library by R. ModdemeijerDiscretization using histogram

Page 141: Tracking the evolution of learning on a complex

Other possible methodsOther possible methods

Linear regression analysis: CharlesNeural-network based clusteringgOther statistical clustering methods