tracking the evolution of learning on a complex

Post on 26-Oct-2021

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Tracking the evolution of Tracking the evolution of learning on a complex

lvisualmotor task

Devika SubramanianRice University

OutlineOutlineA quick overview of current researchqTracking the evolution of human learning on a complex visualmotor learning on a complex visualmotor task

Research questionq

EnvironmentEnvironment

Systemyperceptionsaction

actuators sensors

Goal: build systems that adapt to changes effectively.

Research questionsResearch questionsWhat aspects of the task environment and What aspects of the task, environment and its internal dynamics does a system have to model for autonomous decision making?How can a system build and maintain such models in changing environments?How can a system with limited resources efficiently use these models to make d i i ?decisions?

Current projectsModeling learning on a complex visual-motor task(ONR)Adaptive compilers (with Cooper & Torczon NSF Adaptive compilers (with Cooper & Torczon, NSF ITR,ATP and ARPA)Predicting militarized interstate disputes (with

ll N F ITR I l)Stoll, NSF ITR,Intel)Designing controllers for life support systems(NASA)( )Designing opto-mechanical systems from specifications of behavior (NSF)Virgil the Rice robotic tour guide (School of Engg Virgil, the Rice robotic tour guide (School of Engg, Rice)Predicting protein-protein interactions (with K. M h K k F d i )Matthews, Keck Foundation)

Adaptive systems for analyzing p y y gand predicting conflict

Supported by NSF ITR, Oct 2002.

QuestionsQuestionsGiven a non-stationary time series, y ,determine its properties: e.g., what is the nature of the non-stationarity?Can the series be segmented into quasi-stationary segments over which

di i d l b l d?predictive models can be learned?How can we detect the onset of h f d i i i i ?change of dynamics in a time series?

Analyzing conflictAnalyzing conflict

Analyzing conflictAnalyzing conflict

The need for adaptive pcompilers

Today’s production compilers not sensitive to new compilation pobjectives: e.g., power, space.– Use the same optimization sequence Use the same opt m zat on sequence

independent of characteristics of input program!

Supported by NSF ITR, July 2002.

The need for adaptive pcompilers

Building a high quality compiler is an expensive, labor-intensive effort that requires experts who are in short supply.Compiler experts determine code t f ti / ti i ti f transformation/optimization sequences for any program presented to the compiler (chosen by the –O option)(chosen by the –O option).

Combinatorics

Large solution space.Discrete, non-linear objective function.jHow do we sample the space to geta good solution?

SolutionSolutionUse probabilistic models of the peffects of transformation sequences on classes of programs, for different p gobjective functions.Learn these models by biased random Learn these models by biased random sampling of the space of possible sequences.sequences.

Code space optimization p pexperiments

We ran a biased random sampler to find optimization sequences for several benchmark programsprograms– Fortran: fmin, rkf45, seval, solve, svd, urand, zeroin

(FMM benchmarks), tomcatv (SPEC).C: adpcm compress fft dfa dhrystone nsieve– C: adpcm, compress, fft, dfa, dhrystone, nsieve.

ResultsResultsSpace optimization (LCTES99, Schielke 2001) – 13% smaller code than fixed sequence (0 to

41%)C d ll f t (26% t 25% 5 f – Code was generally faster (26% to -25%; 5 of 14 slower)

– Best methods at this point yield 5% space Best methods at this point yield 5% space advantage at the cost of 2% slowdown in time.

*

Adaptive Compilers Adaptive Compilers

Front end i i t t

Vary parameters

Objectivefunction

remains intact

SteeringAlgorithm

We are exploring new organizing principlesExplicit objective function (chosen by the user )

executable code

Explicit objective function (chosen by the user )Biased random sampler controls optimizer and back end.back end.

OutlineOutlineA quick overview of current researchqTracking the evolution of human learning on a complex visualmotor learning on a complex visualmotor task

The context: training submarine pilotspilots

NRL task

Agent

Track the evolution of a human learning a visualmotortask with a significant strategic component, andalter training protocol to improve the speed anda t r tra n ng protoco to mpro th sp anefficacy of that learning.

Goal of projectGoal of projectConstruct computational models of human Construct computational models of human learning based on performance data gathered during task learning. g g g– Models will be used to diagnose problems in

learning and aid in the design of training protocols that help humans achieve high levels protocols that help humans achieve high levels of competence on the task.

A computational microscope for training: i f iti t t f can we infer cognitive constructs from

objective performance data?.

OutlineOutlineThe NRL Navigation TaskThe NRL Navigation TaskChallenges in modeling human learningUnderstanding the task: optimal playerUnderstanding the task: optimal playerA hybrid model for human learningUnderstanding the task: reinforcement Understanding the task: reinforcement learnerHigh-fidelity models for human learningHigh fidelity models for human learning

The NRL Navigation Task The NRL Navigation Task

The NRL Navigation TaskThe NRL Navigation Task

Mathematical characteristics of the NRL task

A partially observable Markov decision process which can be made fully observable b t ti f t t ith i by augmentation of state with previous action.State space of size 1014 at each step a State space of size 1014, at each step a choice of 153 actions (17 turns and 9 speeds)speeds).Challenging for both humans and machines.

Challenges for a human learnerg

A task with a significant strategic and a visual-motor component.Need for rapid decision making with i l i f iincomplete information.The sheer number (1014) of sensor panel

fi ti d ti h i (153)configurations and action choices (153).Binary feedback at end of episode (200 steps)steps).

Experiments on human subjects

Conducted at San Diego with ASL eyetrackereyetracker.5 subjects, five one-hour sessions each.60 mi s sm ll mi d ift sm ll s s 60 mines, small mine drift, small sensor noise.Collected visualmotor data verbal Collected visualmotor data, verbal protocols and eyetracker data.

Learning curves (success)Learn ng curves (success)

100

70

8090

S3

40

5060

70

cces

s %

S3S4S5S1

1020

30

40

Suc S1

S2

0

10

1 51 101

151

201

251

301

351

401

451

501

551

601

651

701

751

Episode

Learning curves (explosions)Learning curves (explosions)

100

120

S3

60

80

losi

on %

S3S4S5

20

40Expl S1

S2

0

1 61 121

181

241

301

361

421

481

541

601

661

721

Episode

Learning curves (timeouts)Learning curves (timeouts)

100

120

60

80

eout

%

S3S4S5

20

40Tim

e

S1S2

0

20

1 46 91 136

181

226

271

316

361

406

451

496

541

586

631

676

721

766

1 1 2 2 3 3 4 4 4 5 5 6 6 7 7

Episodes

Observations on human learningObservat ons on human learn ng

Learning curves qualitatively similar for Learning curves qualitatively similar for successful learners. – raises hope for a common learning model!p f mm g m

Success learning curves similar for unsuccessful learners, but timeout and ,explosion curves show individual differences in failure to learn task.

OutlineOutlineThe NRL Navigation TaskThe NRL Navigation TaskChallenges in modeling human learningUnderstanding the task: optimal playerUnderstanding the task: optimal playerA model for human learningUnderstanding the task: reinforcement Understanding the task: reinforcement learnerHigh-fidelity models for human learningHigh fidelity models for human learning

Building Representative ModelsBu ld ng Representat ve Models

Behavioral equivalence (similarity in Behav oral equ valence (s m lar ty n learning curves)

Task

cess

%

a(t) Playertime

Succ

Modela’(t)p(t)

Challenges in modeling learningChallenges in modeling learningIt is not possible to gather objective data It is not possible to gather objective data from subjects about their strategy; game doesn’t allow useful verbalizations during f gplay, and post-play explanations are often inaccurate and incomplete.

Cognitive modeling by machine g g ylearningTreat low-level visualmotor data stream as ground truth from which to induce models.A model m: sensor history → actions is an approximation of a subject’s strategy function learned directly from the function learned directly from the available (p(t),a(t)) time series data.Cognitive modeling is a data compression Cognitive modeling is a data compression problem!

Difficulties D ff cult es

High-dimensionality of visual-motor data (11 High dimensionality of visual motor data (11 dimensions spanning a space of size 1014)Noise in visual-motor datam– lapse of attention.– Joystick hysteresis.

Non-stationarity– Subjects have static periods followed by radical

l h f h h ll conceptual shifts which usually trigger significant performance gains.

Action distribution close to mines (subject 1, day 2)

Action distribution close to mines (subject 1, day 4)

Action distribution close to mines (subject 1, day 5)

Action distribution far from mines (subject 1, day 5)

Formulating the modeling taskFormulating the modeling taskGiven: an episodic non-stationary time p yseries – episode 1: (sv0,a0),(sv1,a1)……(svn,an)

episode 2: – episode 2: ….– Episode N:

Find:Find– stationary segments in the data.– an appropriate class of models m: sensor

hi t ti t fit th t ti history -> actions to fit the stationary segments.

Model class selectionModel class selection

hi t tim: sensor history actions– What prefix of the sensor history

d t i s ti s t ti t?determines actions at time t?– How to abstract the sensor space?

Response equivalent partitionsResponse-equivalent partitions

OutlineOutlineThe NRL Navigation TaskThe NRL Navigation TaskChallenges in modeling human learningUnderstanding the task: optimal playerUnderstanding the task: optimal playerA hybrid model for human learningUnderstanding the task: reinforcement Understanding the task: reinforcement learnerHigh-fidelity models for human learningHigh fidelity models for human learning

A near optimal playerA near-optimal player

A h d i i i ll A three-part deterministic controller solves the task!Th l i f ti i d b t th The only information required about the previous state is the last-turn made.A very coarse discretization of the state A very coarse discretization of the state space is needed: about 1000 states!Discovering this solution was not easy! Discovering this solution was not easy!

Part 1: Seek GoalPart 1: Seek Goal

There is a clear sonar in the direction of the goal.

If the sonar in the direction of the goal is clear, follow itat speed of 20, unless goal is straight ahead, then travelat speed 40.

Part 2: Avoid MinePart 2: Avoid Mine

There is a clear sonar but not in the direction of the goalThere is a clear sonar but not in the direction of the goal

Turn at zero speed to orient with the first clear sonarcounted from the middle outward. If middle sonar is clear move forward with speed 20clear, move forward with speed 20.

Part 3: Gap FinderPart 3: Gap FinderThere are no clear sonarsThere are no clear sonars.

If the last turn was non-zero, turn again by the sameamount, else initiate a soft turn by summing the rightand left sonars and turning in the direction of theand left sonars and turning in the direction of thelower sum.

Performance of optimal playerPerformance of optimal playerPlayer Success Behaviory

%

Opt. player 99.7% BaselineOpt. player – Part 3 79.9% Oscillates/times outOpt. player – Part 2 98.3% Times outOpt. player – Part 1 7.3% Never gets to goal/times outPart 1 50.1% Aggressive goal

seeker/blows up

Mine density = 60All results reported for 10,000 episodes

Properties of optimal playerProperties of optimal playerReflects task decomposition found in human pplayers.– However, sub-goals are very coupled and this

coupling is what is hard for humans to learn.coupling is what is hard for humans to learn.Key to success:– state space partitioning; threshold of 50 on

l k i ht i b t sonar value makes right compromise between succeeding, timing out and blowing up.

– turning at zero speeds.g p– turning consistently in a given direction to find

gap in mines.

OutlineOutlineThe NRL Navigation TaskThe NRL Navigation TaskChallenges in modeling human learningUnderstanding the task: optimal playerUnderstanding the task: optimal playerA model for human learningUnderstanding the task: reinforcement Understanding the task: reinforcement learnerHigh-fidelity models for human learningHigh fidelity models for human learning

A modeling approachA model ng approachAbstraction of sensor space– view sensors through prism of equivalence

classes defined by a near-optimal policy for task.

– Extract subject’s part 1 and part 2 policies as probability distributions on actions, and part 3 policy as a hidden Markov model.

Ad tAdvantage– deviations from optimal can be the basis for

directed training of subjects.g f jDisadvantage– humans may not adopt anything close to the

t li ti d d f ti l lconceptualization needed for optimal play.

A modelsensors

Part 1act dist

Part 2act dist

Part 3HMM

actionA ll b f i ffi i A very small number of parameters is sufficient to capturesubject. Can acquire subject model online!

Model extraction algorithmg

To find stationary subsequences, segment the d t i KL di ll di t ib tidata using KL divergence on all distributions– chunk data into uniform segments.– for each segment, compute Part 1, 2 and 3 distributions– for each Part, compute KL divergence between successive

segments, and identify change points as those segment boundaries where measure changes significantly.

Gi t ti b f i l t Given a stationary subsequence of visual-motor data– learn Part 1 and Part 2 conditional action distributions

f d t ( ti )from data (a counting process)– obtain action sequences in Part 3 and learn HMM.

Evolution of gap finding g p gstrategy

Subject Col: episodes 45-67 and episodes 68-90 on day 2.Subject learns to turn in place.

HMM models for gap finding.

Pre shift gap finding strategyPre-shift gap finding strategy0.79 0.005 1

0 9951 2 3

0.21 0.995

0 0 0.84 0.00 0.36right 0 0.03 0.003 0.33left 0 0.05 0.00 0.31

th 0 08 0 997 0 00other 0.08 0.997 0.00

Post shift gap finding strategyPost-shift gap finding strategy0.73 0.62 1

0 381 2 3

0.27 0.38

0 0 0.82 0.05 0.37right 0 0.024 0.00 0.553left 0 0.025 0.95 0.07

th 0 131 0 00 0 00other 0.131 0.00 0.00

ResultsResultsP hift S E l i Ti t T t l i dPre-shift Successes Explosions Timeouts Total episodesCol 0 12 11 23Model 0 17 6 23

Post-shift Successes Explosions Timeouts Total episodesC l 0 2 13 15Col 0 2 13 15Model 0 4 11 15

A better fit than using C4.5.

Problems with modelProblems with model

Very sensitive to choice of equivalence Very sensitive to choice of equivalence classes; near-optimal policy does not always provide the right classes to model subjects p jaccurately.Fit of learning curve worsens, especially for later days in training as subject becomes an expert. H ill h b i However, still the best way to summarize strategy adopted by human, at a high level.

OutlineOutlineThe NRL Navigation TaskThe NRL Navigation TaskChallenges in modeling human learningUnderstanding the task: optimal playerUnderstanding the task: optimal playerA hybrid model for human learningUnderstanding the task: reinforcement Understanding the task: reinforcement learnerHigh-fidelity models for human learningHigh fidelity models for human learning

Machine learning of NRL task

What does it take to get machines to learn t k?task?Can machine learners achieve higher levels f c mp t nc ?of competence?

How does the sample complexity of machine learning compare with humans?machine learning compare with humans?Can we use machine learning to improve human learning?human learning?

Reinforcement learningReinforcement learning

T kTask

d

Learneraction

reward

stateLearner

s1,a1,r1,s2,a2,r2,……….

Reinforcement learningRe nforcement learn ng

Representational hurdlesp– State space has to be manageably small.– Good intermediate feedback in the form of a

d f d dnon-deceptive progress function needed.Algorithmic hurdles

A i t dit i t li d d– Appropriate credit assignment policy needed.– sum-of-rewards assessment criterion is too

slow to convergeslow to converge.

State space designp g

Binary distinction on sonar: is it > 50?Binary distinction on sonar: is it > 50?Six distinctions on bearing: 12, {1,2}, {3,4}, {5 6 7} {8 9} {10 11}{5,6,7},{8,9}, {10,11}State space size = 27 * 6 = 768.Discretization of actionsDiscretization of actions– speed: 0, 20 and 40.– turn: -32, -16, -16, -8, 0, 8, 16, 32.turn 32, 16, 16, 8, 0, 8, 16, 32.

The progress functionThe progress function

( ’) 0 if ’ i t t h l hit ir(s,a,s’) = 0 if s’ is a state where player hits mine.= 1 if s’ is a goal state= 0.5 if s’ is a timeout state

= 0.75 if s is a Part3 state and s’ is a Part1or Part2 stateor Part2 state

= 0.5 + sum of sonars/1000 if s’ is a Part3 state= 0.5 + range/1000 + abs(bearing - 6)/40

th iotherwise

Credit assignment policyCredit assignment policy

Penalize the last action alone in a sequence Penalize the last action alone in a sequence which ends in an explosion.Penalize all actions in sequence which ends Penalize all actions in sequence which ends in a timeout.

Simplification of value pestimation

E i h l l d f h Estimate the average local reward for each action in each state.

A bi chan e from learnin sum of rewards – A big change from learning sum-of-rewards from each state.

s

Q(s,a) is the sum of rewards from s to terminalstate Here we only maintain local reward at state. Here we only maintain local reward at state s.

Staged learningStaged learning

First learn turns alone with speed supplied First learn turns alone, with speed supplied by near-optimal player.Next learn both turn and speed.p .Differences in two learners suggest new protocols for training humans.p g

Results of learning turns

Turn learner/600 episodesTurn learner/600 episodes

Turn learner/10 000 episodesTurn learner/10,000 episodes

Turn learner/failure after 10KTurn learner/failure after 10K

Learning of complete policyLearning of complete policyEstimate of average local reward not a Estimate of average local reward not a perfect substitute for global sum-of-rewards.Make action choice based on estimated local reward weighted by the global g y gmeasure of wins/(wins+timeouts) from that state.Optimistic initialization of q values.

Results of learning complete policy

Full Q learner/1500 episodesFull Q learner/1500 episodes

Full Q learner/10000 episodesFull Q learner/10000 episodes

FullQ learner/failure after 10KFullQ learner/failure after 10K

Why learning takes so longWhy learning takes so long

Effect of discretizationEffect of discretization

Lessons from machine learningWhy task is hard: most frequently occurring state occurs 45% of time, all others are less than 5%.Long sequence of moves makes credit assignment hard.Staged learning makes task easier; and might h l h k help humans acquire task easier.Need for a locally non-deceptive reward f ti t d t i i C i i function to speed up training. Can giving progress function as hints to human players help?help?

OutlineOutlineThe NRL Navigation TaskThe NRL Navigation TaskChallenges in modeling human learningUnderstanding the task: optimal playerUnderstanding the task: optimal playerA hybrid model for human learningUnderstanding the task: reinforcement Understanding the task: reinforcement learnerHigh-fidelity models for human learningHigh fidelity models for human learning

Direct modelsDirect modelsHow well can stateless stochastic models of the form m:sensors → P(actions) match of the form m:sensors → P(actions) match subject learning curves?– Associate with every observed sensor Associate with every observed sensor

configuration, the distribution of actions taken by the player at that configuration.

Advantage: – no need to abstract sensor space.

M d l i b d i l i !– Model construction can be done in real time!

Surely, this can’t work!Surely, th s can t work!

There are 1014 sensor configurations There are 10 sensor configurations possible in the NRL Navigation task.However there are between 103 to 104 of However, there are between 10 to 10 of those configurations actually observed by humans in a training run of 600 episodes.g pExploit sparsity in sensor configuration space to build a direct model of the subject.

Model constructionModel construction

Segmentation of episodic data

episodespStart oftraining

End oftraining

Fitting models of the form sensors P(actions) on the stationary segments.(a n ) n a nary gm n .

Model Derivative

dm/dt =wiiswiswiKLdiv )),(),2,(( +Π−+−+Π

empirical optimum: w = 20, s = 5

w

p pComputed by Monte Carlo sampling (stabilizes after 5% of entries are s l d)sampled)

w Overlap = sw

Overlap = s

Model derivative for CeaModel derivative for Cea

Before shift: Cea (episode 300)Before shift: Cea (episode 300)

After shift: Cea (episode 320)After shift: Cea (episode 320)

Model derivative for ColModel derivative for Col

Gap strategyGap strategyshift here

Model derivative for HeiModel der vat ve for He

How humans learn

Subjects have relatively static Subjects have relat vely stat c periods of action policy choice punctuated by radical shifts.punctuated by rad cal sh fts.Successful learners have conceptual shifts during the first part of shifts during the first part of training; unsuccessful ones keep trying till the end of the protocol!trying till the end of the protocol!

How model is usedHow model s used

To compute action a associated with pcurrent sensor configuration s– take 100 neighbors of s in lookup table.g p– Compute weighted average of the

actions taken by these neighbors, OR– perform locally weighted regression

(LWR) on these 100 (s,a) pairs.

Evaluation protocolp

Same mine configurations as subject.Model switched on segment boundaries.Cross-validation method on each segment:– Train on 9/10ths of data– Test on left-out chunk

Results: w.avg. vs. LWR

LWR is worse: why?LWR is worse: why?

LWR performs worse than w.avg.– data sparsity implies otherwisedata sparsity implies otherwise

Reason: LWR extrapolates oftenReason: LWR extrapolates often– shown by timeout record

Biased dimension eliminationBiased dimension elimination

P j ti t di si s t f Projecting out dimensions to force interpolation

ept

dim

2

candidate percept

projected candidate percept

candidate percept

Perc

e p p

Percept dim 1Percept dim 1

Results: use of bde with LWR

Richer models: internal stateR cher models nternal state

Remember past k actions>=< aapf

k d l i t d ith

>=< −− ktttk aapf ,,, 1 K

k-gram models: experimented with k=1, 2, 3

Results: 1-gram modelsResults 1 gram models

Increasing state preferentiallyIncreas ng state preferent ally

Add additional history information Add additional history information for sensor configurations “close to mines”mines .Two-tier model

ti 1 i f i 1 d l– tier 1: in far-mine, use 1-gram model– tier 2: in close-mine, use

>=< −−−−− 7531,17 ,,,, tttttt aaaappf

Results: 2-tier modelsResults 2 tier models

Subject Cea: Day 5: 1Subject Cea: Day 5: 1

Subject Model

Subject Cea: Day 5: 2Subject Cea: Day 5: 2

Subject Model

Subject Cea: day 5: 3Subject Cea: day 5: 3

Subject Model

Subject Cea: Day 5: 4Subject Cea: Day 5: 4

Subject Model

Subject Cea: Day 5: 5Subject Cea: Day 5: 5

Subject Model

Subject Cea: Day 5: 6Subject Cea: Day 5: 6

Subject Model

Subject Cea: Day 5: 7Subject Cea: Day 5: 7

Subject Model

Subject Cea: Day 5: 8Subject Cea: Day 5: 8

Subject Model

Subject Cea: Day 5: 9Subject Cea: Day 5: 9

Subject Model

Comparison with global methodsCompar son w th global methods

Result summary

We can model subjects on the NRL task in real time achieving excellent fits to their real-time, achieving excellent fits to their learning curves, using the technique of 1-gram/bde-LWR/2-tier on the available gram/bde LWR/2 tier on the available visual-motor data stream.

ConclusionsWe have used inductive machine learning techniques to construct compact cognitive models in real time from the vast empirical visual motor in real-time from the vast empirical visual-motor data gathered from subjects.Direct models offer the best approach to

d l h l h kmodeling human learning on the task.We have studied machine learners for the task and used the results to understand complexities pof task.Machine learning the NRL task has pushed the science and engineering of reinforcement learningscience and engineering of reinforcement learning.Nice interplay between human and machine learning.

Conclusions (contd )Conclusions (contd.)One of the first in the cognitive science f f gcommunity to directly use objective visualmotor performance data to derive high-level strategy models on a complex taskmodels on a complex task.A scalable solution that harnesses the power of new sensors and computing to change training p g g gprotocols.New algorithms for detecting changepoints and building predictive stochastic models for massive building predictive stochastic models for massive, noisy, non-stationary, vector, time series data.

Open questionsOpen questions

How to design algorithms that can learn to How to design algorithms that can learn to include relevant aspects of sensor history to increase goodness of fit with data?g f fHow would algorithms such as SVM perform on this data? What class of pkernels will be appropriate? Can DBNs be learned from this data? How do we represent/approximate the needed probability distributions?

More open questionsp qBuilding explanatory models

ili s HMM d ls ith th bd– reconciling coarse HMM models with the bde-LWR models

Conjecture: a fundamental problem?j f p– Explanatory models do not fit performance well.– Performance models may not be very abstract,

the task seems to need a series of local models the task seems to need a series of local models rather than a single global model.

– Performance models can be used to modify l l d f d training protocols online and for designing

directed lessons because they identify sensor configurations where subject has trouble with

haction choice.

Current workCurrent work

Training subjects to achieve higher Training subjects to achieve higher competence by giving them access to their learning.Use of neuro-imaging to find the signature of strategy shifts in the brain.

AcknowledgementsgDiana Gordon and Sandra Marshall

H bj t d t ll ti– Human subject data collectionMy students at Rice

S tt G iffi d d t– Scott Griffin, undergraduate– Sameer Siruguri, graduate studentP di t t ONRProgram directors at ONR– Helen Gigley, Susan Chipman and Astrid

Schmidt NielsenSchmidt-Nielsen

EEG setupEEG setup

Data Sources: EEG dataData Sources: EEG data

Sampled at 2 kHz into binary (BDF) formatContinuous recordinggGame now sends ‘markers’ to EEG acq software:– S1: game (episode) start markerg ( p )– S2: game (episode) end marker– S3: for each timestep in details file

Raw data file size: for 20 mins (50-100 games), 1.5-1.8 GB for 256 channels

EEG data pre processingEEG data pre-processing

Using Analyzer (proprietary software i Bi di l l b l 1 hi )in Biomedical lab: only 1 machine)New referenceFilter signal above 50HzDown sample to 512 HzDown sample to 512 HzSegment into individual games by markersmarkers

EEG analysisEEG analysis

ERP vs long-term EEGERP: Evoked response potentialsp p– More ‘popular’ research topic– Instantaneous stimuli evoke clear responses– Easier to detect and analyze

Long-term EEG– Aims to detect long-term trends in EEG data– Mainly clinical studies e.g. seizures

O t k i thi t– Our task in this category

EEG: Computational researchEEG: Computational research

Inverse problem: – the blind source separation problem of locating the generating

neural assemblies from external readingsm f m g– Physical computing methods using electromagnetic physics – statistical methods such as ICA

A tif t d is m lArtifact and noise removal– Noise due to electrical activity or equipment limitations– Muscle movements, eye blinks and other ‘unwanted’ signalsy

Brain dynamics– Which regions are active at any given time? Correlated activity?

How does the activity change over time?– How does the activity change over time?

Some known resultsSome known results

Visualmotor task learning:– At the beginning, high levels of activity in pre-frontal y p

cortex and other ‘conscious thinking’ areas– After training, activity shifts to cerebellum and other

‘involuntary’ areasinvoluntary areas– Decrease in overall activity intensity

Hypothesis: Hypothesis: – Initially, more of a conscious cognitive process (think

about what you are doing!) After training, automatic (do it without thinking)

– Specialized small brain regions take over.

Some known results (contd)Some known results (contd)

Cognitive tasks: mostly, working memory tasks, simple strategy tasks such as subtraction have b n st di dbeen studiedFound to be mediated by multiple areas of the brain working in tandembrain working in tandemAgain, shifts and changes in intensity depending on level of achievementHypothesis: cognitive processes rely on large-scale integration of multiple brain regions:

t f f ti l ti it /f ti l concept of functional connectivity/functional networks

ArtifactsArtifacts

What are artifacts?– External (electrical) noise– Unwanted muscular movements, eye blinks or

other unrelated brain activityI t k diffi lt t h t iIn our task, difficult to characterizeWe expect high levels of cognitive activity, and muscular movementsand muscular movements

Artifact detectionArtifact detection

Using 4-sigma rule: for each game in each channel, compute median and std devR if h l i i di Remove game if any channel contains points exceeding 4 stddev from medianHowever such points are very rare For any game only However, such points are very rare. For any game, only 0.2% points (average) in the signalEffect of these points smoothed by averagingRemoving artifacts by this method may throw out lot of valuable information

Artifacts: exampleArtifacts: example

4 sigma artifact statistics4-sigma artifact statistics

#games without

artifacts

Avg % artifacts in a game

Max % artifacts in a game

Vishal 6/237 0.15 3.6

Rosario 0/145 0.2 3.1

Norbert 0/153 0.13 2.9

6 sigma artifact statistics6-sigma artifact statistics

#games without

artifacts

Avg % artifacts in a game

Max % artifacts in a game

Vishal 150/237 0.01 1.37

Rosario 26/145 0.08 1.9

Norbert 33/153 0.02 1.26

Artifacts: suggestionsArtifacts: suggestions

Leave as is: hopefully smoothes outRemove beginning and end sections of g glonger games (20s +)Remove very short games altogether (4-6s) since they may not contain enough information to detect learning

Data analysisData analysis

Statistical techniques: sliding window averaging variations – inconclusiveTime-frequency analysis– Windowed fourier transforms: average over

f b d i l ifrequency bands – inconclusive– Wavelets: have been applied previously for

ERPs Detect discontinuities Can we apply to ERPs. Detect discontinuities. Can we apply to our problem?

Functional network analysisFunctional network analysis

Method to find functional networksFor every channel pair ci and cj– Compute mij, which measures degree of similarity between the signals from the two channels: eg: statistical correlation, mutual information, spectral coherence etc.

– If mij > t, mark ci and cj as an edge(t is a threshold)

Drawbacks in the basic algorithm– Computationally intensive: every channel pair considered

No temporal variance in functional networks considered– No temporal variance in functional networks considered

Spectral CoherenceSpectral Coherence

Consider 2 signals xi(t), xj(t)( ))( 2fCM ij

ij =

Cij(f): Cross-spectral density at frequency f, by Welch’s method

)()( fCfCM

jjiiij =

Cii(f): Power spectrum of signal xi(t) at fLinear measure between 0 and 1.Matlab function ‘cohere’Matlab function cohere

Windowed coherence analysis

Take smaller subset of channels (64).For each channel pairFor each channel pair– Slide window over signal.– Compute coherence for each window Compute coherence for each window

between the 2 channels.– Average coherence over frequency bands.g q yFor each frequency band, consider channel pairs with coherence above pthreshold.

Preliminary ResultsPreliminary Results

KLD curves# of Coherent channels (size of functional # of Coherent channels (size of functional network) at different thresholdsPlots of coherent channels

Ob iObservations

(above) Motor data shifts(below) Number of coherent channel pairs in frequency band 4 8H b 4-8Hz above coherence 0.6

ObservationsObservations

For a sharp peak (game 30) note the high coherences within For a sharp peak (game 30), note the high coherences within the frontal and visual cortices, and the coherences between the two.

ObservationsObservations

For a non-peak time (game 140), the coherences are fewer,For a non p a t m (gam ), th coh r nc s ar f w r,and confined to specific regions

Ob iObservations

Number of coherent channel pairs increases sharply 15-20 trials before conceptual motor data shifts.Coherence peaks involve greater coherence between channels in frontal cortex, and between frontal and visual cortices.

f l h h h Left parietal cortex shows high activity, since subjects are all right-handed.

ObservationsObservations

There are sharp changes in brain activity and size of functional networks correlated with st t d i ti p ksstrategy derivative peaks.However, relative intensities of peaks not necessarily the same Why?necessarily the same. Why?– Use of arbitrary empirical threshold– Intensity of shift in cognitive process shift not

l d f f h fnecessarily indicative of intensity of strategy shift– Non-implemented cognitive shifts?

Need a better metric for shifts in fn Need a better metric for shifts in fn networks!

Mutual InformationMutual Information

How much information does one signal (rand var) provide about another?

h(u): ‘randomness’ (entropy) of RV u.)|()()|()(),( uvhvhvuhuhvuI −=−=

( ) ( py)h(u|v): ‘randomness’ of u given distribution of v. ))](([log)( 2 uPEuh −=

Non-linear measure))]|(([log)|(

))](([log)(2

2

vuPEvuhuPEuh

−=

Non linear measure

Mutual Information (contd)Mutual Information (contd)

Has been applied to fMRI dataEasy to implement for discrete-valued y p fsignals e.g. voxels in fMRI dataHow to compute for continuous-valued psignals such as EEG?Matlab library by R. ModdemeijerDiscretization using histogram

Other possible methodsOther possible methods

Linear regression analysis: CharlesNeural-network based clusteringgOther statistical clustering methods

top related