tracking the evolution of learning on a complex
TRANSCRIPT
Tracking the evolution of Tracking the evolution of learning on a complex
lvisualmotor task
Devika SubramanianRice University
OutlineOutlineA quick overview of current researchqTracking the evolution of human learning on a complex visualmotor learning on a complex visualmotor task
Research questionq
EnvironmentEnvironment
Systemyperceptionsaction
actuators sensors
Goal: build systems that adapt to changes effectively.
Research questionsResearch questionsWhat aspects of the task environment and What aspects of the task, environment and its internal dynamics does a system have to model for autonomous decision making?How can a system build and maintain such models in changing environments?How can a system with limited resources efficiently use these models to make d i i ?decisions?
Current projectsModeling learning on a complex visual-motor task(ONR)Adaptive compilers (with Cooper & Torczon NSF Adaptive compilers (with Cooper & Torczon, NSF ITR,ATP and ARPA)Predicting militarized interstate disputes (with
ll N F ITR I l)Stoll, NSF ITR,Intel)Designing controllers for life support systems(NASA)( )Designing opto-mechanical systems from specifications of behavior (NSF)Virgil the Rice robotic tour guide (School of Engg Virgil, the Rice robotic tour guide (School of Engg, Rice)Predicting protein-protein interactions (with K. M h K k F d i )Matthews, Keck Foundation)
Adaptive systems for analyzing p y y gand predicting conflict
Supported by NSF ITR, Oct 2002.
QuestionsQuestionsGiven a non-stationary time series, y ,determine its properties: e.g., what is the nature of the non-stationarity?Can the series be segmented into quasi-stationary segments over which
di i d l b l d?predictive models can be learned?How can we detect the onset of h f d i i i i ?change of dynamics in a time series?
Analyzing conflictAnalyzing conflict
Analyzing conflictAnalyzing conflict
The need for adaptive pcompilers
Today’s production compilers not sensitive to new compilation pobjectives: e.g., power, space.– Use the same optimization sequence Use the same opt m zat on sequence
independent of characteristics of input program!
Supported by NSF ITR, July 2002.
The need for adaptive pcompilers
Building a high quality compiler is an expensive, labor-intensive effort that requires experts who are in short supply.Compiler experts determine code t f ti / ti i ti f transformation/optimization sequences for any program presented to the compiler (chosen by the –O option)(chosen by the –O option).
Combinatorics
Large solution space.Discrete, non-linear objective function.jHow do we sample the space to geta good solution?
SolutionSolutionUse probabilistic models of the peffects of transformation sequences on classes of programs, for different p gobjective functions.Learn these models by biased random Learn these models by biased random sampling of the space of possible sequences.sequences.
Code space optimization p pexperiments
We ran a biased random sampler to find optimization sequences for several benchmark programsprograms– Fortran: fmin, rkf45, seval, solve, svd, urand, zeroin
(FMM benchmarks), tomcatv (SPEC).C: adpcm compress fft dfa dhrystone nsieve– C: adpcm, compress, fft, dfa, dhrystone, nsieve.
ResultsResultsSpace optimization (LCTES99, Schielke 2001) – 13% smaller code than fixed sequence (0 to
41%)C d ll f t (26% t 25% 5 f – Code was generally faster (26% to -25%; 5 of 14 slower)
– Best methods at this point yield 5% space Best methods at this point yield 5% space advantage at the cost of 2% slowdown in time.
*
Adaptive Compilers Adaptive Compilers
Front end i i t t
Vary parameters
Objectivefunction
remains intact
SteeringAlgorithm
We are exploring new organizing principlesExplicit objective function (chosen by the user )
executable code
Explicit objective function (chosen by the user )Biased random sampler controls optimizer and back end.back end.
OutlineOutlineA quick overview of current researchqTracking the evolution of human learning on a complex visualmotor learning on a complex visualmotor task
The context: training submarine pilotspilots
NRL task
Agent
Track the evolution of a human learning a visualmotortask with a significant strategic component, andalter training protocol to improve the speed anda t r tra n ng protoco to mpro th sp anefficacy of that learning.
Goal of projectGoal of projectConstruct computational models of human Construct computational models of human learning based on performance data gathered during task learning. g g g– Models will be used to diagnose problems in
learning and aid in the design of training protocols that help humans achieve high levels protocols that help humans achieve high levels of competence on the task.
A computational microscope for training: i f iti t t f can we infer cognitive constructs from
objective performance data?.
OutlineOutlineThe NRL Navigation TaskThe NRL Navigation TaskChallenges in modeling human learningUnderstanding the task: optimal playerUnderstanding the task: optimal playerA hybrid model for human learningUnderstanding the task: reinforcement Understanding the task: reinforcement learnerHigh-fidelity models for human learningHigh fidelity models for human learning
The NRL Navigation Task The NRL Navigation Task
The NRL Navigation TaskThe NRL Navigation Task
Mathematical characteristics of the NRL task
A partially observable Markov decision process which can be made fully observable b t ti f t t ith i by augmentation of state with previous action.State space of size 1014 at each step a State space of size 1014, at each step a choice of 153 actions (17 turns and 9 speeds)speeds).Challenging for both humans and machines.
Challenges for a human learnerg
A task with a significant strategic and a visual-motor component.Need for rapid decision making with i l i f iincomplete information.The sheer number (1014) of sensor panel
fi ti d ti h i (153)configurations and action choices (153).Binary feedback at end of episode (200 steps)steps).
Experiments on human subjects
Conducted at San Diego with ASL eyetrackereyetracker.5 subjects, five one-hour sessions each.60 mi s sm ll mi d ift sm ll s s 60 mines, small mine drift, small sensor noise.Collected visualmotor data verbal Collected visualmotor data, verbal protocols and eyetracker data.
Learning curves (success)Learn ng curves (success)
100
70
8090
S3
40
5060
70
cces
s %
S3S4S5S1
1020
30
40
Suc S1
S2
0
10
1 51 101
151
201
251
301
351
401
451
501
551
601
651
701
751
Episode
Learning curves (explosions)Learning curves (explosions)
100
120
S3
60
80
losi
on %
S3S4S5
20
40Expl S1
S2
0
1 61 121
181
241
301
361
421
481
541
601
661
721
Episode
Learning curves (timeouts)Learning curves (timeouts)
100
120
60
80
eout
%
S3S4S5
20
40Tim
e
S1S2
0
20
1 46 91 136
181
226
271
316
361
406
451
496
541
586
631
676
721
766
1 1 2 2 3 3 4 4 4 5 5 6 6 7 7
Episodes
Observations on human learningObservat ons on human learn ng
Learning curves qualitatively similar for Learning curves qualitatively similar for successful learners. – raises hope for a common learning model!p f mm g m
Success learning curves similar for unsuccessful learners, but timeout and ,explosion curves show individual differences in failure to learn task.
OutlineOutlineThe NRL Navigation TaskThe NRL Navigation TaskChallenges in modeling human learningUnderstanding the task: optimal playerUnderstanding the task: optimal playerA model for human learningUnderstanding the task: reinforcement Understanding the task: reinforcement learnerHigh-fidelity models for human learningHigh fidelity models for human learning
Building Representative ModelsBu ld ng Representat ve Models
Behavioral equivalence (similarity in Behav oral equ valence (s m lar ty n learning curves)
Task
cess
%
a(t) Playertime
Succ
Modela’(t)p(t)
Challenges in modeling learningChallenges in modeling learningIt is not possible to gather objective data It is not possible to gather objective data from subjects about their strategy; game doesn’t allow useful verbalizations during f gplay, and post-play explanations are often inaccurate and incomplete.
Cognitive modeling by machine g g ylearningTreat low-level visualmotor data stream as ground truth from which to induce models.A model m: sensor history → actions is an approximation of a subject’s strategy function learned directly from the function learned directly from the available (p(t),a(t)) time series data.Cognitive modeling is a data compression Cognitive modeling is a data compression problem!
Difficulties D ff cult es
High-dimensionality of visual-motor data (11 High dimensionality of visual motor data (11 dimensions spanning a space of size 1014)Noise in visual-motor datam– lapse of attention.– Joystick hysteresis.
Non-stationarity– Subjects have static periods followed by radical
l h f h h ll conceptual shifts which usually trigger significant performance gains.
Action distribution close to mines (subject 1, day 2)
Action distribution close to mines (subject 1, day 4)
Action distribution close to mines (subject 1, day 5)
Action distribution far from mines (subject 1, day 5)
Formulating the modeling taskFormulating the modeling taskGiven: an episodic non-stationary time p yseries – episode 1: (sv0,a0),(sv1,a1)……(svn,an)
episode 2: – episode 2: ….– Episode N:
Find:Find– stationary segments in the data.– an appropriate class of models m: sensor
hi t ti t fit th t ti history -> actions to fit the stationary segments.
Model class selectionModel class selection
hi t tim: sensor history actions– What prefix of the sensor history
d t i s ti s t ti t?determines actions at time t?– How to abstract the sensor space?
Response equivalent partitionsResponse-equivalent partitions
OutlineOutlineThe NRL Navigation TaskThe NRL Navigation TaskChallenges in modeling human learningUnderstanding the task: optimal playerUnderstanding the task: optimal playerA hybrid model for human learningUnderstanding the task: reinforcement Understanding the task: reinforcement learnerHigh-fidelity models for human learningHigh fidelity models for human learning
A near optimal playerA near-optimal player
A h d i i i ll A three-part deterministic controller solves the task!Th l i f ti i d b t th The only information required about the previous state is the last-turn made.A very coarse discretization of the state A very coarse discretization of the state space is needed: about 1000 states!Discovering this solution was not easy! Discovering this solution was not easy!
Part 1: Seek GoalPart 1: Seek Goal
There is a clear sonar in the direction of the goal.
If the sonar in the direction of the goal is clear, follow itat speed of 20, unless goal is straight ahead, then travelat speed 40.
Part 2: Avoid MinePart 2: Avoid Mine
There is a clear sonar but not in the direction of the goalThere is a clear sonar but not in the direction of the goal
Turn at zero speed to orient with the first clear sonarcounted from the middle outward. If middle sonar is clear move forward with speed 20clear, move forward with speed 20.
Part 3: Gap FinderPart 3: Gap FinderThere are no clear sonarsThere are no clear sonars.
If the last turn was non-zero, turn again by the sameamount, else initiate a soft turn by summing the rightand left sonars and turning in the direction of theand left sonars and turning in the direction of thelower sum.
Performance of optimal playerPerformance of optimal playerPlayer Success Behaviory
%
Opt. player 99.7% BaselineOpt. player – Part 3 79.9% Oscillates/times outOpt. player – Part 2 98.3% Times outOpt. player – Part 1 7.3% Never gets to goal/times outPart 1 50.1% Aggressive goal
seeker/blows up
Mine density = 60All results reported for 10,000 episodes
Properties of optimal playerProperties of optimal playerReflects task decomposition found in human pplayers.– However, sub-goals are very coupled and this
coupling is what is hard for humans to learn.coupling is what is hard for humans to learn.Key to success:– state space partitioning; threshold of 50 on
l k i ht i b t sonar value makes right compromise between succeeding, timing out and blowing up.
– turning at zero speeds.g p– turning consistently in a given direction to find
gap in mines.
OutlineOutlineThe NRL Navigation TaskThe NRL Navigation TaskChallenges in modeling human learningUnderstanding the task: optimal playerUnderstanding the task: optimal playerA model for human learningUnderstanding the task: reinforcement Understanding the task: reinforcement learnerHigh-fidelity models for human learningHigh fidelity models for human learning
A modeling approachA model ng approachAbstraction of sensor space– view sensors through prism of equivalence
classes defined by a near-optimal policy for task.
– Extract subject’s part 1 and part 2 policies as probability distributions on actions, and part 3 policy as a hidden Markov model.
Ad tAdvantage– deviations from optimal can be the basis for
directed training of subjects.g f jDisadvantage– humans may not adopt anything close to the
t li ti d d f ti l lconceptualization needed for optimal play.
A modelsensors
Part 1act dist
Part 2act dist
Part 3HMM
actionA ll b f i ffi i A very small number of parameters is sufficient to capturesubject. Can acquire subject model online!
Model extraction algorithmg
To find stationary subsequences, segment the d t i KL di ll di t ib tidata using KL divergence on all distributions– chunk data into uniform segments.– for each segment, compute Part 1, 2 and 3 distributions– for each Part, compute KL divergence between successive
segments, and identify change points as those segment boundaries where measure changes significantly.
Gi t ti b f i l t Given a stationary subsequence of visual-motor data– learn Part 1 and Part 2 conditional action distributions
f d t ( ti )from data (a counting process)– obtain action sequences in Part 3 and learn HMM.
Evolution of gap finding g p gstrategy
Subject Col: episodes 45-67 and episodes 68-90 on day 2.Subject learns to turn in place.
HMM models for gap finding.
Pre shift gap finding strategyPre-shift gap finding strategy0.79 0.005 1
0 9951 2 3
0.21 0.995
0 0 0.84 0.00 0.36right 0 0.03 0.003 0.33left 0 0.05 0.00 0.31
th 0 08 0 997 0 00other 0.08 0.997 0.00
Post shift gap finding strategyPost-shift gap finding strategy0.73 0.62 1
0 381 2 3
0.27 0.38
0 0 0.82 0.05 0.37right 0 0.024 0.00 0.553left 0 0.025 0.95 0.07
th 0 131 0 00 0 00other 0.131 0.00 0.00
ResultsResultsP hift S E l i Ti t T t l i dPre-shift Successes Explosions Timeouts Total episodesCol 0 12 11 23Model 0 17 6 23
Post-shift Successes Explosions Timeouts Total episodesC l 0 2 13 15Col 0 2 13 15Model 0 4 11 15
A better fit than using C4.5.
Problems with modelProblems with model
Very sensitive to choice of equivalence Very sensitive to choice of equivalence classes; near-optimal policy does not always provide the right classes to model subjects p jaccurately.Fit of learning curve worsens, especially for later days in training as subject becomes an expert. H ill h b i However, still the best way to summarize strategy adopted by human, at a high level.
OutlineOutlineThe NRL Navigation TaskThe NRL Navigation TaskChallenges in modeling human learningUnderstanding the task: optimal playerUnderstanding the task: optimal playerA hybrid model for human learningUnderstanding the task: reinforcement Understanding the task: reinforcement learnerHigh-fidelity models for human learningHigh fidelity models for human learning
Machine learning of NRL task
What does it take to get machines to learn t k?task?Can machine learners achieve higher levels f c mp t nc ?of competence?
How does the sample complexity of machine learning compare with humans?machine learning compare with humans?Can we use machine learning to improve human learning?human learning?
Reinforcement learningReinforcement learning
T kTask
d
Learneraction
reward
stateLearner
s1,a1,r1,s2,a2,r2,……….
Reinforcement learningRe nforcement learn ng
Representational hurdlesp– State space has to be manageably small.– Good intermediate feedback in the form of a
d f d dnon-deceptive progress function needed.Algorithmic hurdles
A i t dit i t li d d– Appropriate credit assignment policy needed.– sum-of-rewards assessment criterion is too
slow to convergeslow to converge.
State space designp g
Binary distinction on sonar: is it > 50?Binary distinction on sonar: is it > 50?Six distinctions on bearing: 12, {1,2}, {3,4}, {5 6 7} {8 9} {10 11}{5,6,7},{8,9}, {10,11}State space size = 27 * 6 = 768.Discretization of actionsDiscretization of actions– speed: 0, 20 and 40.– turn: -32, -16, -16, -8, 0, 8, 16, 32.turn 32, 16, 16, 8, 0, 8, 16, 32.
The progress functionThe progress function
( ’) 0 if ’ i t t h l hit ir(s,a,s’) = 0 if s’ is a state where player hits mine.= 1 if s’ is a goal state= 0.5 if s’ is a timeout state
= 0.75 if s is a Part3 state and s’ is a Part1or Part2 stateor Part2 state
= 0.5 + sum of sonars/1000 if s’ is a Part3 state= 0.5 + range/1000 + abs(bearing - 6)/40
th iotherwise
Credit assignment policyCredit assignment policy
Penalize the last action alone in a sequence Penalize the last action alone in a sequence which ends in an explosion.Penalize all actions in sequence which ends Penalize all actions in sequence which ends in a timeout.
Simplification of value pestimation
E i h l l d f h Estimate the average local reward for each action in each state.
A bi chan e from learnin sum of rewards – A big change from learning sum-of-rewards from each state.
s
Q(s,a) is the sum of rewards from s to terminalstate Here we only maintain local reward at state. Here we only maintain local reward at state s.
Staged learningStaged learning
First learn turns alone with speed supplied First learn turns alone, with speed supplied by near-optimal player.Next learn both turn and speed.p .Differences in two learners suggest new protocols for training humans.p g
Results of learning turns
Turn learner/600 episodesTurn learner/600 episodes
Turn learner/10 000 episodesTurn learner/10,000 episodes
Turn learner/failure after 10KTurn learner/failure after 10K
Learning of complete policyLearning of complete policyEstimate of average local reward not a Estimate of average local reward not a perfect substitute for global sum-of-rewards.Make action choice based on estimated local reward weighted by the global g y gmeasure of wins/(wins+timeouts) from that state.Optimistic initialization of q values.
Results of learning complete policy
Full Q learner/1500 episodesFull Q learner/1500 episodes
Full Q learner/10000 episodesFull Q learner/10000 episodes
FullQ learner/failure after 10KFullQ learner/failure after 10K
Why learning takes so longWhy learning takes so long
Effect of discretizationEffect of discretization
Lessons from machine learningWhy task is hard: most frequently occurring state occurs 45% of time, all others are less than 5%.Long sequence of moves makes credit assignment hard.Staged learning makes task easier; and might h l h k help humans acquire task easier.Need for a locally non-deceptive reward f ti t d t i i C i i function to speed up training. Can giving progress function as hints to human players help?help?
OutlineOutlineThe NRL Navigation TaskThe NRL Navigation TaskChallenges in modeling human learningUnderstanding the task: optimal playerUnderstanding the task: optimal playerA hybrid model for human learningUnderstanding the task: reinforcement Understanding the task: reinforcement learnerHigh-fidelity models for human learningHigh fidelity models for human learning
Direct modelsDirect modelsHow well can stateless stochastic models of the form m:sensors → P(actions) match of the form m:sensors → P(actions) match subject learning curves?– Associate with every observed sensor Associate with every observed sensor
configuration, the distribution of actions taken by the player at that configuration.
Advantage: – no need to abstract sensor space.
M d l i b d i l i !– Model construction can be done in real time!
Surely, this can’t work!Surely, th s can t work!
There are 1014 sensor configurations There are 10 sensor configurations possible in the NRL Navigation task.However there are between 103 to 104 of However, there are between 10 to 10 of those configurations actually observed by humans in a training run of 600 episodes.g pExploit sparsity in sensor configuration space to build a direct model of the subject.
Model constructionModel construction
Segmentation of episodic data
episodespStart oftraining
End oftraining
Fitting models of the form sensors P(actions) on the stationary segments.(a n ) n a nary gm n .
Model Derivative
dm/dt =wiiswiswiKLdiv )),(),2,(( +Π−+−+Π
empirical optimum: w = 20, s = 5
w
p pComputed by Monte Carlo sampling (stabilizes after 5% of entries are s l d)sampled)
w Overlap = sw
Overlap = s
Model derivative for CeaModel derivative for Cea
Before shift: Cea (episode 300)Before shift: Cea (episode 300)
After shift: Cea (episode 320)After shift: Cea (episode 320)
Model derivative for ColModel derivative for Col
Gap strategyGap strategyshift here
Model derivative for HeiModel der vat ve for He
How humans learn
Subjects have relatively static Subjects have relat vely stat c periods of action policy choice punctuated by radical shifts.punctuated by rad cal sh fts.Successful learners have conceptual shifts during the first part of shifts during the first part of training; unsuccessful ones keep trying till the end of the protocol!trying till the end of the protocol!
How model is usedHow model s used
To compute action a associated with pcurrent sensor configuration s– take 100 neighbors of s in lookup table.g p– Compute weighted average of the
actions taken by these neighbors, OR– perform locally weighted regression
(LWR) on these 100 (s,a) pairs.
Evaluation protocolp
Same mine configurations as subject.Model switched on segment boundaries.Cross-validation method on each segment:– Train on 9/10ths of data– Test on left-out chunk
Results: w.avg. vs. LWR
LWR is worse: why?LWR is worse: why?
LWR performs worse than w.avg.– data sparsity implies otherwisedata sparsity implies otherwise
Reason: LWR extrapolates oftenReason: LWR extrapolates often– shown by timeout record
Biased dimension eliminationBiased dimension elimination
P j ti t di si s t f Projecting out dimensions to force interpolation
ept
dim
2
candidate percept
projected candidate percept
candidate percept
Perc
e p p
Percept dim 1Percept dim 1
Results: use of bde with LWR
Richer models: internal stateR cher models nternal state
Remember past k actions>=< aapf
k d l i t d ith
>=< −− ktttk aapf ,,, 1 K
k-gram models: experimented with k=1, 2, 3
Results: 1-gram modelsResults 1 gram models
Increasing state preferentiallyIncreas ng state preferent ally
Add additional history information Add additional history information for sensor configurations “close to mines”mines .Two-tier model
ti 1 i f i 1 d l– tier 1: in far-mine, use 1-gram model– tier 2: in close-mine, use
>=< −−−−− 7531,17 ,,,, tttttt aaaappf
Results: 2-tier modelsResults 2 tier models
Subject Cea: Day 5: 1Subject Cea: Day 5: 1
Subject Model
Subject Cea: Day 5: 2Subject Cea: Day 5: 2
Subject Model
Subject Cea: day 5: 3Subject Cea: day 5: 3
Subject Model
Subject Cea: Day 5: 4Subject Cea: Day 5: 4
Subject Model
Subject Cea: Day 5: 5Subject Cea: Day 5: 5
Subject Model
Subject Cea: Day 5: 6Subject Cea: Day 5: 6
Subject Model
Subject Cea: Day 5: 7Subject Cea: Day 5: 7
Subject Model
Subject Cea: Day 5: 8Subject Cea: Day 5: 8
Subject Model
Subject Cea: Day 5: 9Subject Cea: Day 5: 9
Subject Model
Comparison with global methodsCompar son w th global methods
Result summary
We can model subjects on the NRL task in real time achieving excellent fits to their real-time, achieving excellent fits to their learning curves, using the technique of 1-gram/bde-LWR/2-tier on the available gram/bde LWR/2 tier on the available visual-motor data stream.
ConclusionsWe have used inductive machine learning techniques to construct compact cognitive models in real time from the vast empirical visual motor in real-time from the vast empirical visual-motor data gathered from subjects.Direct models offer the best approach to
d l h l h kmodeling human learning on the task.We have studied machine learners for the task and used the results to understand complexities pof task.Machine learning the NRL task has pushed the science and engineering of reinforcement learningscience and engineering of reinforcement learning.Nice interplay between human and machine learning.
Conclusions (contd )Conclusions (contd.)One of the first in the cognitive science f f gcommunity to directly use objective visualmotor performance data to derive high-level strategy models on a complex taskmodels on a complex task.A scalable solution that harnesses the power of new sensors and computing to change training p g g gprotocols.New algorithms for detecting changepoints and building predictive stochastic models for massive building predictive stochastic models for massive, noisy, non-stationary, vector, time series data.
Open questionsOpen questions
How to design algorithms that can learn to How to design algorithms that can learn to include relevant aspects of sensor history to increase goodness of fit with data?g f fHow would algorithms such as SVM perform on this data? What class of pkernels will be appropriate? Can DBNs be learned from this data? How do we represent/approximate the needed probability distributions?
More open questionsp qBuilding explanatory models
ili s HMM d ls ith th bd– reconciling coarse HMM models with the bde-LWR models
Conjecture: a fundamental problem?j f p– Explanatory models do not fit performance well.– Performance models may not be very abstract,
the task seems to need a series of local models the task seems to need a series of local models rather than a single global model.
– Performance models can be used to modify l l d f d training protocols online and for designing
directed lessons because they identify sensor configurations where subject has trouble with
haction choice.
Current workCurrent work
Training subjects to achieve higher Training subjects to achieve higher competence by giving them access to their learning.Use of neuro-imaging to find the signature of strategy shifts in the brain.
AcknowledgementsgDiana Gordon and Sandra Marshall
H bj t d t ll ti– Human subject data collectionMy students at Rice
S tt G iffi d d t– Scott Griffin, undergraduate– Sameer Siruguri, graduate studentP di t t ONRProgram directors at ONR– Helen Gigley, Susan Chipman and Astrid
Schmidt NielsenSchmidt-Nielsen
EEG setupEEG setup
Data Sources: EEG dataData Sources: EEG data
Sampled at 2 kHz into binary (BDF) formatContinuous recordinggGame now sends ‘markers’ to EEG acq software:– S1: game (episode) start markerg ( p )– S2: game (episode) end marker– S3: for each timestep in details file
Raw data file size: for 20 mins (50-100 games), 1.5-1.8 GB for 256 channels
EEG data pre processingEEG data pre-processing
Using Analyzer (proprietary software i Bi di l l b l 1 hi )in Biomedical lab: only 1 machine)New referenceFilter signal above 50HzDown sample to 512 HzDown sample to 512 HzSegment into individual games by markersmarkers
EEG analysisEEG analysis
ERP vs long-term EEGERP: Evoked response potentialsp p– More ‘popular’ research topic– Instantaneous stimuli evoke clear responses– Easier to detect and analyze
Long-term EEG– Aims to detect long-term trends in EEG data– Mainly clinical studies e.g. seizures
O t k i thi t– Our task in this category
EEG: Computational researchEEG: Computational research
Inverse problem: – the blind source separation problem of locating the generating
neural assemblies from external readingsm f m g– Physical computing methods using electromagnetic physics – statistical methods such as ICA
A tif t d is m lArtifact and noise removal– Noise due to electrical activity or equipment limitations– Muscle movements, eye blinks and other ‘unwanted’ signalsy
Brain dynamics– Which regions are active at any given time? Correlated activity?
How does the activity change over time?– How does the activity change over time?
Some known resultsSome known results
Visualmotor task learning:– At the beginning, high levels of activity in pre-frontal y p
cortex and other ‘conscious thinking’ areas– After training, activity shifts to cerebellum and other
‘involuntary’ areasinvoluntary areas– Decrease in overall activity intensity
Hypothesis: Hypothesis: – Initially, more of a conscious cognitive process (think
about what you are doing!) After training, automatic (do it without thinking)
– Specialized small brain regions take over.
Some known results (contd)Some known results (contd)
Cognitive tasks: mostly, working memory tasks, simple strategy tasks such as subtraction have b n st di dbeen studiedFound to be mediated by multiple areas of the brain working in tandembrain working in tandemAgain, shifts and changes in intensity depending on level of achievementHypothesis: cognitive processes rely on large-scale integration of multiple brain regions:
t f f ti l ti it /f ti l concept of functional connectivity/functional networks
ArtifactsArtifacts
What are artifacts?– External (electrical) noise– Unwanted muscular movements, eye blinks or
other unrelated brain activityI t k diffi lt t h t iIn our task, difficult to characterizeWe expect high levels of cognitive activity, and muscular movementsand muscular movements
Artifact detectionArtifact detection
Using 4-sigma rule: for each game in each channel, compute median and std devR if h l i i di Remove game if any channel contains points exceeding 4 stddev from medianHowever such points are very rare For any game only However, such points are very rare. For any game, only 0.2% points (average) in the signalEffect of these points smoothed by averagingRemoving artifacts by this method may throw out lot of valuable information
Artifacts: exampleArtifacts: example
4 sigma artifact statistics4-sigma artifact statistics
#games without
artifacts
Avg % artifacts in a game
Max % artifacts in a game
Vishal 6/237 0.15 3.6
Rosario 0/145 0.2 3.1
Norbert 0/153 0.13 2.9
6 sigma artifact statistics6-sigma artifact statistics
#games without
artifacts
Avg % artifacts in a game
Max % artifacts in a game
Vishal 150/237 0.01 1.37
Rosario 26/145 0.08 1.9
Norbert 33/153 0.02 1.26
Artifacts: suggestionsArtifacts: suggestions
Leave as is: hopefully smoothes outRemove beginning and end sections of g glonger games (20s +)Remove very short games altogether (4-6s) since they may not contain enough information to detect learning
Data analysisData analysis
Statistical techniques: sliding window averaging variations – inconclusiveTime-frequency analysis– Windowed fourier transforms: average over
f b d i l ifrequency bands – inconclusive– Wavelets: have been applied previously for
ERPs Detect discontinuities Can we apply to ERPs. Detect discontinuities. Can we apply to our problem?
Functional network analysisFunctional network analysis
Method to find functional networksFor every channel pair ci and cj– Compute mij, which measures degree of similarity between the signals from the two channels: eg: statistical correlation, mutual information, spectral coherence etc.
– If mij > t, mark ci and cj as an edge(t is a threshold)
Drawbacks in the basic algorithm– Computationally intensive: every channel pair considered
No temporal variance in functional networks considered– No temporal variance in functional networks considered
Spectral CoherenceSpectral Coherence
Consider 2 signals xi(t), xj(t)( ))( 2fCM ij
ij =
Cij(f): Cross-spectral density at frequency f, by Welch’s method
)()( fCfCM
jjiiij =
Cii(f): Power spectrum of signal xi(t) at fLinear measure between 0 and 1.Matlab function ‘cohere’Matlab function cohere
Windowed coherence analysis
Take smaller subset of channels (64).For each channel pairFor each channel pair– Slide window over signal.– Compute coherence for each window Compute coherence for each window
between the 2 channels.– Average coherence over frequency bands.g q yFor each frequency band, consider channel pairs with coherence above pthreshold.
Preliminary ResultsPreliminary Results
KLD curves# of Coherent channels (size of functional # of Coherent channels (size of functional network) at different thresholdsPlots of coherent channels
Ob iObservations
(above) Motor data shifts(below) Number of coherent channel pairs in frequency band 4 8H b 4-8Hz above coherence 0.6
ObservationsObservations
For a sharp peak (game 30) note the high coherences within For a sharp peak (game 30), note the high coherences within the frontal and visual cortices, and the coherences between the two.
ObservationsObservations
For a non-peak time (game 140), the coherences are fewer,For a non p a t m (gam ), th coh r nc s ar f w r,and confined to specific regions
Ob iObservations
Number of coherent channel pairs increases sharply 15-20 trials before conceptual motor data shifts.Coherence peaks involve greater coherence between channels in frontal cortex, and between frontal and visual cortices.
f l h h h Left parietal cortex shows high activity, since subjects are all right-handed.
ObservationsObservations
There are sharp changes in brain activity and size of functional networks correlated with st t d i ti p ksstrategy derivative peaks.However, relative intensities of peaks not necessarily the same Why?necessarily the same. Why?– Use of arbitrary empirical threshold– Intensity of shift in cognitive process shift not
l d f f h fnecessarily indicative of intensity of strategy shift– Non-implemented cognitive shifts?
Need a better metric for shifts in fn Need a better metric for shifts in fn networks!
Mutual InformationMutual Information
How much information does one signal (rand var) provide about another?
h(u): ‘randomness’ (entropy) of RV u.)|()()|()(),( uvhvhvuhuhvuI −=−=
( ) ( py)h(u|v): ‘randomness’ of u given distribution of v. ))](([log)( 2 uPEuh −=
Non-linear measure))]|(([log)|(
))](([log)(2
2
vuPEvuhuPEuh
−=
Non linear measure
Mutual Information (contd)Mutual Information (contd)
Has been applied to fMRI dataEasy to implement for discrete-valued y p fsignals e.g. voxels in fMRI dataHow to compute for continuous-valued psignals such as EEG?Matlab library by R. ModdemeijerDiscretization using histogram
Other possible methodsOther possible methods
Linear regression analysis: CharlesNeural-network based clusteringgOther statistical clustering methods