Spike Sorting I:
Bijan Pesaran
New York University
Acknowledgements
• Ken Harris and Samar Mehta at Neuroinformatics course Woods Hole.
Aims
We would like to …
• Monitor the activity of large numbers of neurons simultaneously
• Know which neuron fired when
• Know which neuron is of which type
• Estimate our errors
Primate retinal ganglion cells, courtesy of the lab of Dr. E.J. Chichilnisky
THE PROBLEM: Multiple Neural Signals
-400
-200
0
200
3 msec
4.64 4.66 4.68 4.7 4.72 4.74 4.76
-400
-300
-200
-100
0
100
200
300
Time (sec)
Vol
tage
(A
/D L
evel
s)
0 1 2 3 4 5 6 7 8 9 10
-400
-200
0
200
Time (sec)
Vol
tage
(A
/D L
evel
s)
THE GOAL: Spike Times of Single Neurons
Time (sec)
Spike Detector
4.5 4.55 4.6 4.65 4.7 4.75 4.8 4.85 4.9 4.95 5
Neuron #1 Spikes
Neuron #2 Spikes
-400
-200
0
200
Raw Data
Region from previous slide
THE ‘GRADUATE STUDENT’ ALGORITHM
4.5 4.55 4.6 4.65 4.7 4.75 4.8 4.85 4.9 4.95 5
-400
-300
-200
-100
0
100
200
300
Time (sec)
Vol
tage
(A
/D L
evel
s)
Raw Data
Threshold detector at 32
0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6
-600
-500
-400
-300
-200
-100
0
100
200
300
Time (msec)
Vol
tage
(A
/D L
evel
s)
Candidate Waveforms
Spike Height vs. Width Plot
-700 -600 -500 -400 -300 -200 -1000
0.2
0.4
0.6
Wid
th (
mse
c)
Height (A/D Levels)
0 10 20 30 40 500
50
100
150
200
250
# of
Inte
rval
s
Time (msec)
Interspike Interval Histogram
1 2 3 40
50
100
150
200
A GENERAL FRAMEWORK
Locate Spikes Preprocess Waveforms
Density Estimation
Spike Classification
Quality Measures
Extracellular Recording Hardware
• You can buy two types of hardware, allowing
• Wide-band continuous recordings
• Filtered, spike-triggered recordings
The Tetrode
• Four microwires twisted into a bundle
• Different neurons will have different amplitudes on the four wires
Raw Data
Spikes
High Pass Filtering
• Local field potential is primarily at low frequencies.
• Spikes are at higher frequencies.
• So use a high pass filter. 800hz cutoff is good.
Filtered Data
Cell 1
Cell 2
Spike Detection
• Locate spikes at times of maximum extracellular negativity
• Exact alignment is important: is it on peak of largest channel or summed channels?
Data Reduction
• We now have a waveform for each spike, for each channel.
• Still too much information!
• Before assigning individual spikes to cells, we must reduce further.
Principal Component Analysis
• Create “feature vector” for each spike.
• Typically takes first 3 PCs for each channel.
• Do you use canonical principal components, or new ones for each file?
“Feature Space”
Cluster Cutting
• Which spikes belong to which neuron?
• Assume a single cluster of spikes in feature space corresponds to a single cell
• Automatic or manual clustering?
Cluster Cutting Methods
• Purely manual – time consuming, leads to high error rates.
• Purely automatic – untrustworthy.
• Hybrid – less time consuming, lowest error rates.
Semi-automatic Clustering
How Do You Know It Works?
• We can split waveforms into clusters, but are we sure they correspond to single cells?
• Simultaneous intra- and extra-cellular recordings allow us to estimate errors.
• Quality measures allow us to guess errors even without simultaneous intracellular recording.
Intra-extra Recording• Simultaneous recording with a wire
tetrode and glass micropipette.
Intra-extra Recording
Extracellular waveform is almost minus derivative of intracellular
Bizarre Extracellular Waveshapes
Model Experiment
Two Types of Error
• Type I error (false positive) – Incorrect inclusion of noise, or spikes of other
cells
• Type II error (false negative)– Omission of true spikes from cluster
• Which is worse? Depends on application…
Manual Clustering Contest
Best Ellipsoid Error Rates
Find ellipsoid that minimizes weighted sum of Type I and Type II errors.
Must evaluate using cross-validation!
Humans vs. B.E.E.R.
Waveshape Helps Separation
Why were human errors higher?
• To understand this, try to understand why clusters have the shape they do
• Simplest possibility: spike waveform is constant, cluster spread comes from background noise
• Are clusters multivariate normal?
Problem: Overlapping Spikes
Problem: Cellular Synchrony
Problem: Bursting
Problem: Misalignment
• When you have a spike whose peak occurs at different times on different channels, it can align on either.
• This causes the cluster to be split in two.
Problem: Dimensionality
Manual clustering only uses 2 dimensions at a time
BEER measure can use all of them
“Semi-Automatic” Clustering
•Uses all dimensions at once
•Errors should be lower
•Still requires human input
Semi-automatic Performance
Software: KlustaKwik• Mixture of Gaussians, unconstrained
covariance matrices
• Speed is crucial
• CEM Algorithm – faster than EM
• Most probabilities not calculated
• Local maxima result in over- and under-clustering
• Split and merge features to tunnel out of local maxima
• Still requires supercomputer resources.
klustakwik.sourceforge.net
Software: Klusters
Recluster Feature
Ergonomic Design
Auto/Cross correlograms
Grouping Assistant
Waveforms
Timecourse
klusters.sourceforge.net
Cluster Quality Measures
• Would like to automatically detect which cells are well isolated.
• BEER measure needs intracellular data, which we don’t have in general.
• Will define two measures that only use extracellular data.
Isolation Distance
Size of ellipsoid within which as many spikes belong to our cluster as not
L_ratio
21ratio clusternoise
L cdf N
False Positives and Negatives
Which Measure to Use?
• Isolation distance correlates with false positive error rates– Measures distance to other clusters
• L_ratio correlates with false negative error rates– Measures number of spikes near cluster
boundary
Conclusions
• Automatic clustering will save time and reduce errors.
• Errors can be as low as ~5%.
• Quality measures give you a feeling of how bad your errors are.
Room for Improvement
• Make it faster
• Improved spike detection and alignment
• Quality measures that estimate % error
• Fully automatic sorting
• Resolve overlapping spikes
Easy
Hard