improved models for analysis of motor-cortical signalsimproved models for analysis of motor-cortical...

Improved Models for Analysis of Motor-Cortical

Signals

Alex L. Rojas ∗, Anthony E. Brockwell†and Andrew B. Schwartz‡

Abstract

In recent years, devices capable of linking the brain to the external world have been devel-

oped. Such devices directly measure the output of multiple neurons simultaneously. One obvious

application of this technology is in helping those who are movement impaired; in theory it would

be possible to implant such a device in the brain, and use its output to control movement of

a robotic prosthetic limb, for instance. However, to achieve this goal, it is necessary to first

understand the relationship between movement and neural signals. We consider data collected

from rhesus monkeys in experiments, and propose a model for describing this relationship. The

model generalizes several previously-considered models from the neuroscience literature, and al-

lows individual neurons to (1) encode different kinematic variables, and (2) to have more general

spike count distributions. The proposed model is used to decode cortical signals recorded for 258

neurons in the ventral premotor cortex of rhesus monkeys during an ellipse-drawing task, and

we demonstrate that relative to the existing models, a substantial reduction in mean squared

error is achieved.

Keywords: Bayesian decoding, neuron, spike, model selection

1 Introduction

The search for accurate descriptions of the relationship between spike trains and movement is a

fundamentally important problem in neuroscience. Not only could these descriptions, formalized

as probabilistic models, give us a better understanding of the underlying processes in the neural

∗Department of Statistics, and Center for Automatic Learning and Discovery, Carnegie Mellon University.†Department of Statistics, Carnegie Mellon University.‡Department of Neurobiology, University of Pittsburgh

1

system, but they have potential applications in areas including development of neurally-controlled

prosthetic devices, restoration of functionality lost due to neural injury, etc.

The relationship between movement and spike trains is often referred to as neural coding, and

is typically specified by a model which gives the probabilistic (spiking) behavior of a neuron as

a function of current, past, and potentially, future movement. The use of probability is natural

since cortical signals are typically described by stochastic models due to their inherent variability.

The converse problem of determining motion, given observed spike trains, is referred to as neural

decoding, and has been heavily studied in the literature. There are basically two groups of decoding

algorithms in the literature: the “population vector” (PV) approach (Georgopoulos et al., 1983;

Taylor et al., 2002; Salinas and Abbott, 1994) and Bayesian decoding algorithms (Brockwell et al.,

2004; Brown et al., 1998; Gao et al., 2002; Shoham et al., 2003; Wu et al., 2002). The former

algorithms are easily implemented, but they are unable to account for dynamic behavior of the

underlying signal of interest (Brockwell et al., 2004), they lack a clear probabilistic model, and

they provide no estimate of uncertainty. On the other hand, Bayesian decoding offers a broad and

flexible model-based approach. It starts with an encoding model, and then “turns the problem

around,” obtaining (via standard recursions involving Bayes’ rule) distributions of kinematic vari-

ables conditional on observations of spike trains. In the past, this approach has not been widely

used, due to the complexity of its implementation. Brockwell et al. (2004) and Gao et al. (2002)

considered the use of a recently-developed numerical method known as “particle filter” (Doucet

et al., 2001) to overcome the complexity of implementation of Bayesian decoding methods when

the models of cortical activity for encoding (modeling) of movement are non-linear or non-Gaussian

(see Section 2). Brockwell et al. (2004) performed a simulation study in which they showed how

the particle filter algorithm can lead to more accurate decoding than the decoding obtained by PV

and “optimal linear estimation” (OLE, Salinas and Abbott, 1994) methods. When applied to real

data, the particle filter also outperformed the PV and OLE methods.

The reliance of the recursive Bayesian approach on the model can be either a strength or a

weakness. When the model is correct, results are optimal (in the sense that conditional means

minimize mean-squared-error), but when the model is not correct, bias and/or noise may be intro-

2

duced into decoded trajectories. Therefore, assessment of model goodness-of-fit is critical in such

a decoding scheme. (This point is recognized by others, such as Paninski et al., 2004, who are also

working to develop better encoding models.) In this paper we study the appropriateness of some of

the models that have been used in the literature, with particular reference to factors that have been

recently included in neuron firing models, such as position, acceleration (Wu et al., 2002) and speed

(Brockwell et al., 2004). In addition, we describe and analyze the conditional distributions of the

cortical signals. This leads us to propose a general model, which is more flexible than many of those

previously considered. The model we propose differs from that of Paninski et al. (2004) in that

ours allows neuron firing rates to depend on arbitrary subsets of kinematic variables, with a more

flexible functional form. On the other hand, the model of Paninski et al. (2004) has the advantage

of being able to capture more correlation between spiking in different neurons. We also propose

several diagnostics and tests which can be used for model-fitting, and finally, we examine whether

or not these new factors help us to decode hand-velocity from cortical signals more accurately.

2 Motor Cortex Data

To study the model specification, we use a center→out task (Moran and Schwartz, 1999a; Taylor

et al., 2002) and an ellipse-drawing task (Reina and Schwartz, 2003).

2.1 Description of Experimental Setup and Data

The basic methodology for the center→out task in two dimensions is described by Moran and

Schwartz (1999a). In the three-dimensional (3D) case, three rhesus monkeys (Maccaca mullata)

were trained to perform hand movements in a computer-generated 3D virtual environment. The

monkeys could not see their actual hand movements, but rather saw two spheres: the motion of the

first sphere was controlled by the monkey’s hand position, and the second sphere was a stationary

target sitting on a vertice of an imaginary cube, creating a total of eight possible targets. Each

monkey performed the same task five times, for each randomly selected target. The cortical activity

(spikes) and the hand position, as functions of time, were recorded for 258 neurons in the ventral

premotor cortex (area 6V), for five repetitions of the experiment. These firing counts were recorded

one neuron at a time as opposed to hundreds of neurons at a time (Black et al., 2003; Paninski et al.,

2004). The total times recorded were divided into 100 equal-sized bins, in such a way that the actual

3

reaching task was performed between time bins 31 and 71. Two of the three monkeys carried out the

experiment with both arms (one after the other), and the third monkey did it only with its left arm.

After conducting the center→out task, the ellipse-drawing task was conducted by the same

three monkeys using the same arms. Specifically, each monkey continually traced five elliptical

loops in the x-y plane, with the z-component capturing small deviations of the hand from the x-y

plane. During the task, cortical activity and hand position were recorded for the same 258 neurons

recorded in the center→out task. As before, the duration of each of the five loops was divided into

100 equal-size bins and the whole experiment was conducted five times. A more detailed description

of the ellipse-drawing task can be found in Reina and Schwartz (2003).

Let N(i)t be the spike count for the ith neuron in the tth time bin and N (i) =

∑Tt=1 N

(i)t

be the total spike count recorded for the ith neuron, where T is the number of time bins. Let

p(i)t =

(

p(i)x,t, p

(i)y,t, p

(i)z,t

)′be the smoothed vector of hand positions in the (x, y, z) direction for the ith

neuron at time t. This smoothing was performed using a cubic spline and it was done to facilitate

calculating hand velocity and acceleration. Let v(i)t =

(

v(i)x,t, v

(i)y,t, v

(i)z,t

)′be the vector of velocities at

time t in the (x, y, z) directions for the ith neuron calculated by taking the first derivative of the

smoothed hand positions. Let a(i)t =

(

a(i)x,t, a

(i)y,t, a

(i)z,t

)′be the vector of accelerations at time t in the

(x, y, z) directions for the ith neuron calculated by taking the second derivative of the smoothed

hand positions.

Define the vector of kinematic variables for the ith neuron as

x(i)t,k =

(

p′(i)t ,v′(i)

t ,a′(i)t , s

(i)t

)

with s(i)t the speed (or tangential velocity) defined as

s(i)t = ||v

(i)t ||2 =

√

v(i)2x,t + v

(i)2y,t + v

(i)2z,t .

Finally, we define λ(i)(·) as the observed spike intensity function, for the ith neuron. This

function is obtained by smoothing the observed spike counts using loess (Cleveland and Devlin,

1988). (Note that λ(i)(·) represents the real-valued rate of firing of the ith neuron, which is different

from the integer-valued count process N(i)t .)

4

2.2 Features of the data

Our first step toward a better encoding of cortical signals is to identify the behavior of the vari-

ability of spike counts. We start by checking whether or not the firing rate for each neuron follows

an inhomogeneous Poisson process, since it has been the main assumption in early studies (Brock-

well et al., 2004; Gao et al., 2003). We design a goodness-of-fit test based on the time-rescaling

theorem (see Appendix A and Brown et al., 2001), the probability integral transform (Casella and

Berger, 2002), the Kolmogorov-Smirnov (KS) test and the Quantile-Quantile (QQ) plot (Johnson

and Kotz, 1970). The idea is to examine whether or not, after time-rescaling so that the process is

homogeneous, the inter-spike times have exponential distributions Details are given in Appendix B.

The KS test and the QQ plot for neuron #156 is displayed in Figure 1. If the spike counts

for neuron #156 were a realization of an inhomogeneous Poisson process with intensity function

λ(i)(·), then we would expect the solid line in Figure 1 to be close to a 45-degree line and stay

within the dashed and dotted lines, according to the KS test and the QQ plot. As can be seen

in Figure 1, there is lack of fit for lower quantiles (below 0.42 for the QQ plot and 0.30 for the

KS test); therefore, the Poisson assumption appears not to be valid for this neuron. The behavior

found in neuron #156 is also found in many other neurons, but not all of them; hence, based on

the KS test and the QQ plot, we conclude that the Poisson assumption is not reasonable for all

neurons and a new model should be proposed. We propose to use the Bernoulli distribution as an

alternative, since the spike counts for most of the neurons that lack fit using the KS test and the

QQ plot typically have at most one spike in each bin.

3 Modeling the Data

There have been a variety of proposed models to describe the variability of neuron firing and the

neuron’s expected firing rate. They mainly differ in the set of kinematic variables included, and

the way the response variable is handled. The first proposed model (Ashe and Georgopoulos,

1994) uses only velocity and avoids the use of counts by using the second order variance stabilizing

transformation for a Poisson variable (Anscombe, 1948). In other words, if Nt ∼ Poisson(λ) then

Varλ

(

√

Nt +3

8

)

≈1

4·

(

1 +1

16 · λ2

)

5

0 0.5 10

0.5

1

Model Quantiles

Empi

rical

Qua

ntile

s

Figure 1: KS test (dotted line) and QQ plot (dashed line) for the interspike times of neuron #156.

The fact that the curve exceeds the dashed line indicates that a test of size 0.05 would reject the

null hypothesis of a Poisson distributed spike count.

Brockwell et al. (2004) used a different approach; they worked directly with spike counts and

made use of generalized linear models (McCullagh and Nelder, 1989) to estimate the parameters in

the model. Their model includes velocities and speed as covariates, as well as time lags. Shoham

et al. (2003) define their own distribution to describe the variability of the spike counts. They

called it the “Normalized-Gaussian discrete” distribution. Their proposal includes the use of linear

combinations of position, velocity, acceleration plus the use of a 4th degree polynomial to model

the expected firing rate. Their model does not include time lags. Gao et al. (2003) proposed the use

of generalized additive models (Hastie and Tibshirani, 1990), allowing a more flexible description

of the expected firing rate as a function of position, velocity and acceleration.

We can generalize the models mentioned before as follows. For modeling the expected firing rate

we adopt the approach of Moran and Schwartz (1999a), Wu et al. (2003) and Shoham et al. (2003),

including in the model other kinematic variables as position and acceleration, as well as speed.

These inclusions are made given that different neurons encode for different kinematic variables.

For example, some encode primarily velocity while others encode position (Shoham et al., 2003).

Given that following neural discharge there is a time lag before movement occurs (or in some cases,

6

the movement occurs before associated neural discharge), we also include a time lag term τ (i). In

summary, for the ith neuron we propose the model

N(i)t ∼ p(n

(i)t |r

(i)t ), (1)

r(i)t = f (i)(βi · x

(i)

t+τ (i)), (2)

where p(n(i)t |r

(i)t ) denotes the conditional distribution of n

(i)t given r

(i)t , with r

(i)t , the expected

firing rate. The f (i) function (namely the inverse of what it is called ‘link’ function in the

generalized linear model literature) represents the non-linear relationship between the expected

firing rate r(i)t , which is a linear combination of kinematic variables. Finally, x

(i)t is a subset

of [pt,vt,at, st]T (i). As a special case, for example, if p(·|r

(i)t ) is Poisson, f(·) = exp (·), and

r(i)t−τ = β0 + β1vx,t + β2vy,t + β3vz,t + β4st, we obtain the model used by Brockwell et al. (2004)

As mentioned before, some neurons seem to have a variability described by the Poisson distri-

bution, while other neurons appear to have more Bernoulli-like distributions. Therefore we include

these two possibilities for p(·|r(i)t ), but in order to use the Bernoulli distribution, we would need, in

a few cases, to transform the spike counts Nt as follows

N(i)∗t ≡ min (1, N

(i)t ). (3)

Assuming that the conditional mean r(i)t or its transformation through (f (i))−1(·) is a linear

function of the hand kinematics, we use generalized linear models (GLMs, McCullagh and Nelder,

1989) to estimate the parameters βi for each neuron. The parameters in a GLM can be estimated

by the maximum likelihood method using iterative re-weighted least squares (IRLS). To carry out

our analysis we make use of the glm function in Splus.

It is important to mention that the IRLS method is not guaranteed to converge to the global

maximum of the likelihood function, as it is the case for some of the models we fitted in this study.

This situation may be due to the sparsity of spike counts for some neurons; therefore, any neuron

with this problem was not included in our study.

7

3.1 Automatic per-Neuron Model Choice

The main goal of this study is to find a “better” model, or set of models, for encoding cortical

signals. We do so by first taking the generalization given in Equations (1) and (2), and exploring

different selections for the conditional distribution of spike counts given the expected firing rate,

p(·|rt); the link function f−1; the time-lag, τ ; and the set of kinematic variables, x. These selections

have to be made for all available neurons, so we need to define an automatic procedure to select

each component of our model.

Based on the preliminary diagnostics considered in Section 2, our choices of allowable distri-

butions p(·|r(i)t ) in Equation (1) are the Bernoulli distribution and the Poisson distribution, with

f(·), the inverse of the corresponding link function, that is, f(x) = exp(x) for the Poisson model

and f−1(x) = logit(x) for the Bernoulli model. Choice of p(·|r(i)t ) for each neuron is described in

Section 3.1.1, while choice of τ and X is explained in Sections 3.1.2 and 3.1.3, respectively.

3.1.1 Distribution Selection

To determine whether the Bernoulli or Poisson distribution is more appropriate, in a manner which

is easily automated, we use the following approach.

First, for the ith neuron, let λ(i)l and λ

(i)u be the minimum and the maximum,respectively, of

λ(i)(·) in the interval (0, b(i) ·T ). Let us divide the interval [λmin, λmax] into M bins, {B1, . . . , BM},

each bin with approximately the same number of elements. We define µ(i)k as the mean of the

smoothed values in the kth bin, k = 1, . . . ,M , and S(i)2k the conditional sample variance of the

spike counts for all time bins t such that λ(i)t ∈ Bk. Based on µ

(i)k and S

(i)2k we defined the statistic

C(i) as

C(i) =

M∑

k=1

w(i)k

{

(µ(i)k (1− µ

(i)k )− S

(i)2k )2 − (µ

(i)k − S

(i)2k )2

}

(4)

where

w(i)k =

√

µ(i)k + 1/M

M∑

k=1

√

µ(i)k + 1

,

8

for k = 1, . . . ,M .

The statistic C is comparing the squared deviations between the expected conditional variance

under the Poisson and Bernoulli models, giving more weight to deviations for high estimated firing

rates. The reason for this choice is that for small values, the Poisson and Bernoulli conditional

variance functions are similar. From the definition of C (i), we would expect positive values of C (i)

for Poisson-like neurons and negative values of C (i) for Bernoulli-like neurons. A formal distribution

selection procedure using this statistic is given in Figure 2.

Input: an integer K > 0, the intensity function for the ith neuron λ(i)(·), T bins were the cortical

activity was recorded.

(i) For each of the T bins, generate a Poisson sample of size one, with rate λ(i)t , t = 1, . . . , T .

(ii) Compute the statistics C (i) (see Equation (4)) based on the sample generated in (i), name it

C(i)P .

(iii) For each of the T bins, generate a Bernoulli sample of size one, with probability λ(i)t , t =

1, . . . , T .

(iv) Compute the statistics C (i) based on the sample generated in (iii), name it C(i)B . Let i = i+1.

(v) Repeat (i)-(iv) K times.

(vii) Choose the Poisson distribution if f̂p(C(i)∗ ) > f̂b(C

(i)∗ ), where f̂b is the kernel density estimate

(Silverman, 1986) of f(C(i)∗ ) based on C

(i)B,1, . . . , C

(i)B,K and, f̂p is the kernel density estimate

of f(C(i)∗ ) based on C

(i)P,1, . . . , C

(i)N,K . Otherwise choose the Bernoulli distribution.

Output: selected density function

Figure 2: Procedure to select p(·|r(i)t )

9

3.1.2 Time Lags

The physical relationship between hand movement and neuron firing implies the existence of a

(possibly small) time lag between the firing and the movement, referred to as τ (i) in Equation (2).

To identify “the” lag for each of the neurons, we fit each model for different values of lag, τ =

−40, . . . ,−1, 0, 1, . . . , 40, and choose the value that maximizes the statistic

D∗τ = ητ (Dτ ) = ητ

(

D0,τ −D1,τ

D0,τ

)

, (5)

in the center→out task and Dτ in the ellipse-drawing task. D0,τ is the null deviance

D0,τ = 2(l(N(i)t ;N

(i)t )− l(N

(i);N

(i)t )),

where N(i)

is the spike counts mean. D1,τ is the deviance

D1,τ = 2(l(N(i)t ;N

(i)t )− l(r̂

(i)t ;N

(i)t )),

where r̂(i)t is the estimated expected tunning function. Finally,

ητ (x) =

x if− 20 < τ < 20(

2−|τ |

20

)

· x otherwise.(6)

For models where the Poisson distribution is used, the log-likelihood is

l(r̂(i)t ;N

(i)t ) =

T∑

t=1

N(i)t log (r̂

(i)t )− r̂

(i)t − log (N

(i)t !) (7)

with i = 1, . . . , 258 and T = 100. For models where the Bernoulli distribution is used, we assume

that for each time bin, we record whether or not spike counts during this time are observed.

Therefore, the log-likelihood can be written as:

l(r̂(i)t ;N

(i)t ) =

T∑

t=1

N(i)t

∗log r̂

(i)t + (1−N

(i)t

∗) log (1− r̂

(i)t ) (8)

with N(i)t

∗= min {N

(i)t , 1}, i = 1, . . . , 258 and T = 100.

Note that we do not use the likelihoods of each model to compare them because for each lag

there is a different amount of data. On the other hand, we are not comparing nested models;

10

therefore, we do not use the deviance D1 alone. Instead, we compare the “gain” in including the

predictive variables p(i)t ,v

(i)t ,a

(i)t , and s

(i)t in our model for different lag values. The function η is

introduced to eliminate large values of τ in the center→out task because, for these values, a high

value of Dτ is probably an artifact of the small amount of data left for high time lags.

3.1.3 Kinematic Variables

Our model does not fix the subset of kinematic variables that we should include, as most current

models do. This added flexibility generally leads to better models, since, as observed in the litera-

ture (see, e.g., Shoham et al., 2003; Moran and Schwartz, 1999a) some neurons appear to code for

position while other encode for velocity. However, with this added flexibility, we have to come up

with a procedure to decide which subset of kinematic variables to include, a task that it is not trivial.

We use the following procedure for selection of kinematic variables. For each value of τ =

−40, . . . ,−1, 0, 1, . . . , 40, the GLM is fitted and the significant variables are recorded. Then, we

find the proportion of times that each kinematic variable was significant. Finally, we take the

subset of kinematic variables that were significant more than κ% of the times. We choose κ = 40

in an ad hoc manner. If there is no variable significant more than 40% of these times, we just take

the variable that was significant more often than the others. Note that if a variable is significant in

only one or two directions (x, y, or z), we included all the variable of the same “class”; for instance,

if position in the x-direction is selected, position in the y and z−directions are also included. We

are now able to specified our automatic model choice procedure as appears in Figure 3.

3.1.4 Example: The selection procedure applied to neuron #156

To get a better understanding of the procedure described in Figure 2, we apply it to neuron #156.

We start by deciding between the Poisson and the Bernoulli distributions. The kernel density

estimates of f(c) obtained by applying the procedure described in Section 3.1.1 appear in Fig-

ure 4(a). As can be seen in Figure 4(a), we select the Bernoulli assumption for neuron #156

because 0.2446 = f̂p(C∗) = f̂p(−0.5024) < f̂b(−0.5024) = 2.6934. Note that this result was ob-

tained using the center→out task, but exactly the same decision is made using the ellipse-drawing

task.

11

1. Choose p(·|r(i)t ) using the procedure defined in Section 3.1.1.

2. Select the set of significant kinematic variables using the procedure defined in Section 3.1.3,

given that p(·|r(i)t ) has been chosen to be Bernoulli or Poisson.

3. Estimate time lag, τ :

Select τ̂ such that, τ̂ = argmaxτ∈{−40,...,40}D∗τ , for the center→out task or τ̂ =

argmaxτ∈{−40,...,40}Dτ , for the ellipse-drawing task, where

D∗τ = η(Dτ ) = η

(

D0,τ −D1,τ

D0,τ

)

calculated from the GLM with the chosen p(·|r(i)t ) in 1., and the set of kinematic variables

selected in 2.

Figure 3: Selection procedure

Given that we have selected p(·|r(i)t ), we follow the approach described in Section 3.1.3 and

evaluate the percentage of times a kinematic variable is significant for values of τ = −40, . . . , 40.

These percentages appear in Table 1 for both center→out and ellipse-drawing tasks. From the

results in Table 1, only velocity is included for both data sets, while position is only included in

the ellipse-drawing data set.

Table 1: Percentage of times a kinematic variable is significant in the GLM, for neuron #156 and

values of τ = −40, . . . , 40

Variable center→out task ellipse-drawing task

Position 32 49

Velocity 62 46

Acceleration 19 16

Speed 5 9

12

C

Dens

ity

−2 −1 0 1 2 3 40

1

2

3

4

C*

(a) Kernel density estimates of f(c) based on

Cb1 , . . . , Cb

K (solid line) and Cp1 , . . . , C

p

K (dashed

line)

−40 −20 0 20 400

0.03

0.06

0.09

0.12

0.15

τ

Dτ

D*τ

Dτ

(b) Values of Dτ for the center→out task and

D∗τ for the ellipse-drawing task

Figure 4: Distribution and lag selection for neuron #156

Finally, having selected the kinematic variables, we choose the lag value. As mentioned before,

we choose τ such that the value of D∗τ , in the case of the center→out task, or Dτ for the ellipse-

drawing task, is maximized. Figure 4(b) displays the values of D∗τ and Dτ , plus a smoothing curve

of these values. The lags selected are 20 for the center→out task and 1 for the ellipse-drawing

task. Notice how Dτ is always greater than D∗τ . On one hand, the difference is due to the fact

that Dτ was calculated using position and velocity, while D∗τ was calculated using only velocity.

On the other hand, the gain of including kinematic variables in the GLM for the ellipse-drawing

task is greater than in the center→out case. It is also clear that the curve’s shape is different for

both tasks; this difference and the fact that different sets of kinematic variables appear significant

for these tasks lead us to conclude that neurons may not encode the same information always, and

their time lags do not remain constant for different tasks or sets of kinematic variables. (The fact

that time lags vary was noted by Moran and Schwartz, 1999b, who observed that time lags depend

of the inherent curvature of a particular task.)

13

4 Results

4.1 Summary of the automatic selection procedure for the Rhesus data

Fifty-seven percent and seventy percent of neurons were classified as Bernoulli distributed, 43%

and 30% were classified as Poisson distributed in the center→out task and ellipse-drawing task,

respectively. These distributions of p(·|r(i)t ) establish a majority of Bernoulli-like neurons for both

tasks. This situation is very interesting given that the most commonly used distribution is Poisson.

On the other hand, the agreement between the chosen p(·|r(i)t ) for each task is 68%; therefore, we

may expect to go through the process of identifying p(·|r(i)t ) for each new task.

Regarding time lags, the difference between the two task are quite substantial. As can be seen

in Figure 5, they seem not to be related at all. Furthermore, no relationship between p(·|r(i)t ),

monkey, arm and time lag was found.

−40 −20 0 20 40−30

−20

−10

0

10

20

30

Time Lag. Ellipse−drawing task

Tim

e La

g. C

ente

r−ou

t tas

k

Figure 5: Scatter plot of Time Lags estimated using the ellipse-drawing task vs. Time Lags

estimated using the center→out task

Finally, the proportion of neurons for which each kinematic variable is significant appear in Table 2.

Notice how more neurons encode position than velocity in the center→out task, while in the ellipse-

14

Table 2: Percentage of neurons for which each kinematic variable is significant

center→out task ellipse-drawing task

Variable % %

p 34 39

v 23 43

a 23 27

s 25 10

drawing task, the number of neurons encoding velocity is a little higher than the neurons encoding

position. Another important feature is that only few neurons seem to encode speed, therefore this

variable will barely be part of the model, which contrasts to the model proposed by Brockwell et al.

(2004).

4.2 Decoding

As mentioned before, there are basically three sets of decoding algorithms. In this paper, we

focus on Bayesian Decoding using particle filtering, since it has been shown that particle filtering

performs better than other methods (Brockwell et al., 2004; Gao et al., 2003). This method is

briefly explained in Appendix C.

4.2.1 The model

The unobserved kinematic variables (hidden states) {Xt, t = 0, 1, 2, . . .} are modeled as a Markov

process of initial distribution p(X0) with a transition equation p(Xt|Xt−1). The cortical sig-

nals {Nt, t = 0, 1, 2, . . .} are assumed to be conditionally independent given the process {Xt, t =

0, 1, 2, . . .} and the marginal distribution p(Nt|Xt). In summary, the basic model is described by

p(X0)

p(Xt|Xt−1) for t ≥ 1 (9)

p(Nt|Xt) for t ≥ 1. (10)

15

The fundamental object is to find the posterior distributions, p(Xt|N0,N1, . . . ,Nt), t = 1, 2, . . ..

This posterior distributions are typically impossible to find analytically. To address this problem,

numerical approximations are needed. One numerical approximation is the Particle Filter (PF),

also know as “Bootstrap Filter”(Doucet et al., 2001), which is applicable to a large class of models

and is easy to implement. This algorithm can be seen in Figure 8.

4.2.2 Decoding of cortical signals

Given that our final goal is to improve the decoding of cortical signals, we analyze the decoding

accuracy of hand velocity using the model proposed by Brockwell et al. (2004) (M1) and the model

we proposed (M2), by comparing the integrated square error (ISE) of the reconstruction of the

original velocity, in the fourth loop of the first repetition, for the ellipse-drawing task. Since each

neuron was recorded separately, there are 258 different velocities. We average all of them to get

the actual velocity. Finally, we assume that the vector of velocities follows a random walk.

For M1 all neurons use the same set of kinematic variables and all assume the Poisson distri-

bution, then the state-space model is as follows:

State equation

vedt = ved

t−1 + εt (11)

Observation equation

Nt =

N(1)t

...

N(n)t

, N(j)t |xt ∼ Poisson(fi(xt)) (12)

where εt ∼ N3(0, Q), fi(xt) = exp{

β̂0 + β̂1v(i)x + β̂2v

(i)y + β̂3v

(i)z + β̂4s

(i)t

}

, β̂j , j = 0, . . . , 4. are

found use the three first loops of the ellipse-drawing task, using IRLS.

For M2 the state-space model is

16

State equation

pt

vt

at

=

I δI 0

0 I 0

0 0 0

·

pt−1

vt−1

at−1

+

0

εt

εt/δ

(13)

Observation equation

Nt =

N(1)t

...

N(n)t

, N(j)t |xt ∼ pi(fi(xt)) (14)

where pi, fi, and x(i)t are chosen using the selection procedure defined in Section 3.1.3, and εt ∼

N3(0, Q).

fi(xt) =exp {β̂ix

(i)}

1 + exp {β̂ix(i)}

, if pi is Bernoulli. (15)

fi(xt) = exp{

β̂x(i)}

, if pi is Poisson. (16)

β̂ =(

β̂0, . . . , β̂p

)

is the GLM estimate of β in M2, using the three first loops of the ellipse-drawing

task, with p being the column dimension of x.

Notice that the estimations of β are only based on the ellipse-drawing task, as opposed to

including information from the center→out task (Brockwell et al., 2004). This seems more natural

since it is not clear how to introduce information from one task to another one. Plus, as mentioned

before, neurons seem to encode different information for each task. Finally, even though we have

a total of 258 neurons, we choose only the set of neurons such that the vector of parameters

(β1, . . . , βp) is significantly different from the p-dimensional vector of zeros, for a total of 118

neurons.

4.2.3 Decoding results

Results from applying the particle filter with the two different models are shown in Figure 6, and

the ISE and the maximum squared error (maxSE) are given in Table 3. Notice how the recon-

struction based on the PF algorithm and M2 is visually closer to the true trajectory than the one

using M1; with its accuracy, in terms of mean-squared error, improved by a factor of approximately

17

Table 3: Decoding errors summarized across time, for M1 and M2

M1 M2

ISE 4.6864 1.3997

maxSE 0.1350 0.0374

three. This result may be interpreted, as was done by Brockwell et al. (2004), as saying that ∼

3 times more neurons would be needed when using the PF algorithm and M1 to obtain the same

reconstruction error achieved when using the PF algorithm and M2.

The improvement obtained by using M2 seems mainly due to the explicit specification of the

variability among neurons in terms of the hand kinematics that they encode and the probabilistic

and deterministic relationships they have with the spike trains.

Finally, it is worth mentioning that the PF using M2 is less sensitive to the estimated time lag

than the PF using M1. The reason for this phenomenon is unknown, but we suspect that it is due

to the fact that we fix the subset of kinematic variables in M1.

5 Conclusions

We have seen (confirming the results of a number of prior studies) that neurons in the ventral pre-

motor cortex behave differently. Not only do they exhibit different time lags, but they also encode

different information relevant to hand movement, and the variability of spike count distributions

is also different. This non-constant behavior is also found when different tasks are performed. By

accounting for this behavior and making model distinctions accordingly for each neuron, we are able

to achieve more accurate decoding of hand position from motor-cortical signals. Further decoding

improvement may be obtained by means of using more flexible functions in our model, for instance,

splines or polynomials. In addition, following in the spirit of Paninski et al. (2004), it may be useful

to adapt our model to handle correlation (beyond simply correlation induced by neurons having

similar “preferred directions”) between neurons as well.

18

−100 0 100−150

−100

−50

0

50

100

150

x

y

−50 0 50−80

−60

−40

−20

0

20

40

6080

z

y

−100 0 100

−100

−50

0

50

100

z

x

0 50 100−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6Actual and PF Velocity, Model 1

Time Bin

Vel

ocity

0 50 100−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6Actual and PF Velocity, Model 2

Time Bin

Vel

ocity

0 50 1000

0.02

0.04

0.06

0.08

0.1

0.12

0.14Squared Errors in Velocity

Time Bin

SE

M2M1

zxy

zxy

ActualPF M1PF M2

ActualPF M1PF M2

ActualPF M1PF M2

Figure 6: Decoding of x, y, and z components of the 4th loop of the ellipse-drawing task, based on

258 neurons in the ventral premotor cortex. A,B,C: Actual position trajectory along with decoded

trajectories; D, E: actual velocity and decoded velocity for the PF algorithm, over the 100 time

bins in the 4th loop using M1 and M2, respectively; and F: squared errors of decoded velocities for

the PF algorithm using M1 and M2.

19

6 Acknowledgements

This work was partially supported by NSF Grant IIS-083148.

20

A Time-Rescaling Theorem.

Let 0 < s1 < s2 < . . . < sn < T be a realization from a point process with a conditional intensity

function r(t) satisfying 0 < r(t) for all t ∈ (0, T ]. Define the transformation

Λ(sj) =

∫ sj

0r(u)du, (17)

for j = 1, . . . , n, and assume Λ(t) <∞ with probability one for all t ∈ (0, T ]. Then the Λ(sj)′s are

a Poisson process with unit rate.

Given that the Λ(sj)′s are a Poisson process with unit rate, if we define the inter-spike times

as τk = Λ(sk+1) − Λ(sk), for k = 1, . . . , n − 1; thus, τk ∼ Exponential(1) (see e.g., Taylor and

Karlin, 1998). Now, by the probability integral transform, zk defined as 1 − exp (−τk) follows a

uniform distribution, that is, zk ∼ Uniform(0, 1) for k = 1, ..., n − 1.

B Kolmogorov-Smirnov test and Quantile-Quantile plot

We use the Kolmogorov-Smirnov test and Quantile-Quantile plot to test goodness-of-fit of the spike

data model that assumes an inhomogenuous Poisson process. These tests are applied to one neuron

at the time and are described in Figure 7.

21

(i) Let r(·) = λ(·) in Equation (17), with λ(·) the observed intensity function defined in Section 2.

(ii) Calculate z1, . . . , zN−1.

(iii) Find the order statistics z(1), . . . , z(N−1).

(iv) Compute the cumulative distribution function of the uniform density defined as bk =j − 1

2

N − 1,

(v) Plot {bk}N−1k=1 versus {z(k)}

N−1k=1 .

(vi) Add the confidence bounds bk ±1.63

(N − 1)1/2, which correspond to the Kolmogorov-Smirnov

test

(vii) Add the 2.5th and 97.5th percentiles of the beta density with parameters k and N − k, which

correspond to the Quantile-Quantile plot.

Figure 7: Kolmogorov-Smirnov Test and Quantile-Quantile plot

22

C Particle Filter Algorithm

1. Initialize: t = 0.

2. For i = 1, . . . , N , sample X0(i) ∼ p(X0)

3. Importance sampling step:

For i = 1, . . . , N , sample X̃t

(i)∼ p(Xt|Xt−1) and set

(X̃0

(i), . . . , X̃t

(i)) = (X0

(i), . . . ,Xt−1(i), X̃t

(i))

For i = 1, . . . , N , evaluate the importance weights

w̃(i)t = p(Nt|X̃1

(i), X̃2

(i), . . . , X̃t

(i))

Normalize the importance weights.

4. Selection step:

Resample with replacement N particles (X0(i), . . . ,Xt

(i), i = 1, . . . , N)

from the set (X̃0

(i), . . . , X̃t

(i), i = 1, . . . , N) according to the importance weights.

5. Set t← t + 1 and go to step 3.

Figure 8: Particle Filter Algorithm

23

References

F. J. Anscombe. The transformation of poisson, binomial and negative-binomial data. Biometrika,

35:246–254, 1948.

J. Ashe and A. P. Georgopoulos. Movement parameters and neural activity in motor cortex and

area 5. Cereb. Cortex, 6:590–600, 1994.

J. M. Black, E. Bienenstock, J. P. Donoghue, M. Serruya, W. Wu, and Y. Gao. Connecting

brains with machines: The neural control of 2d cursor movement. In Proceedings of the First

International IEEE/EMBS Conference on Neural Engineering, pages 580–583, 2003.

Anthony E Brockwell, Alex L Rojas, and Robert E Kass. Recursive bayesian decoding of cortical

signals by particle filter. J. Neurophysiol., 91:1899–1907, 2004.

E. N. Brown, R. Barbieri, V. Ventura, R. E. Kass, and L. M. Frank. The time-rescaling theorem

and its application to neural spike train data analysis. Neural Computation, 14:325–346, 2001.

E. N. Brown, L. M. Frank, D. Tang, M. C. Quirk, and M. A. Wilson. A statistical paradigm for

neural spike train decoding applied to position prediction from ensemble firing patterns of rat

hippocampal place cells. Neuroscience, 18:7411–7425, 1998.

George Casella and Roger Berger. Statistical Inference. Duxbury, 2nd edition, 2002.

W. S. Cleveland and S. J. Devlin. Locally-weighted regression: an approach to regression analysis

by local fitting. Journal of the American Statistical Association, 83:596–610, 1988.

A. Doucet, M. de Freitas, and N. J. Gordon, editors. Sequential Monte Carlo Methods in Practice.

Springer Verlag, New York, 2001.

Y. Gao, J. M. Black, E. Bienenstock, S. Shoham, and J. P. Donoghue. Probabilistic inference of

arm motion from neural activity in motor cortex. Advances in Neural Information Processing

Systems 14,The MIT Press, 2002.

Y. Gao, J. M. Black, E. Bienenstock, W. Wu, and J. P. Donoghue. A quantitative comparison of

linear and non-linear models of motor cortical activity for the encoding and decoding of arm mo-

24

tions. In Proceedings of the First International IEEE/EMBS Conference on Neural Engineering,

pages 189–192, 2003.

A. P. Georgopoulos, R. Caminiti, J. F. Kalaska, and J. T. Massey. Spatial coding movement: a

hypothesis concerning the coding of movement direction by cortical populations. Exp. Brain Res.

Suppl., 7:327–336, 1983.

T. Hastie and R. Tibshirani. Generalized Additive Models. Chapman and Hall, London, England,

1990.

A Johnson and S Kotz. Distribution in statistics: Continuous univariate distributions – 2. Wiley,

New York, 1970.

P. McCullagh and J. A. Nelder. Generalized Linear Models. Chapman and Hall, London, England,

1989.

D. W. Moran and A. B. Schwartz. Motor cortical representation of speed and direction. J. Neuro-

physiol., 82:2676–2692, 1999a.

D. W. Moran and A. B. Schwartz. Motor cortical representation of speed and direction. J. Neuro-

physiol., 82:2693–2704, 1999b.

Liam Paninski, Shy Shoham, Matthew R. Fellows, Nicholas G. Hatsopoulos, and John P. Donoghue.

Superlinear population encoding of dynamic hand trajectory signals in primary motor cortex.

Journal of Neuroscience, 560(3):883–896, 2004.

G.A. Reina and A.B. Schwartz. Eye-hand coupling during closed-loop drawing: evidence of shared

motor planned? Hum. Move. Sci., 22:137–152, 2003.

E. Salinas and L. F. Abbott. Vector reconstruction from firing rates. J. Compu. Neuro., 1:89–107,

1994.

S. Shoham, L. M. Paninski, M. R. Fellows, N. G. Hatsopoulos, J. P. Donoghue, and R. A. Normann.

Optimal decoding for a primary motor cortical brain-computer interface: I. statistical encoding

models for mi neurons. IEEE, 2003.

25

B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman and Hall, London,

1986.

D. M. Taylor, S. I. Helms Tillery, and A. B. Schwartz. Direct cortical control of 3d neuroprosthetic

devices. Science, 296:1829–1832, 2002.

Howard M Taylor and Samuel Karlin. An Introduction To Stochastic Modeling. Academic Press,

New York, 3rd edition, 1998.

W. Wu, J. M. Black, Y. Gao, E. Bienenstock, M. Serruya, and J. P. Donoghue. Inferring hand

motion from multi-cell recordings in motor cortex using a kalman filter. Proceedings of the

SAB’02-Workshop on Motor Control in Humans and Robots: On the Interplay of Real Brains

and Artificial Devices, pages 66–73, 2002.

W. Wu, J. M. Black, Y. Gao, E. Bienenstock, M. Serruya, A. Shaikaouni, and J. P. Donoghue.

Neural decoding of cursor motion using a kalman filter. In Advances in Neural Information

Processing Systems 15, The MIT Press, 2003.

26

improved models for analysis of motor-cortical signalsimproved models for analysis of motor-cortical...

Documents