decision(making,models( definition( action
TRANSCRIPT
Decision Making, Models Definition Models of decision making attempt to describe, using stochastic differential equations which represent either neural activity or more abstract psychological variables, the dynamical process that produces a commitment to a single action/outcome as a result of incoming evidence that can be ambiguous as to the action it supports. Background Decision making can be separated into four processes (Doya, 2008):
1) Acquisition of sensory information to determine the state of the environment and the organism within it.
2) Evaluation of potential actions (options) in terms of the cost and benefit to the organism given its belief about the current state.
3) Selection of an action based on, ideally, an optimal tradeoff between the costs and benefits.
4) Use of the outcome of the action to update the costs and benefits associated with it. Models of the dynamics of decision making have focused on perceptual decisions with only two possible responses available. The term two-‐alternative forced choice (TAFC) applies to such tasks when two stimuli are provided, but the term is now generally used for any binary choice discrimination task. In a perceptual decision, the response, or action, is directly determined by the current percept. Thus the decision in these tasks is essentially one of perceptual categorization, namely process (1) above, though the same models can be used for action selection given ambiguous information of the current state (process 3). Evaluation of the possible responses in terms of their value or the resulting state’s utility (process 2) (Sugrue et al., 2005) given both uncertainty in the current state, and uncertainty in the outcomes of an action given the state, is the subject of expected utility theory and prospect theory. The necessary learning and updating of the values of different actions given the actual outcomes they produce (process 4) is the subject of instrumental conditioning and reinforcement learning, for example via temporal-‐difference learning (Seymour et al., 2004) and actor-‐critic models (Joel et al., 2002). This article is primarily concerned with the dynamics of the production of either a single percept given unreliable sensory evidence (1), or a single action given uncertainty in the outcomes (3). General features of discrimination tasks or TAFC tasks. In a TAFC task, a single decision variable can be defined, representing the likelihood ratio—the probability that evidence to date favors one alternative over the other. While TAFC tasks (Figure 1) have provided the dominant paradigm for analysis of choice behavior, the restriction to only two choices is lifted in many of the more recent models of decision making based on multiple variables, allowing for the fitting of a wider range of data sets.
The tasks can either be based on a free response paradigm, in which a subject responds after as much or little time as she wants, or an interrogation (forced response) paradigm, in which the stimulus duration is limited and the subject must make a response within a given time interval. The free response paradigm is perhaps more powerful, since each trial produces two types of information: accuracy (correct or incorrect) and response time. However, by variation of the time allowed when responses are forced, both paradigms are valuable for constraining models, since they can provide a distribution of response times for both correct and incorrect trials, as well as the proportion of trials that are correct or incorrect with a given stimulus. These behavioral data can be modified by task difficulty, task instructions, such as (“respond rapidly” versus “respond accurately”) or reward schedules and inter-‐trial intervals. Most models of the dynamics of decision making focus on tasks where the time from stimulus onset to response is no more than one to two seconds, a timescale over which neural spiking can be maintained. Choices requiring much more time than this are likely to depend upon multiple memory stores, neural circuits and strategies, which become difficult to identify, extract and model in a dynamical systems framework (a state-‐based framework is more appropriate).
Figure 1. Scheme of the two-‐alternative forced choice (TAFC) task. Two streams of sensory input, each containing stimulus information, or a signal (S1 and S2) combined with noise (𝜎!𝜂! 𝑡 and 𝜎!𝜂! 𝑡 ), are compared in a decision-‐making circuit. The circuit must produce one of two responses (A or B) indicating which of the two signals is the stronger. The optimal method for achieving this discrimination is via the sequential probability ratio test (SPRT) which requires the decision making circuit to integrate inputs over time. In the standard setup of the models, two parallel streams of noisy sensory input are available, with each stream supplying evidence in support of one of the two allowed actions (see Figure 1). The sensory inputs can be of either discrete or continuous quantities and can arrive discretely or continuously in time. The majority of models focus on continuous update in continuous time so can be formulated as stochastic differential equations (Gillespie, 1992, Lawler, 2006). The sensory evidence, which is momentary, produces a decision variable, which indicates the likelihood of choosing one of the two alternatives given current evidence and all prior evidence. The primary difference between models is in how sensory evidence determines the decision variable. While most models incorporate a form of temporal integration of evidence (Cain and Shea-‐Brown, 2012) and include a negative interaction between the two sources of evidence, differences arise in the stability
of initial states which determines whether integration is perfect and in the nature of the interaction: feedforward between the inputs, feedforward between outputs or feedback from outputs to decision variables (Bogacz et al., 2006). Models can also differ in their choice of decision threshold—the value of the decision variable at which a response is produced—in the free response paradigm (Simen et al., 2009, Deneve, 2012, Drugowitsch et al., 2012), and in particular whether this parameter or other model parameters, such as input gain, which also affect the response time distribution, are static or dynamic across a trial (Shea-‐Brown et al., 2008, Thura et al., 2012). As the time available for acquisition of sensory information increases, so accuracy of responses increases in a perceptual discrimination task. Accuracy is measured as probability of choosing the response leading to more reward, which is equivalent to obtaining a veridical percept in these tasks. All of the models to be discussed below can produce such a speed-‐accuracy tradeoff by parameter adjustment. If parameters are adjusted so as to increase the mean response time, then accuracy increases. Such a tradeoff is observed in behavioral tasks, when either instructions or the schedule of reward and punishment encourages participants to respond as quickly as possible while being less concerned about making errors, or to respond as accurately as possible, while being less concerned about the time it takes to decide. The simplest way to effect such a tradeoff is to adjust the inter trial interval, which if long compared to the decision time, means that accuracy of responses impacts reward rate much more so than the time for the decision itself. Models can replicate such behavior when optimal performance is based on the maximal reward rate. Typical parameter adjustments to increase accuracy while slowing responses would be a multiplicative scaling down of inputs (and the concurrent input noise) or a scaling up of the range across which the decision variable can vary by raising a decision threshold (Figs 2-‐3) (Ratcliff, 2002, Simen et al., 2009, Balci et al., 2011). A similar effect can be achieved in alternative, attractor-‐based models through the level of a global applied current, which affects the stability of the initial “undecided” state (Figs 6-‐9) (Miller and Katz, 2013). From a neuroscience perspective, the decision variable is typically interpreted as either the mean firing rate of a group of neurons or a linear combination of rates of many neurons (Beck et al., 2008) (the difference between two groups being the simplest such combination). There has been remarkable progress in matching the observed firing patterns of neurons (Newsome et al., 1989, Shadlen and Newsome, 2001, Huk and Shadlen, 2005) with the dynamics of a decision variable in more mathematical models of decision making (Glimcher, 2001, Gold and Shadlen, 2001, Glimcher, 2003, Smith and Ratcliff, 2004, Gold and Shadlen, 2007, Ratcliff et al., 2007). This has led to the introduction of biophysically based models of neural circuits (Wang, 2008), which have accounted for much of the concordance between simple mathematical models, neural activity and behavior. Optimal Decision Making An optimal decision-‐making strategy either maximizes expected reward over a given time or minimizes risk. In TAFC perceptual tasks, a response is either correct or an error. In the interrogation paradigm, with fixed time per decision, the optimal strategy is the one leading to greatest accuracy, that is the lowest expected error rate. In the free response paradigm the optimal strategy either delivers the greatest accuracy for a given mean response time, or produces the fastest mean response time for a given accuracy. In these tasks, the sequential probability ratio test (SPRT), introduced by Wald and Wolfowitz (Wald, 1947, Wald and Wolfowitz, 1948), and in its continuous form, the drift diffusion model (DDM) (Ratcliff and
Smith, 2004, Ratcliff and McKoon, 2008) leads to optimal choice behavior by any of these measures of optimality (see (Bogacz et al., 2006) for a thorough review). Using SPRT in the interrogation paradigm, one simply accumulates over time the log-‐likelihood ratio of the probabilities of each alternative given the stream of evidence, where the observed sensory input per unit time has a certain probability given alternative A and another probability given alternative B. Integrating the log-‐likelihood over time, after setting the initial condition as the log-‐likelihood ratio of the prior probabilities, log[P(A)/P(B)], leads to a quantity log[P(A|S)/P(B|S)] which is greater than zero if A is more likely than B given the stimulus and less than zero otherwise. Thus the optimal procedure is to choose A or B depending on the sign of the summed, or in the continuous limit, integrated, log-‐likelihood ratio. In the free response paradigm a stopping criterion must be included. This is achieved by setting two thresholds for the integrated log-‐likelihood ratio, a positive one (+𝑎) for choice A and a negative one (– 𝑏) for choice B. The further the thresholds are from the origin, the lower the chance of error, but the longer the integration time before reaching a decision. Thus the thresholds reflect the fraction of errors that can be tolerated, with 𝑎 = log !
!!! and
𝑏 = log !!!!
where 𝛼 is the probability of choosing A when B is correct and 𝛽 is the probability of choosing B when A is correct. The Models Accumulator Models The first models of decision making in humans or animals were accumulator models, sometimes called counter models or race models. In these models, evidence accumulates separately for each possible outcome. This has the advantage that if many outcomes are possible, the models are simply extended by addition of one more variable for each additional alternative, with evidence for each alternative accumulating within its allotted variable. In the interrogation paradigm, one simply reads out the highest variable, so the choice depends on the sign of the difference of the two variables in the TAFC paradigm. Thus, if the difference in accumulated quantities matched the difference in integrated log probabilities of the two types of evidence, such readout from an accumulator model would be equivalent to an SPRT, so would be optimal. In the free response paradigm, accumulator models produce a choice when any one of the accumulated variables reaches a threshold, so these models can be called “race to threshold models” or simply “race models”. The original accumulator models included neither interaction between accumulators, nor ability for variables to decrease. However, for decisions in nature or in laboratory protocols, evidence in favor of one alternative is typically evidence against the other alternative. This is particularly problematic in the free response paradigm, because the time at which one variable reaches threshold and produces the corresponding choice is independent of evidence accumulated for other choices. Thus the behavior of simple accumulator models is not optimal. Comparisons of response time distributions of these models with behavioral responses showed the models to be inaccurate in this regard—observed response-‐time distributions are skewed with a long-‐tail, whereas the response times of accumulator models were much more symmetric about the mean. These discrepancies led to the ascendance of Ratcliff’s drift diffusion model (Ratcliff, 1978).
The Drift Diffusion Model The drift diffusion model (DDM) is an integrator with thresholds (Figure 2), or more precisely, the decision variable, 𝑥, follows a Wiener process with two absorbing boundaries (Figure 3). It includes a deterministic (drift) term, 𝑆, proportional to the rate of incoming evidence and a diffusive noise term of variance 𝜎!, which produces variability in response times and can lead to errors:
𝑑𝑥𝑑𝑡
= 𝑆 + 𝜎𝜂 𝑡 , where 𝜂 𝑡 is a white noise term defined by 𝜂 𝑡 𝜂 𝑡′ = 𝛿 𝑡 − 𝑡′ .
Figure 2. The drift diffusion model (DDM). The DDM is a one-‐dimensional model, so the two competing inputs and their noise terms are first combined: in this case 𝑆 = 𝑆! − 𝑆! and 𝜎 =𝜎!! + 𝜎!!.
If the model is scaled to a given level of noise then its three independent parameters are drift rate (S) and positions of each of the two thresholds (a, -‐b) with respect to the starting point. When the model was introduced, these parameters were assumed fixed for a given subject in a specific task. The threshold spacing determines where one operates in the speed-‐accuracy tradeoff, so can be optimized as a function of the relative cost for making an incorrect response and the time between trials. Any starting point away from the midpoint represents bias or prior information. The drift rate is proportional to stimulus strength.
S1
S2
Choice A X > 0
Choice B X < 0
1(t)
2(t)
+
+
-or X = +T
or X = -T
S + n X
σ1
σ
η
2η
m
X
0
X
t
m t
P(X,t)
X(0) = 0
+a
−b
St
(for St + t << +a)
Figure 3. The drift diffusion model is a Wiener process (one-‐dimensional Brownian motion) with absorbing boundaries. In the absence of boundaries the probability distribution is Gaussian, centered at a distance 𝑆𝑡 from its starting point with standard deviation increasing as 𝜎 𝑡. With fixed parameters, which could be fitted to any subject’s responses, the DDM reproduces key features of the behavioral data: notably the skewed shape of response time distributions and the covariation of mean response times and response accuracy with task difficulty. Skewed response time distributions arise because the variance of a Wiener process increases linearly with time—responses much earlier than the mean response time, when the variance in the decision variable is low, are less likely than responses much later, when the variance in the decision variable is high. A more difficult perceptual choice is represented by a drift rate closer to zero, which increases response times and increases the probability of error. Such covariation of response times with accuracy matches behavioral data well, so long as an additional processing time is added to the model—the additional time representing a variable sensory transduction delay on the input side and a motor delay on the output side, both of which contribute to response times in addition to the processing within any decision circuit. While the original DDM included trial-‐to-‐trial variability through the diffusive noise term, additional trial-‐to-‐trial variability in the parameters was needed to account for differences between the response time distributions of correct trials from those of incorrect trials. With fixed parameters and no initial bias, the DDM produces identical distributions (though with different magnitudes) for the timing of correct responses and errors. However, response times of human subjects are typically slower when they produce errors, unless they are instructed to respond as quickly as possible, in which case the reverse is true. These behaviors are accounted for in the DDM by including trial-‐to-‐trial variability in the drift rate and/or the starting point. Trial-‐to-‐trial variability in drift rate leads to slower errors, as errors become more likely on those trials when the drift rate is altered from its mean toward zero and mean responses are longer. Trial-‐to-‐trial variability in the starting point leads to faster errors, as errors become more likely on those trials in which the starting point is closer to the error boundary, in which case the response is faster. Error response times are reduced more by such variability than are correct response times since the correct responses include more of the trials started at the midpoint where mean times are longer. Prior information or bias can be incorporated in the drift-‐diffusion model, either through a shift in the starting point of integration, or an additional bias current to the inputs to be integrated. A shift in starting point is more optimal and has led to two-‐stage models , where the first stage sets the starting point from the values of the two choices, before a second stage of integration toward threshold commences. Such a model is supported by electrophysiological data (Rorie et al., 2010).
The Leaky Competing Accumulator Model
Figure 4: The Leaky Competing Accumulator (LCA) model. Two separate integration processes for the two separate stimuli produce two decision variables, 𝑋! and 𝑋!. A “leak” term proportional to 𝑘 causes a decay of the decision variable through self-‐inhibition while a cross-‐inhibition term proportional to 𝛽 produces competition. In the interrogation paradigm a decision is made according to the greater of 𝑋! and 𝑋! at the response time, while in the free response paradigm a choice is made when one decision variable first reaches its threshold (given as +a and +b respectively). The Leaky Competing Accumulator (LCA), introduced by Usher and McClelland (Usher and McClelland, 2001), was suggested to be in better accord with neural data and to better fit some behavioral data than the DDM. The LCA is a two-‐variable model, with each variable integrating evidence in support of one of the two alternatives in a TAFC task. The model includes a “leak” term, as decay back to baseline for each individual variable in the absence of incoming evidence. The “competition” in LCA is a cross-‐inhibition between the two variables, thus improving upon original accumulator models in allowing evidence for one variable contributing to a reduction in the other variable.
𝜏𝑑𝑋!𝑑𝑡
= 𝑆! − 𝑘𝑋! − 𝛽𝑋! + 𝜎! 𝜏𝜂! 𝑡
𝜏𝑑𝑋!𝑑𝑡
= 𝑆! − 𝑘𝑋! − 𝛽𝑋! + 𝜎! 𝜏𝜂! 𝑡 The difference in the two LCA variables, 𝑥! = 𝑥! − 𝑥! follows an Ornstein-‐Uhlenbeck process:
𝜏𝑑𝑋!𝑑𝑡
= 𝑆! − 𝑘𝑋! + 𝛽𝑋! + 𝜎! 𝜏𝜂 𝑡
where 𝑆! = 𝑆! − 𝑆! and 𝜎! = 𝜎!! + 𝜎!!. It should be noted that the DDM is retrieved as a special case of the LCA, if the coefficient of the term linear in 𝑋! , that is 𝛽 − 𝑘, is set to zero. If the only criterion for choosing the best model were its match to behavioral data, one could use Aikake Information Criterion or Bayesian Information Criterion to assess whether inclusion of a non-‐zero term in the LCA is justified by the better fit to data so produced. As well as ability to fit human response times, the ability for a model to match neural activity and to be generated robustly in a neural circuit should be considered when assessing its value and relevance. One achievement of the LCA is to reproduce an initial increase in both variables, before the competition between variables causes one variable to be suppressed, while the other variable accelerates its rise. Such behavior matches electrophysiological data recorded in monkeys during perceptual choices—in particular firing rates of neurons
representing the response not chosen increase upon stimulus onset before they are suppressed (cf trajectories in Figure 6). Neural Network Models and Attractor States Some of the first models of perceptual categorization, which have had significant impact on neuroscience, were Hopfield networks, Cohen-‐Grossberg neural networks and Willshaw networks. While these models are primarily aimed at formation and storage of memories, the retrieval of a memory via corrupted information is identical to a perceptual decision. A correspondence between memory retrieval and decision making should not be surprising, since Ratcliffs’ introduction of the DDM—the archetype of decision making models—was within a paper entitled “A theory of memory retrieval”. Ratcliff was focused on fitting response times, so thus produced an inherently dynamic model—memory retrieval was treated as a set of parallel DDMs, each representing an individual item with a “match” and “non-‐match” threshold to indicate its recognition or not. The rate of evidence accumulation in each individual DDM was taken as the closeness of match between a current item and one in memory. Models such as those of Hopfield respond to the match between a current item and memories of previously encoded items, which are contained within the network as attractor states. The network’s activity reaches a stable attractor state more rapidly if the match is close. The temporal dynamics of memory retrieval or pattern completion, which comprises the decision process, is not addressed carefully in neural network models, in which neural units can be binary and time can be discretized, since this is not their goal. However, use of attractor states to represent the outcome of a decision or a perceptual categorization has achieved success in more biophysical models based on spiking neurons (Wang, 2008), albeit with far simpler attractor states than those of neural networks. Biophysical Models The LCA model (Usher and McClelland, 2001), being motivated by neurophysiology, is similar in spirit to the more detailed biophysical models that followed it. In particular, the first model of decision making based on spiking neurons assumes two competing integrators, where integration is produced through tuned recurrent excitation and competition is the result of cross-‐inhibition between the pools of neurons (see Figure 5). However, when even the most elementary properties of spiking neurons are taken into account, a few additional complications arise (Wong and Wang, 2006).
Figure 5. A neural circuit model of decision-‐making. Integration of stimuli (S1, S2) by groups of neurons with rates r1 and r2 can be achieved through strong excitatory recurrent connections within groups (looped
S1
S2
r1
r2
INHIB
connections with arrows). The mean firing rate of cells in the inhibitory pool (“INHIB”) increases with both r1 and r2, producing competition through inhibitory input to both cell-‐groups (solid circles) (Wang, 2002). First, neurons emit spikes as a point process in time, so that even at a constant firing rate they supply a variable current to other cells. The variability in the current can be reduced with an increase in the number of cells in a group, so long as the spike times are uncorrelated—that is, cells are firing asynchronously. Asynchrony is most easily achieved when neurons spike irregularly, a feature produced by adding additional noise to each neuron in the decision making circuit. So, in accord with most likely realizations of a decision making circuit in vivo, biophysical models introduce additional noise into the decision making model itself, which inevitably adds to any already-‐present stimulus noise (Wang, 2002). Second, neurons are either excitatory or inhibitory, so in order for two groups of cells with self-‐excitation to inhibit each other, at a minimum a third group of inhibitory cells must be added. The need for an extra cell-‐group to mediate the inhibition adds a small delay to the cross-‐inhibition compared to self-‐excitation, though this effect can be counteracted with fast responses in the synaptic connections to and from inhibitory cells versus slower synaptic responses in excitatory connections. The second effect of an intermediate cell-‐group is to add the nonlinearity of the inhibitory interneuron’s response function into the cross-‐inhibition. Thus the inhibitory input to one excitatory cell-‐group is not a linear function of the firing rate of the other cell-‐group. The consequences of this and other nonlinearities are discussed further below. Third, biophysical models of neurons, just like neurons in vivo, respond nonlinearly to increased inputs. Similarly synaptic inputs to a cell saturate, so are a nonlinear function of presynaptic firing rates. In the LCA model (Usher and McClelland, 2001), both the neural response and the feedback are linear functions passing through the origin, so via the tuning of one variable the two curves can match each other and produce an integrator. Integrators require such a matching of synaptic feedback to neural response so they can retain a stable firing rate in the absence of input—the firing rate produced by a given synaptic input must be exactly that needed to feedback the same synaptic input—and this must be true for a wide range of firing rates. Such matching of synaptic feedback to neural response produces a state of marginal stability, typically called a line attractor or continuous attractor.
Figure 6. Nonlinearity of neural response functions produces stable fixed points, toward which neural activity evolves. Nullclines are represented by the solid thick red/green lines, the green line where dr2/dt = 0 at fixed r1 and the red line where dr1/dt = 0 at fixed r2. Crossings of the lines are fixed points of the system, with stable fixed points denoted by solid black circles. Decision states are stable fixed points with r2>>r1 or r1>>r2. As a symmetric input current is added to the network, the spontaneous “undecided” state, that is the fixed point of low rates with r1=r2, becomes unstable, so that by I1=I2=10, the only stable fixed points correspond to one
0 1000
20
40
60
80
100
r1 (Hz)
r 2 (Hz)
I1 = I2 = 0
0 100r1 (Hz)
I1 = I2 = 5
0 100r1 (Hz)
I1 = I2 = 10
0 100r1 (Hz)
I1 = I2 = 15
choice or the other and a decision is forced. With high enough applied input current, I1=I2=15, a new symmetric stable state appears at high firing rates, but this plays no role in the decision making process. Deterministic trajectories from a range of starting points are indicated by the thin colored lines, which terminate at a fixed point. See Table 1 for an xppaut code, which produces the nullclines, and Table 2 for the Matlab code, which produces the trajectories.
Figure 7. Stochastic noise in the simulation produces trial-‐to-‐trial variability in the neural responses. With the same neural circuit of Fig. 6, addition of noise can cause neural activity to end up in different states, even with the same starting points. Small colored dots indicate trajectories, with black solid circles denoting end points after 5s. All simulations begin with 𝑟! = 𝑟! = 0.1𝐻𝑧. Far left: with no applied current, neural activity is maintained near the low-‐rate spontaneous state. Center left: with moderate applied current, noise can induce transitions to one of the two “decision states”. Center and far right: with increased applied current, trajectories always end up at a decision state—even with I1=I2=15, the two decision states are more stable than the symmetric high-‐rate state (far right). See Table 1 for an xppaut code, which produces the nullclines, and Table 2 for the Matlab code, which produces the trajectories. However, in the absence of symmetry or a remarkable similarity in the shape of neural response curves and synaptic feedback curves, the nonlinear curves of biophysical neurons intersect at no more than three points (Figure 6), leading to the possibility of two discrete stable attractor states for a group of cells (with an unstable fixed point in between). When two such groups are coupled, such as by cross-‐inhibition, the network can have at most four stable states, given by the combinations of low and high firing rates for the two groups. In the winner-‐takes-‐all model of decision making, three of these states can be stable in the absence of input: the state when both cell-‐groups have low, or spontaneous activity and the two states with just one of the cell-‐groups possessing high activity. The fixed point with both groups possessing high activity is unstable. In the presence of input, the symmetric low-‐activity state becomes unstable and only two stable states remain: the decision states with one group active (“the winner”) and the other group inactive (“the loser”). Importantly, with a combination of slow synaptic time constants (NMDA receptors with a time constant of 50-‐100ms are an essential ingredient of the model) and sufficient fine tuning of parameters of the network, the time course for the network’s activity to shift from the unstable initial state to one of the two remaining stable attractor states is slow enough to match neural and behavioral response times.
0 1000
20406080
100
r1 (Hz)
r 2 (Hz)
I1 = I2 = 0
0 100r1 (Hz)
I1 = I2 = 5
0 100r1 (Hz)
I1 = I2 = 10
0 100r1 (Hz)
I1 = I2 = 15
Figure 8. A bias in the inputs causes neural activity to favor one decision state over the other, though noise means that “errors” can arise. Left: In the deterministic system more trajectories terminate at the attractor point of high r1, because group-‐1 receives higher input current. In particular, a symmetric initial condition (r1=r2) results in termination with high r1 and low r2. That is, the basin of attraction for the state with r1>r2 includes the line r1=r2. Right: With added noise, some trials with a symmetric starting point terminate in the state with high r2—these correspond to “Errors” in the standard terminology of decision making. See Table 1 for an xppaut code, which produces the nullclines, and Table 2 for the Matlab code, which produces the trajectories.
Figure 9. In a noisy system, the initial state can remain deterministically stable, but responses terminate in one of the decision states, with the bias favoring one final state over the other. Left: in the absence of noise and a bias in the inputs, more trajectories evolve to the fixed point favored by the input, but many terminate at the “undecided” state. Right: with added noise, and symmetric initial conditions, most otherwise “undecided” responses switch to the decision state favored by the input bias (corresponding to “Correct” responses) while a few terminate in the other decision state (“Incorrect” responses) and one remains in the symmetric low-‐rate state (an “Undecided” response). See Table 1 for an xppaut code, which produces the nullclines, and Table 2 for the Matlab code, which produces the trajectories. Extension of models to multiple choices Many decision-‐making models for TAFC are simply extended to the case of multiple alternatives (Bogacz et al., 2007b, Furman and Wang, 2008, Niwa and Ditterich, 2008, Ditterich, 2010). Electrophysiological data suggests that neural firing rates reach a threshold independent of number of alternatives, but neurons receive greater inhibition, as revealed by reduced firing rates in their initial, spontaneous activity state (Churchland et al., 2008, Churchland and Ditterich, 2012). This is akin to a reduction in the prior probability of each individual choice alternative before stimulus onset if prior probability impacts the starting point of integration. Models in which the total amount of inhibition depends on the total number of alternative choices (see Figure 10) reproduce such behavior. One
0 1000
20
40
60
80
100
r1 (Hz)
r 2 (Hz)
I1 = 11, I2 = 10
0 100r1 (Hz)
I1 = 11, I2 = 10
0 1000
20
40
60
80
100
r1 (Hz)
r 2 (Hz)
I1 = 5.5, I2 = 5
0 100r1 (Hz)
I1 = 5.5, I2 = 5
consequence of the increased inhibition is a slowing of decision times as the number of alternatives increases.
Figure 10: Models for decision-‐making with multiple alternatives. Left: Extension of the leaky competing accumulator model to multiple alternatives, such that the negative feedback is proportional to the sum of multiple decision variables. Right: Extension of the biophysical model, in which cells providing inhibitory feedback are activated by the summed activity of multiple groups of excitatory cells Sequential Discrimination, Context-‐Dependent Decisions and Prior Information The most developed dynamical models of decision-‐making pertain to the identification of an incoming sensory stimulus or the comparison of two or more concurrent stimuli. However, many decisions require a comparison of successive stimuli and even when two stimuli are concurrent, our attention typically switches from one to the other when making a perceptual choice based on the two stimuli. Thus, models of decision-‐making have been developed in which the stimuli are separated in time, so a form of short-‐term memory is required, with the contents of short-‐term memory affecting the choice of action given a later sensory input (Romo and Salinas, 2003, Machens et al., 2005, Miller and Wang, 2006). The process of making a decision by combining short-‐term memory, which can represent the current “context” (Salinas, 2004), with sensory input provides the essential ingredient for working memory tasks and for model-‐based strategies of action selection (Deco and Rolls, 2005). Prior information can make one response either a more likely alternative or a more rewarding alternative given ambiguous sensory information and lead to across-‐trial dependences in decision-‐making behavior. Consideration of how much to weight prior information compared to the current stimulus requires a separate choice (Hanks et al., 2011), of how long one should continue to acquire sensory input. Such a choice is akin to setting a decision-‐making threshold. Factors affecting the optimal period of obtaining sensory input include the relative clarity of incoming information compared to the strength of the prior (Deneve, 2012), as well as the intrinsic cost in reduced reward rate when taking more time to decide (Drugowitsch et al., 2012). At some point in time, awaiting further sensory evidence does not improve ones probability of a correct choice sufficiently to warrant any extra delay of reward. Solution by dynamic programming (Bellman, 1957) of a model that takes into account these factors suggests that monkeys respond according to an optimal, time-‐varying cost function (Drugowitsch et al., 2012).
S1
S2
S3
SN
∫
∫
∫
∫I = Xn
X1
X2
X3
XN
Σ
S1
S2
S3
SN
r1
r2
r3
rN
INHIB
Testing Decision-‐Making Models Decision-‐making models can be tested more stringently using tasks in which the stimulus is not held constant across the time allotted for producing a response (Zhou et al., 2009, Stanford et al., 2010, Shankar et al., 2011, Rüter et al., 2012). For example, in models such as DDM based on perfect integration, if a stimulus is altered or even reversed for a short amount of time, so long as the stimulus alteration is in the period of its integration it has the same effect on response time and choice probability whether it is early or late. However, in models where the initial state is unstable, a late altered stimulus has weaker impact on the decision-‐making dynamics than an early one. Conversely, in models where the initial state is stable, such as the LCA with a positive leak term, only stimuli shortly before the final response contribute to it—the time constant of the drift term back to the initial state corresponds to a time constant for the forgetting of earlier evidence. Alternatively, if one sets up a task in which noise in the stimulus, controlled by the experimenter, is the dominant noise source in the decision making process, one can analyze separately correct trials and error trials and align them by either time of stimulus onset, or time of response, to assess the impact of noise fluctuations on choice probability or response times. Care must be taken when aligning by response times, since threshold crossings are inevitably produced by noise fluctuations in the direction of the threshold crossed. However, in all cases model predictions can be tested with experimental measurements. Current results appear to be task dependent, as some data sets suggest all evidence is equally impactful (supporting a perfect integrator) (Huk and Shadlen, 2005, Brunton et al., 2013), while others suggest the weighting of sensory evidence is higher early (Ludwig and Davies, 2011) or higher late (Cisek et al., 2009, Thura et al., 2012) or oscillatory (Wyart et al., 2012) across stimulus duration. Beyond Discrimination: Action Selection In the models considered heretofore, the decision of what action to take has been equivalent to the question of what is perceived. This is because in the relevant tasks, the difficulty is in unraveling the cause of a sensory input, which has been degraded either at source, or through sensory processing, or as a result of imperfections of memory encoding and recall. The requisite action given a sensory percept is either via a straightforward instruction for human subjects, or produced by weeks to months of training in non-‐human animal subjects. Thus, in the post-‐training stage used to acquire data, the step from percept to action can be considered very fast and independent of the parameters varied by the experimenter to modify task difficulty. However, most decisions require us to select a course of action given a percept, or given a combination of percepts. Two general strategies, termed model-‐based or model free (Dayan and Daw, 2008, Dayan and Niv, 2008), are possible for action selection. Model-‐based strategies require an evaluation of all possible consequences, with their likelihood, similar in a chess game to calculating all the combinations of moves in response to one move, or conversely, all the possible causes of an observation. A model-‐free system simply learns the value of a given state and selects an action based on the immediately reachable state with highest value. For example, one could move a chess piece to produce the pattern of pieces that has led to most games won in the past. A wide literature in the field of REINFORCEMENT LEARNING (Barto, 1994, Redish et al., 2007) and its possible neural underpinnings (Daw and Doya, 2006, Johnson et al., 2007, Lee and Seo, 2007) addresses how one can learn the value of states over multiple trials. This
literature has influenced model-‐free biophysical models of decision making (Soltani and Wang, 2008, 2010), with the principle requirement being a need for enhanced Hebbian learning when the resulting action leads to positive reinforcement, but not otherwise. The neural activity underlying the decision precedes the reinforcement signal, so in order for the appropriate synapses to be potentiated when positive reinforcement arrives, authors have suggested the activity itself remains in a persistent state of high firing-‐rate (Soltani and Wang, 2006), or a molecular “eligibility trace” of earlier activity persists through the time of the reinforcement signal (Bogacz et al., 2007a, Izhikevich, 2007). A model-‐based method for making a decision is more flexible and can be more accurate, but in all but the simplest cases the combinatorial explosion of alternatives becomes unwieldy and time consuming to calculate. Hierarchical reinforcement learning renders such models more manageable (Barto and Mahadevan, 2003, Ito and Doya, 2011, Botvinick, 2012).
Table 1: A code base on xppaut to produce nullclines and fixed points for two interacting groups of neurons, connected in a decision-‐making circuit, as used in Figs. 4-‐7.
# Rate-based model of two coupled neuron populations. # I1 and I2 are the inputs to each population, which should be varied. par I1=5 par I2=5.5 # Strength of self-excitation par wself=1 # Cross connections are net inhibitory par wcross=-0.2 # tau is the time constant -- the slowest synaptic time constant, of NMDA is used. # tau has no effect on steady state solutions. par tau=0.075 # Sigmoidal f-I curves require three parameters, these are: # rmax, the maximum rate par rmax=100 # Ithresh, current to produce half-maximum rate par Ithresh=50 # delta, determines the width of the sigmoid (sensitivity near Ithresh). par delta=20 # Initial firing rates can be varied. init r1=0 init r2=0 dr1/dt=-r1/tau+rmax/(1+exp(-(I1+wself*r1+wcross*r2-Ithresh)/delta))/tau dr2/dt=-r2/tau+rmax/(1+exp(-(I2+wself*r2+wcross*r1-Ithresh)/delta))/tau # The following are default values for the start-up of xpp. @method=rk,total=2,bound=500,dt=.005,dtmin=1e-12,atoler=1e-7 @toler=1e-5,xhi=2,yhi=100,ylo=0 njmp=5 @ bell=off,nout=50 done
Table 2. The Matlab function used to simulate multiple decision-‐making trials as plotted in each panel of Figs 6-‐9. function [ r1 r2 ] = WTAtrials( Iapp1,Iapp2,sigma ); % WTAtrials produces multiple trials of a winner-takes-all network produced % by coupling two sigmoidal firing rate models, each representing a neural % population. % Iapp1 and Iapp2 are the respective current inputs to the 2 populations. % sigma is the level of noise (added independently to each population). % The function returns the firing rate as a function of time for each % trial for each population. % Number of trials is defined within the function as Ntrials = 10. dt = 0.001; % timestep for simulation tmax = 5.0; % max time for integration tvec = 0:dt:tmax; % time vector Ntrials = 10; r1 = zeros(Ntrials, length(tvec)); % records rate of one cell group r2 = zeros(Ntrials, length(tvec)); % records rate of other cell group tau = 0.05; % time constant (cf NMDA receptors) rmax = 100; % max rate of cells Ithresh = 50; % input needed for 1/2-max firing Idelta = 20; % determines steepness of sigmoidal f-I curve wself = 1; % recurrent excitatory feedback wcross = -0.2; % strength of cross-inhibition trial = 0; rhighstart = 0*rmax; % If nonzero, provides a range of starting points. for trial = 1:Ntrials if ( trial <= Ntrials/2 ) r1init = 0.1+rhighstart*(Ntrials/2-trial)/(Ntrials/2-1); r2init = 0.1; else r2init = 0.1+rhighstart*(Ntrials-trial)/(Ntrials/2-1); r1init = 0.1; end % Implement initial conditions r1(trial,1) = r1init; r2(trial,1) = r2init; % Produce independent random noise for each population noise1 = sigma*sqrt(dt/tau)*randn(size(tvec)); noise2 = sigma*sqrt(dt/tau)*randn(size(tvec)); % Now integrate with basic Euler-Maruyama method using sigmoid f-I % curves and current, I, linear in rate. for i = 2:length(tvec) r1(trial,i) = r1(trial,i-1) + ... dt/tau*( rmax/(1+exp(-(Iapp1+wself*r1(trial,i-1)+wcross*r2(trial,i-1)-Ithresh)/Idelta)) ... -r1(trial,i-1) )+ noise1(i); r2(trial,i) = r2(trial,i-1) + ... dt/tau*(rmax/(1+exp(-(Iapp2+wself*r2(trial,i-1)+wcross*r1(trial,i-1)-Ithresh)/Idelta)) ... -r2(trial,i-1) ) + noise2(i); end end end
Balci F, Simen P, Niyogi R, Saxe A, Hughes JA, Holmes P, Cohen JD (2011) Acquisition of decision making criteria: reward rate ultimately beats accuracy. Atten Percept Psychophys 73:640-‐657.
Barto AG (1994) Reinforcement learning control. Current Opinion in Neurobiology 4:888-‐893.
Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discrete Event Dynamical Systems: Theory and Applications 13:343-‐379.
Beck JM, Ma WJ, Kiani R, Hanks T, Churchland AK, Roitman J, Shadlen MN, Latham PE, Pouget A (2008) Probabilistic population codes for Bayesian decision making. Neuron 60:1142-‐1152.
Bellman R (1957) Dynamic Programming. Princeton, NJ: Princeton University Press. Bogacz R, Brown E, Moehlis J, Holmes P, Cohen JD (2006) The physics of optimal
decision making: a formal analysis of models of performance in two-‐alternative forced-‐choice tasks. Psychological Review 113:700-‐765.
Bogacz R, McClure SM, Li J, Cohen JD, Montague PR (2007a) Short-‐term memory traces for action bias in human reinforcement learning. Brain Res 1153:111-‐121.
Bogacz R, Usher M, Zhang J, McClelland JL (2007b) Extending a biologically inspired model of choice: multi-‐alternatives, nonlinearity and value-‐based multidimensional choice. Philos Trans R Soc Lond B Biol Sci 362:1655-‐1670.
Botvinick MM (2012) Hierarchical reinforcement learning and decision making. Current Opinion in Neurobiology 22:956-‐962.
Brunton BW, Botvinick MM, Brody CD (2013) Rats and humans can optimally accumulate evidence for decision-‐making. Science 340:95-‐98.
Cain N, Shea-‐Brown E (2012) Computational models of decision making: integration, stability, and noise. Current Opinion in Neurobiology 22:1047-‐1053.
Churchland AK, Ditterich J (2012) New advances in understanding decisions among multiple alternatives. Current Opinion in Neurobiology 22:920-‐926.
Churchland AK, Kiani R, Shadlen MN (2008) Decision-‐making with multiple alternatives. Nature Neuroscience 11:693-‐702.
Cisek P, Puskas GA, El-‐Murr S (2009) Decisions in changing conditions: the urgency-‐gating model. J Neurosci 29:11560-‐11571.
Daw ND, Doya K (2006) The computational neurobiology of learning and reward. Current Opinion in Neurobiology 16:199-‐204.
Dayan P, Daw ND (2008) Decision theory, reinforcement learning, and the brain. Cognitive, affective & behavioral neuroscience 8:429-‐453.
Dayan P, Niv Y (2008) Reinforcement learning: the good, the bad and the ugly. Current Opinion in Neurobiology 18:185-‐196.
Deco G, Rolls ET (2005) Attention, short-‐term memory, and action selection: a unifying theory. Prog Neurobiol 76:236-‐256.
Deneve S (2012) Making decisions with unknown sensory reliability. Frontiers in neuroscience 6:75.
Ditterich J (2010) A Comparison between Mechanisms of Multi-‐Alternative Perceptual Decision Making: Ability to Explain Human Behavior, Predictions
for Neurophysiology, and Relationship with Decision Theory. Frontiers in neuroscience 4:184.
Doya K (2008) Modulators of decision making. Nature Neuroscience 11:410-‐416. Drugowitsch J, Moreno-‐Bote R, Churchland AK, Shadlen MN, Pouget A (2012) The
cost of accumulating evidence in perceptual decision making. The Journal of neuroscience : the official journal of the Society for Neuroscience 32:3612-‐3628.
Furman M, Wang XJ (2008) Similarity effect and optimal control of multiple-‐choice decision making. Neuron 60:1153-‐1168.
Gillespie DT (1992) Markov Processes: Academic Press. Glimcher PW (2001) Making choices: the neurophysiology of visual-‐saccadic
decision making. Trends in Neurosciences 24:654-‐659. Glimcher PW (2003) The neurobiology of visual-‐saccadic decision making. Annual
Review of Neuroscience 26:133-‐179. Gold JI, Shadlen MN (2001) Neural computations that underlie decisions about
sensory stimuli. Trends Cogn Sci 5:10-‐16. Gold JI, Shadlen MN (2007) The neural basis of decision making. Annual Review of
Neuroscience 30:535-‐574. Hanks TD, Mazurek ME, Kiani R, Hopp E, Shadlen MN (2011) Elapsed decision time
affects the weighting of prior probability in a perceptual decision task. The Journal of neuroscience : the official journal of the Society for Neuroscience 31:6339-‐6352.
Huk AC, Shadlen MN (2005) Neural activity in macaque parietal cortex reflects temporal integration of visual motion signals during perceptual decision making. The Journal of neuroscience : the official journal of the Society for Neuroscience 25:10420-‐10436.
Ito M, Doya K (2011) Multiple representations and algorithms for reinforcement learning in the cortico-‐basal ganglia circuit. Current Opinion in Neurobiology 21:368-‐373.
Izhikevich EM (2007) Solving the distal reward problem through linkage of STDP and dopamine signaling. Cereb Cortex 17:2443-‐2452.
Joel D, Niv Y, Ruppin E (2002) Actor-‐critic models of the basal ganglia: new anatomical and computational perspectives. Neural networks : the official journal of the International Neural Network Society 15:535-‐547.
Johnson A, van der Meer MA, Redish AD (2007) Integrating hippocampus and striatum in decision-‐making. Current Opinion in Neurobiology 17:692-‐697.
Lawler GF (2006) Introduction to Stochastic Processes. Boca Raton: Chapman & Hall/CRC.
Lee D, Seo H (2007) Mechanisms of reinforcement learning and decision making in the primate dorsolateral prefrontal cortex. Annals of the New York Academy of Sciences 1104:108-‐122.
Ludwig CJ, Davies JR (2011) Estimating the growth of internal evidence guiding perceptual decisions. Cognitive Psychology 63:61-‐92.
Machens CK, Romo R, Brody CD (2005) Flexible control of mutual inhibition: a neural model of two-‐interval discrimination. Science 307:1121-‐1124.
Miller P, Katz DB (2013) Accuracy and response-‐time distributions for decision-‐making: linear perfect integrators versus nonlinear attractor-‐based neural circuits. Journal of Computational Neuroscience.
Miller P, Wang XJ (2006) Discrimination of temporally separated stimuli by integral feedback control. Proc Natl Acad Sci USA 103:201-‐206.
Newsome WT, Britten KH, Movshon JA (1989) Neuronal correlates of a perceptual decision. Nature 341:52-‐54.
Niwa M, Ditterich J (2008) Perceptual decisions between multiple directions of visual motion. The Journal of neuroscience : the official journal of the Society for Neuroscience 28:4435-‐4445.
Ratcliff R (1978) A theory of memory retrieval. Psychological Review 85:59-‐108. Ratcliff R (2002) A diffusion model account of response time and accuracy in a
brightness discrimination task: fitting real data and failing to fit fake but plausible data. Psychon Bull Rev 9:278-‐291.
Ratcliff R, Hasegawa YT, Hasegawa RP, Smith PL, Segraves MA (2007) Dual diffusion model for single-‐cell recording data from the superior colliculus in a brightness-‐discrimination task. J Neurophysiol 97:1756-‐1774.
Ratcliff R, McKoon G (2008) The diffusion decision model: theory and data for two-‐choice decision tasks. Neural Comput 20:873-‐922.
Ratcliff R, Smith PL (2004) A comparison of sequential sampling models for two-‐choice reaction time. Psychol Rev 111:333-‐367.
Redish AD, Jensen S, Johnson A, Kurth-‐Nelson Z (2007) Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. Psychological Review 114:784-‐805.
Romo R, Salinas E (2003) Flutter discrimination: neural codes, perception, memory and decision making. Nat Rev Neurosci 4:203-‐218.
Rorie AE, Gao J, McClelland JL, Newsome WT (2010) Integration of sensory and reward information during perceptual decision-‐making in lateral intraparietal cortex (LIP) of the macaque monkey. PLoS ONE 5:e9308.
Rüter J, Marcille N, Sprekeler H, Gerstner W, Herzog MH (2012) Paradoxical evidence integration in rapid decision processes. PLoS computational biology 8:e1002382.
Salinas E (2004) Fast remapping of sensory stimuli onto motor actions on the basis of contextual modulation. J Neurosci 24:1113-‐1118.
Seymour B, O'Doherty JP, Dayan P, Koltzenburg M, Jones AK, Dolan RJ, Friston KJ, Frackowiak RS (2004) Temporal difference models describe higher-‐order learning in humans. Nature 429:664-‐667.
Shadlen MN, Newsome WT (2001) Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey. Journal of Neurophysiology 86:1916-‐1936.
Shankar S, Massoglia DP, Zhu D, Costello MG, Stanford TR, Salinas E (2011) Tracking the temporal evolution of a perceptual judgment using a compelled-‐response task. The Journal of neuroscience : the official journal of the Society for Neuroscience 31:8406-‐8421.
Shea-‐Brown E, Gilzenrat MS, Cohen JD (2008) Optimization of decision making in multilayer networks: the role of locus coeruleus. Neural Comput 20:2863-‐2894.
Simen P, Contreras D, Buck C, Hu P, Holmes P, Cohen JD (2009) Reward rate optimization in two-‐alternative decision making: empirical tests of theoretical predictions. Journal of experimental psychology Human perception and performance 35:1865-‐1897.
Smith PL, Ratcliff R (2004) Psychology and neurobiology of simple decisions. Trends Neurosci 27:161-‐168.
Soltani A, Wang XJ (2006) A Biophysically-‐based Neural Model of Matching Law Behavior: Melioration by Stochastic Synapses. J Neurosci 26:3731-‐3744.
Soltani A, Wang XJ (2008) From biophysics to cognition: reward-‐dependent adaptive choice behavior. Curr Opin Neurobiol 18:209-‐216.
Soltani A, Wang XJ (2010) Synaptic computation underlying probabilistic inference. Nat Neurosci 13:112-‐119.
Stanford TR, Shankar S, Massoglia DP, Costello MG, Salinas E (2010) Perceptual decision making in less than 30 milliseconds. Nature Neuroscience 13:379-‐385.
Sugrue LP, Corrado GS, Newsome WT (2005) Choosing the greater of two goods: neural currencies for valuation and decision making. Nature reviews Neuroscience 6:363-‐375.
Thura D, Beauregard-‐Racine J, Fradet CW, Cisek P (2012) Decision making by urgency gating: theory and experimental support. Journal of Neurophysiology 108:2912-‐2930.
Usher M, McClelland JL (2001) The time course of perceptual choice: the leaky, competing accumulator model. Psychol Rev 108:550-‐592.
Wald A (1947) Sequential analysis. New York: Wiley. Wald A, Wolfowitz J (1948) Optimum character of the sequential probability ratio
test. Annals of Mathematical Statistics 19:326-‐339. Wang XJ (2002) Probabilistic decision making by slow reverberation in cortical
circuits. Neuron 36:955-‐968. Wang XJ (2008) Decision making in recurrent neuronal circuits. Neuron 60:215-‐234. Wong KF, Wang XJ (2006) A recurrent network mechanism of time integration in
perceptual decisions. The Journal of neuroscience : the official journal of the Society for Neuroscience 26:1314-‐1328.
Wyart V, de Gardelle V, Scholl J, Summerfield C (2012) Rhythmic fluctuations in evidence accumulation during decision making in the human brain. Neuron 76:847-‐858.
Zhou X, Wong-‐Lin K, Philip H (2009) Time-‐varying perturbations can distinguish among integrate-‐to-‐threshold models for perceptual decision making in reaction time tasks. Neural Computation 21:2336-‐2362.