adaptive stepsize selection for tracking in a regime-switching environment

13
Automatica 43 (2007) 1896 – 1908 www.elsevier.com/locate/automatica Adaptive stepsize selection for tracking in a regime-switching environment Andre Costa a , , Felisa J. Vázquez-Abad b a ARC Centre of Excellence for Mathematics and Statistics of Complex Systems, University of Melbourne, 3010, Australia b Department of Mathematics and Statistics, University of Melbourne, 3010, Australia Received 29 June 2006; received in revised form 3 December 2006; accepted 20 March 2007 Available online 30 August 2007 Abstract We consider the problem of using a stochastic approximation algorithm to perform online tracking in a non-stationary environment characterised by abrupt “regime changes”. The primary contribution of this paper is a new approach for adaptive stepsize selection that is suitable for this type of non-stationarity. Our approach is pre-emptive rather than reactive, and is based on a strategy of maximising the rate of adaptation, subject to a constraint on the probability that the iterates fall outside a pre-determined range of acceptable error. The basis for our approach is provided by the theory of weak convergence for stochastic approximation algorithms. Crown Copyright 2007 Published by Elsevier Ltd. All rights reserved. Keywords: Stochastic approximation; Tracking; Regime switching; Abrupt changes; Weak convergence 1. Introduction In this paper, we present a new approach for adaptively selecting the stepsize parameters for stochastic approximation algorithms (Kushner & Yin, 2003) in a non-stationary environ- ment. We consider stochastic approximation algorithms of the general form n+1 := n + n Y n , (1) where n = 1, 2,..., is a discrete-time iteration index, n R q is a column vector of estimates of system parameters of interest, n is a q × q diagonal matrix with diagonal entries n,i ,i = 1,...,q , and the Y n R q are random vectors which, on average, drive the iterates towards some target R q . The vectors n and Y n have components n,i and Y n,i , re- spectively, where i = 1,...,q. In particular, we consider the case of a non-stationary environment where the target is sub- ject to abrupt changes which we refer to as regime switching This paper was not presented at any IFAC meeting. This paper was recommended for publication in revised form by Associate Editor George Yin under the direction of Editor Ian Petersen. Corresponding author. E-mail addresses: [email protected] (A. Costa), [email protected] (F.J. Vázquez-Abad). 0005-1098/$ - see front matter Crown Copyright 2007 Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.automatica.2007.03.025 ( Yin, Krishnamurthy, & Ion, 2004;Yin & Zhang, 2005). Thus, the target is time-dependent, and is denoted n accordingly (with components n,i ,i = 1,...,q ). Tracking algorithms have been studied extensively for the case of slowly time-varying parameters (see, for example (Benveniste, Metivier, & Priouret, 1990; Krishnamurthy, Yin, & Singh, 2001; Kushner, 1995; Kushner & Yin, 2003) and the references therein). These studies analyse the properties of tracking algorithms when the stepsizes are constant, that is, n = for all n. In particular, the theory of weak convergence was used in Benveniste et al. (1990) to characterize and quan- tify the compromise between tracking and accuracy, under a variety of hypermodels governing the non-stationarity. This led to the identification of optimal constant stepsizes for specific classes of tracking problems involving slow variations of the target n . The most common framework for this type of study is that of linear systems; the reader is referred to Benveniste et al. (1990), Farhang-Boroujeny (1998) and Kushner and Yin (2003) for an overview. On the other hand, tracking in a non- stationary environment characterised by abrupt regime changes has received only limited attention, most recently in Yin et al. (2004) and Yin and Zhang (2005). In this paper, we present a novel adaptive stepsize approach that is suitable for online tracking in a regime-switching envi- ronment. To the best of our knowledge, this is the first adaptive

Upload: andre-costa

Post on 26-Jun-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Adaptive stepsize selection for tracking in a regime-switching environment

Automatica 43 (2007) 1896–1908www.elsevier.com/locate/automatica

Adaptive stepsize selection for tracking in a regime-switching environment�

Andre Costaa,∗, Felisa J. Vázquez-Abadb

aARC Centre of Excellence for Mathematics and Statistics of Complex Systems, University of Melbourne, 3010, AustraliabDepartment of Mathematics and Statistics, University of Melbourne, 3010, Australia

Received 29 June 2006; received in revised form 3 December 2006; accepted 20 March 2007Available online 30 August 2007

Abstract

We consider the problem of using a stochastic approximation algorithm to perform online tracking in a non-stationary environment characterisedby abrupt “regime changes”. The primary contribution of this paper is a new approach for adaptive stepsize selection that is suitable for thistype of non-stationarity. Our approach is pre-emptive rather than reactive, and is based on a strategy of maximising the rate of adaptation,subject to a constraint on the probability that the iterates fall outside a pre-determined range of acceptable error. The basis for our approach isprovided by the theory of weak convergence for stochastic approximation algorithms.Crown Copyright � 2007 Published by Elsevier Ltd. All rights reserved.

Keywords: Stochastic approximation; Tracking; Regime switching; Abrupt changes; Weak convergence

1. Introduction

In this paper, we present a new approach for adaptivelyselecting the stepsize parameters for stochastic approximationalgorithms (Kushner & Yin, 2003) in a non-stationary environ-ment. We consider stochastic approximation algorithms of thegeneral form

�n+1 := �n + �nYn, (1)

where n = 1, 2, . . . , is a discrete-time iteration index, �n ∈Rq is a column vector of estimates of system parameters ofinterest, �n is a q × q diagonal matrix with diagonal entries�n,i , i = 1, . . . , q, and the Yn ∈ Rq are random vectors which,on average, drive the iterates towards some target �∗ ∈ Rq .The vectors �n and Yn have components �n,i and Yn,i , re-spectively, where i = 1, . . . , q. In particular, we consider thecase of a non-stationary environment where the target is sub-ject to abrupt changes which we refer to as regime switching

� This paper was not presented at any IFAC meeting. This paper wasrecommended for publication in revised form by Associate Editor GeorgeYin under the direction of Editor Ian Petersen.

∗ Corresponding author.E-mail addresses: [email protected] (A. Costa),

[email protected] (F.J. Vázquez-Abad).

0005-1098/$ - see front matter Crown Copyright � 2007 Published by Elsevier Ltd. All rights reserved.doi:10.1016/j.automatica.2007.03.025

(Yin, Krishnamurthy, & Ion, 2004; Yin & Zhang, 2005). Thus,the target is time-dependent, and is denoted �∗

n accordingly(with components �∗

n,i , i = 1, . . . , q).Tracking algorithms have been studied extensively for the

case of slowly time-varying parameters (see, for example(Benveniste, Metivier, & Priouret, 1990; Krishnamurthy, Yin,& Singh, 2001; Kushner, 1995; Kushner & Yin, 2003) andthe references therein). These studies analyse the properties oftracking algorithms when the stepsizes are constant, that is,�n = � for all n. In particular, the theory of weak convergencewas used in Benveniste et al. (1990) to characterize and quan-tify the compromise between tracking and accuracy, under avariety of hypermodels governing the non-stationarity. This ledto the identification of optimal constant stepsizes for specificclasses of tracking problems involving slow variations of thetarget �∗

n. The most common framework for this type of studyis that of linear systems; the reader is referred to Benvenisteet al. (1990), Farhang-Boroujeny (1998) and Kushner and Yin(2003) for an overview. On the other hand, tracking in a non-stationary environment characterised by abrupt regime changeshas received only limited attention, most recently in Yin et al.(2004) and Yin and Zhang (2005).

In this paper, we present a novel adaptive stepsize approachthat is suitable for online tracking in a regime-switching envi-ronment. To the best of our knowledge, this is the first adaptive

Page 2: Adaptive stepsize selection for tracking in a regime-switching environment

A. Costa, F.J. Vázquez-Abad / Automatica 43 (2007) 1896–1908 1897

stepsize scheme that is specifically designed for this type ofnon-stationarity, and which employs a pre-emptive rather thanreactive approach, as will be described shortly.

By “online”, we refer to the situation where the value ofthe iterates �n generated by (1) have an immediate impact onthe system, in real time, and where no offline training periodis available. For example, take the case where the iterates �n

track the value of a risky stock whose actual return is unknown,and where the �n are used to make investment decisions asthe financial market evolves in real time. Here, tracking errorsresult in over or underestimation of the true returns, which canlead to a loss of actual revenue.

We propose an adaptive stepsize algorithm that is based on astrategy of maximising the rate of adaptation (by maximisingthe stepsizes), subject to a constraint on the probability that theiterates fall outside a given range of “acceptable error”, dis-cussed below. Thus, our approach is pre-emptive rather thanreactive (or predictive), because it maintains a perpetual stateof readiness for change, instead of relying on an auxiliarychange detection mechanism (see, for example, Basseville &Benveniste, 1986) and subsequent reactive control. In particu-lar, this approach is predicated on the assumption that the con-troller has no a priori knowledge of the regime change times,hence the strategy of maintaining readiness by making the step-size as large as possible, subject to a constraint on the accept-able error.

The proposed stepsize selection criterion described above isa novel approach to balancing the competing requirements forfast tracking versus small errors, and can be stated informallyas the following optimisation criterion: for each componenti = 1, . . . , q,

�∗n,i = max

�∈(0,�]�,

s.t. P�n(“error”)��, (2)

where � ∈ (0, 1) determines the error probability, ��1 is anupper bound on the stepsizes, and P

�n(·) represents a probability

measure conditional on �n,i = � and the current regime at timen. Criterion (2) differs from other stepsize selection criteriathat appear in the literature (see, for example, Benveniste et al.,1990; Yousef & Sayed, 2001); the typical approach is to findthe constant stepsizes which minimise

limn→∞ E

�M[‖�n − �∗

n‖2],

where E�M represents the expectation under � and some as-

sumed model M of slow time variations for the target �∗n.

It is important to note that our approach is designed forabrupt regime changes of the kind described in Basseville andBenveniste (1986), Yin et al. (2004) and Yin and Zhang (2005),where �∗

n is constant in between regime change times. It isnot designed for the case where �∗

n is slowly and continuallyvarying, which is the typical context for performance analysisof tracking algorithms.

In this paper, we focus on the following two alternativemodels for the “error” event appearing in (2).

Model 1: Suppose that the user wishes to track the absolutevalues of the components of �∗

n with a given level of accuracy.Hence, for each i = 1, . . . , q, a constraint is imposed on theprobability that the iterates �n,i fall outside the set [�∗

n,i −�i , �

∗n,i +�i], where �i > 0 is a constant set by the user. Thus,

the �i , i = 1, . . . , q, define a region of “tolerance” for eachcomponent within which the user is prepared to operate.

Model 2: Suppose the user wishes to identify the largest com-ponent of �∗

n. Hence, a constraint is placed on the probabilityof the event{

arg maxi=1,...,q

�n,i �= arg maxi=1,...,q

�∗n,i

}.

In this second model, it is the relative ordering (rather than theabsolute values) of the components of �∗ that is important.

Significantly, our approach does not require any assumptionsabout the process governing regime changes; we do not evenrequire that it be stationary. We assume only that the trackingalgorithm has the opportunity to reach near-equilibrium withinthe current regime prior to the next regime change.

The paper is structured as follows. In Section 2, we describethe regime-switching environment. In Section 3, we give thespecific form of the stochastic approximation algorithm thatwill be studied, and we review weak convergence results thatform the basis for our adaptive stepsize scheme, which is pre-sented in Section 4. In Sections 5 and 6 we apply the stepsizecriterion (2) to Models 1 and 2, respectively, and present theresults of numerical experiments. We discuss some additionalapplications of our method in Section 7, followed by conclu-sions in Section 8.

2. Non-stationarity: a regime-switching environment

We consider the problem of tracking a time-dependent quan-tity, �∗

n, n = 1, 2, . . . , in the scenario where �∗n is subject to

abrupt and unpredictable changes. In between such changetimes, which we refer to as regime changes, �∗

n is constant—anillustrative example in one dimension is shown in Fig. 1

In order to more precisely characterize the non-stationaryenvironment, let � denote a countable collection of regimes,where each regime � ∈ � is associated with a 4-tuple

{g�(·), F�(·), �∗�, ��},

where g� : Rq → Rq is a vector-valued function that has itsunique zero at � = �∗

�, and F� : Rq → R is a joint probabilitydistribution with associated mean �∗

� and variance–covariancematrix ��. As we shall see in Section 3, the above formulationwill allow us to pose iteration (1) as a stochastic search methodfor tracking the solution to g�(�)=0. In this paper, we focus onthe particular case where g�(�)=�∗

� −�. A discussion of otherfunctional forms for g is given in Section 7. It is important tonote that none of the above functions or parameters are assumedto be known a priori by the controller.

Let �n ∈ � denote the regime that is active at time n. Tokeep the notation simple, let gn ≡ g�n

, Fn≡F�n, �∗

n≡�∗�n

and�n≡��n

denote the quantities associated with the current regime.

Page 3: Adaptive stepsize selection for tracking in a regime-switching environment

1898 A. Costa, F.J. Vázquez-Abad / Automatica 43 (2007) 1896–1908

0 100 200 300 400 500 600 700 800 900 1000

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Fig. 1. Example of a regime-switching environment.

We introduce a general framework for non-stationarity in theform of regime switching, where changes occur on a time scalethat is slow compared with the iteration index n. Specifically,let T� denote the (random) duration of regime �.

Assumption 1. For each regime � ∈ �, E[T�]?1.

This is an extremely mild assumption; it simply encapsulatesthe requirement that the regime does not change “too often”(for example, at every iteration), so that the adaptive algorithmhas some chance of tracking the changes. Indeed, such an as-sumption would apply to any adaptive algorithm.

Note that we do not make any assumptions about the processgoverning the regime sequence other than Assumption 1; wedo not even require that the modulating process be stationary.

3. A stochastic approximation algorithm

We take the case where �∗n is the unique solution to the

equation gn(�) = 0, and where

E[Yn|�n] = gn(�n)

with probability 1, so that we can pose iteration (1) as a stochas-tic zero-finding search procedure. In particular, the choice

gn(�) = �∗n − � (3)

yields the well-known least mean squares method (Benvenisteet al., 1990; Kushner & Yin, 2003), which we shall employthroughout this paper. We note that while (3) governs the searchdirections for the iteration (1), our criterion (2) will be usedsimultaneously to determine the magnitude of the updates, byadapting the stepsize sequence �n in a manner that is appropriatefor a regime-switching environment. This will be described inSection 4.

We assume that the controller observes a sequence ofindependent random vectors n, n = 1, 2, . . . , with entries

n,i , i = 1, . . . , q, and joint distribution functions Fn, satis-fying E[n,i] = �∗

n,i , V ar[n,i] = �n,ii for i = 1, . . . , q, andCov[n,i , n,j ] = �n,ij for i �= j, i, j = 1, . . . , q. As noted inSection 2, none of these quantities are assumed to be knowna priori by the controller. However, we do make the followingmild assumption.

Assumption 2. For each n, �∗n,i < ∞, i = 1, . . . q, and

�n,ij < ∞, i, j = 1, . . . , q.

The unknown quantity �∗n in (3) is replaced by its “noisy

estimate” n, so that Yn = n − �n. Iteration (1) then takes thespecific form

�n+1 := �n + �n(n − �n), (4)

where we set �1 = 1.

3.1. Review: results for a stationary environment

Stochastic approximation algorithms of form (1) have beenstudied extensively for the stationary case, where �∗

n = �∗ forall n (in other words, when there are no regime changes, andgn(�) = g(�) for all n). We now give a brief review of resultsfor the stationary case.

In this section we focus on the case of constant stepsizes,with �n,i = � for all n and components i. In this case, it iscommon to denote the iterates using ��

n, where the superscriptemphasizes their dependence on the fixed stepsize �. Since theconstant stepsize case will be used to motivate an adaptivestepsize scheme in Section 4, we summarise the main results,which are developed in detail in Benveniste et al. (1990) andalso in Kushner and Yin (2003).

Let ��(t), t ∈ R, be the continuous-time interpolation of��n, n = 0, 1, 2, . . . , given by

��(t) = ��n for t ∈ (�n, �(n + 1)]. (5)

Furthermore, let �(t) ∈ Rq be the continuous-time variable thatsolves the system of ODEs

d�(t)

dt= g(�). (6)

Then, under some regularity assumptions which we outline be-low, as � → 0, there exists a time < ∞ beyond which thefluctuations of the interpolated process around the limit ODE(6) are approximately normally distributed, that is,

��(t) − �∗ d≈N(0, ��) for t �, (7)

where � is a constant matrix that satisfies

A� + �AT + � = 0, (8)

� is the variance–covariance matrix of the observations n, and

A = −∇�g(�∗). (9)

Relation (8) is known as the Lyapunov equation, and appearsfrequently in Benveniste et al. (1990) for the steady-state

Page 4: Adaptive stepsize selection for tracking in a regime-switching environment

A. Costa, F.J. Vázquez-Abad / Automatica 43 (2007) 1896–1908 1899

analysis of tracking algorithms where �∗ is slowly varying. InSection 4, we shall take a different approach, which involvesexploiting (8) to solve the adaptive stepsize criterion (2).

For the case considered in this paper, where g(�) is given by(3), we have A = −I , where I is the identity matrix. Thus, (8)can be solved explicitly to give

� = 12�. (10)

Expression (10) describes the relationship between thevariance–covariance of the observations n and that of theiterates �n.

The regularity assumptions which underpin (7) are given inChapter 10 of Kushner and Yin (2003). For our purposes, it canbe verified that for the function g(�) given in (3), the stochas-tic approximation (4) fits within the framework of Martingaledifference noise, and that Assumptions (A1.1)–(A1.7) in Chap-ter 10 of Kushner and Yin (2003) are satisfied if Assumption 2holds. In particular, the finite variance condition in Assumption2 together with the fact that A=−I is a Hurwitz matrix yieldsthe required uniform integrability of the Yn, tightness of the it-erates �n and stability of the ODE (6). The reader is referredto Kushner and Yin (2003) for details.

In the following section, we turn our attention back toa non-stationary regime-switching environment, where thevariance–covariance matrix �n depends on the current regimeat time n. Thus, (10) becomes

�n = 12�n, (11)

to reflect this time-dependence.

0 100 200 300

0

1

2

3

4

5

6

7

8

9

0 100 200 300

0

1

2

3

4

5

6

7

8

9

small stepsize ( = 0.05) large stepsize ( = 0.5)

Fig. 2. Comparison of small versus large fixed stepsize.

4. An adaptive stepsize approach for a regime-switchingenvironment

In this section, we use the theoretical results outlined inSection 3 for the stationary case to motivate a new adaptivestepsize scheme that is appropriate for tracking in the regime-switching environment described in Section 2.

In order to perform tracking in any non-stationary environ-ment, the stepsize parameters must be bounded away from zero,otherwise the algorithm eventually ceases to adapt to a chang-ing environment. Hence, the almost sure convergence resultsbased on decreasing stepsizes, where �n,i → 0 as n → ∞ in anappropriate manner (see, for example, Kushner & Yin, 2003)are not relevant for the non-stationary case.

A common approach for tracking is to use fixed, constantstepsizes. Consider a given component i. When using a con-stant stepsize, there exists a trade-off in selecting its value;from (4), we see that large stepsize values yield a fast rate ofadaptation, which is desirable in order to achieve rapid adapta-tion when a regime change occurs, but on the other hand, (7)shows that smaller stepsize values are favourable in order toreduce the variance of the iterates �n,i about �∗

n,i . Thus, a step-size value that is good for tracking the abrupt regime changes isnot good for precise convergence within a given regime, where�∗n,i is constant. This trade-off is illustrated in Fig. 2 in the one-

dimensional case.Instead of using fixed stepsizes, we propose an adaptive

stepsize scheme, which is designed to automatically balancethe above competing objectives, without assuming any priorknowledge about the regime switching process (other than

Page 5: Adaptive stepsize selection for tracking in a regime-switching environment

1900 A. Costa, F.J. Vázquez-Abad / Automatica 43 (2007) 1896–1908

Assumption 1). The key to our approach is as follows: we antic-ipate (arbitrary) regime changes by always selecting the largestpossible stepsizes for the current regime, as this maximises therate of adaptation when a regime change eventually does oc-cur. However, this maximisation is subject to a constraint onthe probability that the variance of the iterates yields a giventype of “error” that the user wishes to avoid. Thus, we seek so-lutions to the stepsize criterion (2). In the following sections,we consider two alternative models for the definition of an er-ror event, which appears in the probability constraint in (2). Inparticular, the result (7) suggests using a normal distribution toapproximate this probability constraint once the error event isspecified.

5. Model 1

For Model 1, we shall say that an error occurs for componenti at iteration n if �n,i falls outside the set

[�∗n,i − �i , �

∗n,i + �i],

where �i > 0 is a constant that is set by the user. Hence theprobability constraints in (2) take the specific form

P(�n,i /∈ [�∗n,i − �i , �

∗n,i + �i])��, i = 1, . . . , q, (12)

where �i defines an “error tolerance” range for the iterates �n,i .

5.1. One dimension

Consider the one-dimensional case, where �∗n (and hence the

iterates �n) are scalars. Here we have a single stepsize parameter�, and �n and �n reduce to scalar variance terms. We observethat in the limit � → 0, the quantity ��n is the variance ofthe stationary Ornstein–Uhlenbeck process associated with thecurrent regime at time n (see Section 3.1). This suggests usinga normal distribution to approximate the (singleton) probabilityconstraint (12), which can then be expressed as

�√��n

�z1−�, (13)

where z1−� is the two-sided 100(1 − �)-percentile for the stan-dard normal distribution. It follows that an approximate solu-tion to problem (2) for this example is obtained when (13) holdswith equality, or when �=�, whichever is smaller. Thus, withinthe current regime at time n, we seek the target stepsize

�∗n = min

[�,

�2

z21−��n

], (14)

where min[a, b] denotes the minimum value of the two numbersa and b.

5.2. Multiple dimensions

The one-dimensional case forms the basis for a generaladaptive stepsize scheme in higher dimensions. Consider themarginal distribution of a given component �n,i of the iterates

Table 1Summary of parameters for Algorithm 1

Parameter Range Description

M Z+ Length of moving window� (0, 1] Maximum stepsize value� (0, 1] Smoothing parameter for

stepsize update�i (0, ∞) Tolerance

for �n,i

� (0, 1) Probability of exceedingtolerance

�n. Within the current regime at time n, in the limit �i → 0, thedifferences �n,i − �∗

n,i are governed by an Ornstein–Uhlenbeckprocess with variance �i�n,ii . Thus, following the samesteps as for the one-dimensional case, we obtain the targetstepsizes

�∗n,i = min

[�,

�2i

z21−��n,ii

], i = 1, . . . , q. (15)

Observe that �i , i = 1, . . . , q, and z1−� are fixed constants,which are set by the algorithm designer. On the other hand, �n

depends on �n, which in turn depends on the current regimeat time n, and which is not assumed to be known in advance.Thus, an adaptive stepsize scheme based on (15) must estimatethe variances �n,ii , which can then be used to estimate the �n,ii

via relation (11). In particular, these quantities are estimatedusing a moving window of length M, that is, using the M most-recent samples n−M+1, . . . , n.

The following adaptive stepsize algorithm is used to selectthe stepsizes �n,i for the stochastic approximation (4). A sum-mary of the algorithm parameters is given in Table 1.

Algorithm 1. Initialise M ∈ Z+, � ∈ (0, 1], � ∈ R+q , � ∈(0, 1] (and hence z1−�), � ∈ (0, 1]. Set n := 1, and initialstepsize �1 ∈ (0, �].

(1) If n is an integer multiple of M, perform Steps 2 and 3,otherwise, set �n+1,i := �n,i , for each i = 1, . . . , q, andproceed directly to Step 4.

(2) For each i =1, . . . , q, calculate the sample variances �̂n,ii ,and

�̂n,ii := �̂n,ii

2.

(3) For each i = 1, . . . , q, calculate

�̂∗n,i := min

[�,

�2i

z21−��̂n,ii

](16)

and update

�n+1,i := (1 − �)�n,i + ��̂∗n,i .

(4) Set n := n + 1, repeat from Step 1.

Page 6: Adaptive stepsize selection for tracking in a regime-switching environment

A. Costa, F.J. Vázquez-Abad / Automatica 43 (2007) 1896–1908 1901

Before proceeding with the experiments, we make a numberof comments regarding Algorithm 1.

First, stepsize changes are performed only once every Mtime steps, which is the length of the moving window usedto estimate the mean and variance of the current regime. Thiswas found to improve the overall stability and performanceof the adaptive stepsize algorithm, compared with updatingthe stepsizes at every time step. One reason for this is thatthe estimation of the target stepsize �∗

n,i (see (15)) is basedon the assumption of stationarity within a given regime; thus,any estimation window that includes a regime change willcontain a mixture of samples n,i from different distributions,which in turn introduces a bias in the estimation of �∗

n,i forthat given time window. The advantage of updating only onceevery M time steps is that for each regime change, at mostone estimate of the target stepsize is biased due to a mixtureof samples from the two regimes before and after the changeevent.

Second, there exists a trade-off in selecting the value of M;larger values lead to more accurate estimates of the target step-sizes �∗

n,i within each regime but reduce the speed of response,and vice versa. Our experience suggests that M �10 is re-quired for stability of the algorithm. A reasonable upper boundis M � 1

2 inf�∈�E[T�], such that with high probability, at leastone time window of length M contains samples taken entirelyfrom the current regime. Indeed, Assumption 1 is consistentwith the satisfaction of these bounds. For the numerical exper-iments presented in this paper, E[T�] ranges from 50 to 200,and we obtain good performances with M in the range 20–30.It is worth noting that if the system designer does have some apriori knowledge about the characteristic length of the time in-tervals between regime changes, then it is possible to improveperformance by adjusting M. In particular, if regime changesare less frequent, then a larger value of M may improve perfor-mance, and vice versa.

Third, we note that �̂∗n ∝ 1/ ̂n, and in general, E[1/X] �=

1/E[X] for any random variable X. Thus, �̂∗n is a biased es-

timator of �∗n. If necessary, a first order correction for this

bias can be applied as follows. Consider the Taylor series ap-proximation E[f (X)] ≈ f (E[X]) + 1

2f ′′(E[X])V ar[X] fora (twice-differentiable) function f. Setting f (x) = 1/x in thepreceding formula and letting X = �̂n,ii , it is straightforwardto derive the following bias-corrected estimator for the targetstepsize:

�̂∗n,i

corr := min

[�,

�2i

z21−��̂n,ii

(1 − �i )

], (17)

where �i is the variance of �̂n,ii . Note that (17) differs from (16)only by the presence of the correction term (1−�i ). In the caseof normally distributed samples n (see numerical experimentsin Section 5.3), we have �i = �2

n,ii/2(M − 1). We note that inall of our experiments, we found the effect of this correctionterm to be negligible, although this may not always be the case.

Fourth, the parameter � may be regarded as a “stepsize forthe stepsize”. It has the overall effect of reducing the varianceof the sequences �n,i .

5.3. Numerical experiments

In this section, we present numerical experiments to investi-gate the performance (see below) of Algorithm 1, used to adaptthe stepsize �n in the stochastic approximation (4) for the one-dimensional case (q = 1). As a point of reference, we take thefixed stepsize version of (4), that is, with �n = � for all n.

By evaluating the performance of (4) for a range of fixedstepsizes �, we are able to determine empirically, and with fullhindsight, the best performance that could have been obtainedusing a fixed, constant stepsize. Of course, this value cannot beknown a priori (because we assume that the controller cannotperform offline experimentation or “training”, and that no priorknowledge of the regime switching process �n is available). Incontrast to the fixed stepsize case, Algorithm 1 is designed totrack the target stepsize �∗

n given by (14), which in turn dependson the current regime at time n.

Let N denote the total duration of a given experiment, andlet 1{·} denote the indicator function. For both the adaptive andfixed stepsize algorithms, we employ the performance measure

V = 1

N

N∑n=1

1{�n ∈ [�∗n − �, �∗

n + �]}, (18)

that is, the proportion of time steps for which the iterates �n arewithin the pre-determined acceptable error bounds defined by�. By performing multiple replications of the given experiment,we obtain 95% confidence intervals for the proportion V.

5.3.1. Deterministic regime changesWe start with the case of a given (deterministic) regime se-

quence, occurring over N = 2500 time steps. The collectionof regimes is � = {1, 2, 3, 4}, and we fix the regime sequenceR = {3, 2, 1, 4, 1, 3, 2, 4, 1, 2, 1}, with regime changes occur-ring at times

T = {350, 500, 600, 750, 800, 1150, 1600, 1850, 2000, 2200}.For each regime � ∈ �, we let F�(�∗

�, �2�) be the normal

distribution with mean �∗� and variance �2

�. In particular,we set:

Regime 1: �∗ = 2, �2 = 0.5,

Regime 2: �∗ = 6, �2 = 1,

Regime 3: �∗ = 4, �2 = 1,

Regime 4: �∗ = 0.5, �2 = 0.5. (19)

Note that R, T, and the regime means and variances are notknown in advance by the controller.

For all experiments using Algorithm 1, we set M=20, �=0.8,� = 0.3, � = 0.05 (and hence z̃1−� = 1.96), � = 0.3 and initialstepsize �1 = 0.1.

Fig. 3 shows a sample path for �n (solid line), together withthe bounds �∗

n − � and �∗n + � (dashed lines). The figure il-

lustrates that the variation of the iterates �n within the interval[�n − �, �n + �] is effectively made as large as possible, sub-ject to the probability constraint (12). The purpose of this is to

Page 7: Adaptive stepsize selection for tracking in a regime-switching environment

1902 A. Costa, F.J. Vázquez-Abad / Automatica 43 (2007) 1896–1908

0 500 1000 1500 2000 2500

0

1

2

3

4

5

6

7

Fig. 3. Sample path for �n using stepsizes generated by Algorithm 1.

0 500 1000 1500 2000 2500

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Fig. 4. Sample path for stepsizes �n (solid line) generated by Algorithm 1.The dotted line shows the sequence of exact target stepsizes �∗

n.

enable rapid tracking when a regime change does occur. Impor-tantly, this is a pre-emptive mechanism, because it maintains aperpetual state of readiness for change, instead of relying on anauxiliary change detection mechanism and subsequent reactivecontrol.

Fig. 4 shows the corresponding sample path for the stepsizes�n (solid line) produced by Algorithm 1, and also the exacttarget stepsizes �∗

n (dashed line) given by (14) for comparison.We see that Algorithm 1 is able to track the target stepsizesquite well.

The left-hand portion of Fig. 5 shows 95% confidence inter-vals for the performance measure V, for a range of fixed step-sizes �. The right-hand portion of Fig. 5 shows a 95% confi-dence interval for V using Algorithm 1. In each case, the confi-dence intervals were produced by performing 10 replication ofthe experiment, using the same regime sequence R and changetimes T given above.

0 0.1 0.2 0.3 0.4 0.5

0.3

0.4

0.5

0.6

0.7

0.8

0.9

fixed γ

Adaptive stepsize

Fig. 5. (Deterministic regime changes) Comparison of performance measureV (95% confidence intervals) for fixed stepsizes versus adaptive stepsizesgenerated using Algorithm 1.

We see from Fig. 5 that Algorithm 1 achieves a level of per-formance that is as good as the best performance that couldbe obtained using a fixed stepsize (in this case, approximately� = 0.07). However, Algorithm 1 does not assume any priorknowledge about the regime changes, and achieves a good per-formance by learning online, in real time, and without the needfor offline experimentation or training.

5.3.2. A comment on normalityIn the previous example, the observations n used in (4) were

normally distributed, that is, we set Fn to be normal distribu-tions. However, it is important to note that the limiting result(7)–(9) holds for any distribution of the samples n, subject toAssumption 2. Accordingly, we find that Algorithm 1 exhibitsa similarly high level of performance when other distributionsare employed. As an example, Fig. 6 shows the results obtainedfor the experiment of Section 5.3.1, where Fn are uniform in-stead of normal distributions (the regime means and variancesremain the same as those given in (19)).

5.3.3. A comment on the upper bound �For the example shown in Fig. 4, the combination of regime

variances (19) and choice of tolerance � = 0.3 are such thatthe constraint �∗

n �� = 0.8 in (2) never becomes active. From(14), we see that this constraint would become active for asufficiently large increase in � and/or decrease of one or moreof the regime variances. An example of this is shown in Fig. 7,for the case where � = 1.

5.3.4. Markov-modulated regime changesNext, we consider the case where the regime changes are

Markov-modulated. Let L denote the number of regimes in thecollection �, and let � ∈ (0, 1). Then the regime sequenceR and change times T are governed by the L × L transition

Page 8: Adaptive stepsize selection for tracking in a regime-switching environment

A. Costa, F.J. Vázquez-Abad / Automatica 43 (2007) 1896–1908 1903

0 0.1 0.2 0.3 0.4 0.5

0.3

0.4

0.5

0.6

0.7

0.8

fixed γ

Adaptive stepsize

Fig. 6. (Deterministic regime changes) Uniformly distributed samples n.

0 500 1000 1500 2000 2500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Fig. 7. Sample path for stepsizes �n with active upper bound constraint.

matrix P, with entries

pij ={

1 − �, i = j,

L − 1, i �= j,

(20)

for i, j = 1, . . . , L, where pij is the probability that the regimeat the next time step is Regime j, given that the regime at thecurrent time step is Regime i. Thus, the parameter � determinesthe average rate of regime changes. In particular, observe thatthe expected number of time steps between regime changes isgiven by 1/�.

We start by taking the collection of four regimes given in(19), so that L= 4, and set �= 0.005. The experiment durationis N =2500. In order to assess the performance of Algorithm 1with respect to the modulating process, we generate 100 pairs(R,T) according to (20), and for each pair, we perform 10replications of the algorithm. We use the latter to generate a

0 0.1 0.2 0.3 0.4 0.5

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

fixed γ

Adaptive stepsize

Fig. 8. (Markov-modulated regime changes) Comparison of performance mea-sure V (95% confidence intervals) for fixed stepsizes versus adaptive step-sizes generated using Algorithm 1, with � = 0.005 and high variance regimecollection.

point estimate for V, and the former to construct a 95% confi-dence interval for V based on the sample variance of the pointestimate.

Fig. 8 compares the performance of Algorithm 1 with a rangeof fixed stepsizes. We see from Fig. 8 that the performanceof Algorithm 1 is close to the best performance that couldhave been achieved using a fixed stepsize. Observe that for thiscombination of � and regime collection, the best fixed stepsizewith respect to the performance measure V is approximately� = 0.07.

Next, we increase the expected frequency of regime changesby setting � = 0.02, and we replace (19) with the followingcollection of lower-variance regimes (the mean values remainidentical):

Regime 1: �∗ = 2, �2 = 0.09,

Regime 2: �∗ = 6, �2 = 0.2,

Regime 3: �∗ = 4, �2 = 0.1,

Regime 4: �∗ = 0.5, �2 = 0.08. (21)

The performance of the adaptive stepsize algorithm versus arange of fixed stepsizes is shown in Fig. 9. Once more, we findthat the performance of the adaptive stepsize algorithm is closeto that obtained using the best fixed stepsize, which in this caseis approximately � = 0.3.

Finally, we set � = 0.0125, which is the midpoint betweenthe previous values 0.005 and 0.02, and we take a collectionof mixed high and low variance regimes:

Regime 1: �∗ = 2, �2 = 0.5,

Regime 2: �∗ = 6, �2 = 1,

Regime 3: �∗ = 4, �2 = 0.1,

Regime 4: �∗ = 0.5, �2 = 0.08. (22)

The performance results are shown in Fig. 10.

Page 9: Adaptive stepsize selection for tracking in a regime-switching environment

1904 A. Costa, F.J. Vázquez-Abad / Automatica 43 (2007) 1896–1908

0 0.1 0.2 0.3 0.4 0.5

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

fixed γ

Adaptive stepsize

Fig. 9. (Markov-modulated regime changes) Comparison of performance mea-sure V (95% confidence intervals) for fixed stepsizes versus adaptive step-sizes generated using Algorithm 1, with � = 0.02 and a low variance regimecollection.

0 0.1 0.2 0.3 0.4 0.5

0.2

0.3

0.4

0.5

0.6

0.7

0.8

fixed γ

Adaptive stepsize

Fig. 10. (Markov-modulated regime changes) Comparison of performancemeasure V (95% confidence intervals) for fixed stepsizes versus adaptivestepsizes generated using Algorithm 1, with � = 0.0125 and mixed high andlow variance regime collection.

Comparing Figs. 8–10, we see that the adaptive stepsize al-gorithm (Algorithm 1) is robust, in the sense that it is ableto automatically achieve a performance that is close to thebest performance that could have been achieved using a fixedstepsize. Note that the best fixed stepsize ranges from 0.07 to0.3, depending on the statistics of the regimes and the regimeswitching process. However, it is not possible to identify thebest fixed stepsize in advance, without prior knowledge of thesestatistics. In particular, the best fixed stepsizes shown in each

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

0

1

2

3

4

5

6

7

Fig. 11. Sample path for �n.

of Figs. 8–10 are identified in hindsight, by performing manyreplications of the experiment for each fixed �, thus renderingthe selection of an optimal fixed stepsize effectively an off-lineprocedure, which is made possible only by many “training”examples. On the other hand, the adaptive stepsize algorithmachieves a good performance using only real-time information,and with no prior training period.

We also note that while, for example, the fixed stepsize� = 0.07 is best for the experiment of Fig. 8, this small step-size yields poor performance for the experiment shown inFig. 9, and vice versa. Thus, using Algorithm 1 provides adegree of robustness that cannot be achieved by a “set-and-forget” approach with a fixed stepsize. In addition, the perfor-mance of a given fixed stepsize is also heavily dependent onthe tolerance parameter �, which is automatically accountedfor by Algorithm 1 via the constraint (13).

5.3.5. Non-stationary modulating processAs we noted in Section 2, our adaptive stepsize scheme is

not based on any assumptions about the process which gov-erns regime changes, other than Assumption 1 (which is a veryweak assumption). In particular, we noted that we do not evenrequire that the modulating process be stationary. To illustratethe robustness of Algorithm 1, we let the modulating parame-ter � in (20) vary with time, thus giving rise to a non-stationaryMarkov-modulated regime switching process.

Specifically, we let

�n = 1

2(�min + �max) + 1

2(�min − �max) sin

(4�n

N

), (23)

which describes a sinusoidal time variation with minimum andmaximum values �min and �max, respectively, with two cyclesover the total experiment duration N.

For this experiment, we set �min=0.005, �max=0.03, and N=5000. We create an augmented collection of regimes by com-bining the four regimes given in (21) with the additional four

Page 10: Adaptive stepsize selection for tracking in a regime-switching environment

A. Costa, F.J. Vázquez-Abad / Automatica 43 (2007) 1896–1908 1905

0 0.1 0.2 0.3 0.4 0.5

0.2

0.3

0.4

0.5

0.6

0.7

0.8

fixed γ

Adaptive stepsize

Fig. 12. (Non-stationary modulating process) Comparison of performancemeasure V (95% confidence intervals) for fixed stepsizes versus adaptivestepsizes generated using Algorithm 1. The regime-switching parameter �ngiven by (23).

given in (22), and relabelling them � = {1, 2, 3, 4, 5, 6, 7, 8},accordingly. Thus, we have a total of L = 8 regimes.

Fig. 11 shows a typical sample path for the iterates �n.The periodic variation in the frequency of regime changes dueto the time variation of �n can be clearly seen in the figure.Fig. 12 shows the performance of Algorithm 1 compared withfixed stepsizes. As before, the confidence intervals were ob-tained by performing 100 replications of the pair (R,T) and10 replications of the algorithm for each pair. Once again, wesee that the performance of the adaptive stepsize algorithm isclose to the best that could have been achieved (with full knowl-edge, in hindsight) using a fixed stepsize. Of course, Algorithm1 achieves this competitive level of performance without anyprior knowledge of the process (23) or the regime parameters.

6. Model 2

We now take the case where the user wishes to identify thelargest component of �∗

n. An example of this scenario is whereeach element �∗

n,i , i = 1, . . . , q, represents the expected payoffor reward associated with a decision i, and the user wishes toidentify the decision which achieves the maximum expectedreward. In this context, the user is not directly concerned withobtaining precise estimates of �∗

n; instead, the user’s goal is tobe able to resolve the entries of the estimate �n so as to identifythe optimal decision with a desired level of confidence.

Let �∗n=arg maxi=1,...,q �∗

n,i denote the true optimal decision,and let �̃n = arg maxi=1,...,q �n,i denote the current estimate ofthe optimal decision. Thus, for Model 2, we shall say that anerror occurs at iteration n if �̃n �= �∗

n. Hence the probabilityconstraints in (2) take the specific form

P(�̃n �= �∗n)��. (24)

Observe that unlike for Model 1, the “error” event in Model 2involves all components i =1, . . . , q simultaneously. We there-fore have the same error criterion for each component. Fur-thermore, since Model 2 is designed for the direct comparisonof “like” quantities (that is, sharing the same units), we takethe approach of assigning a common stepsize �n,i = �n for allcomponents i.

6.1. Two dimensions

Consider the two-dimensional case, where �∗n = (�∗

n,1, �∗n,2)

and �n = (�n,1, �n,2). Define the sequence of random variables

Dn = �n,1 − �n,2,

and let dn = E[Dn]. Observe that within a given regime,

dn = �∗n,1 − �∗

n,2.

It follows from (7) and (11) that the variance of Dn in the limit� → 0 is given by �vn, where

vn = 12

(�n,11 + �n,22 − 2�n,12

). (25)

The probability of an error in identifying the correct deci-sion can therefore be approximated as follows; suppose that�∗n,1 > �∗

n,2, such that decision 1 is optimal within the currentregime at time n. Then

P(�∗n �= �̃n) = P(�n,1 < �n,2)

= P(Dn < 0)

= P

(Zn <

−dn√�vn

)= P

(Zn >

dn√�vn

),

where

Zn = Dn − dn√�vn

has the distribution N(0, 1). Similarly, if �∗n,1 < �∗

n,2, such thatdecision 2 is optimal at time n, then dn < 0 and

P(�∗n �= �̃n) = P(�n,1 > �n,2)

= P

(Zn >

−dn√�vn

).

Thus, constraint (24) takes the specific form

|dn|√�vn

� z̃1−�, (26)

where z̃1−� is the one-sided 100(1 − �)-percentile for the stan-dard normal distribution. Thus, an approximate solution toproblem (2) is given by the target stepsize

�∗n = min

[�,

d2n

z̃21−�vn

]. (27)

Our adaptive stepsize scheme must estimate vn, by estimat-ing the variance and covariance terms on the right-hand side

Page 11: Adaptive stepsize selection for tracking in a regime-switching environment

1906 A. Costa, F.J. Vázquez-Abad / Automatica 43 (2007) 1896–1908

Table 2Summary of parameters for Algorithm 2

Parameter Range Description

M Z+ Length of moving window� (0, 1] Maximum stepsize value� (0, 1] Smoothing parameter for

stepsize update� (0, 1) Probability of exceeding

acceptable error

of expression (25), that is, the entries of �n. As before, weestimate these quantities using the M most-recent samplesn−M+1,i , . . . , n,i , i = 1, 2. The following algorithm imple-ments this scheme. A summary of the algorithm parameters isgiven in Table 2.

Algorithm 2. Initialise M ∈ Z+, � ∈ (0, 1], � ∈ (0, 1] (andhence z̃1−�), � ∈ (0, 1]. Set n := 1, and initial stepsize �1 ∈(0, �].

(1) If n is an integer multiple of M, perform Steps 2 and 3,otherwise, set �n+1 := �n and proceed directly to Step 4.

(2) Calculate the sample variances ̂�n,11,̂�n,22, the sample co-variance ̂�n,12, the sample squared difference d̂2

n , and

v̂n = 12 (̂�n,11 + ̂�n,22 − 2̂�n,12).

(3) Calculate

�̂∗n := min

[�,

d̂2n

z̃21−�v̂n

]and update

�n+1 := (1 − �)�n + ��̂∗n.

(4) Set n := n + 1, repeat from Step 1.

6.2. Numerical experiments

We consider once again the case of Markov-modulatedregimes, which evolve according to the transition matrix de-fined in (20). In the following experiments, we take the col-lection of regimes � = {1, 2, 3, 4}, so that L = 4 in (20). Foreach regime � ∈ �, we let F�(�∗

�, ��) be the joint normaldistribution with mean �∗

� and variance–covariance matrix ��.The experiment duration is N = 2000. For all experiments us-ing Algorithm 2, we set M = 30, � = 0.8, � = 0.05 (and hencez̃1−� = 1.65), � = 0.3 and initial stepsize �1 = 0.1.

For both the adaptive and fixed stepsize algorithms, we em-ploy the performance measure

V ′ = 1

N

N∑n=1

1{�̃n �= �∗n}, (28)

that is, the proportion of time steps for which the iterates �n

are in the “incorrect order” with respect to the true values �∗n.

0 0.1 0.2 0.3 0.4

0.65

0.7

0.75

0.8

0.85

0.9

fixed γ

Adaptive stepsize

Fig. 13. (Markov-modulated regime changes) Comparison of performancemeasure V ′ (95% confidence intervals) for fixed stepsizes versus adaptivestepsizes generated using Algorithm 2, with � = 0.005 and high varianceregime collection.

As before, we obtain 95% confidence intervals for V ′ by per-forming 100 replications of the regime sequence (R,T), and10 replications of the algorithm for each regime sequence.

For the regime switching parameter in (20), we first set � =0.005, and employ a high variance regime collection with:

Regime 1: �∗ = (10, −10), � =(

8000 0

0 4000

),

Regime 2: �∗ = (−11, 11), � =(

5000 −10

−10 5000

),

Regime 3: �∗ = (1, −1), � =(

9 1

1 9

),

Regime 4: �∗ = (−20, −10), � =(

2000 0

0 2000

).

Fig. 13 compares the performance of the adaptive stepsize al-gorithm (Algorithm 2) with a range of fixed stepsizes. Notethat for this combination of � and regimes with high variance,the best fixed stepsize is approximately 0.03.

Next, we set � = 0.02, and employ a low variance regimecollection with:

Regime 1: �∗ = (10, −10), � =(

1000 0

0 1000

),

Regime 2: �∗ = (−11, 11), � =(

500 0

0 500

),

Regime 3: �∗ = (1, −1), � =(

4 1

1 4

),

Regime 4: �∗ = (−20, −10), � =(

200 −10

−10 200

).

Page 12: Adaptive stepsize selection for tracking in a regime-switching environment

A. Costa, F.J. Vázquez-Abad / Automatica 43 (2007) 1896–1908 1907

0 0.1 0.2 0.3 0.4

0.65

0.7

0.75

0.8

0.85

0.9

fixed

Adaptive stepsize

Fig. 14. (Markov-modulated regime changes) Comparison of performancemeasure V ′ (95% confidence intervals) for fixed stepsizes versus adaptivestepsizes generated using Algorithm 2, with �=0.02 and low variance regimecollection.

Fig. 14 shows the performance of the fixed versus adaptivestepsize algorithm. Observe that for this combination of � andregimes with low variance, the best fixed stepsize is approxi-mately 0.15.

Finally, we set � = 0.0125, which is the midpoint betweenthe previous values �=0.005 and �=0.02, and employ a mixedhigh and low variance regime collection given by

Regime 1: �∗ = (10, −10), � =(

8000 0

0 4000

),

Regime 2: �∗ = (−11, 11), � =(

5000 −10

10 5000

),

Regime 3: �∗ = (1, −1), � =(

4 1

1 4

),

Regime 4: �∗ = (−20, −10), � =(

200 −10

−10 200

).

Fig. 15 shows the corresponding performance comparison. Wesee that in this case, the best fixed stepsize is approximately 0.1.

Comparing the results in Figs. 13–15, we see that the adaptivestepsize algorithm automatically achieves a good performancecompared with the best performance that can be achieved us-ing a fixed stepsize, and we note that it does so without anyprior knowledge of the regime switching process. In contrast,the best fixed stepsizes identified in Figs. 13–15 are obtainedin hindsight, having performed many replications of the exper-iment. Finally, we observe that the best fixed stepsize for theexperiment in Fig. 13 is very different to the best fixed stepsizefor the experiment in Fig. 14, demonstrating that the adaptivestepsize algorithm is robust in the absence of a priori knowl-edge about the underlying regime switching process.

0 0.1 0.2 0.3 0.4

0.68

0.7

0.72

0.74

0.76

0.78

0.8

0.82

0.84

0.86

fixed

Adaptive stepsize

Fig. 15. (Markov-modulated regime changes) Comparison of performancemeasure V ′ (95% confidence intervals) for fixed stepsizes versus adaptivestepsizes generated using Algorithm 2, with � = 0.0125 and mixed high andlow variance regime collection.

7. Extensions

Our adaptive stepsize approach is not restricted to the leastmean squares case corresponding to the choice of function ggiven in (3). The weak convergence results presented in Section3.1 are general, and hold for any choice of g that satisfies thetightness and stability assumptions (A1.1)–(A1.7) in Chapter10 of Kushner and Yin (2003).

For example, the function g can be chosen to yield a descentmethod for unconstrained optimisation, or a Lagrange multi-plier method for constrained optimisation (Bertsekas, 1999).Other examples stem from reinforcement learning for solvingMarkov decision problems (MDPs) (Puterman, 1994). In thiscontext, g(�) = h(�) − �, where h(�) is a (vector-valued) dy-namic programming operator, and � corresponds to a vectorof value functions. For example, the well-known Q-learningalgorithm (Watkins & Dayan, 1992) is based on a stochas-tic approximation for solving Bellman’s equation via value it-eration. Under a broad range of conditions, the value itera-tion operator is a contraction mapping (Bertsekas & Tsitsiklis,1996), which means that A = −∇�g(�∗) = I − ∇�h(�∗) isa Hurwitz matrix (which depends on the parameters of theMDP and �∗). In particular, the conditions in Chapter 10 of(Kushner & Yin, 2003) can be shown to be satisfied under mildconditions.

However, it should be noted that for extensions such as theones described above, the form of g is generally such that Eq.(8) cannot be used to derive a closed form solution for the ma-trix �. In this case, our adaptive stepsize approach can stillbe applied by modifying Algorithm 1 so that �n is estimateddirectly from the iterates �n. Thus, the intermediate estima-tion of the variance–covariance matrix �n is effectively by-passed. Similarly, Algorithm 2 can be modified so that vn is

Page 13: Adaptive stepsize selection for tracking in a regime-switching environment

1908 A. Costa, F.J. Vázquez-Abad / Automatica 43 (2007) 1896–1908

estimated directly (this is described by the authors in (Levy,Vázquez-Abad, & Costa, 2006)).

Direct estimation does not exploit the information availablein expression (8), and is therefore, in general, less accurate.However, direct estimation is a feasible and effective optionin the absence of efficient methods for solving (8). The readeris referred to Levy et al. (2006) for details, given in the con-text of online Q-learning for solving MDPs. In this setting,each regime is associated with a distinct set of MDP parame-ters comprising the state transition probabilities and rewards,which are not known a priori. Thus, each regime is associatedwith a different optimal policy, which the controller wishes totrack in real time. The iterates �n correspond to the “Q-values”,which represent the estimates of the value associated with thestate-action pairs of the MDP. In this scenario, the use ofModel 2 (see Section 6) is most natural, since for each regime,the controller is interested in rapidly identifying the actions ateach state which yield the maximum (or minimum) expectedreward.

8. Conclusions and future work

We have presented a new approach for stepsize selection thatis appropriate for tracking in a regime-switching environmentwith abrupt changes. Our approach is based on the theory ofweak convergence for stochastic approximation algorithms. Themain innovation lies in the use of this theory to formulate thecriterion (2) for stepsize selection. In particular, our approachis pre-emptive rather than reactive, and does not require anauxiliary change detection algorithm, or prior knowledge of thestatistics of the regime switching process.

Furthermore, because Algorithms 1 and 2 do not attempt to“learn” the statistics of the process governing regime changes,they are suitable for relatively short-lived (as well as long-runaverage) online tracking problems. For instance, the exampleshown in Fig. 3 involves an experiment of duration 2500 timesteps, during which Regime 1 occurs four times, Regime 2occurs three times, and Regimes 3 and 4 each occur just twice.For this kind of short, once-off realisation, an adaptive stepsizealgorithm which attempts to learn (and possibly then track) thelong-run statistics of the regime modulating process, would beof little use.

A direction for further research is to extend Model 2 tohigher-dimensional cases (q > 2); a possible approach is to per-form pairwise comparisons between the components �n,i , i =1, . . . , q. More broadly, we expect that our method can be ap-plied to a range of real-time stochastic search algorithms, whenabrupt regime changes are present. Indeed, alternative types oferror (beyond Models 1 and 2) can be investigated, accordingto the requirements dictated by the application at hand.

Acknowledgement

Andre Costa acknowledges the support of the AustralianResearch Council Centre of Excellence for Mathematics andStatistics of Complex Systems.

References

Basseville, M., & Benveniste, A. (Eds.). (1986). Detection of abrupt changesin signals and dynamical systems. Berlin: Springer.

Benveniste, A., Metivier, M., & Priouret, P. (1990). Adaptive algorithms andstochastic approximation. New York, Berlin: Springer.

Bertsekas, D. P. (1999). Nonlinear programming. Belmont, MA: AthenaScientific.

Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming.Belmont, MA: Athena Scientific.

Farhang-Boroujeny, B. (1998). Adaptive filters: Theory and applications.Chichester, UK: Wiley.

Krishnamurthy, V., Yin, G., & Singh, S. (2001). Adaptive step-sizealgorithms for blind interference suppression in DS/CDMA systems. IEEETransactions Signal Processing, 49(1), 190–201.

Kushner, H. J. (1995). Analysis of adaptive step-size SA algorithms forparameter tracking. IEEE Transactions on Automatic Control, 40(8),1403–1410.

Kushner, H. J., & Yin, G. (2003). Stochastic approximation and recursivealgorithms and applications. Berlin: Springer.

Levy, K., Vázquez-Abad, F. J., & Costa, A. (2006). Adaptive stepsize selectionfor online Q-learning in a non-stationary environment. IEEE Proceedingsof the eighth workshop on discrete event systems (pp. 372–377).

Puterman, M. (1994). Markov decision processes. New York: Wiley.Watkins, C., & Dayan, P. (1992). Technical note: Q-learning. Machine

Learning, 8, 279–292.Yin, G., Krishnamurthy, V., & Ion, C. (2004). Regime switching stochastic

approximation algorithms with application to adaptive discrete stochasticoptimization. SIAM Journal on Optimization, 14(4), 1187–1215.

Yin, G., & Zhang, Q. (2005). Discrete-time Markov chains: Two-time-scalemethods and applications. Berlin: springer.

Yousef, N. R., & Sayed, A. H. (2001). A unified approach to the steady-stateand tracking analyses of adaptive filters. IEEE Transactions on SignalProcessing, 49(2), 314–324.

Andre Costa obtained a Ph.D. in Applied Math-ematics from the University of Adelaide in 2003.During 2003, he held a teaching position inthe School of Applied Mathematics at the Uni-versity of Adelaide. From 2004 to 2007 heheld a postdoctoral research fellowship in theAustralian Research Council Centre of Excel-lence for Mathematics and Statistics of Com-plex Systems at the University of Melbourne.His research interests include deterministic andstochastic optimisation, learning algorithms andapplied probability. He is currently working inthe finance industry.

Felisa J. Vázquez-Abad obtained a B.Sc. inPhysics in 1983 and a M.Sc. in Statistics andOperations Research from the Universidad Na-cional Autónoma de México. In 1989 she ob-tained a Ph.D. in Applied Mathematics fromBrown University. She spent four years doingpostdoctoral research at Brown and at the INRS-Telecommunications in Montreal, Que. She be-came a professor at the University of Montreal,Canada in 1993, where she remained until 2004then she moved to the University of Melbourne,in Australia. In 2000, she was a recipient of

the Jacob Wolfowitz award for advances in the mathematical and managementsciences. Her interests focus on the optimisation of complex systems underuncertainty, primarily to build efficient self-regulated learning systems. Shehas applied novel techniques in telecommunications, transportation, financeand insurance and she is interested by real life problems. She co-authored aUS patent for an optical network switch and has been a research consultant tothe Melbourne Airport. She has participated in NSERC Selection Committeesand has been Associate Editor for IEEE Transactions on Automatic Control,Management Science, and Operations Research Letters. She is Area Editorof the ACM Transactions on Computer Modeling and Simulation.