foss lecture4

Upload: jarsen21

Post on 03-Jun-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Foss Lecture4

    1/12

    Lectures on Stochastic Stability

    Sergey FOSS

    Heriot-Watt University

    Lecture 4

    Coupling and Harris Processes

    1 A simple example

    Consider a Markov chain X n in a countable state space S with transition probabilities pi,jsuch that: There exists a state aS and > 0, with the property:

    pi,a > 0, for all iS.We will show that there is a unique stationary distribution

    jS |P (X n j ) ( j )| 2(1 )n ,

    regardless of the initial state X 0. We give two proofs.

    Proof 1 (analytic) Think in terms of of the one-step transition matrix P = [ pi,j ] as alinear operator acting on R S . Equip R S with the norm ||x|| := jS |x j |. Stroock (2000)[pg. 28-29] proves that, for any R S , such that iS i = 0, we have

    ||P || (1 )||||.He then claims that this implies that

    ||P n || (1 )n ||||, nN ,and uses this to show that, for any R S with, i 0 for all i S , and iS i = 1, itholds that ||P n P m || 2(1 )m , for m n.Proof 2 (probabilistic) Consider the following experiment. Suppose the current stateis i. Toss a coin with P (heads ) = . If heads show up then move to state a. If tails showup, then move to state j with probability

    pi,j = pi,j a,j

    1 .

    1

  • 8/12/2019 Foss Lecture4

    2/12

    (That this IS a valid probability is a consequence of the assumption!) In this manner, weobtain a process that has precisely transition probabilities pi,j . Note that state a will bevisited either because of heads in a coin toss or because it was chosen so by the alternativetransition probability. So state a will be visited at least as many times as the number of heads in a coin toss. This means that state a is positive recurrent. And so a stationaryprobability exists. We will show that this is unique and that the distibution of the chainconverges to it. To do this, consider two chains X , X , both with transition probabilities pi,j , and realise them as follows. The rst one starts with X 0 distributed according to anarbitrary . The second one starts with X 0 distributed according to . Now do this: Usethe same coin for both. So, if heads show up then move both chains to a. If tails show upthen realise each one according to p, independently. Repeat this at the next step, by tossinga new coin, independently of the past. Thus, as long as heads have not come up yet, thechains are moving independently. Of course, sooner or later, heads will show up and thechains will be the same thereafter. Let T be the rst time at which heads show up. Wehave:

    P (X n B ) = P (X n B,T > n ) + P (X n B, T n)= P (X n B,T > n ) + P (X n B, T n)P (T > n ) + P (X n B ) = P (T > n ) + (B ).

    Similarly,

    (B ) = P (X n B ) = P (X n B,T > n ) + P (X n B, T n)= P (X n B,T > n ) + P (X n B, T n)

    P (T > n ) + P (X n B ).Hence

    |P (X n B ) (B )| P (T > n ) = (1 )n .Finally, check that sup B S P (X n B ) (B )| = 12 iS |P (X n = i) (i)|.

    2 Coupling

    In the introductory lecture, we have already discussed various concepts of coupling, but ina static way. Now we will do it dynamically.

    Consider more examples of coupling :

    Ex. 1: Skorokhod embedding Let X be a zero-mean r.v. with distribution F . Then wecan realise X on a probability space supporting at least a standard Brownian motion ( W t )by means of the Skorokhod embedding. This is another instance of coupling. For example,if F = p a + q b, where p + q = 1, pa + qb = 0, we let T = inf {t 0 : W t (a, b)} and letX = W T . Then X F .

    2

  • 8/12/2019 Foss Lecture4

    3/12

    Ex. 2: Dynkins trick You shuffle well a deck of n cards. Let their values, after theshuffle, be: x(1) , x(2), . . . , x (n). (If we are talking about playing cards with J, Q, K included,assign some specic numbers to them, e.g. J=1, Q=2, K=3.)Experiment (performed by you, in secrecy): Starting from one of the rst 10 cards atrandom, say card N (P (N = i) = 1 / 10, i = 1 , . . . , 10), you notice the value x(N ) and thenmove forward to the card with index N + x(N ), notice its value x(N + x(N )), move forwardto the card with index N + x(N + x(N ))), etc. So, at the i-th step, i 1, if you areat position N i , at a card with value x(N i ), you move to position N i+1 = N i + x(N i ) andnotice the value x(N i+1 ). (We take N 0 = N ). The process stops at the step I for whichN I + x(N I ) > n . You notice x(N I ).The value x(N I ) can be guessed with probability 1 cn , where c < 1.The coupling here is this: One process starts from one of the rst 10 cards at random.Another process starts from a specic card, say the rst one. Then, if n is large, the twoprocesses will meet.

    Ex. 3: Common component for 2 distributions Let F , G be two distributions onR . I pick i.i.d. numbers X n according to F and you pick i.i.d. Y n according to G. We wantto do this in a way that, for sure X n = Y n for some nite n. The caveat here is that it isnot a requirement that my sequence be independent of yours. Obviously, if the supports of F and G are disjoint, then there is no way to achieve equality in nite time. So let us saythat the supports of F and G contain a set, say a small interval I . Let then be somemeasure supported on I , e.g., the uniform distribution on I . The important thing is thatthere is some nontrivial measure nite such that

    F , G ,in the sense that F (B ) (B ), G(B ) (B ) for all Borel sets B. Then, letting |||| := (R )and assuming 0 < ||||< 1 (this is not a problem), we have

    F = ||||

    |||| + ||F ||

    F ||F ||

    G = ||||

    |||| + ||G ||

    G ||G ||

    Now here is how we do a coupling. Let be a probability space supporting 4 independentsequences of i.i.d. random variables: the sequence ( n ) such that P ( n = 1) = ||||, P ( n =0) = 1 ||||, the sequence ( n ) such that P ( n ) = ()/ ||||, the sequence ( X n ) suchthat P (X n

    ) = ( F (

    )

    (

    )) /

    ||F

    ||, and the sequence ( Y n ) such that P (Y n

    ) =

    (G()()) / ||G||. We can take = {0, 1}R 3the canonical space of ( n , n , X n , Y n )nN .We now deneX n = n n + (1 n )X nY n = n n + (1 n )Y n .

    Clearly, X n F , Y n G, as needed, and if T = inf {n : n = 1}, we have X T = Y T . But T is a geometric random variable with P (T > n ) = (1 ||||)n and so a.s. nite.

    3

  • 8/12/2019 Foss Lecture4

    4/12

  • 8/12/2019 Foss Lecture4

    5/12

    X T K +1 has law Q and that the sequence of cycles is stationary. (This is referred to asPalm stationarity.) Moreover, we have a regenerative structure: {T

    (m )K , 0 m n} isindependent of {Cm , m n + 1}, for all n. The only problem is that the cycle durationsmay be random variables with innite mean.

    Now let P Q be the law of the process when X 0 is chosen according to Q. Dene the measure

    () = E QT K +1

    n =1

    1 (X n ).

    Strong Markov property ensures that is stationary, i.e.,

    () = S P (x, )(dx).

    Positive Harris recurrence If, in addition, to the Harris property, we also haveE Q (T K ) < , (1)

    we then say that the process is positive Harris recurrent . In such a case, () = ()/E Q (T K )denes a stationary probability measure. Moreover, the assumption that R is recurrentensures that there is no other stationary probability measure.

    A sufficient condition for positive Harris recurrence is that

    supxR

    E x R < . (2)

    This is a condition that does not depend on the (usually unknown) measure Q. To see thesufficiency, just use the fact that T K may be represented as a geometric sum of r.v.s withuniformly bounded means, so that (2) implies (1). To check (2) the Lyapunov functionmethods are very useful. Let us also offer a further remark on the relation between (2) and(1): it can be shown that if (1) holds, then there is a R R such that

    supxR

    E x R < .

    We next give a brief description of the stability by means of coupling, achieved by a positiveHarris recurrent process. To avoid periodicity phenomena, we assume that the discrete

    random variable T K has aperiodic distribution under P Q . (A sufficient condition for this is:P Q (T K = 1) > 0.) Then we can construct a successful coupling between the process startingfrom an arbitrary X 0 = x0 and its stationary version. Assume that E x 0 R < . (This x0may or may not be an element of R.) Let {X n } be the resulting process. Then one canshow, by means of backwards coupling construction, that the process {X n } couples (in thesense of the denition of the previous subsection). with the stationary version.

    5

  • 8/12/2019 Foss Lecture4

    6/12

  • 8/12/2019 Foss Lecture4

    7/12

    Let S be a nite set and pi,j transition probabilities on S , assumed to be irreducible. Givenany probability i on S , we can dene a Markov chain {X n , n 0} by requiring that

    X 0 ,and for all n

    1, (X n

    |X n 1 = i)

    pi,

    The random sequence thus constructed is ergodic and satises all usual limit theorems.In particular, if denotes the unique invariant probability measure, we have that X n,weakly, and n 1 n1 1 (X k )(), a.s.We choose a particular way to realise the process. Dene a sequence of random maps n : S S , nZ , such that { n (i), iS, n Z } are independent, and

    P ( n (i) = j ) = pi,j , i, j S, nZ .We assume that these random maps are dened on a probability space ( , F , P ) supportinga measurable ow : such that n = n +1 for all n Z . (E.g., let = S Z and()(n) = (n + 1).) Let Y be an random variable with distribution , independent of everything else, and dene

    X 0 = Y, X n = n 1 0(Y ), n 1.(Notational convention: we use a little circle when we denote composition with respect tothe variable and nothing when we denote composition with respect to the variable iS .)Clearly this sequence ( X 0, X 1, . . .) realises our Markov chain. Observe that for any m

    Z ,the sequence ( Y, m (Y ), m +1 m (Y ), . . .) also realises our Markov chain.

    Next dene the forward strong coupling time by

    = inf {n 0 :i, j S n 0(i) = n 0( j )},and the backward coupling time by

    = inf {n 0 :i, j S 0 n (i) = 0 n ( j )}.Proposition 1.(i) is a.s. nite.(ii) .

    Proof. Consider the Markov chain { n , nZ }, with values in S S . Let = { S S : (1) = = (d)}. Then { < }= n 0{ n }. Although the chain is not the product chain(its components are not independent), the probability that it hits the set in nite time islarger than or equal to the probability that the product chain hits the set in nite time.But the product chain is irreducible, hence the latter probability is 1. Hence (i) holds. Toprove (ii), notice that + n 0 , for all n 0, a.s. Hence, for each n 0,

    { n}= { n 0 }. (3)The same argument applied to the maps composed in reverse order shows that

    { n}= { 0 n }. (4)The events on the right sides of (3) and (4) have the same probability due to stationarity.This proves (ii).

    7

  • 8/12/2019 Foss Lecture4

    8/12

    Corollary 1. < , a.s.

    Warning: While it is true that, for each n, the random maps n 0 and 0 n havethe same distribution, it is not true, in general, that the processes { n 0, n 0} and

    {

    0 n , n

    0

    } are indistinguishable. They are both Markov, the rst one being a

    realisation of our original Markov process, but the second one not.

    Dene now the random variable

    Z (i) = 0 (i).By the very denition of , Z (i) = Z ( j ) for all i, j . So we may set Z (i) = Z .

    Proposition 2. Z .

    Proof.

    P (Z = k) = limn P (Z = k, n)= lim

    n P ( 0 n (i) = k)

    = limn

    P ( n 0(i) = k) = (k),the latter equality following from the convergence X n.

    We now describe the actual algorithm. We think of an element S S either as a map : S S or as a vector ( (1) , (2) , . . . , (d)), where d is the cardinality of S . So, we maythink of 0, 1, 2, . . . as i.i.d. random vectors in S d . These vectors are needed at eachstep of the algorithm and we assume we have random number generator that allows picking

    these vectors. The algorithm can be written as:

    1. 0 = identity

    2. n = 1

    3. n = n 1 n 1

    4. If n , set n = n + 1 and repeat the previous step;otherwise, set Z = n (1) and stop.

    For example, let S =

    {1, 2, 3, 4

    }. Suppose that the k , k

    0, are drawn as follows:

    0 = (2 , 3, 2, 1) 1 = (2 , 1, 4, 3) 2 = (2 , 4, 2, 1) 3 = (2 , 1, 4, 3) 4 = (4 , 1, 1, 2) 5 = (1 , 2, 3, 3)

    8

  • 8/12/2019 Foss Lecture4

    9/12

  • 8/12/2019 Foss Lecture4

    10/12

    x):

    p(x, y ) = P ( (n 1) (n 1) = y | (n 1) = x)= P (x (n 1) = y)

    = P (x( 0(1)) = y(1) , . . . , x ( 0(d)) = y(d))

    =d

    i=1

    P (x( 0(i)) = y(i))

    =d

    i=1

    P ( 0(i)x 1(y(i)))

    =d

    i=1 jx 1 (y(i))

    pi,j (6)

    For this chain we observe that not only the set is closed, but also that each individual

    element of is an absorbing state. The elements of S S

    are transient.Example. Let us consider S = {1, 2} and transition probability matrix

    1 1 .

    The chains {n } and {n } take values in S S = S 2 = {11, 12, 21, 22}. Notice the propertiesmentioned before: The set is closed in both cases. Individual states of are absorbing inthe backward case. All other states are transient in both cases. We computed the transitionprobabilities following equations (5) and (6). For instance, with x = (21) and y = (11), wehave

    p(x, y ) = P ( 0(x(1)) = y(1), 0(x(2)) = y(2))= P ( 0(2) = 1 , 0(1) = 1) = (1 ),

    p(x, y ) = P ( 0(1)x 1(y(1)) , 0(2)x

    1(y(2))

    = P ( 0(1)x 1(1) , 0(2)x

    1(1))= P ( 0(1) = 2 , 0(2) = 2) = (1 ).

    The invariant probability measure of the original 2-state chain is = / ( + ), / ( + )).When the backward chain is absorbed at , each of its components has distribution . Itis easily checked that this is not the case, in general, with the forward chain. For instance,suppose = 1 and 0 < < 1. Then = ( / (1 + ), 1/ (1 + )). But forward couplingcan only occur at state 2. In other words, at time , the distribution of the chain takesvalue 2 with probability one. This is not the stationary distribution! (Of course, if theforward coupled is allowed to evolve ad innitum, it certainly converges to .) On the otherhand, it can be easily checked that the backward chain, starting at the state (12) (see nextgure), is absorbed at state (11) with probability / (1+ ) and at state (22) with probability1/ (1 + )). This is precisely the stationary distribution.

    10

  • 8/12/2019 Foss Lecture4

    11/12

    5 Examples of non-Harris Markov chains.

    Example 1. Consider circle of unit length and dene the dynamics as follows: move

    clockwise with equal probabilities for a distance of length either 1 / 2 or 1/ 2. Then, forany initial value X 0 , a distribution of X n converges weakly to a uniform distribution. Thelimiting distribution coincides with the Lebesgue measure while each distribution of X n isdiscrete. So, there is no convergence in the total variation.

    Example 2. Consider the autoregressive model on the real line

    X n +1 = n X n + 1

    where n are i.i.d. and take values 1 / 2 and 1/ 2 with equal probabilities. Again, for anyinitial non-random value X 0, each distribution of X n is discrete and singular to the limitingdistribution which is a distribution of a random variable

    1 + 1

    i= n

    1

    j = i j .

    Example 3. Let A, B,C be three non-collinear points on the plane, forming the verticesof a triangle. Let X 0 be an arbitrary point on the plane. Toss a 3-sided coin and pick oneof the vertices at random. Join X 0 with the chosen vertex by a straight line segment andlet X 1 be the middle of the segment. Repeat the process independently. We thus obtaina Markov process ( X n , n

    0). The process has a unique stationary version constructed as

    follows. Let ( n , n Z ) be i.i.d. random elements of the plane, such that 1 is distributedon {A,B,C } uniformly. By assigning some Cartesian coordinates, deneX n +1 =

    12

    (X n + n ).

    Iterate the recursion, starting from m and ending up at n m. Fix n and let m .We obtainX n =

    k=1

    12k

    n k .

    The so-constructed process is Markovian with the required transitions and is also a sta-

    tionary process. There is no other stationary solution to the recursion. So the stationarydistribution is the distribution of the random variable

    k=1

    12k

    k .

    It is not hard to see that the stationary distribution is supported on a set of Lebesguemeasure zero (the so-called Sierpinski gasket).

    11

  • 8/12/2019 Foss Lecture4

    12/12

    6 Discussion

    1. Show, in detail, that if X n is non-periodic positive Harris recurrent then the law of X n converges in total variation to the unique invariant probability measure.

    2. Show that any irreducible, (positive) recurrent Markov chain in a countable state spaceS is (positive) Harris recurrent.

    3. Let X a,b be a random variable with P (X a,b = a) = p, P (X a,b = b) = 1 p, EX a,b = 0.Let Y be another 0-mean random variable. Show that, by randomising a, b, we canwrite Y = X A,B . Use this to couple any 0-mean random variable and a Brownianmotion.

    4. Why is the process of the last example non-Harris? Find a formula for the character-istic function of the stationary distribution.

    5. In the same example, let X 0 have a law with density. Will ( X n ) couple with itsstationary version?

    6. How are Lyapunov function methods used for checking positive Harris recurrence?(For the countable state-space case check the excellent book of Bremaud (2001).)

    References

    Asmussen, S. (2003). Applied Probability and Queues. Springer.

    Br emaud, P. (2001). Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues.Springer.

    Doeblin, W. (1940). Elements dune theorie g nerale des chanes simples constantes deMarkoff. Ann. Sci. Ecole Normale Sup. 57, 61111.

    Foss, S.G., Tweedie, R.L. (1998). Perfect Simulation and Backward Coupling. Stochas-tic Models 14, 187204.

    Foss, S. and Konstantopoulos, T. (2004). An overview of some stochastic stabilitymethods. J. Oper. Res. Soc. Japan , 275303.

    Harris, T.E. (1956). The existence of stationary measures for cerain Markov processes.Proc. 3d Berkeley Symp. Math. Stat. Prob. 2, 113124.

    Meyn, S. and Tweedie, R. (1993). Markov Chains and Stochastic Stability. Springer,

    New York.Nummelin, E. (1984) General Irreducible Markov Chains and Non-Negative Operators.

    Cambridge University Press.

    Orey, S. (1959). Recurrent Markov chains. Pacic J. Math. 9, 805827.

    Orey, S. (1971). Limit Theorems for Markov Chain Transition Probabilities. Van Nos-trand Reinhold.

    Stroock, D. (2005) An Introduction to Markov Processes. Springer.