large deviation asymptotics for occupancy problems · introduction. urn occupancy problems center...

57
arXiv:math/0410174v1 [math.PR] 6 Oct 2004 The Annals of Probability 2004, Vol. 32, No. 3B, 2765–2818 DOI: 10.1214/009117904000000135 c Institute of Mathematical Statistics, 2004 LARGE DEVIATION ASYMPTOTICS FOR OCCUPANCY PROBLEMS 1 By Paul Dupuis 2 , Carl Nuzman and Phil Whiting Brown University, Bell Labs and Bell Labs In the standard formulation of the occupancy problem one con- siders the distribution of r balls in n cells, with each ball assigned independently to a given cell with probability 1/n. Although closed form expressions can be given for the distribution of various inter- esting quantities (such as the fraction of cells that contain a given number of balls), these expressions are often of limited practical use. Approximations provide an attractive alternative, and in the present paper we consider a large deviation approximation as r and n tend to infinity. In order to analyze the problem we first consider a dynam- ical model, where the balls are placed in the cells sequentially and “time” corresponds to the number of balls that have already been thrown. A complete large deviation analysis of this “process level” problem is carried out, and the rate function for the original prob- lem is then obtained via the contraction principle. The variational problem that characterizes this rate function is analyzed, and a fairly complete and explicit solution is obtained. The minimizing trajec- tories and minimal cost are identified up to two constants, and the constants are characterized as the unique solution to an elementary fixed point problem. These results are then used to solve a number of interesting problems, including an overflow problem and the partial coupon collector’s problem. 1. Introduction. Urn occupancy problems center on the distribution of r balls in n cells, typically with each ball independently assigned to a given cell with probability 1/n. The literature on the general topic is enormous. See, for example, [9, 19, 20] and the references therein. Received April 2002; revised October 2003. 1 Supported in part by NSF Grant DMS-03-06070. 2 Supported in part by NSF Grants DMS-00-72004, ECS-99-79250 and Army Research Office Grants DAAD19-00-1-0549 and DAAD19-02-1-0425. AMS 2000 subject classifications. 60F10, 65K10, 49N99. Key words and phrases. Occupancy problems, urn models, large deviations, calculus of variations, sample paths, explicit solutions, combinatorics, Euler–Lagrange equations. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The Annals of Probability, 2004, Vol. 32, No. 3B, 2765–2818. This reprint differs from the original in pagination and typographic detail. 1

Upload: others

Post on 13-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

arX

ivm

ath

0410

174v

1 [

mat

hPR

] 6

Oct

200

4

The Annals of Probability

2004 Vol 32 No 3B 2765ndash2818DOI 101214009117904000000135ccopy Institute of Mathematical Statistics 2004

LARGE DEVIATION ASYMPTOTICS FOR OCCUPANCY

PROBLEMS1

By Paul Dupuis2 Carl Nuzman and Phil Whiting

Brown University Bell Labs and Bell Labs

In the standard formulation of the occupancy problem one con-siders the distribution of r balls in n cells with each ball assignedindependently to a given cell with probability 1n Although closedform expressions can be given for the distribution of various inter-esting quantities (such as the fraction of cells that contain a givennumber of balls) these expressions are often of limited practical useApproximations provide an attractive alternative and in the presentpaper we consider a large deviation approximation as r and n tend toinfinity In order to analyze the problem we first consider a dynam-ical model where the balls are placed in the cells sequentially andldquotimerdquo corresponds to the number of balls that have already beenthrown A complete large deviation analysis of this ldquoprocess levelrdquoproblem is carried out and the rate function for the original prob-lem is then obtained via the contraction principle The variationalproblem that characterizes this rate function is analyzed and a fairlycomplete and explicit solution is obtained The minimizing trajec-tories and minimal cost are identified up to two constants and theconstants are characterized as the unique solution to an elementaryfixed point problem These results are then used to solve a number ofinteresting problems including an overflow problem and the partialcoupon collectorrsquos problem

1 Introduction Urn occupancy problems center on the distribution ofr balls in n cells typically with each ball independently assigned to a givencell with probability 1n The literature on the general topic is enormousSee for example [9 19 20] and the references therein

Received April 2002 revised October 20031Supported in part by NSF Grant DMS-03-060702Supported in part by NSF Grants DMS-00-72004 ECS-99-79250 and Army Research

Office Grants DAAD19-00-1-0549 and DAAD19-02-1-0425AMS 2000 subject classifications 60F10 65K10 49N99Key words and phrases Occupancy problems urn models large deviations calculus of

variations sample paths explicit solutions combinatorics EulerndashLagrange equations

This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in The Annals of Probability2004 Vol 32 No 3B 2765ndash2818 This reprint differs from the original inpagination and typographic detail

1

2 P DUPUIS C NUZMAN AND P WHITING

There are many different questions one can pose For example it maybe that one is interested in the distribution of (Γ0Γ1 ) where Γi is thenumber of cells containing exactly i balls after all r balls have been thrownIn the classical occupancy problem [9 14 20] one is interested only in thedistribution of unoccupied urns Γ0 In other cases one might be interested inthe (random) number of balls required to fill all cells to a given level or thenumber required so that a given fraction are filled to that levelmdashthe so calledcoupon collectorrsquos or dixie cup problem In biology the inverse problem ofestimating the number of balls thrown from the number of occupied cells alsoarises and is used to estimate species abundance [17] Related applicationsin biology appear in [2 3] and an application in computer science appearsin [21]

Another related problem of interest is the overflow problem in which theurns are supposed to have a finite capacity C and the number of balls thatoverflow is the random variable of interest Ramakrishna and Mukhopad-hyay [24] describes an application in computer science concerned with mem-ory access and [12] considers an application to optical switches In [12] oneis concerned with dimensioning the number of wavelength converters so asto reduce the probability of packet loss across the switch to an acceptablelevel In [12] any-color-to-any-color converters are considered However byextending the results proved here to the case where the balls have distinctcolors dimensioning in the case where we have many-to-one color converterscan be carried out and the blocking probability of the output estimated

A wide range of results have been proved for the occupancy problemby using ldquoexactrdquo approaches For example combinatorial methods are usedin [9 14 20] and methods that utilize generating functions are discussedin [20] The implementation of these results however can be difficult For ex-ample in applying the combinatorial results one must compute the differencebetween large quantities that appear in the inclusionndashexclusion formula forthe probability that a given fraction of cells are occupied An analogous dif-ficulty occurs with techniques based on moment generating functions sinceone must invert the generating function itself

Asymptotic methods provide an attractive alternative to both of these ap-proaches One reason is that they often offer good approximations with onlya modest computational effort A second perhaps more important reasonis that superior qualitative insights can often be obtained Indeed a rangeof asymptotic results have already been obtained for these problems (seeeg [18]) The first large deviations principle (LDP) for urn problems thatwe are aware of was established in [28] for the special case of the classicaloccupancy problem This result was applied in [21] to a boolean satisfiabil-ity problem in computer science Reference [6] which appeared while thepresent paper was under review proves a LDP for the infinite-dimensionaloccupancy measures associated with occupancy processes The present paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 3

focuses on finite-dimensional occupancy measures in which urn occupanciesabove a given level are not distinguished In this finite case we are able toprovide a concise large deviations proof along with explicit insightful andcomputable expressions for the rate functions and for the large deviationsextremals The rate function for the occupancy model after all the balls havebeen thrown is shown to have a simple and fairly explicit rate function whichcan be defined in terms of relative entropy with respect to the Poisson dis-tribution Many different problems can be solved in this framework simplyby changing the set over which the rate function is minimized We also givesample path results for the evolution of the urn occupancies toward a partic-ular event In principle the explicit form of the minimizing trajectories forsample path results should enable accurate empirical estimates of unlikelyevents using importance sampling see [7 22]

Let lfloorarfloor denote the integer part of a scalar a With n cells available anda total of r = lfloorβnrfloor balls to be thrown (with β gt 0) we consider the largedeviation asymptotics as nrarr infin The precise statement is as follows Fixa positive integer I Then with Γn

i (β) equal to the fraction of cells contain-ing i balls we characterize the large deviation asymptotics of the randomvectors (Γn

0 (β)Γn1 (β) Γ

nI (β)) n= 12 as nrarrinfin A direct analysis

of this problem is difficult and in fact it turns out to be simpler to firstldquoliftrdquo the problem to the level of a sample path large deviation problem andthen use the contraction principle to reduce to the original finite-dimensionalproblem A ldquotimerdquo variable x is introduced into the problem where lfloornxrfloorballs have been thrown at time x and Γn

i (x) is equal to the fraction of cellscontaining i balls at this time We then follow a standard program the largedeviation properties of this Markov process are analyzed the rate function Jon path space is obtained and the rate function for the occupancy at time βis then characterized as the solution to a variational problem involving J

Although the program is standard there are several very interestingfeatures both qualitative and technical which distinguish this large de-viation problem We first describe some of the attractive qualitative fea-tures Typically one has a rate function on path space of the form J(φ) =int

L(φ(x) φ(x))dx where the nonnegative function L(γ ξ) is jointly lowersemi-continuous and convex in ξ for each fixed γ The large deviation prop-erties of the process at time β are then found by solving a variational problemof the form

infJ(φ) φ(β) = ω

where ω is given and where there will also be constraints on the initial con-dition φ(0) In general this problem does not have an explicit closed formsolution One exception to this rule is the extraordinarily simple situationwhere L(γ ξ) does not depend on the state variable γ In this case Jensenrsquos

4 P DUPUIS C NUZMAN AND P WHITING

inequality implies that the minimizing trajectory is a straight line and sothe variational problem is actually finite dimensional Another exception isthe case of large deviation asymptotics of a small noise linear stochastic dif-ferential equation In this case the variational problem takes the form of theclassical linear quadratic regulator and the explicit solution is well knownfrom the theory of deterministic optimal control However in this case thereis no need to ldquoliftrdquo the problem to the sample path level since the distribu-tion of the diffusion at any time is Gaussian with explicitly calculable meanand covariance Other exceptions occur in large deviation problems fromqueuing theory but in these problems the variational integrand is ldquosection-allyrdquo independent of γ and one can show that the minimizing trajectories arethe concatenation of a finite number of straight line segments (see eg [1]and the references therein)

For the occupancy problem the variational integrand has a complicatedstate dependence [see (22)] reflecting the complicated dependence of thetransition probabilities (in the process level version) on the state Nonethe-less the rate function possesses a great deal of structure that can be heavilyexploited For example the function J turns out to be strictly convex onpath space and so all local minimizers in the variational problem are actu-ally global minimizers Perhaps even more surprising is the fact that explicitsolutions to the EulerndashLagrange equations can be constructed and as a con-sequence the variational problem can be more-or-less solved explicitly (seethe Appendix) Both of these properties follow from the fact that the vari-ational integrand L can be defined in terms of the famous relative entropyfunction (or divergence)

A technical novelty of the problem is the singular behavior of the tran-sition probabilities of the underlying Markov process Since only cells con-taining j balls can become cells with j + 1 it is clear that the transitionprobability corresponding to such an event scales linearly with Γn

j (x) andin particular that it vanishes at the boundary of the state space whenΓnj (x) = 0 This poses no difficulty for the large deviations upper bound but

is an obstacle for the lower bound (Some results that address lower boundswhen rates go to zero include Chapter 8 of [26] which treats processes withldquoflatrdquo boundaries and recent general results in [27]) For the occupancymodel existing results provide a lower bound for open sets of trajectoriesthat do not touch the boundary because away from the boundaries the setξ L(Γ ξ)ltinfin is independent of Γ To deal with more general open setswe use a perturbation argument We first show using the strict convexityof J and properties of the zero cost paths that it is enough to prove large de-viation lower bounds for open neighborhoods of trajectories that stay awayfrom the boundary for all positive times (Note that this still allows thetrajectory to start on the boundary) Loosely speaking to prove the lowerbound for sets of this type it is enough to show that given a gt 0 there is

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 5

b gt 0 such that the probability that the process is at least distance b fromthe boundary by time b is bounded below by expminusna It turns out that thesebounds can be easily established by exploiting an explicit representation forsuch probabilities that was proved in [11]

An outline of the paper is as follows Section 2 states the main resultsof the paper In the first part of Section 2 we construct the underlyingstochastic process model and state the corresponding sample path levelLDP as well as the LDP for the terminal distribution The proof of thesample path LDP is given in the following section although in Section 2an additional heuristic argument for the form of the local rate functionbased on Sanovrsquos theorem is provided In the second part of Section 2 theterminal distribution rate function is characterized and explicit expressionsfor the sample path minimizers are presented These constitute more-or-lesscomplete solutions to the corresponding calculus of variations problem Thedetailed proofs of these latter results are deferred to the Appendix

Section 3 gives a proof of the sample path LDP The solutions to thevariational problem are exemplified in Section 4 where we show how specificquestions regarding the occupancy problem can be answered In particularwe work out the asymptotics for a number of examples including an overflowproblem and a partial coupon collection problem Generalizations to ourresults are also described

2 Main results

21 Statement of the LDP In this section we formulate the problem ofinterest and state the LDP The proof will be given in Section 3 As notedin the Introduction our focus is the asymptotic behavior of the occupancyproblem If n denotes the total number of cells then to have a nontriviallimit the number of balls placed in the cells should scale linearly with nWe will place lfloorβnrfloor balls in the cells where β isin (0infin) is a fixed parameterand lfloorarfloor denotes the integer part of a

As will be seen in the sequel the large deviation asymptotics of the oc-cupancy should be treated by first lifting the problem to the level of samplepath large deviations and then using the contraction principle to reduce tothe original problem To do this we introduce a ldquotimerdquo variable x that rangesfrom 0 to β At time x one should imagine that lfloornxrfloor balls have been thrownThus the occupancy process will be piecewise constant over intervals of theform [in in+1n) With this scaling of time large deviation asymptoticscan be obtained if we scale space by a factor of 1n Thus we define the ran-dom occupancy process Γn(x)

= (Γn

0 (x) ΓnI (x)Γ

nI+(x)) by letting Γn

i (x)i= 0 I denote 1n times the number of cells with exactly i balls attime x and letting Γn

I+(x) be 1n times the number of cells with more than I

6 P DUPUIS C NUZMAN AND P WHITING

balls at time x Note that Γn takes values in the set of probability vectorson I + 2 points SI

= γ isinR

I+2 γj ge 00le j le I +1 andsumI+1

j=0 γj = 1The process Γn(in) i= 01 is obviously Markovian It will be con-

venient to work with the following ldquodynamical systemrdquo representation

Γn(

i+1

n

)

=Γn(

i

n

)

+1

nbi

(

Γn(

i

n

))

where the independent and identically distributed random vector fields bi(middot)i= 01 have distributions

Pbi(γ) = v=

γj if v = ej+1 minus ej 0le j le IγI+1 if v = 0

Here ej represents the vector in RI+2 whose jth element is unity and for

which all other elements are zero The occupancy after lfloorβnrfloor balls have beenthrown can then be represented by nΓn(β)

As we have discussed the large deviations behavior of Γn(β) will be de-duced from that of the process Γn(middot) In order to state the large deviationasymptotics precisely we should clarify the space in which Γn(middot) takes val-ues and the topology used on that space As usual for processes with jumpsof this sort we use the Skorokhod space D([0 β] RI+2) together with theSkorokhod topology [5] Chapter 4 However the large deviations propertiesof Γn are the same as for the processes Γn where Γn is defined to bethe piecewise linear process which agrees with Γn at times of the form inFor readers unfamiliar with the Skorokhod space and associated topologythe identical large deviations results hold for Γn save that the space ofcontinuous functions and the sup norm topology are used instead

To complete the statement of the LDP for Γn we need some additionalnotation A vector of (deterministic) occupancy rates θ(x) = (θ0(x) θI(x)θI+(x))is a measurable mapping from [0 β] to SI Intuitively these rates repre-sent the rate at which balls flow into urns of a given occupancy levelAssociated with each such vector of rates is the corresponding determin-istic occupancy function γ(x) = (γ0(x) γI(x) γI+(x)) which is definedby the initial condition γ(0) and the differential equations γ0(x) = minusθ0(x)γj(x) = θjminus1(x)minus θj(x) for j = 1 I and γI+(x) = θI(x) These equationsreflect the idea that the fraction of urns containing i balls increases as ballsenter (iminus 1)-occupied urns and decreases as balls enter i-occupied urnsDefining the matrix

M =

minus1 0 0 middot middot middot 01 minus1 0 middot middot middot 00 1 minus1 middot middot middot 0

0 middot middot middot 1 minus1 00 middot middot middot 0 1 0

(21)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 7

we can write γ(x) =Mθ(x) for all x isin [0 β] Conversely given a differentiableoccupancy function γ the corresponding rates are uniquely determinedby γ(x) =Mθ(x) and the normalization

sumIi=0 θi(x) + θI+(x) = 1 We will

also be interested in the cumulative occupancy function ψ with componentsψi(x) =

sumij=0 γi(x) Inspecting the cumulative sums of the rows of M shows

that ψi =minusθi for i= 0 I As more balls are thrown the fraction of urnscontaining i or fewer balls can only decrease and the rate of decrease is therate at which balls enter i-occupied urns We will say that γ is a valid occu-pancy function if γ is absolutely continuous if γ(x) is a probability vectorfor all x isin [0 β] and if its associated θ(x) is a probability vector for almostall x isin [0 β] Note that the functions ψ γ and θ are interchangeable in thesense that each can be derived from any of the others [given γ(0) in the caseof θ] Thus we say that θ and ψ are valid if the associated γ is valid Thefollowing lemma gives a direct characterization of validity in terms of ψ

Lemma 21 A vector of I + 1 continuous functions ψ each of whichmaps [0 β] to [01] is a valid cumulative occupancy path if and only if

(a) ψi(x)ge ψiminus1(x)(b) ψi(x)ge ψi(y)(c)

sumIk=0(ψk(x)minus ψk(y))le y minus x

for each 0le ile I and 0le x lt y le β

Sketch of the proof In the forward direction the first conditionplus the bound ψI(x)le 1 ensure that γ(x) is a probability distribution Thesecond and third conditions together imply that ψ is Lipschitz continuouswith constant 1 and hence absolutely continuous This implies the absolutecontinuity of γ Since ψ =minusθ the third condition ensures that θ is almostalways a probability distribution The reverse direction proceeds similarly

Given γ θ isin SI let D(θγ) denote relative entropy of θ with respect to γThus

D(θγ)=

I+1sum

i=0

θi log(θiγi)

with the understanding that θi log(θiγi)= 0 whenever θi = 0 and that

θi log(θiγi)=infin if θi gt 0 and γi = 0 If γ(x) is a valid occupancy function

with corresponding occupancy rates θ(x) then we set

J(γ) =

int β

0D(θ(x)γ(x))dx(22)

In all other cases set J(γ) =infin

8 P DUPUIS C NUZMAN AND P WHITING

Theorem 22 Suppose that the sequence of initial conditions Γn(0)n = 12 is deterministic and that it converges to α isin SI as nrarr infinThen the sequence of processes Γn n= 12 satisfies the LDP with ratefunction J In other words for any measurable set A of trajectories we havethe large deviation lower bound

lim infnrarrinfin

1

nlogPΓn isinA geminus infJ(γ) γ isinA γ(0) = α

and the large deviation upper bound

lim sup1

nlogPΓn isinA le minus infJ(γ) γ isin A γ(0) = α

and moreover for any compact set of initial conditions K and C ltinfinthe set

γ J(γ)leCγ(0) isinK

is compact

The proof of this result is provided in Section 3 However a formal justi-fication for the form of the rate function is as follows Let δ gt 0 be a smalltime increment Owing to the fact that Γn varies slowly when comparedto bi(γ) one expects

P

(

Γn(δ)minus Γn(0)

δasymp η

∣Γn(0) = γ

)

asymp P

(

1

lfloornδrfloorminus1sum

j=0

bj(γ)asymp η

)

where the symbol ldquoasymprdquo inside the probability means that the indicated quan-tities are within a small fixed constant ε gt 0 of each other Suppose that weinterpret γ as a probability measure on e0 eI+1 and let Yj be in-dependent and identically distributed (iid) with distribution γ Then thesequence of iid random vectors bj(γ) can be realized by setting

bj(γ) =

ei+1 minus ei0

lArrrArr Yj =

eieI+1

that is

bj(γ) =MYj

By Sanovrsquos theorem for any probability vector θ isin SI

P

(

1

lfloornδrfloorminus1sum

j=0

Yj asymp θ

)

asymp expminusnδD(θγ)

Thus

P

(

Γn(δ)minus Γn(0)

δasympMθ

∣Γn(0) = γ

)

asymp expminusnδD(θγ)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 9

Approximating an arbitrary trajectory by a piecewise linear trajectory withnearly equal cost and using the Markov property one expects

P (Γn asymp γ|Γn(0) = α)asymp expminusn

int β

0D(θ(t)γ(t))dt

where θ and γ are related by γ =Mθ γ(0) = α Thus the rate function onpath space should be J(γ)

The zero cost trajectories are of course the law of large numbers limitsand can easily be computed For example if α = (10 ) (so all cells areinitially empty) and i le I then J(γ) = 0 implies that γi(t) = eminusttii Inother words γ(t) is the Poisson distribution with mean t save that all masscorresponding to i gt I is collected together into the state I+1 Throughoutthis paper we will denote the Poisson distribution with mean t by P(t)where Pi(t) = eminusttii

We are primarily interested in the distribution of Γn(β) The contractionprinciple (eg [11] Theorem 132) implies the following variational repre-sentation for the rate function for Γn(β) n= 12 Let A(αωβ) denotethe set of valid occupancy paths γ on [0 β] satisfying γ(0) = α and γ(β) = ω

Corollary 23 Suppose that the sequence of initial conditions Γn(0)n= 12 is deterministic and that it converges to α as nrarrinfin Then thesequence of random vectors Γn(β) n= 12 satisfies the LDP with the con-vex rate function

J (ω) = infJ(γ) γ isinA(αωβ)

Remark 21 Using the explicit formula for J (ω) stated in Theorem 27and the convexity of relative entropy in its first argument it follows that J isin fact strictly convex

Remark 22 It is sometimes useful to show that the large deviationlower bound holds for sets with no interior relative to the ambient spaceSuch bounds can often be proved for processes such as ours that take valuesin a discrete lattice An example would be a set Asub γ isin SI γI+ = 0 forwhich the minimizing trajectories must be polynomial extremals (definedafter Theorem 26) Although we do not need such results in the presentpaper it is worth observing that lower bounds of this kind can be proved

22 Characterization of the terminal rate function and minimizing pathsThe results of this section were obtained using calculus of variations tech-niques the details of which are given in the Appendix The presentationbegins with the case in which all urns are initially empty because it appears

10 P DUPUIS C NUZMAN AND P WHITING

in many applications and because it is a building block for the general caseIn each case first we give a characterization of J (ω) as a minimal relativeentropy which may be computed explicitly in this form We then give anexplicit functional form for the sample path minimizer The function formconstain parameters determined by the solutions to fixed point equations

Before stating these theorems it is helpful to specify the domain overwhich J (ω) is finite We define an endpoint constraint to be a triple (αωβ)where α ω isin SI are the initial and terminal occupancies and where β gt 0is the number of balls thrown per urn We assume without loss that α0 gt 0that is that some fraction of urns are initially empty and we denote theindex set of positive initial occupancies by K= k αk gt 0 Since we do notdistinguish between urns having more than I balls we can always supposethat no urns initially have more than I+1 balls and denote the last elementof α by αI+1 The last element of ω is denoted ωI+ signifying that it collectsall urns with occupancy greater than or equal to I +1

Definition 21 A endpoint constraint (αωβ) is feasible if the corre-sponding set of valid occupancy paths A(αωβ) is nonempty

Lemma 24 An endpoint constraint (αωβ) is feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(23)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

k=0

kαk + β (conservation)(24)

This lemma is proved in the Appendix Condition (23) relates to the factthat the ψi(x) must decrease monotonically and (24) to a conservation con-straint for the number of balls thrown The right-hand side of the inequalityequals the initial number of balls per urn plus the additional balls per urnthrown up to time β while the left-hand side is a lower bound on the numberof balls per urn at time β

In the treatment of general initial conditions the following further defi-nition will be useful

Definition 22 A set of feasible constraints (αωβ) is irreducible ifthe monotonicity conditions (23) hold with strict inequality for all i lt I Otherwise the constraints are termed reducible

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 11

For a reducible set of constraints let i be the first index such thatsumi

k=0αk =sumi

k=0ωk This condition can only be met if no balls are everthrown into i-occupied urns and it follows that urns which initially con-tain i balls or less will always contain i balls or less Thus these urns maybe treated in isolation from the urns containing more than i balls Further-more by subtracting i+ 1 balls from each urn in this latter set we obtainanother occupancy problem in standard form that is with α0 gt 0 Contin-uing in this way a given problem with constraints (αωβ) may be dividedinto a finite number of isolated irreducible subproblems It is only neces-sary therefore to treat problems with irreducible constraints and wherevernecessary we suppose this to be the case

221 Empty initial conditions An important special case of the initialconstraint is when all urns are initially empty that is α0 = 1 and αi = 0 fori gt 0 We abuse notation and denote this case by α= 1

Define the set F (1 ω β) to be the set of distributions π on the nonnegativeintegers satisfying πi = ωi for i= 0 I and the constraint

infinsum

i=0

iπi = β

In the empty case the conditions for feasibility of (1 ω β) reduce tosumI

i=0 iωi+(I+1)ωI+ le β from which it follows that the set F is nonempty if and onlyif (1 ω β) is feasible As we now discuss the infimum of the rate function Jover paths A(1 ω β) can be represented as an infimum of a relative entropyover distributions in F (1 ω β)

Theorem 25 (Terminal rate function empty case) Given the initialcondition α = 1 the rate function J (ω) SI rarr R

+ defined in Corollary 23may be determined as

J (ω) = minπisinF (1ωβ)

D(πP(β))

if (1 ω β) is feasible and is otherwise infinite The minimizing argumentπlowast isin F (1 ω β) is unique

The above expression is of interest in its own right Moreover the optimalsolution πlowast can be computed explicitly Using Lagrange multipliers onecan show in fact that the solution takes the form πlowasti = CPi(ρβ) for i gt Ifor some constants C ge 0 and ρ gt 0 that we refer to as twist parametersHere ρ is related to the Lagrange multiplier for the conservation constraintsuminfin

i=0 iπlowasti = β while C is a normalization constant ensuring that

suminfini=0 π

lowasti = 1

These two constraints may be solved to determine ρ and C If we have the

12 P DUPUIS C NUZMAN AND P WHITING

strict equalitysumI

i=0 iωi + (I + 1)ωI+ = β there are just enough balls to meetthe terminal constraints and we may replace I by I+1 if necessary to ensurethat ωI+ = 1minus

sumIi=0ωi = 0 In this case it turns out that F (1 ω β) has only

one element we then have C = 0 and we may take ρ = 1 Otherwise wedefine ρ to be the unique positive root of the equation

ρβ minussumI

i=0 iPi(ρβ)

1minussumI

i=0Pi(ρβ)=β minus

sumIi=0 iωi

1minussumI

i=0ωi

(25)

Because of the strict inequality in the conservation condition (24) the right-hand side of the last equation is strictly greater than I+1 The left-hand sideof the equation is the conditional mean E[Y |Y gt I] of a Poisson randomvariable Y with mean ρβ As a function of ρ this conditional mean is astrictly monotonic and continuous map from (0infin) to (I+1infin) and hencethe equation has exactly one positive root C is given by

C =1minus

sumIi=0ωi

1minussumI

i=0Pi(ρβ)=

β minussumI

i=0 iωi

ρβ minussumI

i=0 iPi(ρβ)(26)

Evaluating the relative entropy of πlowast and P(β) the rate function of Γn(β)can be expressed as

J (ω) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

(27)

+

(

β minusIsum

i=0

iωi

)

log ρ

The next theorem shows that the least cost path γlowast satisfying J(γlowast) =J (ω) can also be expressed explicitly in terms of the twist parameters Cand ρ The least cost path is of interest for a number of reasons Firstof all the proof of Theorem 25 is obtained by evaluating (22) using theexplicit form of γlowast In a similar way the least cost paths may be used as atool in other problems of interest such as for determining the rate functionof Γn(β) when β is random Second the least cost path provides insightinto the expected behavior of occupancy experiments conditioned on theoccurrence of a rare event Third they allow empirical estimation of rareevent probabilities by change-of-measure importance sampling A key stepin proving the minimality of the proposed least cost path is to show thatthe path satisfies the EulerndashLagrange equations (defined in the Appendix)

Theorem 26 (Globally minimizing path empty case) Suppose that(1ωβ) are feasible constraints with empty initial conditions The infimum of

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 13

J(γ) over A(1 ω β) is achieved on the occupancy path γ isinA(1 ω β) definedby

γ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(28)

γi(x) =xi

i(minus1)iγ

(i)0 (x) 1le ile I(29)

γI+(x) = 1minusIsum

i=0

γi(x)

where ρ gt 0 and C ge 0 are twist parameters associated with the constraintsIn addition γ satisfies the EulerndashLagrange equations

Note that the entire path γ(x) is completely determined by the emptycomponent γ0(x) and its derivatives In particular the components γi(x) are

the terms in the Taylor expansion of γ0(x) about x γ0(x+ y) = γ0(x) + yγ(1)0 (x) + middot middot middot

evaluated at time 0 that is with y = minusx Note that γ0(x) is the sum ofa polynomial and a single exponential term so that the Taylor expansionalways exists When C = 0 there is no exponential term and we say that γ isa polynomial extremal Otherwise C gt 0 we have an exponential extremal

222 General initial conditions Occupancy problems with general ini-tial conditions may be thought of as a coupled set of problems with emptyinitial conditions In particular we consider the set of urns initially contain-ing k balls to form a class The evolution of excess balls (beyond k) enteringurns of this class may be denoted by occupancy functions of the form γkj(t)representing the fraction of balls initially having k urns which have k + jurns at time t The fraction of urns containing i balls in the overall system isobtained by summing contributions from all subproblem components withk+ j = i

As in the empty case the rate function J (ω) can be expressed as thesolution to a minimization problem Let Sinfin denote the set of distributionson the nonnegative integers and let Sn

infin denote the set of n-tuples of suchdistributions Recall that K is set of indices k such that αk gt 0 We will

denote an element of S|K|infin by π = π(0) π(k) π(K) where for any

k isin K a component distribution is denoted π(k) = πk0 πk1 and it isunderstood that the corresponding π(k) is omitted if αk = 0 Finally let

F (αβω) be the set of π isin S|K|infin which satisfies the terminal constraints

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(210)

14 P DUPUIS C NUZMAN AND P WHITING

along with the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(211)

As in the empty case it may be established that F is nonempty if and onlyif (αωβ) is feasible

Theorem 27 (Terminal rate function general case) The rate functionJ (ω) SI rarrR

+ defined in Corollary 23 may be expressed

J (ω) = minπisinF (αωβ)

sum

kisinK

αkD(π(k)P(β))

whenever (αβω) are feasible and is infinite otherwise The minimizingargument πlowast isin F (αωβ) is unique

As discussed above we may suppose the constraints are irreducible andthen as shown in the Appendix Lagrange multipliers will always exist forthis problem When we have strict inequality in the conservation condi-tion (24) (the exponential case) the solution takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le ICkPj(ρβ) k isinK k+ j gt I

In the case of equality in (24) (the polynomial case) the correspondingform is

πlowastkj =

DkPj(β)Wk+j k isinK j + k le I0 k+ j gt I

As for empty initial conditions ρ may be associated with the conserva-tion condition The Wi correspond to the terminal constraints ωi and theCkDk are normalization constants The constants Ck ρ Wi and Di canall be computed numerically using Lagrangian methods for constrained op-timization (see eg [4]) Given these constants the optimizing trajectoryγ(t) may be constructed explicitly This construction may be most simplyexpressed in terms of the minimizing trajectories in the empty case Fork isinK denote the mean of πlowast(k) by βk =

suminfinj=0 jπ

lowastkj and define the terminal

condition ω(k) isin SIminusk by ωkj = πlowastkj for j = 0 Iminusk Then the constraints(1 ω(k) βk) are feasible constraints and Theorem 26 determines the associ-ated least cost paths which we denote γ(k) = γkj(x) The least cost pathsfor the subproblems combine to form the overall least cost path

Theorem 28 (Globally minimizing path general case) For irreducible

feasible constraints (αωβ) let πlowast isin S|K|infin be the unique minimizing distri-

bution in Theorem 27 and let the functions γkj [0 βk]rarr [01] be the min-imizing paths corresponding to the subproblems (1 ω(k) βk) The infimum

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 15

of J(γ) over A(αωβ) is achieved on the occupancy path γ isin A(αωβ)defined by

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

In addition γ satisfies the EulerndashLagrange equations

3 Proof of Theorem 22 The purpose of this section is to prove themain large deviations result We recall the processes and notation definedat the beginning of Section 2

For ζ isinRI+2 and γ isin SI define

H(γ ζ) = log(E[exp(ζ middot bi(γ))])

where ldquomiddotrdquo denotes inner product Since the support of bi(γ) is boundeduniformly in i and γ there exists a function h Rrarr R such that H(γ ζ)leh(|ζ|) for all ζ and γ Also since the distribution of bi(γ) is weakly continuousin γ H(γ ζ) is jointly continuous It follows from [10] Theorem 41 thatthe sequence Γn n= 12 satisfies a large deviation upper bound witha rate function J which we now define Let L be the LegendrendashFencheltransform of H(γ ζ) in ζ

L(γ η) = supζisinRI+2

[ζ middot ηminus H(γ ζ)]

If γ(x)0 le x le β is an absolutely continuous function that takes valuesin SI then

J(γ) =

int β

0L(γ(x) γ(x))dx

If γ is not absolutely continuous then J(γ) = infin ([10] assumes that thevector fields bi(γ) are defined for all γ isin R

I+2 It is easy to check that wecan extend the definition to this set with the bound H(γ ζ) le h(|ζ|) andthe continuity of H(γ ζ) preserved However if the process Γn starts in SI then it stays in SI and so the exact form of the extension has no effect onthe rate function)

We recall the definition of the matrix M given in (21) To complete theproof of the upper bound we must show that J = J where J is defined asin (22) This will hold if we can show that L(γ η) is finite only when η =Mθfor a unique probability vector θ and that in this case

L(γMθ) =D(θγ)

16 P DUPUIS C NUZMAN AND P WHITING

It is well known that L(γ η) is finite if and only if η is in the convexhull of the support of bi(γ) (see eg [11] Lemma 623(d)) Therefore ifL(γ η)ltinfin then η can be written as a convex combination of the form

θ0(e1 minus e0) + middot middot middot+ θI(eI+1 minus eI)

where θj ge 0 andsumI

j=0 θj le 1 Since the vectors (ej+1 minus ej) j = 01 I

are linearly independent these values are unique Setting θI+1 = 1minussumI

j=0 θj we have η =Mθ for a unique probability vector θ Now assume that η takesthis form Then

L(γMθ) = supζisinRI+2

[ζ middotMθminus log(E[exp(ζ middot bi(γ))])]

= supζisinRI+2

[

MT ζ middot θminus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

= supζisinRI+2

[

Isum

j=0

(ζj+1 minus ζj)θj minus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

Given any values micro0 microI+1 we can define ζ0 ζI+1 recursively by ζ0 =minusmicroI+1

and ζj+1minus ζj = microjminusmicroI+1 With these definitions it is apparent that the lastdisplay is equal to

supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1)minus log

([

Isum

j=0

γj exp(microj minus microI+1)

]

+ γI+1

)]

= supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1) + microI+1 minus log

(

I+1sum

j=0

γj expmicroj

)]

= supmicroisinRI+2

[

I+1sum

j=0

microjθj minus log

(

I+1sum

j=0

γj expmicroj

)]

According to the DonskerndashVaradhan variational formula for relative entropy(eg [11] Lemma 143(a)) the last display equals D(θγ) thereby com-pleting the proof of the upper bound

We turn now to the proof of the lower bound In this proof we will as-sume that γ0(0)gt 0 Since γ0 is always nonincreasing γ0(0) = 0 implies thatγ0(x) = 0 for all x and therefore under this condition the first componentplays no significant role A proof analogous to the one given below applieswhen γ0(0) = 0 where the role of γ0(0) here is played by the first positivecomponent of γ(0)

Let PΓn(0) [resp EΓn(0)] denote probability (resp expected value) givena deterministic initial occupancy Γn(0) To prove the large deviation lower

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 17

bound it suffices to show that given any ε gt 0 and δ gt 0 there is η gt 0 suchthat for any initial occupancies satisfying |Γn(0)minus γ(0)|lt η

lim infnrarrinfin

1

nlogPΓn(0)

(

sup0lexleβ

|Γn(x)minus γ(x)|lt δ

)

geminusJ(γ)minus ε(31)

Of course this inequality is trivial if J(γ) = infin and so we assume thatJ(γ)ltinfin

As we have remarked a source of difficulty is the singular behavior ofthe transition rates of the process when Γn is near the boundary of SI We first show that this can be avoided at all times save x= 0 To do thiswe show that for any a gt 0 there exist b gt 0 K isin N and an occupancyfunction y such that y(0) = γ(0) sup0lexleβ |y(x) minus γ(x)| lt a yj(x) gt bxK

for all j = 01 I I+ and 0lt xle β and such that

J(y)le J(γ)

Consider the zero cost trajectory defined by

z(x) =Mz(x) z(0) = γ(0)

We have the following expression for zj when j le I

zj(x) =

[ jsum

k=0

γk(0)x(jminusk)(j minus k)

]

eminusx(32)

It is easy to check from this explicit formula that zj(x) gt bxK for someb gt 0 K = I and all j = 01 I I+ and 0 lt x le β For ρ isin (01) letyρ = ρz + (1minus ρ)γ Then yρ is the occupancy function that corresponds tothe rate ρz + (1minus ρ)θ Using the joint convexity of relative entropy in bothvariables ([11] Lemma 143(b)) and the fact that D(zz) = 0 we have

J(yρ) =

int β

0D(ρz(x) + (1minus ρ)θ(x)ρz(x) + (1minus ρ)γ(x))dx

le ρ

int β

0D(z(x)z(x)) dx+ (1minus ρ)

int β

0D(θ(x)γ(x))dx

= (1minus ρ)J(γ)

All required properties are then obtained by letting y = yρ for suitably smallρ isin (01)

It follows that in proving the lower bound we can assume without loss ofgenerality that for some fixed constants b gt 0 and K isinN γi(x)ge bxK for allx isin [0 β] We now return to the proof of the lower bound Our first objectiveis to show that the process can be moved into a small neighborhood of γ(τ)

18 P DUPUIS C NUZMAN AND P WHITING

(for τ gt 0 small) with sufficiently high probability Given τ isin (0 β] σ gt 0and ε gt 0 define

h(γ) =

0 |γ minus γ(τ)|lt σ22ε else

For n large enough that |γ(lfloornτrfloorn) minus γ(τ)| le σ2 (and independent ofτ isin (0 β]) we have the inequality

PΓn(0)(|Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ) + eminus2nε

ge PΓn(0)(|Γn(lfloornτrfloorn)minus γ(τ)|lt σ2) + eminus2nε

geEΓn(0)(expminusnh(Γn(lfloornτrfloorn)))

We next exploit a representation for exponential integrals that will give usan explicit lower bound on the last quantity Consider a process Γn(x) con-structed as follows The process dynamics are of the same general structureas those of Γn save that bi(Γ

n(in)) is replaced by a sequence bni

Γn(

i+1

n

)

= Γn(

i

n

)

+1

nbni Γn(0) = Γn(0)

Furthermore the distribution of bni is allowed to depend in any measurableway upon the set of values Γn(jn)0le j le i Let microγ denote the distribu-tion of bi(γ) and (without explicitly exhibiting all the dependencies) let micronidenote the (random) distribution of bni given Γn(jn)0 le j le i We letEΓn(0) denote expectation on the space that supports these processes Itfollows from [11] Theorem 431 that

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

= inf EΓn(0)

[

h

(

Γn(

lfloornτrfloor

n

))

+1

n

lfloornτrfloorminus1sum

i=1

D(microni microΓn(in))

]

where the infimum is over all such processes Γn(x) In order to obtain a lowerbound we now simply insert a particular choice for the random variables bni We can write γ(τ) minus γ(0) =Mvτ for some probability vector v Define aprocess bni as follows

bni =

ej minus ejminus1 if

lfloor jminus1sum

k=0

vkτn

rfloor

le ile

lfloor jsum

k=0

vkτn

rfloor

minus 1 for 0le j le I

0 if

lfloor

Isum

k=0

vkτn

rfloor

le ile lfloorτnrfloor minus 1

In other words bni defines a deterministic discrete time approximation tothe continuous time occupancy rate process that uses e0 minus e1 for an amount

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 19

of time v0τ e1 minus e2 for an amount of time v1τ and so on This continuoustime process will move the occupancy process from γ(0) to γ(τ) at time τ If ej minus ejminus1 is used at the discrete time step i then since microni concentratesits mass on ej minus ejminus1 the cost is

D( microni microΓn(in)) = log

(

1

Γnjminus1(in)

)

=minus log

(

Γnjminus1

(

i

n

))

(33)

The process Γn(in) possesses important monotonicity and convergence

properties Since Γnjminus1((i+1)n)minus Γn

jminus1(in) =minus1n for lfloorsumjminus1

k=0 vkτnrfloor le ile

lfloorsumj

k=0 vkτnrfloorminus 1

Γnjminus1

(

i

n

)

darr Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

as i uarr lfloorsumj

k=0 vkτnrfloor In addition because the (j minus 1)st component is never

modified when ige lfloorsumj

k=0 vkτnrfloor it follows that

Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

rarr γjminus1(τ)

as nrarrinfin and ηrarr 0 Furthermore as observed previously (32) implies theexistence of b gt 0 and K isinN such that γj(τ)ge bτK for j = 0 I Thus atany given time step i we have a strictly positive lower bound on the relevantcomponent of Γn which in turn provides a strictly finite upper bound on thecorresponding relative entropy cost Indeed it follows from (33) and γj(τ)gebτK that for all sufficiently large n and small η gt 0 there are C1C2 ltinfin(and independent of τ ) such that whenever |Γn(0)minus γ(0)|lt η for all i

D(microni microΓn(in))leC1[minus log τK ]leminusC2 log τ

In addition as nrarrinfin and ηrarr 0

Γn(

1

nlfloorτnrfloor

)

rarr γ(τ)

By the Lebesgue dominated convergence theorem for all sufficiently large nand sufficiently small η gt 0 |Γn(0)minus γ(0)|lt η implies

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

leminusC2τ log τ

We now choose τ gt 0 so that minusC2τ log τ le ε2 Choosing τ gt 0 smaller ifneed be we can also guarantee that |Γn(x)minusγ(x)| le δ for all x isin [0 τ ] wp1if |Γn(0)minus γ(0)| lt η and η gt 0 is sufficiently small The following bound is

20 P DUPUIS C NUZMAN AND P WHITING

therefore valid for the given τ gt 0 for any σ gt 0 and all sufficiently smallη gt 0 |Γn(0)minus γ(0)|lt η implies

lim infnrarrinfin

1

nlogPΓn(0)

(∣

Γn(

lfloornτrfloor

n

)

minus γ

(

lfloornτrfloor

n

)∣

ltσ

supxisin[0τ ]

|Γn(x)minus γ(x)|lt δ

)

geminusε

2

Note that the asymptotic lower bound on the normalized log of the proba-bility is independent of σ gt 0 To obtain the lower bound for all x isin [0 β]we will use the Markov property and an existing lower bound for pathswhich avoid that boundary This latter lower bound will hold uniformly ina neighborhood of the initial condition γ(τ) Since we do not know a priorihow small this neighborhood must be it is important that the lower boundin the last display should be independent of σ gt 0

Now choose ζ isin (0 δ] such that γ(x) is at least distance 2ζ from theboundary of SI for all x isin [τ β] Recall that when considered as a func-tion of γ the distribution of bi(γ) is continuous in the weak topology andmoreover that the support of this distribution is independent of γ so longas γi gt 0 for all i isin 01 I + 1 (ie γ isin S

I where SI denotes interior

relative to the smallest affine space that contains SI) It then follows fromProposition 661 of [11] (see also the discussion on [11] page 165 regardinguniformity) that Γn n= 12 satisfies the following uniform large devi-ations lower bound given any ε gt 0 and ζ gt 0 defined above there is σ gt 0such that as long as |Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ

lim infnrarrinfin

1

nlogPΓn(lfloornτrfloorn)lfloornτrfloorn

(

supτlexleβ

|Γn(x)minus γ(x)|lt ζ

)

geminusJ(γ)minusε

2

where PΓn(lfloornτrfloorn)lfloornτrfloorn denotes probability given the occupancy levels Γn(lfloornτrfloorn)at time lfloornτrfloorn Proposition 661 of [11] assumes Condition 632 It is worthnoting that in the present setting this condition holds with the particularlysimple choice β = γ (using the notation of [11])

The lower bound (31) now follows by the Markov property and the lasttwo displays The proof that J has compact level sets is as in [10] andtherefore omitted

4 Examples and extensions In this section we apply the results of theprevious sections to three different occupancy problems We show how theparameters of interest may be computed numerically and plot the solutionsto the associated calculus of variations problems In Section 44 we list someother asymptotic problems of interest which can be solved by relativelystraightforward generalizations of the results presented in this paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 21

In the calculus of variations problems solved in Section 2 precise initialand terminal points were always given In typical applications one is inter-ested in the minimum value of the rate function over a constraint set Whenthe constraint set is sufficiently simple such problems may still be solvedeasily using the tools provided in Section 22 The three problems of thissection are of this type

Suppose that the event of interest is that the random endpoint Γn(β)should lie in a terminal constraint set Ω To apply the LDP for Γn(β)established by Corollary 23 one must compute exponents of the form

J (Ω) = infωisinΩ

J (ω)

Using Theorem 27 we can write

J (Ω) = infωisinΩ

infπisinF (αωβ)

Ksum

k=0

αkD(π(k)P(β))

(41)

= infπisinF (αΩβ)

Ksum

k=0

αkD(π(k)P(β))

where we have abused notation to define F (αΩ β) =⋃

ωisinΩF (αωβ)In many cases for example when the terminal set Ω is convex and defined

by linear constraints the exponent J can be computed directly from (41)using Lagrange multipliers That is one solves minimization problems of thetype given in Theorem 27 but with the endpoint constraints (210) replacedby constraints defining Ω This is the approach used in the second andthird example below We do not prove that appropriate Lagrange multipliersalways exist if needed existence may be established using methods similarto those used in the Appendix for the case Ω= ω Because of the convexityof relative entropy in (41) a local minimum is always a global minimum overconvex sets Hence in any particular scenario with convex Ω it is sufficientto establish a local minimum by numerically computing a set of Lagrangemultipliers

An alternative approach for computing J (Ω) could be to return to thesample path level and use natural boundary conditions on the extremalcurves (see [25])

A set Ω with interior Ω and closure Ω is a J -continuity set if and onlyif infωisinΩ J (ω) = infωisinΩJ (ω) For such sets the large deviations lower andupper bounds coincide so that minus lim1n logP (Γn(β) isinΩ) = infωisinΩJ (ω) Itmay readily be verified in each of the following examples that the event ofinterest is a J -continuity set

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 2: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

2 P DUPUIS C NUZMAN AND P WHITING

There are many different questions one can pose For example it maybe that one is interested in the distribution of (Γ0Γ1 ) where Γi is thenumber of cells containing exactly i balls after all r balls have been thrownIn the classical occupancy problem [9 14 20] one is interested only in thedistribution of unoccupied urns Γ0 In other cases one might be interested inthe (random) number of balls required to fill all cells to a given level or thenumber required so that a given fraction are filled to that levelmdashthe so calledcoupon collectorrsquos or dixie cup problem In biology the inverse problem ofestimating the number of balls thrown from the number of occupied cells alsoarises and is used to estimate species abundance [17] Related applicationsin biology appear in [2 3] and an application in computer science appearsin [21]

Another related problem of interest is the overflow problem in which theurns are supposed to have a finite capacity C and the number of balls thatoverflow is the random variable of interest Ramakrishna and Mukhopad-hyay [24] describes an application in computer science concerned with mem-ory access and [12] considers an application to optical switches In [12] oneis concerned with dimensioning the number of wavelength converters so asto reduce the probability of packet loss across the switch to an acceptablelevel In [12] any-color-to-any-color converters are considered However byextending the results proved here to the case where the balls have distinctcolors dimensioning in the case where we have many-to-one color converterscan be carried out and the blocking probability of the output estimated

A wide range of results have been proved for the occupancy problemby using ldquoexactrdquo approaches For example combinatorial methods are usedin [9 14 20] and methods that utilize generating functions are discussedin [20] The implementation of these results however can be difficult For ex-ample in applying the combinatorial results one must compute the differencebetween large quantities that appear in the inclusionndashexclusion formula forthe probability that a given fraction of cells are occupied An analogous dif-ficulty occurs with techniques based on moment generating functions sinceone must invert the generating function itself

Asymptotic methods provide an attractive alternative to both of these ap-proaches One reason is that they often offer good approximations with onlya modest computational effort A second perhaps more important reasonis that superior qualitative insights can often be obtained Indeed a rangeof asymptotic results have already been obtained for these problems (seeeg [18]) The first large deviations principle (LDP) for urn problems thatwe are aware of was established in [28] for the special case of the classicaloccupancy problem This result was applied in [21] to a boolean satisfiabil-ity problem in computer science Reference [6] which appeared while thepresent paper was under review proves a LDP for the infinite-dimensionaloccupancy measures associated with occupancy processes The present paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 3

focuses on finite-dimensional occupancy measures in which urn occupanciesabove a given level are not distinguished In this finite case we are able toprovide a concise large deviations proof along with explicit insightful andcomputable expressions for the rate functions and for the large deviationsextremals The rate function for the occupancy model after all the balls havebeen thrown is shown to have a simple and fairly explicit rate function whichcan be defined in terms of relative entropy with respect to the Poisson dis-tribution Many different problems can be solved in this framework simplyby changing the set over which the rate function is minimized We also givesample path results for the evolution of the urn occupancies toward a partic-ular event In principle the explicit form of the minimizing trajectories forsample path results should enable accurate empirical estimates of unlikelyevents using importance sampling see [7 22]

Let lfloorarfloor denote the integer part of a scalar a With n cells available anda total of r = lfloorβnrfloor balls to be thrown (with β gt 0) we consider the largedeviation asymptotics as nrarr infin The precise statement is as follows Fixa positive integer I Then with Γn

i (β) equal to the fraction of cells contain-ing i balls we characterize the large deviation asymptotics of the randomvectors (Γn

0 (β)Γn1 (β) Γ

nI (β)) n= 12 as nrarrinfin A direct analysis

of this problem is difficult and in fact it turns out to be simpler to firstldquoliftrdquo the problem to the level of a sample path large deviation problem andthen use the contraction principle to reduce to the original finite-dimensionalproblem A ldquotimerdquo variable x is introduced into the problem where lfloornxrfloorballs have been thrown at time x and Γn

i (x) is equal to the fraction of cellscontaining i balls at this time We then follow a standard program the largedeviation properties of this Markov process are analyzed the rate function Jon path space is obtained and the rate function for the occupancy at time βis then characterized as the solution to a variational problem involving J

Although the program is standard there are several very interestingfeatures both qualitative and technical which distinguish this large de-viation problem We first describe some of the attractive qualitative fea-tures Typically one has a rate function on path space of the form J(φ) =int

L(φ(x) φ(x))dx where the nonnegative function L(γ ξ) is jointly lowersemi-continuous and convex in ξ for each fixed γ The large deviation prop-erties of the process at time β are then found by solving a variational problemof the form

infJ(φ) φ(β) = ω

where ω is given and where there will also be constraints on the initial con-dition φ(0) In general this problem does not have an explicit closed formsolution One exception to this rule is the extraordinarily simple situationwhere L(γ ξ) does not depend on the state variable γ In this case Jensenrsquos

4 P DUPUIS C NUZMAN AND P WHITING

inequality implies that the minimizing trajectory is a straight line and sothe variational problem is actually finite dimensional Another exception isthe case of large deviation asymptotics of a small noise linear stochastic dif-ferential equation In this case the variational problem takes the form of theclassical linear quadratic regulator and the explicit solution is well knownfrom the theory of deterministic optimal control However in this case thereis no need to ldquoliftrdquo the problem to the sample path level since the distribu-tion of the diffusion at any time is Gaussian with explicitly calculable meanand covariance Other exceptions occur in large deviation problems fromqueuing theory but in these problems the variational integrand is ldquosection-allyrdquo independent of γ and one can show that the minimizing trajectories arethe concatenation of a finite number of straight line segments (see eg [1]and the references therein)

For the occupancy problem the variational integrand has a complicatedstate dependence [see (22)] reflecting the complicated dependence of thetransition probabilities (in the process level version) on the state Nonethe-less the rate function possesses a great deal of structure that can be heavilyexploited For example the function J turns out to be strictly convex onpath space and so all local minimizers in the variational problem are actu-ally global minimizers Perhaps even more surprising is the fact that explicitsolutions to the EulerndashLagrange equations can be constructed and as a con-sequence the variational problem can be more-or-less solved explicitly (seethe Appendix) Both of these properties follow from the fact that the vari-ational integrand L can be defined in terms of the famous relative entropyfunction (or divergence)

A technical novelty of the problem is the singular behavior of the tran-sition probabilities of the underlying Markov process Since only cells con-taining j balls can become cells with j + 1 it is clear that the transitionprobability corresponding to such an event scales linearly with Γn

j (x) andin particular that it vanishes at the boundary of the state space whenΓnj (x) = 0 This poses no difficulty for the large deviations upper bound but

is an obstacle for the lower bound (Some results that address lower boundswhen rates go to zero include Chapter 8 of [26] which treats processes withldquoflatrdquo boundaries and recent general results in [27]) For the occupancymodel existing results provide a lower bound for open sets of trajectoriesthat do not touch the boundary because away from the boundaries the setξ L(Γ ξ)ltinfin is independent of Γ To deal with more general open setswe use a perturbation argument We first show using the strict convexityof J and properties of the zero cost paths that it is enough to prove large de-viation lower bounds for open neighborhoods of trajectories that stay awayfrom the boundary for all positive times (Note that this still allows thetrajectory to start on the boundary) Loosely speaking to prove the lowerbound for sets of this type it is enough to show that given a gt 0 there is

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 5

b gt 0 such that the probability that the process is at least distance b fromthe boundary by time b is bounded below by expminusna It turns out that thesebounds can be easily established by exploiting an explicit representation forsuch probabilities that was proved in [11]

An outline of the paper is as follows Section 2 states the main resultsof the paper In the first part of Section 2 we construct the underlyingstochastic process model and state the corresponding sample path levelLDP as well as the LDP for the terminal distribution The proof of thesample path LDP is given in the following section although in Section 2an additional heuristic argument for the form of the local rate functionbased on Sanovrsquos theorem is provided In the second part of Section 2 theterminal distribution rate function is characterized and explicit expressionsfor the sample path minimizers are presented These constitute more-or-lesscomplete solutions to the corresponding calculus of variations problem Thedetailed proofs of these latter results are deferred to the Appendix

Section 3 gives a proof of the sample path LDP The solutions to thevariational problem are exemplified in Section 4 where we show how specificquestions regarding the occupancy problem can be answered In particularwe work out the asymptotics for a number of examples including an overflowproblem and a partial coupon collection problem Generalizations to ourresults are also described

2 Main results

21 Statement of the LDP In this section we formulate the problem ofinterest and state the LDP The proof will be given in Section 3 As notedin the Introduction our focus is the asymptotic behavior of the occupancyproblem If n denotes the total number of cells then to have a nontriviallimit the number of balls placed in the cells should scale linearly with nWe will place lfloorβnrfloor balls in the cells where β isin (0infin) is a fixed parameterand lfloorarfloor denotes the integer part of a

As will be seen in the sequel the large deviation asymptotics of the oc-cupancy should be treated by first lifting the problem to the level of samplepath large deviations and then using the contraction principle to reduce tothe original problem To do this we introduce a ldquotimerdquo variable x that rangesfrom 0 to β At time x one should imagine that lfloornxrfloor balls have been thrownThus the occupancy process will be piecewise constant over intervals of theform [in in+1n) With this scaling of time large deviation asymptoticscan be obtained if we scale space by a factor of 1n Thus we define the ran-dom occupancy process Γn(x)

= (Γn

0 (x) ΓnI (x)Γ

nI+(x)) by letting Γn

i (x)i= 0 I denote 1n times the number of cells with exactly i balls attime x and letting Γn

I+(x) be 1n times the number of cells with more than I

6 P DUPUIS C NUZMAN AND P WHITING

balls at time x Note that Γn takes values in the set of probability vectorson I + 2 points SI

= γ isinR

I+2 γj ge 00le j le I +1 andsumI+1

j=0 γj = 1The process Γn(in) i= 01 is obviously Markovian It will be con-

venient to work with the following ldquodynamical systemrdquo representation

Γn(

i+1

n

)

=Γn(

i

n

)

+1

nbi

(

Γn(

i

n

))

where the independent and identically distributed random vector fields bi(middot)i= 01 have distributions

Pbi(γ) = v=

γj if v = ej+1 minus ej 0le j le IγI+1 if v = 0

Here ej represents the vector in RI+2 whose jth element is unity and for

which all other elements are zero The occupancy after lfloorβnrfloor balls have beenthrown can then be represented by nΓn(β)

As we have discussed the large deviations behavior of Γn(β) will be de-duced from that of the process Γn(middot) In order to state the large deviationasymptotics precisely we should clarify the space in which Γn(middot) takes val-ues and the topology used on that space As usual for processes with jumpsof this sort we use the Skorokhod space D([0 β] RI+2) together with theSkorokhod topology [5] Chapter 4 However the large deviations propertiesof Γn are the same as for the processes Γn where Γn is defined to bethe piecewise linear process which agrees with Γn at times of the form inFor readers unfamiliar with the Skorokhod space and associated topologythe identical large deviations results hold for Γn save that the space ofcontinuous functions and the sup norm topology are used instead

To complete the statement of the LDP for Γn we need some additionalnotation A vector of (deterministic) occupancy rates θ(x) = (θ0(x) θI(x)θI+(x))is a measurable mapping from [0 β] to SI Intuitively these rates repre-sent the rate at which balls flow into urns of a given occupancy levelAssociated with each such vector of rates is the corresponding determin-istic occupancy function γ(x) = (γ0(x) γI(x) γI+(x)) which is definedby the initial condition γ(0) and the differential equations γ0(x) = minusθ0(x)γj(x) = θjminus1(x)minus θj(x) for j = 1 I and γI+(x) = θI(x) These equationsreflect the idea that the fraction of urns containing i balls increases as ballsenter (iminus 1)-occupied urns and decreases as balls enter i-occupied urnsDefining the matrix

M =

minus1 0 0 middot middot middot 01 minus1 0 middot middot middot 00 1 minus1 middot middot middot 0

0 middot middot middot 1 minus1 00 middot middot middot 0 1 0

(21)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 7

we can write γ(x) =Mθ(x) for all x isin [0 β] Conversely given a differentiableoccupancy function γ the corresponding rates are uniquely determinedby γ(x) =Mθ(x) and the normalization

sumIi=0 θi(x) + θI+(x) = 1 We will

also be interested in the cumulative occupancy function ψ with componentsψi(x) =

sumij=0 γi(x) Inspecting the cumulative sums of the rows of M shows

that ψi =minusθi for i= 0 I As more balls are thrown the fraction of urnscontaining i or fewer balls can only decrease and the rate of decrease is therate at which balls enter i-occupied urns We will say that γ is a valid occu-pancy function if γ is absolutely continuous if γ(x) is a probability vectorfor all x isin [0 β] and if its associated θ(x) is a probability vector for almostall x isin [0 β] Note that the functions ψ γ and θ are interchangeable in thesense that each can be derived from any of the others [given γ(0) in the caseof θ] Thus we say that θ and ψ are valid if the associated γ is valid Thefollowing lemma gives a direct characterization of validity in terms of ψ

Lemma 21 A vector of I + 1 continuous functions ψ each of whichmaps [0 β] to [01] is a valid cumulative occupancy path if and only if

(a) ψi(x)ge ψiminus1(x)(b) ψi(x)ge ψi(y)(c)

sumIk=0(ψk(x)minus ψk(y))le y minus x

for each 0le ile I and 0le x lt y le β

Sketch of the proof In the forward direction the first conditionplus the bound ψI(x)le 1 ensure that γ(x) is a probability distribution Thesecond and third conditions together imply that ψ is Lipschitz continuouswith constant 1 and hence absolutely continuous This implies the absolutecontinuity of γ Since ψ =minusθ the third condition ensures that θ is almostalways a probability distribution The reverse direction proceeds similarly

Given γ θ isin SI let D(θγ) denote relative entropy of θ with respect to γThus

D(θγ)=

I+1sum

i=0

θi log(θiγi)

with the understanding that θi log(θiγi)= 0 whenever θi = 0 and that

θi log(θiγi)=infin if θi gt 0 and γi = 0 If γ(x) is a valid occupancy function

with corresponding occupancy rates θ(x) then we set

J(γ) =

int β

0D(θ(x)γ(x))dx(22)

In all other cases set J(γ) =infin

8 P DUPUIS C NUZMAN AND P WHITING

Theorem 22 Suppose that the sequence of initial conditions Γn(0)n = 12 is deterministic and that it converges to α isin SI as nrarr infinThen the sequence of processes Γn n= 12 satisfies the LDP with ratefunction J In other words for any measurable set A of trajectories we havethe large deviation lower bound

lim infnrarrinfin

1

nlogPΓn isinA geminus infJ(γ) γ isinA γ(0) = α

and the large deviation upper bound

lim sup1

nlogPΓn isinA le minus infJ(γ) γ isin A γ(0) = α

and moreover for any compact set of initial conditions K and C ltinfinthe set

γ J(γ)leCγ(0) isinK

is compact

The proof of this result is provided in Section 3 However a formal justi-fication for the form of the rate function is as follows Let δ gt 0 be a smalltime increment Owing to the fact that Γn varies slowly when comparedto bi(γ) one expects

P

(

Γn(δ)minus Γn(0)

δasymp η

∣Γn(0) = γ

)

asymp P

(

1

lfloornδrfloorminus1sum

j=0

bj(γ)asymp η

)

where the symbol ldquoasymprdquo inside the probability means that the indicated quan-tities are within a small fixed constant ε gt 0 of each other Suppose that weinterpret γ as a probability measure on e0 eI+1 and let Yj be in-dependent and identically distributed (iid) with distribution γ Then thesequence of iid random vectors bj(γ) can be realized by setting

bj(γ) =

ei+1 minus ei0

lArrrArr Yj =

eieI+1

that is

bj(γ) =MYj

By Sanovrsquos theorem for any probability vector θ isin SI

P

(

1

lfloornδrfloorminus1sum

j=0

Yj asymp θ

)

asymp expminusnδD(θγ)

Thus

P

(

Γn(δ)minus Γn(0)

δasympMθ

∣Γn(0) = γ

)

asymp expminusnδD(θγ)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 9

Approximating an arbitrary trajectory by a piecewise linear trajectory withnearly equal cost and using the Markov property one expects

P (Γn asymp γ|Γn(0) = α)asymp expminusn

int β

0D(θ(t)γ(t))dt

where θ and γ are related by γ =Mθ γ(0) = α Thus the rate function onpath space should be J(γ)

The zero cost trajectories are of course the law of large numbers limitsand can easily be computed For example if α = (10 ) (so all cells areinitially empty) and i le I then J(γ) = 0 implies that γi(t) = eminusttii Inother words γ(t) is the Poisson distribution with mean t save that all masscorresponding to i gt I is collected together into the state I+1 Throughoutthis paper we will denote the Poisson distribution with mean t by P(t)where Pi(t) = eminusttii

We are primarily interested in the distribution of Γn(β) The contractionprinciple (eg [11] Theorem 132) implies the following variational repre-sentation for the rate function for Γn(β) n= 12 Let A(αωβ) denotethe set of valid occupancy paths γ on [0 β] satisfying γ(0) = α and γ(β) = ω

Corollary 23 Suppose that the sequence of initial conditions Γn(0)n= 12 is deterministic and that it converges to α as nrarrinfin Then thesequence of random vectors Γn(β) n= 12 satisfies the LDP with the con-vex rate function

J (ω) = infJ(γ) γ isinA(αωβ)

Remark 21 Using the explicit formula for J (ω) stated in Theorem 27and the convexity of relative entropy in its first argument it follows that J isin fact strictly convex

Remark 22 It is sometimes useful to show that the large deviationlower bound holds for sets with no interior relative to the ambient spaceSuch bounds can often be proved for processes such as ours that take valuesin a discrete lattice An example would be a set Asub γ isin SI γI+ = 0 forwhich the minimizing trajectories must be polynomial extremals (definedafter Theorem 26) Although we do not need such results in the presentpaper it is worth observing that lower bounds of this kind can be proved

22 Characterization of the terminal rate function and minimizing pathsThe results of this section were obtained using calculus of variations tech-niques the details of which are given in the Appendix The presentationbegins with the case in which all urns are initially empty because it appears

10 P DUPUIS C NUZMAN AND P WHITING

in many applications and because it is a building block for the general caseIn each case first we give a characterization of J (ω) as a minimal relativeentropy which may be computed explicitly in this form We then give anexplicit functional form for the sample path minimizer The function formconstain parameters determined by the solutions to fixed point equations

Before stating these theorems it is helpful to specify the domain overwhich J (ω) is finite We define an endpoint constraint to be a triple (αωβ)where α ω isin SI are the initial and terminal occupancies and where β gt 0is the number of balls thrown per urn We assume without loss that α0 gt 0that is that some fraction of urns are initially empty and we denote theindex set of positive initial occupancies by K= k αk gt 0 Since we do notdistinguish between urns having more than I balls we can always supposethat no urns initially have more than I+1 balls and denote the last elementof α by αI+1 The last element of ω is denoted ωI+ signifying that it collectsall urns with occupancy greater than or equal to I +1

Definition 21 A endpoint constraint (αωβ) is feasible if the corre-sponding set of valid occupancy paths A(αωβ) is nonempty

Lemma 24 An endpoint constraint (αωβ) is feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(23)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

k=0

kαk + β (conservation)(24)

This lemma is proved in the Appendix Condition (23) relates to the factthat the ψi(x) must decrease monotonically and (24) to a conservation con-straint for the number of balls thrown The right-hand side of the inequalityequals the initial number of balls per urn plus the additional balls per urnthrown up to time β while the left-hand side is a lower bound on the numberof balls per urn at time β

In the treatment of general initial conditions the following further defi-nition will be useful

Definition 22 A set of feasible constraints (αωβ) is irreducible ifthe monotonicity conditions (23) hold with strict inequality for all i lt I Otherwise the constraints are termed reducible

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 11

For a reducible set of constraints let i be the first index such thatsumi

k=0αk =sumi

k=0ωk This condition can only be met if no balls are everthrown into i-occupied urns and it follows that urns which initially con-tain i balls or less will always contain i balls or less Thus these urns maybe treated in isolation from the urns containing more than i balls Further-more by subtracting i+ 1 balls from each urn in this latter set we obtainanother occupancy problem in standard form that is with α0 gt 0 Contin-uing in this way a given problem with constraints (αωβ) may be dividedinto a finite number of isolated irreducible subproblems It is only neces-sary therefore to treat problems with irreducible constraints and wherevernecessary we suppose this to be the case

221 Empty initial conditions An important special case of the initialconstraint is when all urns are initially empty that is α0 = 1 and αi = 0 fori gt 0 We abuse notation and denote this case by α= 1

Define the set F (1 ω β) to be the set of distributions π on the nonnegativeintegers satisfying πi = ωi for i= 0 I and the constraint

infinsum

i=0

iπi = β

In the empty case the conditions for feasibility of (1 ω β) reduce tosumI

i=0 iωi+(I+1)ωI+ le β from which it follows that the set F is nonempty if and onlyif (1 ω β) is feasible As we now discuss the infimum of the rate function Jover paths A(1 ω β) can be represented as an infimum of a relative entropyover distributions in F (1 ω β)

Theorem 25 (Terminal rate function empty case) Given the initialcondition α = 1 the rate function J (ω) SI rarr R

+ defined in Corollary 23may be determined as

J (ω) = minπisinF (1ωβ)

D(πP(β))

if (1 ω β) is feasible and is otherwise infinite The minimizing argumentπlowast isin F (1 ω β) is unique

The above expression is of interest in its own right Moreover the optimalsolution πlowast can be computed explicitly Using Lagrange multipliers onecan show in fact that the solution takes the form πlowasti = CPi(ρβ) for i gt Ifor some constants C ge 0 and ρ gt 0 that we refer to as twist parametersHere ρ is related to the Lagrange multiplier for the conservation constraintsuminfin

i=0 iπlowasti = β while C is a normalization constant ensuring that

suminfini=0 π

lowasti = 1

These two constraints may be solved to determine ρ and C If we have the

12 P DUPUIS C NUZMAN AND P WHITING

strict equalitysumI

i=0 iωi + (I + 1)ωI+ = β there are just enough balls to meetthe terminal constraints and we may replace I by I+1 if necessary to ensurethat ωI+ = 1minus

sumIi=0ωi = 0 In this case it turns out that F (1 ω β) has only

one element we then have C = 0 and we may take ρ = 1 Otherwise wedefine ρ to be the unique positive root of the equation

ρβ minussumI

i=0 iPi(ρβ)

1minussumI

i=0Pi(ρβ)=β minus

sumIi=0 iωi

1minussumI

i=0ωi

(25)

Because of the strict inequality in the conservation condition (24) the right-hand side of the last equation is strictly greater than I+1 The left-hand sideof the equation is the conditional mean E[Y |Y gt I] of a Poisson randomvariable Y with mean ρβ As a function of ρ this conditional mean is astrictly monotonic and continuous map from (0infin) to (I+1infin) and hencethe equation has exactly one positive root C is given by

C =1minus

sumIi=0ωi

1minussumI

i=0Pi(ρβ)=

β minussumI

i=0 iωi

ρβ minussumI

i=0 iPi(ρβ)(26)

Evaluating the relative entropy of πlowast and P(β) the rate function of Γn(β)can be expressed as

J (ω) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

(27)

+

(

β minusIsum

i=0

iωi

)

log ρ

The next theorem shows that the least cost path γlowast satisfying J(γlowast) =J (ω) can also be expressed explicitly in terms of the twist parameters Cand ρ The least cost path is of interest for a number of reasons Firstof all the proof of Theorem 25 is obtained by evaluating (22) using theexplicit form of γlowast In a similar way the least cost paths may be used as atool in other problems of interest such as for determining the rate functionof Γn(β) when β is random Second the least cost path provides insightinto the expected behavior of occupancy experiments conditioned on theoccurrence of a rare event Third they allow empirical estimation of rareevent probabilities by change-of-measure importance sampling A key stepin proving the minimality of the proposed least cost path is to show thatthe path satisfies the EulerndashLagrange equations (defined in the Appendix)

Theorem 26 (Globally minimizing path empty case) Suppose that(1ωβ) are feasible constraints with empty initial conditions The infimum of

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 13

J(γ) over A(1 ω β) is achieved on the occupancy path γ isinA(1 ω β) definedby

γ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(28)

γi(x) =xi

i(minus1)iγ

(i)0 (x) 1le ile I(29)

γI+(x) = 1minusIsum

i=0

γi(x)

where ρ gt 0 and C ge 0 are twist parameters associated with the constraintsIn addition γ satisfies the EulerndashLagrange equations

Note that the entire path γ(x) is completely determined by the emptycomponent γ0(x) and its derivatives In particular the components γi(x) are

the terms in the Taylor expansion of γ0(x) about x γ0(x+ y) = γ0(x) + yγ(1)0 (x) + middot middot middot

evaluated at time 0 that is with y = minusx Note that γ0(x) is the sum ofa polynomial and a single exponential term so that the Taylor expansionalways exists When C = 0 there is no exponential term and we say that γ isa polynomial extremal Otherwise C gt 0 we have an exponential extremal

222 General initial conditions Occupancy problems with general ini-tial conditions may be thought of as a coupled set of problems with emptyinitial conditions In particular we consider the set of urns initially contain-ing k balls to form a class The evolution of excess balls (beyond k) enteringurns of this class may be denoted by occupancy functions of the form γkj(t)representing the fraction of balls initially having k urns which have k + jurns at time t The fraction of urns containing i balls in the overall system isobtained by summing contributions from all subproblem components withk+ j = i

As in the empty case the rate function J (ω) can be expressed as thesolution to a minimization problem Let Sinfin denote the set of distributionson the nonnegative integers and let Sn

infin denote the set of n-tuples of suchdistributions Recall that K is set of indices k such that αk gt 0 We will

denote an element of S|K|infin by π = π(0) π(k) π(K) where for any

k isin K a component distribution is denoted π(k) = πk0 πk1 and it isunderstood that the corresponding π(k) is omitted if αk = 0 Finally let

F (αβω) be the set of π isin S|K|infin which satisfies the terminal constraints

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(210)

14 P DUPUIS C NUZMAN AND P WHITING

along with the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(211)

As in the empty case it may be established that F is nonempty if and onlyif (αωβ) is feasible

Theorem 27 (Terminal rate function general case) The rate functionJ (ω) SI rarrR

+ defined in Corollary 23 may be expressed

J (ω) = minπisinF (αωβ)

sum

kisinK

αkD(π(k)P(β))

whenever (αβω) are feasible and is infinite otherwise The minimizingargument πlowast isin F (αωβ) is unique

As discussed above we may suppose the constraints are irreducible andthen as shown in the Appendix Lagrange multipliers will always exist forthis problem When we have strict inequality in the conservation condi-tion (24) (the exponential case) the solution takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le ICkPj(ρβ) k isinK k+ j gt I

In the case of equality in (24) (the polynomial case) the correspondingform is

πlowastkj =

DkPj(β)Wk+j k isinK j + k le I0 k+ j gt I

As for empty initial conditions ρ may be associated with the conserva-tion condition The Wi correspond to the terminal constraints ωi and theCkDk are normalization constants The constants Ck ρ Wi and Di canall be computed numerically using Lagrangian methods for constrained op-timization (see eg [4]) Given these constants the optimizing trajectoryγ(t) may be constructed explicitly This construction may be most simplyexpressed in terms of the minimizing trajectories in the empty case Fork isinK denote the mean of πlowast(k) by βk =

suminfinj=0 jπ

lowastkj and define the terminal

condition ω(k) isin SIminusk by ωkj = πlowastkj for j = 0 Iminusk Then the constraints(1 ω(k) βk) are feasible constraints and Theorem 26 determines the associ-ated least cost paths which we denote γ(k) = γkj(x) The least cost pathsfor the subproblems combine to form the overall least cost path

Theorem 28 (Globally minimizing path general case) For irreducible

feasible constraints (αωβ) let πlowast isin S|K|infin be the unique minimizing distri-

bution in Theorem 27 and let the functions γkj [0 βk]rarr [01] be the min-imizing paths corresponding to the subproblems (1 ω(k) βk) The infimum

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 15

of J(γ) over A(αωβ) is achieved on the occupancy path γ isin A(αωβ)defined by

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

In addition γ satisfies the EulerndashLagrange equations

3 Proof of Theorem 22 The purpose of this section is to prove themain large deviations result We recall the processes and notation definedat the beginning of Section 2

For ζ isinRI+2 and γ isin SI define

H(γ ζ) = log(E[exp(ζ middot bi(γ))])

where ldquomiddotrdquo denotes inner product Since the support of bi(γ) is boundeduniformly in i and γ there exists a function h Rrarr R such that H(γ ζ)leh(|ζ|) for all ζ and γ Also since the distribution of bi(γ) is weakly continuousin γ H(γ ζ) is jointly continuous It follows from [10] Theorem 41 thatthe sequence Γn n= 12 satisfies a large deviation upper bound witha rate function J which we now define Let L be the LegendrendashFencheltransform of H(γ ζ) in ζ

L(γ η) = supζisinRI+2

[ζ middot ηminus H(γ ζ)]

If γ(x)0 le x le β is an absolutely continuous function that takes valuesin SI then

J(γ) =

int β

0L(γ(x) γ(x))dx

If γ is not absolutely continuous then J(γ) = infin ([10] assumes that thevector fields bi(γ) are defined for all γ isin R

I+2 It is easy to check that wecan extend the definition to this set with the bound H(γ ζ) le h(|ζ|) andthe continuity of H(γ ζ) preserved However if the process Γn starts in SI then it stays in SI and so the exact form of the extension has no effect onthe rate function)

We recall the definition of the matrix M given in (21) To complete theproof of the upper bound we must show that J = J where J is defined asin (22) This will hold if we can show that L(γ η) is finite only when η =Mθfor a unique probability vector θ and that in this case

L(γMθ) =D(θγ)

16 P DUPUIS C NUZMAN AND P WHITING

It is well known that L(γ η) is finite if and only if η is in the convexhull of the support of bi(γ) (see eg [11] Lemma 623(d)) Therefore ifL(γ η)ltinfin then η can be written as a convex combination of the form

θ0(e1 minus e0) + middot middot middot+ θI(eI+1 minus eI)

where θj ge 0 andsumI

j=0 θj le 1 Since the vectors (ej+1 minus ej) j = 01 I

are linearly independent these values are unique Setting θI+1 = 1minussumI

j=0 θj we have η =Mθ for a unique probability vector θ Now assume that η takesthis form Then

L(γMθ) = supζisinRI+2

[ζ middotMθminus log(E[exp(ζ middot bi(γ))])]

= supζisinRI+2

[

MT ζ middot θminus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

= supζisinRI+2

[

Isum

j=0

(ζj+1 minus ζj)θj minus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

Given any values micro0 microI+1 we can define ζ0 ζI+1 recursively by ζ0 =minusmicroI+1

and ζj+1minus ζj = microjminusmicroI+1 With these definitions it is apparent that the lastdisplay is equal to

supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1)minus log

([

Isum

j=0

γj exp(microj minus microI+1)

]

+ γI+1

)]

= supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1) + microI+1 minus log

(

I+1sum

j=0

γj expmicroj

)]

= supmicroisinRI+2

[

I+1sum

j=0

microjθj minus log

(

I+1sum

j=0

γj expmicroj

)]

According to the DonskerndashVaradhan variational formula for relative entropy(eg [11] Lemma 143(a)) the last display equals D(θγ) thereby com-pleting the proof of the upper bound

We turn now to the proof of the lower bound In this proof we will as-sume that γ0(0)gt 0 Since γ0 is always nonincreasing γ0(0) = 0 implies thatγ0(x) = 0 for all x and therefore under this condition the first componentplays no significant role A proof analogous to the one given below applieswhen γ0(0) = 0 where the role of γ0(0) here is played by the first positivecomponent of γ(0)

Let PΓn(0) [resp EΓn(0)] denote probability (resp expected value) givena deterministic initial occupancy Γn(0) To prove the large deviation lower

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 17

bound it suffices to show that given any ε gt 0 and δ gt 0 there is η gt 0 suchthat for any initial occupancies satisfying |Γn(0)minus γ(0)|lt η

lim infnrarrinfin

1

nlogPΓn(0)

(

sup0lexleβ

|Γn(x)minus γ(x)|lt δ

)

geminusJ(γ)minus ε(31)

Of course this inequality is trivial if J(γ) = infin and so we assume thatJ(γ)ltinfin

As we have remarked a source of difficulty is the singular behavior ofthe transition rates of the process when Γn is near the boundary of SI We first show that this can be avoided at all times save x= 0 To do thiswe show that for any a gt 0 there exist b gt 0 K isin N and an occupancyfunction y such that y(0) = γ(0) sup0lexleβ |y(x) minus γ(x)| lt a yj(x) gt bxK

for all j = 01 I I+ and 0lt xle β and such that

J(y)le J(γ)

Consider the zero cost trajectory defined by

z(x) =Mz(x) z(0) = γ(0)

We have the following expression for zj when j le I

zj(x) =

[ jsum

k=0

γk(0)x(jminusk)(j minus k)

]

eminusx(32)

It is easy to check from this explicit formula that zj(x) gt bxK for someb gt 0 K = I and all j = 01 I I+ and 0 lt x le β For ρ isin (01) letyρ = ρz + (1minus ρ)γ Then yρ is the occupancy function that corresponds tothe rate ρz + (1minus ρ)θ Using the joint convexity of relative entropy in bothvariables ([11] Lemma 143(b)) and the fact that D(zz) = 0 we have

J(yρ) =

int β

0D(ρz(x) + (1minus ρ)θ(x)ρz(x) + (1minus ρ)γ(x))dx

le ρ

int β

0D(z(x)z(x)) dx+ (1minus ρ)

int β

0D(θ(x)γ(x))dx

= (1minus ρ)J(γ)

All required properties are then obtained by letting y = yρ for suitably smallρ isin (01)

It follows that in proving the lower bound we can assume without loss ofgenerality that for some fixed constants b gt 0 and K isinN γi(x)ge bxK for allx isin [0 β] We now return to the proof of the lower bound Our first objectiveis to show that the process can be moved into a small neighborhood of γ(τ)

18 P DUPUIS C NUZMAN AND P WHITING

(for τ gt 0 small) with sufficiently high probability Given τ isin (0 β] σ gt 0and ε gt 0 define

h(γ) =

0 |γ minus γ(τ)|lt σ22ε else

For n large enough that |γ(lfloornτrfloorn) minus γ(τ)| le σ2 (and independent ofτ isin (0 β]) we have the inequality

PΓn(0)(|Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ) + eminus2nε

ge PΓn(0)(|Γn(lfloornτrfloorn)minus γ(τ)|lt σ2) + eminus2nε

geEΓn(0)(expminusnh(Γn(lfloornτrfloorn)))

We next exploit a representation for exponential integrals that will give usan explicit lower bound on the last quantity Consider a process Γn(x) con-structed as follows The process dynamics are of the same general structureas those of Γn save that bi(Γ

n(in)) is replaced by a sequence bni

Γn(

i+1

n

)

= Γn(

i

n

)

+1

nbni Γn(0) = Γn(0)

Furthermore the distribution of bni is allowed to depend in any measurableway upon the set of values Γn(jn)0le j le i Let microγ denote the distribu-tion of bi(γ) and (without explicitly exhibiting all the dependencies) let micronidenote the (random) distribution of bni given Γn(jn)0 le j le i We letEΓn(0) denote expectation on the space that supports these processes Itfollows from [11] Theorem 431 that

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

= inf EΓn(0)

[

h

(

Γn(

lfloornτrfloor

n

))

+1

n

lfloornτrfloorminus1sum

i=1

D(microni microΓn(in))

]

where the infimum is over all such processes Γn(x) In order to obtain a lowerbound we now simply insert a particular choice for the random variables bni We can write γ(τ) minus γ(0) =Mvτ for some probability vector v Define aprocess bni as follows

bni =

ej minus ejminus1 if

lfloor jminus1sum

k=0

vkτn

rfloor

le ile

lfloor jsum

k=0

vkτn

rfloor

minus 1 for 0le j le I

0 if

lfloor

Isum

k=0

vkτn

rfloor

le ile lfloorτnrfloor minus 1

In other words bni defines a deterministic discrete time approximation tothe continuous time occupancy rate process that uses e0 minus e1 for an amount

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 19

of time v0τ e1 minus e2 for an amount of time v1τ and so on This continuoustime process will move the occupancy process from γ(0) to γ(τ) at time τ If ej minus ejminus1 is used at the discrete time step i then since microni concentratesits mass on ej minus ejminus1 the cost is

D( microni microΓn(in)) = log

(

1

Γnjminus1(in)

)

=minus log

(

Γnjminus1

(

i

n

))

(33)

The process Γn(in) possesses important monotonicity and convergence

properties Since Γnjminus1((i+1)n)minus Γn

jminus1(in) =minus1n for lfloorsumjminus1

k=0 vkτnrfloor le ile

lfloorsumj

k=0 vkτnrfloorminus 1

Γnjminus1

(

i

n

)

darr Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

as i uarr lfloorsumj

k=0 vkτnrfloor In addition because the (j minus 1)st component is never

modified when ige lfloorsumj

k=0 vkτnrfloor it follows that

Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

rarr γjminus1(τ)

as nrarrinfin and ηrarr 0 Furthermore as observed previously (32) implies theexistence of b gt 0 and K isinN such that γj(τ)ge bτK for j = 0 I Thus atany given time step i we have a strictly positive lower bound on the relevantcomponent of Γn which in turn provides a strictly finite upper bound on thecorresponding relative entropy cost Indeed it follows from (33) and γj(τ)gebτK that for all sufficiently large n and small η gt 0 there are C1C2 ltinfin(and independent of τ ) such that whenever |Γn(0)minus γ(0)|lt η for all i

D(microni microΓn(in))leC1[minus log τK ]leminusC2 log τ

In addition as nrarrinfin and ηrarr 0

Γn(

1

nlfloorτnrfloor

)

rarr γ(τ)

By the Lebesgue dominated convergence theorem for all sufficiently large nand sufficiently small η gt 0 |Γn(0)minus γ(0)|lt η implies

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

leminusC2τ log τ

We now choose τ gt 0 so that minusC2τ log τ le ε2 Choosing τ gt 0 smaller ifneed be we can also guarantee that |Γn(x)minusγ(x)| le δ for all x isin [0 τ ] wp1if |Γn(0)minus γ(0)| lt η and η gt 0 is sufficiently small The following bound is

20 P DUPUIS C NUZMAN AND P WHITING

therefore valid for the given τ gt 0 for any σ gt 0 and all sufficiently smallη gt 0 |Γn(0)minus γ(0)|lt η implies

lim infnrarrinfin

1

nlogPΓn(0)

(∣

Γn(

lfloornτrfloor

n

)

minus γ

(

lfloornτrfloor

n

)∣

ltσ

supxisin[0τ ]

|Γn(x)minus γ(x)|lt δ

)

geminusε

2

Note that the asymptotic lower bound on the normalized log of the proba-bility is independent of σ gt 0 To obtain the lower bound for all x isin [0 β]we will use the Markov property and an existing lower bound for pathswhich avoid that boundary This latter lower bound will hold uniformly ina neighborhood of the initial condition γ(τ) Since we do not know a priorihow small this neighborhood must be it is important that the lower boundin the last display should be independent of σ gt 0

Now choose ζ isin (0 δ] such that γ(x) is at least distance 2ζ from theboundary of SI for all x isin [τ β] Recall that when considered as a func-tion of γ the distribution of bi(γ) is continuous in the weak topology andmoreover that the support of this distribution is independent of γ so longas γi gt 0 for all i isin 01 I + 1 (ie γ isin S

I where SI denotes interior

relative to the smallest affine space that contains SI) It then follows fromProposition 661 of [11] (see also the discussion on [11] page 165 regardinguniformity) that Γn n= 12 satisfies the following uniform large devi-ations lower bound given any ε gt 0 and ζ gt 0 defined above there is σ gt 0such that as long as |Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ

lim infnrarrinfin

1

nlogPΓn(lfloornτrfloorn)lfloornτrfloorn

(

supτlexleβ

|Γn(x)minus γ(x)|lt ζ

)

geminusJ(γ)minusε

2

where PΓn(lfloornτrfloorn)lfloornτrfloorn denotes probability given the occupancy levels Γn(lfloornτrfloorn)at time lfloornτrfloorn Proposition 661 of [11] assumes Condition 632 It is worthnoting that in the present setting this condition holds with the particularlysimple choice β = γ (using the notation of [11])

The lower bound (31) now follows by the Markov property and the lasttwo displays The proof that J has compact level sets is as in [10] andtherefore omitted

4 Examples and extensions In this section we apply the results of theprevious sections to three different occupancy problems We show how theparameters of interest may be computed numerically and plot the solutionsto the associated calculus of variations problems In Section 44 we list someother asymptotic problems of interest which can be solved by relativelystraightforward generalizations of the results presented in this paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 21

In the calculus of variations problems solved in Section 2 precise initialand terminal points were always given In typical applications one is inter-ested in the minimum value of the rate function over a constraint set Whenthe constraint set is sufficiently simple such problems may still be solvedeasily using the tools provided in Section 22 The three problems of thissection are of this type

Suppose that the event of interest is that the random endpoint Γn(β)should lie in a terminal constraint set Ω To apply the LDP for Γn(β)established by Corollary 23 one must compute exponents of the form

J (Ω) = infωisinΩ

J (ω)

Using Theorem 27 we can write

J (Ω) = infωisinΩ

infπisinF (αωβ)

Ksum

k=0

αkD(π(k)P(β))

(41)

= infπisinF (αΩβ)

Ksum

k=0

αkD(π(k)P(β))

where we have abused notation to define F (αΩ β) =⋃

ωisinΩF (αωβ)In many cases for example when the terminal set Ω is convex and defined

by linear constraints the exponent J can be computed directly from (41)using Lagrange multipliers That is one solves minimization problems of thetype given in Theorem 27 but with the endpoint constraints (210) replacedby constraints defining Ω This is the approach used in the second andthird example below We do not prove that appropriate Lagrange multipliersalways exist if needed existence may be established using methods similarto those used in the Appendix for the case Ω= ω Because of the convexityof relative entropy in (41) a local minimum is always a global minimum overconvex sets Hence in any particular scenario with convex Ω it is sufficientto establish a local minimum by numerically computing a set of Lagrangemultipliers

An alternative approach for computing J (Ω) could be to return to thesample path level and use natural boundary conditions on the extremalcurves (see [25])

A set Ω with interior Ω and closure Ω is a J -continuity set if and onlyif infωisinΩ J (ω) = infωisinΩJ (ω) For such sets the large deviations lower andupper bounds coincide so that minus lim1n logP (Γn(β) isinΩ) = infωisinΩJ (ω) Itmay readily be verified in each of the following examples that the event ofinterest is a J -continuity set

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 3: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 3

focuses on finite-dimensional occupancy measures in which urn occupanciesabove a given level are not distinguished In this finite case we are able toprovide a concise large deviations proof along with explicit insightful andcomputable expressions for the rate functions and for the large deviationsextremals The rate function for the occupancy model after all the balls havebeen thrown is shown to have a simple and fairly explicit rate function whichcan be defined in terms of relative entropy with respect to the Poisson dis-tribution Many different problems can be solved in this framework simplyby changing the set over which the rate function is minimized We also givesample path results for the evolution of the urn occupancies toward a partic-ular event In principle the explicit form of the minimizing trajectories forsample path results should enable accurate empirical estimates of unlikelyevents using importance sampling see [7 22]

Let lfloorarfloor denote the integer part of a scalar a With n cells available anda total of r = lfloorβnrfloor balls to be thrown (with β gt 0) we consider the largedeviation asymptotics as nrarr infin The precise statement is as follows Fixa positive integer I Then with Γn

i (β) equal to the fraction of cells contain-ing i balls we characterize the large deviation asymptotics of the randomvectors (Γn

0 (β)Γn1 (β) Γ

nI (β)) n= 12 as nrarrinfin A direct analysis

of this problem is difficult and in fact it turns out to be simpler to firstldquoliftrdquo the problem to the level of a sample path large deviation problem andthen use the contraction principle to reduce to the original finite-dimensionalproblem A ldquotimerdquo variable x is introduced into the problem where lfloornxrfloorballs have been thrown at time x and Γn

i (x) is equal to the fraction of cellscontaining i balls at this time We then follow a standard program the largedeviation properties of this Markov process are analyzed the rate function Jon path space is obtained and the rate function for the occupancy at time βis then characterized as the solution to a variational problem involving J

Although the program is standard there are several very interestingfeatures both qualitative and technical which distinguish this large de-viation problem We first describe some of the attractive qualitative fea-tures Typically one has a rate function on path space of the form J(φ) =int

L(φ(x) φ(x))dx where the nonnegative function L(γ ξ) is jointly lowersemi-continuous and convex in ξ for each fixed γ The large deviation prop-erties of the process at time β are then found by solving a variational problemof the form

infJ(φ) φ(β) = ω

where ω is given and where there will also be constraints on the initial con-dition φ(0) In general this problem does not have an explicit closed formsolution One exception to this rule is the extraordinarily simple situationwhere L(γ ξ) does not depend on the state variable γ In this case Jensenrsquos

4 P DUPUIS C NUZMAN AND P WHITING

inequality implies that the minimizing trajectory is a straight line and sothe variational problem is actually finite dimensional Another exception isthe case of large deviation asymptotics of a small noise linear stochastic dif-ferential equation In this case the variational problem takes the form of theclassical linear quadratic regulator and the explicit solution is well knownfrom the theory of deterministic optimal control However in this case thereis no need to ldquoliftrdquo the problem to the sample path level since the distribu-tion of the diffusion at any time is Gaussian with explicitly calculable meanand covariance Other exceptions occur in large deviation problems fromqueuing theory but in these problems the variational integrand is ldquosection-allyrdquo independent of γ and one can show that the minimizing trajectories arethe concatenation of a finite number of straight line segments (see eg [1]and the references therein)

For the occupancy problem the variational integrand has a complicatedstate dependence [see (22)] reflecting the complicated dependence of thetransition probabilities (in the process level version) on the state Nonethe-less the rate function possesses a great deal of structure that can be heavilyexploited For example the function J turns out to be strictly convex onpath space and so all local minimizers in the variational problem are actu-ally global minimizers Perhaps even more surprising is the fact that explicitsolutions to the EulerndashLagrange equations can be constructed and as a con-sequence the variational problem can be more-or-less solved explicitly (seethe Appendix) Both of these properties follow from the fact that the vari-ational integrand L can be defined in terms of the famous relative entropyfunction (or divergence)

A technical novelty of the problem is the singular behavior of the tran-sition probabilities of the underlying Markov process Since only cells con-taining j balls can become cells with j + 1 it is clear that the transitionprobability corresponding to such an event scales linearly with Γn

j (x) andin particular that it vanishes at the boundary of the state space whenΓnj (x) = 0 This poses no difficulty for the large deviations upper bound but

is an obstacle for the lower bound (Some results that address lower boundswhen rates go to zero include Chapter 8 of [26] which treats processes withldquoflatrdquo boundaries and recent general results in [27]) For the occupancymodel existing results provide a lower bound for open sets of trajectoriesthat do not touch the boundary because away from the boundaries the setξ L(Γ ξ)ltinfin is independent of Γ To deal with more general open setswe use a perturbation argument We first show using the strict convexityof J and properties of the zero cost paths that it is enough to prove large de-viation lower bounds for open neighborhoods of trajectories that stay awayfrom the boundary for all positive times (Note that this still allows thetrajectory to start on the boundary) Loosely speaking to prove the lowerbound for sets of this type it is enough to show that given a gt 0 there is

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 5

b gt 0 such that the probability that the process is at least distance b fromthe boundary by time b is bounded below by expminusna It turns out that thesebounds can be easily established by exploiting an explicit representation forsuch probabilities that was proved in [11]

An outline of the paper is as follows Section 2 states the main resultsof the paper In the first part of Section 2 we construct the underlyingstochastic process model and state the corresponding sample path levelLDP as well as the LDP for the terminal distribution The proof of thesample path LDP is given in the following section although in Section 2an additional heuristic argument for the form of the local rate functionbased on Sanovrsquos theorem is provided In the second part of Section 2 theterminal distribution rate function is characterized and explicit expressionsfor the sample path minimizers are presented These constitute more-or-lesscomplete solutions to the corresponding calculus of variations problem Thedetailed proofs of these latter results are deferred to the Appendix

Section 3 gives a proof of the sample path LDP The solutions to thevariational problem are exemplified in Section 4 where we show how specificquestions regarding the occupancy problem can be answered In particularwe work out the asymptotics for a number of examples including an overflowproblem and a partial coupon collection problem Generalizations to ourresults are also described

2 Main results

21 Statement of the LDP In this section we formulate the problem ofinterest and state the LDP The proof will be given in Section 3 As notedin the Introduction our focus is the asymptotic behavior of the occupancyproblem If n denotes the total number of cells then to have a nontriviallimit the number of balls placed in the cells should scale linearly with nWe will place lfloorβnrfloor balls in the cells where β isin (0infin) is a fixed parameterand lfloorarfloor denotes the integer part of a

As will be seen in the sequel the large deviation asymptotics of the oc-cupancy should be treated by first lifting the problem to the level of samplepath large deviations and then using the contraction principle to reduce tothe original problem To do this we introduce a ldquotimerdquo variable x that rangesfrom 0 to β At time x one should imagine that lfloornxrfloor balls have been thrownThus the occupancy process will be piecewise constant over intervals of theform [in in+1n) With this scaling of time large deviation asymptoticscan be obtained if we scale space by a factor of 1n Thus we define the ran-dom occupancy process Γn(x)

= (Γn

0 (x) ΓnI (x)Γ

nI+(x)) by letting Γn

i (x)i= 0 I denote 1n times the number of cells with exactly i balls attime x and letting Γn

I+(x) be 1n times the number of cells with more than I

6 P DUPUIS C NUZMAN AND P WHITING

balls at time x Note that Γn takes values in the set of probability vectorson I + 2 points SI

= γ isinR

I+2 γj ge 00le j le I +1 andsumI+1

j=0 γj = 1The process Γn(in) i= 01 is obviously Markovian It will be con-

venient to work with the following ldquodynamical systemrdquo representation

Γn(

i+1

n

)

=Γn(

i

n

)

+1

nbi

(

Γn(

i

n

))

where the independent and identically distributed random vector fields bi(middot)i= 01 have distributions

Pbi(γ) = v=

γj if v = ej+1 minus ej 0le j le IγI+1 if v = 0

Here ej represents the vector in RI+2 whose jth element is unity and for

which all other elements are zero The occupancy after lfloorβnrfloor balls have beenthrown can then be represented by nΓn(β)

As we have discussed the large deviations behavior of Γn(β) will be de-duced from that of the process Γn(middot) In order to state the large deviationasymptotics precisely we should clarify the space in which Γn(middot) takes val-ues and the topology used on that space As usual for processes with jumpsof this sort we use the Skorokhod space D([0 β] RI+2) together with theSkorokhod topology [5] Chapter 4 However the large deviations propertiesof Γn are the same as for the processes Γn where Γn is defined to bethe piecewise linear process which agrees with Γn at times of the form inFor readers unfamiliar with the Skorokhod space and associated topologythe identical large deviations results hold for Γn save that the space ofcontinuous functions and the sup norm topology are used instead

To complete the statement of the LDP for Γn we need some additionalnotation A vector of (deterministic) occupancy rates θ(x) = (θ0(x) θI(x)θI+(x))is a measurable mapping from [0 β] to SI Intuitively these rates repre-sent the rate at which balls flow into urns of a given occupancy levelAssociated with each such vector of rates is the corresponding determin-istic occupancy function γ(x) = (γ0(x) γI(x) γI+(x)) which is definedby the initial condition γ(0) and the differential equations γ0(x) = minusθ0(x)γj(x) = θjminus1(x)minus θj(x) for j = 1 I and γI+(x) = θI(x) These equationsreflect the idea that the fraction of urns containing i balls increases as ballsenter (iminus 1)-occupied urns and decreases as balls enter i-occupied urnsDefining the matrix

M =

minus1 0 0 middot middot middot 01 minus1 0 middot middot middot 00 1 minus1 middot middot middot 0

0 middot middot middot 1 minus1 00 middot middot middot 0 1 0

(21)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 7

we can write γ(x) =Mθ(x) for all x isin [0 β] Conversely given a differentiableoccupancy function γ the corresponding rates are uniquely determinedby γ(x) =Mθ(x) and the normalization

sumIi=0 θi(x) + θI+(x) = 1 We will

also be interested in the cumulative occupancy function ψ with componentsψi(x) =

sumij=0 γi(x) Inspecting the cumulative sums of the rows of M shows

that ψi =minusθi for i= 0 I As more balls are thrown the fraction of urnscontaining i or fewer balls can only decrease and the rate of decrease is therate at which balls enter i-occupied urns We will say that γ is a valid occu-pancy function if γ is absolutely continuous if γ(x) is a probability vectorfor all x isin [0 β] and if its associated θ(x) is a probability vector for almostall x isin [0 β] Note that the functions ψ γ and θ are interchangeable in thesense that each can be derived from any of the others [given γ(0) in the caseof θ] Thus we say that θ and ψ are valid if the associated γ is valid Thefollowing lemma gives a direct characterization of validity in terms of ψ

Lemma 21 A vector of I + 1 continuous functions ψ each of whichmaps [0 β] to [01] is a valid cumulative occupancy path if and only if

(a) ψi(x)ge ψiminus1(x)(b) ψi(x)ge ψi(y)(c)

sumIk=0(ψk(x)minus ψk(y))le y minus x

for each 0le ile I and 0le x lt y le β

Sketch of the proof In the forward direction the first conditionplus the bound ψI(x)le 1 ensure that γ(x) is a probability distribution Thesecond and third conditions together imply that ψ is Lipschitz continuouswith constant 1 and hence absolutely continuous This implies the absolutecontinuity of γ Since ψ =minusθ the third condition ensures that θ is almostalways a probability distribution The reverse direction proceeds similarly

Given γ θ isin SI let D(θγ) denote relative entropy of θ with respect to γThus

D(θγ)=

I+1sum

i=0

θi log(θiγi)

with the understanding that θi log(θiγi)= 0 whenever θi = 0 and that

θi log(θiγi)=infin if θi gt 0 and γi = 0 If γ(x) is a valid occupancy function

with corresponding occupancy rates θ(x) then we set

J(γ) =

int β

0D(θ(x)γ(x))dx(22)

In all other cases set J(γ) =infin

8 P DUPUIS C NUZMAN AND P WHITING

Theorem 22 Suppose that the sequence of initial conditions Γn(0)n = 12 is deterministic and that it converges to α isin SI as nrarr infinThen the sequence of processes Γn n= 12 satisfies the LDP with ratefunction J In other words for any measurable set A of trajectories we havethe large deviation lower bound

lim infnrarrinfin

1

nlogPΓn isinA geminus infJ(γ) γ isinA γ(0) = α

and the large deviation upper bound

lim sup1

nlogPΓn isinA le minus infJ(γ) γ isin A γ(0) = α

and moreover for any compact set of initial conditions K and C ltinfinthe set

γ J(γ)leCγ(0) isinK

is compact

The proof of this result is provided in Section 3 However a formal justi-fication for the form of the rate function is as follows Let δ gt 0 be a smalltime increment Owing to the fact that Γn varies slowly when comparedto bi(γ) one expects

P

(

Γn(δ)minus Γn(0)

δasymp η

∣Γn(0) = γ

)

asymp P

(

1

lfloornδrfloorminus1sum

j=0

bj(γ)asymp η

)

where the symbol ldquoasymprdquo inside the probability means that the indicated quan-tities are within a small fixed constant ε gt 0 of each other Suppose that weinterpret γ as a probability measure on e0 eI+1 and let Yj be in-dependent and identically distributed (iid) with distribution γ Then thesequence of iid random vectors bj(γ) can be realized by setting

bj(γ) =

ei+1 minus ei0

lArrrArr Yj =

eieI+1

that is

bj(γ) =MYj

By Sanovrsquos theorem for any probability vector θ isin SI

P

(

1

lfloornδrfloorminus1sum

j=0

Yj asymp θ

)

asymp expminusnδD(θγ)

Thus

P

(

Γn(δ)minus Γn(0)

δasympMθ

∣Γn(0) = γ

)

asymp expminusnδD(θγ)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 9

Approximating an arbitrary trajectory by a piecewise linear trajectory withnearly equal cost and using the Markov property one expects

P (Γn asymp γ|Γn(0) = α)asymp expminusn

int β

0D(θ(t)γ(t))dt

where θ and γ are related by γ =Mθ γ(0) = α Thus the rate function onpath space should be J(γ)

The zero cost trajectories are of course the law of large numbers limitsand can easily be computed For example if α = (10 ) (so all cells areinitially empty) and i le I then J(γ) = 0 implies that γi(t) = eminusttii Inother words γ(t) is the Poisson distribution with mean t save that all masscorresponding to i gt I is collected together into the state I+1 Throughoutthis paper we will denote the Poisson distribution with mean t by P(t)where Pi(t) = eminusttii

We are primarily interested in the distribution of Γn(β) The contractionprinciple (eg [11] Theorem 132) implies the following variational repre-sentation for the rate function for Γn(β) n= 12 Let A(αωβ) denotethe set of valid occupancy paths γ on [0 β] satisfying γ(0) = α and γ(β) = ω

Corollary 23 Suppose that the sequence of initial conditions Γn(0)n= 12 is deterministic and that it converges to α as nrarrinfin Then thesequence of random vectors Γn(β) n= 12 satisfies the LDP with the con-vex rate function

J (ω) = infJ(γ) γ isinA(αωβ)

Remark 21 Using the explicit formula for J (ω) stated in Theorem 27and the convexity of relative entropy in its first argument it follows that J isin fact strictly convex

Remark 22 It is sometimes useful to show that the large deviationlower bound holds for sets with no interior relative to the ambient spaceSuch bounds can often be proved for processes such as ours that take valuesin a discrete lattice An example would be a set Asub γ isin SI γI+ = 0 forwhich the minimizing trajectories must be polynomial extremals (definedafter Theorem 26) Although we do not need such results in the presentpaper it is worth observing that lower bounds of this kind can be proved

22 Characterization of the terminal rate function and minimizing pathsThe results of this section were obtained using calculus of variations tech-niques the details of which are given in the Appendix The presentationbegins with the case in which all urns are initially empty because it appears

10 P DUPUIS C NUZMAN AND P WHITING

in many applications and because it is a building block for the general caseIn each case first we give a characterization of J (ω) as a minimal relativeentropy which may be computed explicitly in this form We then give anexplicit functional form for the sample path minimizer The function formconstain parameters determined by the solutions to fixed point equations

Before stating these theorems it is helpful to specify the domain overwhich J (ω) is finite We define an endpoint constraint to be a triple (αωβ)where α ω isin SI are the initial and terminal occupancies and where β gt 0is the number of balls thrown per urn We assume without loss that α0 gt 0that is that some fraction of urns are initially empty and we denote theindex set of positive initial occupancies by K= k αk gt 0 Since we do notdistinguish between urns having more than I balls we can always supposethat no urns initially have more than I+1 balls and denote the last elementof α by αI+1 The last element of ω is denoted ωI+ signifying that it collectsall urns with occupancy greater than or equal to I +1

Definition 21 A endpoint constraint (αωβ) is feasible if the corre-sponding set of valid occupancy paths A(αωβ) is nonempty

Lemma 24 An endpoint constraint (αωβ) is feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(23)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

k=0

kαk + β (conservation)(24)

This lemma is proved in the Appendix Condition (23) relates to the factthat the ψi(x) must decrease monotonically and (24) to a conservation con-straint for the number of balls thrown The right-hand side of the inequalityequals the initial number of balls per urn plus the additional balls per urnthrown up to time β while the left-hand side is a lower bound on the numberof balls per urn at time β

In the treatment of general initial conditions the following further defi-nition will be useful

Definition 22 A set of feasible constraints (αωβ) is irreducible ifthe monotonicity conditions (23) hold with strict inequality for all i lt I Otherwise the constraints are termed reducible

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 11

For a reducible set of constraints let i be the first index such thatsumi

k=0αk =sumi

k=0ωk This condition can only be met if no balls are everthrown into i-occupied urns and it follows that urns which initially con-tain i balls or less will always contain i balls or less Thus these urns maybe treated in isolation from the urns containing more than i balls Further-more by subtracting i+ 1 balls from each urn in this latter set we obtainanother occupancy problem in standard form that is with α0 gt 0 Contin-uing in this way a given problem with constraints (αωβ) may be dividedinto a finite number of isolated irreducible subproblems It is only neces-sary therefore to treat problems with irreducible constraints and wherevernecessary we suppose this to be the case

221 Empty initial conditions An important special case of the initialconstraint is when all urns are initially empty that is α0 = 1 and αi = 0 fori gt 0 We abuse notation and denote this case by α= 1

Define the set F (1 ω β) to be the set of distributions π on the nonnegativeintegers satisfying πi = ωi for i= 0 I and the constraint

infinsum

i=0

iπi = β

In the empty case the conditions for feasibility of (1 ω β) reduce tosumI

i=0 iωi+(I+1)ωI+ le β from which it follows that the set F is nonempty if and onlyif (1 ω β) is feasible As we now discuss the infimum of the rate function Jover paths A(1 ω β) can be represented as an infimum of a relative entropyover distributions in F (1 ω β)

Theorem 25 (Terminal rate function empty case) Given the initialcondition α = 1 the rate function J (ω) SI rarr R

+ defined in Corollary 23may be determined as

J (ω) = minπisinF (1ωβ)

D(πP(β))

if (1 ω β) is feasible and is otherwise infinite The minimizing argumentπlowast isin F (1 ω β) is unique

The above expression is of interest in its own right Moreover the optimalsolution πlowast can be computed explicitly Using Lagrange multipliers onecan show in fact that the solution takes the form πlowasti = CPi(ρβ) for i gt Ifor some constants C ge 0 and ρ gt 0 that we refer to as twist parametersHere ρ is related to the Lagrange multiplier for the conservation constraintsuminfin

i=0 iπlowasti = β while C is a normalization constant ensuring that

suminfini=0 π

lowasti = 1

These two constraints may be solved to determine ρ and C If we have the

12 P DUPUIS C NUZMAN AND P WHITING

strict equalitysumI

i=0 iωi + (I + 1)ωI+ = β there are just enough balls to meetthe terminal constraints and we may replace I by I+1 if necessary to ensurethat ωI+ = 1minus

sumIi=0ωi = 0 In this case it turns out that F (1 ω β) has only

one element we then have C = 0 and we may take ρ = 1 Otherwise wedefine ρ to be the unique positive root of the equation

ρβ minussumI

i=0 iPi(ρβ)

1minussumI

i=0Pi(ρβ)=β minus

sumIi=0 iωi

1minussumI

i=0ωi

(25)

Because of the strict inequality in the conservation condition (24) the right-hand side of the last equation is strictly greater than I+1 The left-hand sideof the equation is the conditional mean E[Y |Y gt I] of a Poisson randomvariable Y with mean ρβ As a function of ρ this conditional mean is astrictly monotonic and continuous map from (0infin) to (I+1infin) and hencethe equation has exactly one positive root C is given by

C =1minus

sumIi=0ωi

1minussumI

i=0Pi(ρβ)=

β minussumI

i=0 iωi

ρβ minussumI

i=0 iPi(ρβ)(26)

Evaluating the relative entropy of πlowast and P(β) the rate function of Γn(β)can be expressed as

J (ω) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

(27)

+

(

β minusIsum

i=0

iωi

)

log ρ

The next theorem shows that the least cost path γlowast satisfying J(γlowast) =J (ω) can also be expressed explicitly in terms of the twist parameters Cand ρ The least cost path is of interest for a number of reasons Firstof all the proof of Theorem 25 is obtained by evaluating (22) using theexplicit form of γlowast In a similar way the least cost paths may be used as atool in other problems of interest such as for determining the rate functionof Γn(β) when β is random Second the least cost path provides insightinto the expected behavior of occupancy experiments conditioned on theoccurrence of a rare event Third they allow empirical estimation of rareevent probabilities by change-of-measure importance sampling A key stepin proving the minimality of the proposed least cost path is to show thatthe path satisfies the EulerndashLagrange equations (defined in the Appendix)

Theorem 26 (Globally minimizing path empty case) Suppose that(1ωβ) are feasible constraints with empty initial conditions The infimum of

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 13

J(γ) over A(1 ω β) is achieved on the occupancy path γ isinA(1 ω β) definedby

γ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(28)

γi(x) =xi

i(minus1)iγ

(i)0 (x) 1le ile I(29)

γI+(x) = 1minusIsum

i=0

γi(x)

where ρ gt 0 and C ge 0 are twist parameters associated with the constraintsIn addition γ satisfies the EulerndashLagrange equations

Note that the entire path γ(x) is completely determined by the emptycomponent γ0(x) and its derivatives In particular the components γi(x) are

the terms in the Taylor expansion of γ0(x) about x γ0(x+ y) = γ0(x) + yγ(1)0 (x) + middot middot middot

evaluated at time 0 that is with y = minusx Note that γ0(x) is the sum ofa polynomial and a single exponential term so that the Taylor expansionalways exists When C = 0 there is no exponential term and we say that γ isa polynomial extremal Otherwise C gt 0 we have an exponential extremal

222 General initial conditions Occupancy problems with general ini-tial conditions may be thought of as a coupled set of problems with emptyinitial conditions In particular we consider the set of urns initially contain-ing k balls to form a class The evolution of excess balls (beyond k) enteringurns of this class may be denoted by occupancy functions of the form γkj(t)representing the fraction of balls initially having k urns which have k + jurns at time t The fraction of urns containing i balls in the overall system isobtained by summing contributions from all subproblem components withk+ j = i

As in the empty case the rate function J (ω) can be expressed as thesolution to a minimization problem Let Sinfin denote the set of distributionson the nonnegative integers and let Sn

infin denote the set of n-tuples of suchdistributions Recall that K is set of indices k such that αk gt 0 We will

denote an element of S|K|infin by π = π(0) π(k) π(K) where for any

k isin K a component distribution is denoted π(k) = πk0 πk1 and it isunderstood that the corresponding π(k) is omitted if αk = 0 Finally let

F (αβω) be the set of π isin S|K|infin which satisfies the terminal constraints

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(210)

14 P DUPUIS C NUZMAN AND P WHITING

along with the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(211)

As in the empty case it may be established that F is nonempty if and onlyif (αωβ) is feasible

Theorem 27 (Terminal rate function general case) The rate functionJ (ω) SI rarrR

+ defined in Corollary 23 may be expressed

J (ω) = minπisinF (αωβ)

sum

kisinK

αkD(π(k)P(β))

whenever (αβω) are feasible and is infinite otherwise The minimizingargument πlowast isin F (αωβ) is unique

As discussed above we may suppose the constraints are irreducible andthen as shown in the Appendix Lagrange multipliers will always exist forthis problem When we have strict inequality in the conservation condi-tion (24) (the exponential case) the solution takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le ICkPj(ρβ) k isinK k+ j gt I

In the case of equality in (24) (the polynomial case) the correspondingform is

πlowastkj =

DkPj(β)Wk+j k isinK j + k le I0 k+ j gt I

As for empty initial conditions ρ may be associated with the conserva-tion condition The Wi correspond to the terminal constraints ωi and theCkDk are normalization constants The constants Ck ρ Wi and Di canall be computed numerically using Lagrangian methods for constrained op-timization (see eg [4]) Given these constants the optimizing trajectoryγ(t) may be constructed explicitly This construction may be most simplyexpressed in terms of the minimizing trajectories in the empty case Fork isinK denote the mean of πlowast(k) by βk =

suminfinj=0 jπ

lowastkj and define the terminal

condition ω(k) isin SIminusk by ωkj = πlowastkj for j = 0 Iminusk Then the constraints(1 ω(k) βk) are feasible constraints and Theorem 26 determines the associ-ated least cost paths which we denote γ(k) = γkj(x) The least cost pathsfor the subproblems combine to form the overall least cost path

Theorem 28 (Globally minimizing path general case) For irreducible

feasible constraints (αωβ) let πlowast isin S|K|infin be the unique minimizing distri-

bution in Theorem 27 and let the functions γkj [0 βk]rarr [01] be the min-imizing paths corresponding to the subproblems (1 ω(k) βk) The infimum

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 15

of J(γ) over A(αωβ) is achieved on the occupancy path γ isin A(αωβ)defined by

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

In addition γ satisfies the EulerndashLagrange equations

3 Proof of Theorem 22 The purpose of this section is to prove themain large deviations result We recall the processes and notation definedat the beginning of Section 2

For ζ isinRI+2 and γ isin SI define

H(γ ζ) = log(E[exp(ζ middot bi(γ))])

where ldquomiddotrdquo denotes inner product Since the support of bi(γ) is boundeduniformly in i and γ there exists a function h Rrarr R such that H(γ ζ)leh(|ζ|) for all ζ and γ Also since the distribution of bi(γ) is weakly continuousin γ H(γ ζ) is jointly continuous It follows from [10] Theorem 41 thatthe sequence Γn n= 12 satisfies a large deviation upper bound witha rate function J which we now define Let L be the LegendrendashFencheltransform of H(γ ζ) in ζ

L(γ η) = supζisinRI+2

[ζ middot ηminus H(γ ζ)]

If γ(x)0 le x le β is an absolutely continuous function that takes valuesin SI then

J(γ) =

int β

0L(γ(x) γ(x))dx

If γ is not absolutely continuous then J(γ) = infin ([10] assumes that thevector fields bi(γ) are defined for all γ isin R

I+2 It is easy to check that wecan extend the definition to this set with the bound H(γ ζ) le h(|ζ|) andthe continuity of H(γ ζ) preserved However if the process Γn starts in SI then it stays in SI and so the exact form of the extension has no effect onthe rate function)

We recall the definition of the matrix M given in (21) To complete theproof of the upper bound we must show that J = J where J is defined asin (22) This will hold if we can show that L(γ η) is finite only when η =Mθfor a unique probability vector θ and that in this case

L(γMθ) =D(θγ)

16 P DUPUIS C NUZMAN AND P WHITING

It is well known that L(γ η) is finite if and only if η is in the convexhull of the support of bi(γ) (see eg [11] Lemma 623(d)) Therefore ifL(γ η)ltinfin then η can be written as a convex combination of the form

θ0(e1 minus e0) + middot middot middot+ θI(eI+1 minus eI)

where θj ge 0 andsumI

j=0 θj le 1 Since the vectors (ej+1 minus ej) j = 01 I

are linearly independent these values are unique Setting θI+1 = 1minussumI

j=0 θj we have η =Mθ for a unique probability vector θ Now assume that η takesthis form Then

L(γMθ) = supζisinRI+2

[ζ middotMθminus log(E[exp(ζ middot bi(γ))])]

= supζisinRI+2

[

MT ζ middot θminus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

= supζisinRI+2

[

Isum

j=0

(ζj+1 minus ζj)θj minus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

Given any values micro0 microI+1 we can define ζ0 ζI+1 recursively by ζ0 =minusmicroI+1

and ζj+1minus ζj = microjminusmicroI+1 With these definitions it is apparent that the lastdisplay is equal to

supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1)minus log

([

Isum

j=0

γj exp(microj minus microI+1)

]

+ γI+1

)]

= supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1) + microI+1 minus log

(

I+1sum

j=0

γj expmicroj

)]

= supmicroisinRI+2

[

I+1sum

j=0

microjθj minus log

(

I+1sum

j=0

γj expmicroj

)]

According to the DonskerndashVaradhan variational formula for relative entropy(eg [11] Lemma 143(a)) the last display equals D(θγ) thereby com-pleting the proof of the upper bound

We turn now to the proof of the lower bound In this proof we will as-sume that γ0(0)gt 0 Since γ0 is always nonincreasing γ0(0) = 0 implies thatγ0(x) = 0 for all x and therefore under this condition the first componentplays no significant role A proof analogous to the one given below applieswhen γ0(0) = 0 where the role of γ0(0) here is played by the first positivecomponent of γ(0)

Let PΓn(0) [resp EΓn(0)] denote probability (resp expected value) givena deterministic initial occupancy Γn(0) To prove the large deviation lower

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 17

bound it suffices to show that given any ε gt 0 and δ gt 0 there is η gt 0 suchthat for any initial occupancies satisfying |Γn(0)minus γ(0)|lt η

lim infnrarrinfin

1

nlogPΓn(0)

(

sup0lexleβ

|Γn(x)minus γ(x)|lt δ

)

geminusJ(γ)minus ε(31)

Of course this inequality is trivial if J(γ) = infin and so we assume thatJ(γ)ltinfin

As we have remarked a source of difficulty is the singular behavior ofthe transition rates of the process when Γn is near the boundary of SI We first show that this can be avoided at all times save x= 0 To do thiswe show that for any a gt 0 there exist b gt 0 K isin N and an occupancyfunction y such that y(0) = γ(0) sup0lexleβ |y(x) minus γ(x)| lt a yj(x) gt bxK

for all j = 01 I I+ and 0lt xle β and such that

J(y)le J(γ)

Consider the zero cost trajectory defined by

z(x) =Mz(x) z(0) = γ(0)

We have the following expression for zj when j le I

zj(x) =

[ jsum

k=0

γk(0)x(jminusk)(j minus k)

]

eminusx(32)

It is easy to check from this explicit formula that zj(x) gt bxK for someb gt 0 K = I and all j = 01 I I+ and 0 lt x le β For ρ isin (01) letyρ = ρz + (1minus ρ)γ Then yρ is the occupancy function that corresponds tothe rate ρz + (1minus ρ)θ Using the joint convexity of relative entropy in bothvariables ([11] Lemma 143(b)) and the fact that D(zz) = 0 we have

J(yρ) =

int β

0D(ρz(x) + (1minus ρ)θ(x)ρz(x) + (1minus ρ)γ(x))dx

le ρ

int β

0D(z(x)z(x)) dx+ (1minus ρ)

int β

0D(θ(x)γ(x))dx

= (1minus ρ)J(γ)

All required properties are then obtained by letting y = yρ for suitably smallρ isin (01)

It follows that in proving the lower bound we can assume without loss ofgenerality that for some fixed constants b gt 0 and K isinN γi(x)ge bxK for allx isin [0 β] We now return to the proof of the lower bound Our first objectiveis to show that the process can be moved into a small neighborhood of γ(τ)

18 P DUPUIS C NUZMAN AND P WHITING

(for τ gt 0 small) with sufficiently high probability Given τ isin (0 β] σ gt 0and ε gt 0 define

h(γ) =

0 |γ minus γ(τ)|lt σ22ε else

For n large enough that |γ(lfloornτrfloorn) minus γ(τ)| le σ2 (and independent ofτ isin (0 β]) we have the inequality

PΓn(0)(|Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ) + eminus2nε

ge PΓn(0)(|Γn(lfloornτrfloorn)minus γ(τ)|lt σ2) + eminus2nε

geEΓn(0)(expminusnh(Γn(lfloornτrfloorn)))

We next exploit a representation for exponential integrals that will give usan explicit lower bound on the last quantity Consider a process Γn(x) con-structed as follows The process dynamics are of the same general structureas those of Γn save that bi(Γ

n(in)) is replaced by a sequence bni

Γn(

i+1

n

)

= Γn(

i

n

)

+1

nbni Γn(0) = Γn(0)

Furthermore the distribution of bni is allowed to depend in any measurableway upon the set of values Γn(jn)0le j le i Let microγ denote the distribu-tion of bi(γ) and (without explicitly exhibiting all the dependencies) let micronidenote the (random) distribution of bni given Γn(jn)0 le j le i We letEΓn(0) denote expectation on the space that supports these processes Itfollows from [11] Theorem 431 that

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

= inf EΓn(0)

[

h

(

Γn(

lfloornτrfloor

n

))

+1

n

lfloornτrfloorminus1sum

i=1

D(microni microΓn(in))

]

where the infimum is over all such processes Γn(x) In order to obtain a lowerbound we now simply insert a particular choice for the random variables bni We can write γ(τ) minus γ(0) =Mvτ for some probability vector v Define aprocess bni as follows

bni =

ej minus ejminus1 if

lfloor jminus1sum

k=0

vkτn

rfloor

le ile

lfloor jsum

k=0

vkτn

rfloor

minus 1 for 0le j le I

0 if

lfloor

Isum

k=0

vkτn

rfloor

le ile lfloorτnrfloor minus 1

In other words bni defines a deterministic discrete time approximation tothe continuous time occupancy rate process that uses e0 minus e1 for an amount

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 19

of time v0τ e1 minus e2 for an amount of time v1τ and so on This continuoustime process will move the occupancy process from γ(0) to γ(τ) at time τ If ej minus ejminus1 is used at the discrete time step i then since microni concentratesits mass on ej minus ejminus1 the cost is

D( microni microΓn(in)) = log

(

1

Γnjminus1(in)

)

=minus log

(

Γnjminus1

(

i

n

))

(33)

The process Γn(in) possesses important monotonicity and convergence

properties Since Γnjminus1((i+1)n)minus Γn

jminus1(in) =minus1n for lfloorsumjminus1

k=0 vkτnrfloor le ile

lfloorsumj

k=0 vkτnrfloorminus 1

Γnjminus1

(

i

n

)

darr Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

as i uarr lfloorsumj

k=0 vkτnrfloor In addition because the (j minus 1)st component is never

modified when ige lfloorsumj

k=0 vkτnrfloor it follows that

Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

rarr γjminus1(τ)

as nrarrinfin and ηrarr 0 Furthermore as observed previously (32) implies theexistence of b gt 0 and K isinN such that γj(τ)ge bτK for j = 0 I Thus atany given time step i we have a strictly positive lower bound on the relevantcomponent of Γn which in turn provides a strictly finite upper bound on thecorresponding relative entropy cost Indeed it follows from (33) and γj(τ)gebτK that for all sufficiently large n and small η gt 0 there are C1C2 ltinfin(and independent of τ ) such that whenever |Γn(0)minus γ(0)|lt η for all i

D(microni microΓn(in))leC1[minus log τK ]leminusC2 log τ

In addition as nrarrinfin and ηrarr 0

Γn(

1

nlfloorτnrfloor

)

rarr γ(τ)

By the Lebesgue dominated convergence theorem for all sufficiently large nand sufficiently small η gt 0 |Γn(0)minus γ(0)|lt η implies

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

leminusC2τ log τ

We now choose τ gt 0 so that minusC2τ log τ le ε2 Choosing τ gt 0 smaller ifneed be we can also guarantee that |Γn(x)minusγ(x)| le δ for all x isin [0 τ ] wp1if |Γn(0)minus γ(0)| lt η and η gt 0 is sufficiently small The following bound is

20 P DUPUIS C NUZMAN AND P WHITING

therefore valid for the given τ gt 0 for any σ gt 0 and all sufficiently smallη gt 0 |Γn(0)minus γ(0)|lt η implies

lim infnrarrinfin

1

nlogPΓn(0)

(∣

Γn(

lfloornτrfloor

n

)

minus γ

(

lfloornτrfloor

n

)∣

ltσ

supxisin[0τ ]

|Γn(x)minus γ(x)|lt δ

)

geminusε

2

Note that the asymptotic lower bound on the normalized log of the proba-bility is independent of σ gt 0 To obtain the lower bound for all x isin [0 β]we will use the Markov property and an existing lower bound for pathswhich avoid that boundary This latter lower bound will hold uniformly ina neighborhood of the initial condition γ(τ) Since we do not know a priorihow small this neighborhood must be it is important that the lower boundin the last display should be independent of σ gt 0

Now choose ζ isin (0 δ] such that γ(x) is at least distance 2ζ from theboundary of SI for all x isin [τ β] Recall that when considered as a func-tion of γ the distribution of bi(γ) is continuous in the weak topology andmoreover that the support of this distribution is independent of γ so longas γi gt 0 for all i isin 01 I + 1 (ie γ isin S

I where SI denotes interior

relative to the smallest affine space that contains SI) It then follows fromProposition 661 of [11] (see also the discussion on [11] page 165 regardinguniformity) that Γn n= 12 satisfies the following uniform large devi-ations lower bound given any ε gt 0 and ζ gt 0 defined above there is σ gt 0such that as long as |Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ

lim infnrarrinfin

1

nlogPΓn(lfloornτrfloorn)lfloornτrfloorn

(

supτlexleβ

|Γn(x)minus γ(x)|lt ζ

)

geminusJ(γ)minusε

2

where PΓn(lfloornτrfloorn)lfloornτrfloorn denotes probability given the occupancy levels Γn(lfloornτrfloorn)at time lfloornτrfloorn Proposition 661 of [11] assumes Condition 632 It is worthnoting that in the present setting this condition holds with the particularlysimple choice β = γ (using the notation of [11])

The lower bound (31) now follows by the Markov property and the lasttwo displays The proof that J has compact level sets is as in [10] andtherefore omitted

4 Examples and extensions In this section we apply the results of theprevious sections to three different occupancy problems We show how theparameters of interest may be computed numerically and plot the solutionsto the associated calculus of variations problems In Section 44 we list someother asymptotic problems of interest which can be solved by relativelystraightforward generalizations of the results presented in this paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 21

In the calculus of variations problems solved in Section 2 precise initialand terminal points were always given In typical applications one is inter-ested in the minimum value of the rate function over a constraint set Whenthe constraint set is sufficiently simple such problems may still be solvedeasily using the tools provided in Section 22 The three problems of thissection are of this type

Suppose that the event of interest is that the random endpoint Γn(β)should lie in a terminal constraint set Ω To apply the LDP for Γn(β)established by Corollary 23 one must compute exponents of the form

J (Ω) = infωisinΩ

J (ω)

Using Theorem 27 we can write

J (Ω) = infωisinΩ

infπisinF (αωβ)

Ksum

k=0

αkD(π(k)P(β))

(41)

= infπisinF (αΩβ)

Ksum

k=0

αkD(π(k)P(β))

where we have abused notation to define F (αΩ β) =⋃

ωisinΩF (αωβ)In many cases for example when the terminal set Ω is convex and defined

by linear constraints the exponent J can be computed directly from (41)using Lagrange multipliers That is one solves minimization problems of thetype given in Theorem 27 but with the endpoint constraints (210) replacedby constraints defining Ω This is the approach used in the second andthird example below We do not prove that appropriate Lagrange multipliersalways exist if needed existence may be established using methods similarto those used in the Appendix for the case Ω= ω Because of the convexityof relative entropy in (41) a local minimum is always a global minimum overconvex sets Hence in any particular scenario with convex Ω it is sufficientto establish a local minimum by numerically computing a set of Lagrangemultipliers

An alternative approach for computing J (Ω) could be to return to thesample path level and use natural boundary conditions on the extremalcurves (see [25])

A set Ω with interior Ω and closure Ω is a J -continuity set if and onlyif infωisinΩ J (ω) = infωisinΩJ (ω) For such sets the large deviations lower andupper bounds coincide so that minus lim1n logP (Γn(β) isinΩ) = infωisinΩJ (ω) Itmay readily be verified in each of the following examples that the event ofinterest is a J -continuity set

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 4: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

4 P DUPUIS C NUZMAN AND P WHITING

inequality implies that the minimizing trajectory is a straight line and sothe variational problem is actually finite dimensional Another exception isthe case of large deviation asymptotics of a small noise linear stochastic dif-ferential equation In this case the variational problem takes the form of theclassical linear quadratic regulator and the explicit solution is well knownfrom the theory of deterministic optimal control However in this case thereis no need to ldquoliftrdquo the problem to the sample path level since the distribu-tion of the diffusion at any time is Gaussian with explicitly calculable meanand covariance Other exceptions occur in large deviation problems fromqueuing theory but in these problems the variational integrand is ldquosection-allyrdquo independent of γ and one can show that the minimizing trajectories arethe concatenation of a finite number of straight line segments (see eg [1]and the references therein)

For the occupancy problem the variational integrand has a complicatedstate dependence [see (22)] reflecting the complicated dependence of thetransition probabilities (in the process level version) on the state Nonethe-less the rate function possesses a great deal of structure that can be heavilyexploited For example the function J turns out to be strictly convex onpath space and so all local minimizers in the variational problem are actu-ally global minimizers Perhaps even more surprising is the fact that explicitsolutions to the EulerndashLagrange equations can be constructed and as a con-sequence the variational problem can be more-or-less solved explicitly (seethe Appendix) Both of these properties follow from the fact that the vari-ational integrand L can be defined in terms of the famous relative entropyfunction (or divergence)

A technical novelty of the problem is the singular behavior of the tran-sition probabilities of the underlying Markov process Since only cells con-taining j balls can become cells with j + 1 it is clear that the transitionprobability corresponding to such an event scales linearly with Γn

j (x) andin particular that it vanishes at the boundary of the state space whenΓnj (x) = 0 This poses no difficulty for the large deviations upper bound but

is an obstacle for the lower bound (Some results that address lower boundswhen rates go to zero include Chapter 8 of [26] which treats processes withldquoflatrdquo boundaries and recent general results in [27]) For the occupancymodel existing results provide a lower bound for open sets of trajectoriesthat do not touch the boundary because away from the boundaries the setξ L(Γ ξ)ltinfin is independent of Γ To deal with more general open setswe use a perturbation argument We first show using the strict convexityof J and properties of the zero cost paths that it is enough to prove large de-viation lower bounds for open neighborhoods of trajectories that stay awayfrom the boundary for all positive times (Note that this still allows thetrajectory to start on the boundary) Loosely speaking to prove the lowerbound for sets of this type it is enough to show that given a gt 0 there is

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 5

b gt 0 such that the probability that the process is at least distance b fromthe boundary by time b is bounded below by expminusna It turns out that thesebounds can be easily established by exploiting an explicit representation forsuch probabilities that was proved in [11]

An outline of the paper is as follows Section 2 states the main resultsof the paper In the first part of Section 2 we construct the underlyingstochastic process model and state the corresponding sample path levelLDP as well as the LDP for the terminal distribution The proof of thesample path LDP is given in the following section although in Section 2an additional heuristic argument for the form of the local rate functionbased on Sanovrsquos theorem is provided In the second part of Section 2 theterminal distribution rate function is characterized and explicit expressionsfor the sample path minimizers are presented These constitute more-or-lesscomplete solutions to the corresponding calculus of variations problem Thedetailed proofs of these latter results are deferred to the Appendix

Section 3 gives a proof of the sample path LDP The solutions to thevariational problem are exemplified in Section 4 where we show how specificquestions regarding the occupancy problem can be answered In particularwe work out the asymptotics for a number of examples including an overflowproblem and a partial coupon collection problem Generalizations to ourresults are also described

2 Main results

21 Statement of the LDP In this section we formulate the problem ofinterest and state the LDP The proof will be given in Section 3 As notedin the Introduction our focus is the asymptotic behavior of the occupancyproblem If n denotes the total number of cells then to have a nontriviallimit the number of balls placed in the cells should scale linearly with nWe will place lfloorβnrfloor balls in the cells where β isin (0infin) is a fixed parameterand lfloorarfloor denotes the integer part of a

As will be seen in the sequel the large deviation asymptotics of the oc-cupancy should be treated by first lifting the problem to the level of samplepath large deviations and then using the contraction principle to reduce tothe original problem To do this we introduce a ldquotimerdquo variable x that rangesfrom 0 to β At time x one should imagine that lfloornxrfloor balls have been thrownThus the occupancy process will be piecewise constant over intervals of theform [in in+1n) With this scaling of time large deviation asymptoticscan be obtained if we scale space by a factor of 1n Thus we define the ran-dom occupancy process Γn(x)

= (Γn

0 (x) ΓnI (x)Γ

nI+(x)) by letting Γn

i (x)i= 0 I denote 1n times the number of cells with exactly i balls attime x and letting Γn

I+(x) be 1n times the number of cells with more than I

6 P DUPUIS C NUZMAN AND P WHITING

balls at time x Note that Γn takes values in the set of probability vectorson I + 2 points SI

= γ isinR

I+2 γj ge 00le j le I +1 andsumI+1

j=0 γj = 1The process Γn(in) i= 01 is obviously Markovian It will be con-

venient to work with the following ldquodynamical systemrdquo representation

Γn(

i+1

n

)

=Γn(

i

n

)

+1

nbi

(

Γn(

i

n

))

where the independent and identically distributed random vector fields bi(middot)i= 01 have distributions

Pbi(γ) = v=

γj if v = ej+1 minus ej 0le j le IγI+1 if v = 0

Here ej represents the vector in RI+2 whose jth element is unity and for

which all other elements are zero The occupancy after lfloorβnrfloor balls have beenthrown can then be represented by nΓn(β)

As we have discussed the large deviations behavior of Γn(β) will be de-duced from that of the process Γn(middot) In order to state the large deviationasymptotics precisely we should clarify the space in which Γn(middot) takes val-ues and the topology used on that space As usual for processes with jumpsof this sort we use the Skorokhod space D([0 β] RI+2) together with theSkorokhod topology [5] Chapter 4 However the large deviations propertiesof Γn are the same as for the processes Γn where Γn is defined to bethe piecewise linear process which agrees with Γn at times of the form inFor readers unfamiliar with the Skorokhod space and associated topologythe identical large deviations results hold for Γn save that the space ofcontinuous functions and the sup norm topology are used instead

To complete the statement of the LDP for Γn we need some additionalnotation A vector of (deterministic) occupancy rates θ(x) = (θ0(x) θI(x)θI+(x))is a measurable mapping from [0 β] to SI Intuitively these rates repre-sent the rate at which balls flow into urns of a given occupancy levelAssociated with each such vector of rates is the corresponding determin-istic occupancy function γ(x) = (γ0(x) γI(x) γI+(x)) which is definedby the initial condition γ(0) and the differential equations γ0(x) = minusθ0(x)γj(x) = θjminus1(x)minus θj(x) for j = 1 I and γI+(x) = θI(x) These equationsreflect the idea that the fraction of urns containing i balls increases as ballsenter (iminus 1)-occupied urns and decreases as balls enter i-occupied urnsDefining the matrix

M =

minus1 0 0 middot middot middot 01 minus1 0 middot middot middot 00 1 minus1 middot middot middot 0

0 middot middot middot 1 minus1 00 middot middot middot 0 1 0

(21)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 7

we can write γ(x) =Mθ(x) for all x isin [0 β] Conversely given a differentiableoccupancy function γ the corresponding rates are uniquely determinedby γ(x) =Mθ(x) and the normalization

sumIi=0 θi(x) + θI+(x) = 1 We will

also be interested in the cumulative occupancy function ψ with componentsψi(x) =

sumij=0 γi(x) Inspecting the cumulative sums of the rows of M shows

that ψi =minusθi for i= 0 I As more balls are thrown the fraction of urnscontaining i or fewer balls can only decrease and the rate of decrease is therate at which balls enter i-occupied urns We will say that γ is a valid occu-pancy function if γ is absolutely continuous if γ(x) is a probability vectorfor all x isin [0 β] and if its associated θ(x) is a probability vector for almostall x isin [0 β] Note that the functions ψ γ and θ are interchangeable in thesense that each can be derived from any of the others [given γ(0) in the caseof θ] Thus we say that θ and ψ are valid if the associated γ is valid Thefollowing lemma gives a direct characterization of validity in terms of ψ

Lemma 21 A vector of I + 1 continuous functions ψ each of whichmaps [0 β] to [01] is a valid cumulative occupancy path if and only if

(a) ψi(x)ge ψiminus1(x)(b) ψi(x)ge ψi(y)(c)

sumIk=0(ψk(x)minus ψk(y))le y minus x

for each 0le ile I and 0le x lt y le β

Sketch of the proof In the forward direction the first conditionplus the bound ψI(x)le 1 ensure that γ(x) is a probability distribution Thesecond and third conditions together imply that ψ is Lipschitz continuouswith constant 1 and hence absolutely continuous This implies the absolutecontinuity of γ Since ψ =minusθ the third condition ensures that θ is almostalways a probability distribution The reverse direction proceeds similarly

Given γ θ isin SI let D(θγ) denote relative entropy of θ with respect to γThus

D(θγ)=

I+1sum

i=0

θi log(θiγi)

with the understanding that θi log(θiγi)= 0 whenever θi = 0 and that

θi log(θiγi)=infin if θi gt 0 and γi = 0 If γ(x) is a valid occupancy function

with corresponding occupancy rates θ(x) then we set

J(γ) =

int β

0D(θ(x)γ(x))dx(22)

In all other cases set J(γ) =infin

8 P DUPUIS C NUZMAN AND P WHITING

Theorem 22 Suppose that the sequence of initial conditions Γn(0)n = 12 is deterministic and that it converges to α isin SI as nrarr infinThen the sequence of processes Γn n= 12 satisfies the LDP with ratefunction J In other words for any measurable set A of trajectories we havethe large deviation lower bound

lim infnrarrinfin

1

nlogPΓn isinA geminus infJ(γ) γ isinA γ(0) = α

and the large deviation upper bound

lim sup1

nlogPΓn isinA le minus infJ(γ) γ isin A γ(0) = α

and moreover for any compact set of initial conditions K and C ltinfinthe set

γ J(γ)leCγ(0) isinK

is compact

The proof of this result is provided in Section 3 However a formal justi-fication for the form of the rate function is as follows Let δ gt 0 be a smalltime increment Owing to the fact that Γn varies slowly when comparedto bi(γ) one expects

P

(

Γn(δ)minus Γn(0)

δasymp η

∣Γn(0) = γ

)

asymp P

(

1

lfloornδrfloorminus1sum

j=0

bj(γ)asymp η

)

where the symbol ldquoasymprdquo inside the probability means that the indicated quan-tities are within a small fixed constant ε gt 0 of each other Suppose that weinterpret γ as a probability measure on e0 eI+1 and let Yj be in-dependent and identically distributed (iid) with distribution γ Then thesequence of iid random vectors bj(γ) can be realized by setting

bj(γ) =

ei+1 minus ei0

lArrrArr Yj =

eieI+1

that is

bj(γ) =MYj

By Sanovrsquos theorem for any probability vector θ isin SI

P

(

1

lfloornδrfloorminus1sum

j=0

Yj asymp θ

)

asymp expminusnδD(θγ)

Thus

P

(

Γn(δ)minus Γn(0)

δasympMθ

∣Γn(0) = γ

)

asymp expminusnδD(θγ)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 9

Approximating an arbitrary trajectory by a piecewise linear trajectory withnearly equal cost and using the Markov property one expects

P (Γn asymp γ|Γn(0) = α)asymp expminusn

int β

0D(θ(t)γ(t))dt

where θ and γ are related by γ =Mθ γ(0) = α Thus the rate function onpath space should be J(γ)

The zero cost trajectories are of course the law of large numbers limitsand can easily be computed For example if α = (10 ) (so all cells areinitially empty) and i le I then J(γ) = 0 implies that γi(t) = eminusttii Inother words γ(t) is the Poisson distribution with mean t save that all masscorresponding to i gt I is collected together into the state I+1 Throughoutthis paper we will denote the Poisson distribution with mean t by P(t)where Pi(t) = eminusttii

We are primarily interested in the distribution of Γn(β) The contractionprinciple (eg [11] Theorem 132) implies the following variational repre-sentation for the rate function for Γn(β) n= 12 Let A(αωβ) denotethe set of valid occupancy paths γ on [0 β] satisfying γ(0) = α and γ(β) = ω

Corollary 23 Suppose that the sequence of initial conditions Γn(0)n= 12 is deterministic and that it converges to α as nrarrinfin Then thesequence of random vectors Γn(β) n= 12 satisfies the LDP with the con-vex rate function

J (ω) = infJ(γ) γ isinA(αωβ)

Remark 21 Using the explicit formula for J (ω) stated in Theorem 27and the convexity of relative entropy in its first argument it follows that J isin fact strictly convex

Remark 22 It is sometimes useful to show that the large deviationlower bound holds for sets with no interior relative to the ambient spaceSuch bounds can often be proved for processes such as ours that take valuesin a discrete lattice An example would be a set Asub γ isin SI γI+ = 0 forwhich the minimizing trajectories must be polynomial extremals (definedafter Theorem 26) Although we do not need such results in the presentpaper it is worth observing that lower bounds of this kind can be proved

22 Characterization of the terminal rate function and minimizing pathsThe results of this section were obtained using calculus of variations tech-niques the details of which are given in the Appendix The presentationbegins with the case in which all urns are initially empty because it appears

10 P DUPUIS C NUZMAN AND P WHITING

in many applications and because it is a building block for the general caseIn each case first we give a characterization of J (ω) as a minimal relativeentropy which may be computed explicitly in this form We then give anexplicit functional form for the sample path minimizer The function formconstain parameters determined by the solutions to fixed point equations

Before stating these theorems it is helpful to specify the domain overwhich J (ω) is finite We define an endpoint constraint to be a triple (αωβ)where α ω isin SI are the initial and terminal occupancies and where β gt 0is the number of balls thrown per urn We assume without loss that α0 gt 0that is that some fraction of urns are initially empty and we denote theindex set of positive initial occupancies by K= k αk gt 0 Since we do notdistinguish between urns having more than I balls we can always supposethat no urns initially have more than I+1 balls and denote the last elementof α by αI+1 The last element of ω is denoted ωI+ signifying that it collectsall urns with occupancy greater than or equal to I +1

Definition 21 A endpoint constraint (αωβ) is feasible if the corre-sponding set of valid occupancy paths A(αωβ) is nonempty

Lemma 24 An endpoint constraint (αωβ) is feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(23)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

k=0

kαk + β (conservation)(24)

This lemma is proved in the Appendix Condition (23) relates to the factthat the ψi(x) must decrease monotonically and (24) to a conservation con-straint for the number of balls thrown The right-hand side of the inequalityequals the initial number of balls per urn plus the additional balls per urnthrown up to time β while the left-hand side is a lower bound on the numberof balls per urn at time β

In the treatment of general initial conditions the following further defi-nition will be useful

Definition 22 A set of feasible constraints (αωβ) is irreducible ifthe monotonicity conditions (23) hold with strict inequality for all i lt I Otherwise the constraints are termed reducible

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 11

For a reducible set of constraints let i be the first index such thatsumi

k=0αk =sumi

k=0ωk This condition can only be met if no balls are everthrown into i-occupied urns and it follows that urns which initially con-tain i balls or less will always contain i balls or less Thus these urns maybe treated in isolation from the urns containing more than i balls Further-more by subtracting i+ 1 balls from each urn in this latter set we obtainanother occupancy problem in standard form that is with α0 gt 0 Contin-uing in this way a given problem with constraints (αωβ) may be dividedinto a finite number of isolated irreducible subproblems It is only neces-sary therefore to treat problems with irreducible constraints and wherevernecessary we suppose this to be the case

221 Empty initial conditions An important special case of the initialconstraint is when all urns are initially empty that is α0 = 1 and αi = 0 fori gt 0 We abuse notation and denote this case by α= 1

Define the set F (1 ω β) to be the set of distributions π on the nonnegativeintegers satisfying πi = ωi for i= 0 I and the constraint

infinsum

i=0

iπi = β

In the empty case the conditions for feasibility of (1 ω β) reduce tosumI

i=0 iωi+(I+1)ωI+ le β from which it follows that the set F is nonempty if and onlyif (1 ω β) is feasible As we now discuss the infimum of the rate function Jover paths A(1 ω β) can be represented as an infimum of a relative entropyover distributions in F (1 ω β)

Theorem 25 (Terminal rate function empty case) Given the initialcondition α = 1 the rate function J (ω) SI rarr R

+ defined in Corollary 23may be determined as

J (ω) = minπisinF (1ωβ)

D(πP(β))

if (1 ω β) is feasible and is otherwise infinite The minimizing argumentπlowast isin F (1 ω β) is unique

The above expression is of interest in its own right Moreover the optimalsolution πlowast can be computed explicitly Using Lagrange multipliers onecan show in fact that the solution takes the form πlowasti = CPi(ρβ) for i gt Ifor some constants C ge 0 and ρ gt 0 that we refer to as twist parametersHere ρ is related to the Lagrange multiplier for the conservation constraintsuminfin

i=0 iπlowasti = β while C is a normalization constant ensuring that

suminfini=0 π

lowasti = 1

These two constraints may be solved to determine ρ and C If we have the

12 P DUPUIS C NUZMAN AND P WHITING

strict equalitysumI

i=0 iωi + (I + 1)ωI+ = β there are just enough balls to meetthe terminal constraints and we may replace I by I+1 if necessary to ensurethat ωI+ = 1minus

sumIi=0ωi = 0 In this case it turns out that F (1 ω β) has only

one element we then have C = 0 and we may take ρ = 1 Otherwise wedefine ρ to be the unique positive root of the equation

ρβ minussumI

i=0 iPi(ρβ)

1minussumI

i=0Pi(ρβ)=β minus

sumIi=0 iωi

1minussumI

i=0ωi

(25)

Because of the strict inequality in the conservation condition (24) the right-hand side of the last equation is strictly greater than I+1 The left-hand sideof the equation is the conditional mean E[Y |Y gt I] of a Poisson randomvariable Y with mean ρβ As a function of ρ this conditional mean is astrictly monotonic and continuous map from (0infin) to (I+1infin) and hencethe equation has exactly one positive root C is given by

C =1minus

sumIi=0ωi

1minussumI

i=0Pi(ρβ)=

β minussumI

i=0 iωi

ρβ minussumI

i=0 iPi(ρβ)(26)

Evaluating the relative entropy of πlowast and P(β) the rate function of Γn(β)can be expressed as

J (ω) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

(27)

+

(

β minusIsum

i=0

iωi

)

log ρ

The next theorem shows that the least cost path γlowast satisfying J(γlowast) =J (ω) can also be expressed explicitly in terms of the twist parameters Cand ρ The least cost path is of interest for a number of reasons Firstof all the proof of Theorem 25 is obtained by evaluating (22) using theexplicit form of γlowast In a similar way the least cost paths may be used as atool in other problems of interest such as for determining the rate functionof Γn(β) when β is random Second the least cost path provides insightinto the expected behavior of occupancy experiments conditioned on theoccurrence of a rare event Third they allow empirical estimation of rareevent probabilities by change-of-measure importance sampling A key stepin proving the minimality of the proposed least cost path is to show thatthe path satisfies the EulerndashLagrange equations (defined in the Appendix)

Theorem 26 (Globally minimizing path empty case) Suppose that(1ωβ) are feasible constraints with empty initial conditions The infimum of

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 13

J(γ) over A(1 ω β) is achieved on the occupancy path γ isinA(1 ω β) definedby

γ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(28)

γi(x) =xi

i(minus1)iγ

(i)0 (x) 1le ile I(29)

γI+(x) = 1minusIsum

i=0

γi(x)

where ρ gt 0 and C ge 0 are twist parameters associated with the constraintsIn addition γ satisfies the EulerndashLagrange equations

Note that the entire path γ(x) is completely determined by the emptycomponent γ0(x) and its derivatives In particular the components γi(x) are

the terms in the Taylor expansion of γ0(x) about x γ0(x+ y) = γ0(x) + yγ(1)0 (x) + middot middot middot

evaluated at time 0 that is with y = minusx Note that γ0(x) is the sum ofa polynomial and a single exponential term so that the Taylor expansionalways exists When C = 0 there is no exponential term and we say that γ isa polynomial extremal Otherwise C gt 0 we have an exponential extremal

222 General initial conditions Occupancy problems with general ini-tial conditions may be thought of as a coupled set of problems with emptyinitial conditions In particular we consider the set of urns initially contain-ing k balls to form a class The evolution of excess balls (beyond k) enteringurns of this class may be denoted by occupancy functions of the form γkj(t)representing the fraction of balls initially having k urns which have k + jurns at time t The fraction of urns containing i balls in the overall system isobtained by summing contributions from all subproblem components withk+ j = i

As in the empty case the rate function J (ω) can be expressed as thesolution to a minimization problem Let Sinfin denote the set of distributionson the nonnegative integers and let Sn

infin denote the set of n-tuples of suchdistributions Recall that K is set of indices k such that αk gt 0 We will

denote an element of S|K|infin by π = π(0) π(k) π(K) where for any

k isin K a component distribution is denoted π(k) = πk0 πk1 and it isunderstood that the corresponding π(k) is omitted if αk = 0 Finally let

F (αβω) be the set of π isin S|K|infin which satisfies the terminal constraints

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(210)

14 P DUPUIS C NUZMAN AND P WHITING

along with the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(211)

As in the empty case it may be established that F is nonempty if and onlyif (αωβ) is feasible

Theorem 27 (Terminal rate function general case) The rate functionJ (ω) SI rarrR

+ defined in Corollary 23 may be expressed

J (ω) = minπisinF (αωβ)

sum

kisinK

αkD(π(k)P(β))

whenever (αβω) are feasible and is infinite otherwise The minimizingargument πlowast isin F (αωβ) is unique

As discussed above we may suppose the constraints are irreducible andthen as shown in the Appendix Lagrange multipliers will always exist forthis problem When we have strict inequality in the conservation condi-tion (24) (the exponential case) the solution takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le ICkPj(ρβ) k isinK k+ j gt I

In the case of equality in (24) (the polynomial case) the correspondingform is

πlowastkj =

DkPj(β)Wk+j k isinK j + k le I0 k+ j gt I

As for empty initial conditions ρ may be associated with the conserva-tion condition The Wi correspond to the terminal constraints ωi and theCkDk are normalization constants The constants Ck ρ Wi and Di canall be computed numerically using Lagrangian methods for constrained op-timization (see eg [4]) Given these constants the optimizing trajectoryγ(t) may be constructed explicitly This construction may be most simplyexpressed in terms of the minimizing trajectories in the empty case Fork isinK denote the mean of πlowast(k) by βk =

suminfinj=0 jπ

lowastkj and define the terminal

condition ω(k) isin SIminusk by ωkj = πlowastkj for j = 0 Iminusk Then the constraints(1 ω(k) βk) are feasible constraints and Theorem 26 determines the associ-ated least cost paths which we denote γ(k) = γkj(x) The least cost pathsfor the subproblems combine to form the overall least cost path

Theorem 28 (Globally minimizing path general case) For irreducible

feasible constraints (αωβ) let πlowast isin S|K|infin be the unique minimizing distri-

bution in Theorem 27 and let the functions γkj [0 βk]rarr [01] be the min-imizing paths corresponding to the subproblems (1 ω(k) βk) The infimum

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 15

of J(γ) over A(αωβ) is achieved on the occupancy path γ isin A(αωβ)defined by

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

In addition γ satisfies the EulerndashLagrange equations

3 Proof of Theorem 22 The purpose of this section is to prove themain large deviations result We recall the processes and notation definedat the beginning of Section 2

For ζ isinRI+2 and γ isin SI define

H(γ ζ) = log(E[exp(ζ middot bi(γ))])

where ldquomiddotrdquo denotes inner product Since the support of bi(γ) is boundeduniformly in i and γ there exists a function h Rrarr R such that H(γ ζ)leh(|ζ|) for all ζ and γ Also since the distribution of bi(γ) is weakly continuousin γ H(γ ζ) is jointly continuous It follows from [10] Theorem 41 thatthe sequence Γn n= 12 satisfies a large deviation upper bound witha rate function J which we now define Let L be the LegendrendashFencheltransform of H(γ ζ) in ζ

L(γ η) = supζisinRI+2

[ζ middot ηminus H(γ ζ)]

If γ(x)0 le x le β is an absolutely continuous function that takes valuesin SI then

J(γ) =

int β

0L(γ(x) γ(x))dx

If γ is not absolutely continuous then J(γ) = infin ([10] assumes that thevector fields bi(γ) are defined for all γ isin R

I+2 It is easy to check that wecan extend the definition to this set with the bound H(γ ζ) le h(|ζ|) andthe continuity of H(γ ζ) preserved However if the process Γn starts in SI then it stays in SI and so the exact form of the extension has no effect onthe rate function)

We recall the definition of the matrix M given in (21) To complete theproof of the upper bound we must show that J = J where J is defined asin (22) This will hold if we can show that L(γ η) is finite only when η =Mθfor a unique probability vector θ and that in this case

L(γMθ) =D(θγ)

16 P DUPUIS C NUZMAN AND P WHITING

It is well known that L(γ η) is finite if and only if η is in the convexhull of the support of bi(γ) (see eg [11] Lemma 623(d)) Therefore ifL(γ η)ltinfin then η can be written as a convex combination of the form

θ0(e1 minus e0) + middot middot middot+ θI(eI+1 minus eI)

where θj ge 0 andsumI

j=0 θj le 1 Since the vectors (ej+1 minus ej) j = 01 I

are linearly independent these values are unique Setting θI+1 = 1minussumI

j=0 θj we have η =Mθ for a unique probability vector θ Now assume that η takesthis form Then

L(γMθ) = supζisinRI+2

[ζ middotMθminus log(E[exp(ζ middot bi(γ))])]

= supζisinRI+2

[

MT ζ middot θminus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

= supζisinRI+2

[

Isum

j=0

(ζj+1 minus ζj)θj minus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

Given any values micro0 microI+1 we can define ζ0 ζI+1 recursively by ζ0 =minusmicroI+1

and ζj+1minus ζj = microjminusmicroI+1 With these definitions it is apparent that the lastdisplay is equal to

supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1)minus log

([

Isum

j=0

γj exp(microj minus microI+1)

]

+ γI+1

)]

= supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1) + microI+1 minus log

(

I+1sum

j=0

γj expmicroj

)]

= supmicroisinRI+2

[

I+1sum

j=0

microjθj minus log

(

I+1sum

j=0

γj expmicroj

)]

According to the DonskerndashVaradhan variational formula for relative entropy(eg [11] Lemma 143(a)) the last display equals D(θγ) thereby com-pleting the proof of the upper bound

We turn now to the proof of the lower bound In this proof we will as-sume that γ0(0)gt 0 Since γ0 is always nonincreasing γ0(0) = 0 implies thatγ0(x) = 0 for all x and therefore under this condition the first componentplays no significant role A proof analogous to the one given below applieswhen γ0(0) = 0 where the role of γ0(0) here is played by the first positivecomponent of γ(0)

Let PΓn(0) [resp EΓn(0)] denote probability (resp expected value) givena deterministic initial occupancy Γn(0) To prove the large deviation lower

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 17

bound it suffices to show that given any ε gt 0 and δ gt 0 there is η gt 0 suchthat for any initial occupancies satisfying |Γn(0)minus γ(0)|lt η

lim infnrarrinfin

1

nlogPΓn(0)

(

sup0lexleβ

|Γn(x)minus γ(x)|lt δ

)

geminusJ(γ)minus ε(31)

Of course this inequality is trivial if J(γ) = infin and so we assume thatJ(γ)ltinfin

As we have remarked a source of difficulty is the singular behavior ofthe transition rates of the process when Γn is near the boundary of SI We first show that this can be avoided at all times save x= 0 To do thiswe show that for any a gt 0 there exist b gt 0 K isin N and an occupancyfunction y such that y(0) = γ(0) sup0lexleβ |y(x) minus γ(x)| lt a yj(x) gt bxK

for all j = 01 I I+ and 0lt xle β and such that

J(y)le J(γ)

Consider the zero cost trajectory defined by

z(x) =Mz(x) z(0) = γ(0)

We have the following expression for zj when j le I

zj(x) =

[ jsum

k=0

γk(0)x(jminusk)(j minus k)

]

eminusx(32)

It is easy to check from this explicit formula that zj(x) gt bxK for someb gt 0 K = I and all j = 01 I I+ and 0 lt x le β For ρ isin (01) letyρ = ρz + (1minus ρ)γ Then yρ is the occupancy function that corresponds tothe rate ρz + (1minus ρ)θ Using the joint convexity of relative entropy in bothvariables ([11] Lemma 143(b)) and the fact that D(zz) = 0 we have

J(yρ) =

int β

0D(ρz(x) + (1minus ρ)θ(x)ρz(x) + (1minus ρ)γ(x))dx

le ρ

int β

0D(z(x)z(x)) dx+ (1minus ρ)

int β

0D(θ(x)γ(x))dx

= (1minus ρ)J(γ)

All required properties are then obtained by letting y = yρ for suitably smallρ isin (01)

It follows that in proving the lower bound we can assume without loss ofgenerality that for some fixed constants b gt 0 and K isinN γi(x)ge bxK for allx isin [0 β] We now return to the proof of the lower bound Our first objectiveis to show that the process can be moved into a small neighborhood of γ(τ)

18 P DUPUIS C NUZMAN AND P WHITING

(for τ gt 0 small) with sufficiently high probability Given τ isin (0 β] σ gt 0and ε gt 0 define

h(γ) =

0 |γ minus γ(τ)|lt σ22ε else

For n large enough that |γ(lfloornτrfloorn) minus γ(τ)| le σ2 (and independent ofτ isin (0 β]) we have the inequality

PΓn(0)(|Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ) + eminus2nε

ge PΓn(0)(|Γn(lfloornτrfloorn)minus γ(τ)|lt σ2) + eminus2nε

geEΓn(0)(expminusnh(Γn(lfloornτrfloorn)))

We next exploit a representation for exponential integrals that will give usan explicit lower bound on the last quantity Consider a process Γn(x) con-structed as follows The process dynamics are of the same general structureas those of Γn save that bi(Γ

n(in)) is replaced by a sequence bni

Γn(

i+1

n

)

= Γn(

i

n

)

+1

nbni Γn(0) = Γn(0)

Furthermore the distribution of bni is allowed to depend in any measurableway upon the set of values Γn(jn)0le j le i Let microγ denote the distribu-tion of bi(γ) and (without explicitly exhibiting all the dependencies) let micronidenote the (random) distribution of bni given Γn(jn)0 le j le i We letEΓn(0) denote expectation on the space that supports these processes Itfollows from [11] Theorem 431 that

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

= inf EΓn(0)

[

h

(

Γn(

lfloornτrfloor

n

))

+1

n

lfloornτrfloorminus1sum

i=1

D(microni microΓn(in))

]

where the infimum is over all such processes Γn(x) In order to obtain a lowerbound we now simply insert a particular choice for the random variables bni We can write γ(τ) minus γ(0) =Mvτ for some probability vector v Define aprocess bni as follows

bni =

ej minus ejminus1 if

lfloor jminus1sum

k=0

vkτn

rfloor

le ile

lfloor jsum

k=0

vkτn

rfloor

minus 1 for 0le j le I

0 if

lfloor

Isum

k=0

vkτn

rfloor

le ile lfloorτnrfloor minus 1

In other words bni defines a deterministic discrete time approximation tothe continuous time occupancy rate process that uses e0 minus e1 for an amount

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 19

of time v0τ e1 minus e2 for an amount of time v1τ and so on This continuoustime process will move the occupancy process from γ(0) to γ(τ) at time τ If ej minus ejminus1 is used at the discrete time step i then since microni concentratesits mass on ej minus ejminus1 the cost is

D( microni microΓn(in)) = log

(

1

Γnjminus1(in)

)

=minus log

(

Γnjminus1

(

i

n

))

(33)

The process Γn(in) possesses important monotonicity and convergence

properties Since Γnjminus1((i+1)n)minus Γn

jminus1(in) =minus1n for lfloorsumjminus1

k=0 vkτnrfloor le ile

lfloorsumj

k=0 vkτnrfloorminus 1

Γnjminus1

(

i

n

)

darr Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

as i uarr lfloorsumj

k=0 vkτnrfloor In addition because the (j minus 1)st component is never

modified when ige lfloorsumj

k=0 vkτnrfloor it follows that

Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

rarr γjminus1(τ)

as nrarrinfin and ηrarr 0 Furthermore as observed previously (32) implies theexistence of b gt 0 and K isinN such that γj(τ)ge bτK for j = 0 I Thus atany given time step i we have a strictly positive lower bound on the relevantcomponent of Γn which in turn provides a strictly finite upper bound on thecorresponding relative entropy cost Indeed it follows from (33) and γj(τ)gebτK that for all sufficiently large n and small η gt 0 there are C1C2 ltinfin(and independent of τ ) such that whenever |Γn(0)minus γ(0)|lt η for all i

D(microni microΓn(in))leC1[minus log τK ]leminusC2 log τ

In addition as nrarrinfin and ηrarr 0

Γn(

1

nlfloorτnrfloor

)

rarr γ(τ)

By the Lebesgue dominated convergence theorem for all sufficiently large nand sufficiently small η gt 0 |Γn(0)minus γ(0)|lt η implies

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

leminusC2τ log τ

We now choose τ gt 0 so that minusC2τ log τ le ε2 Choosing τ gt 0 smaller ifneed be we can also guarantee that |Γn(x)minusγ(x)| le δ for all x isin [0 τ ] wp1if |Γn(0)minus γ(0)| lt η and η gt 0 is sufficiently small The following bound is

20 P DUPUIS C NUZMAN AND P WHITING

therefore valid for the given τ gt 0 for any σ gt 0 and all sufficiently smallη gt 0 |Γn(0)minus γ(0)|lt η implies

lim infnrarrinfin

1

nlogPΓn(0)

(∣

Γn(

lfloornτrfloor

n

)

minus γ

(

lfloornτrfloor

n

)∣

ltσ

supxisin[0τ ]

|Γn(x)minus γ(x)|lt δ

)

geminusε

2

Note that the asymptotic lower bound on the normalized log of the proba-bility is independent of σ gt 0 To obtain the lower bound for all x isin [0 β]we will use the Markov property and an existing lower bound for pathswhich avoid that boundary This latter lower bound will hold uniformly ina neighborhood of the initial condition γ(τ) Since we do not know a priorihow small this neighborhood must be it is important that the lower boundin the last display should be independent of σ gt 0

Now choose ζ isin (0 δ] such that γ(x) is at least distance 2ζ from theboundary of SI for all x isin [τ β] Recall that when considered as a func-tion of γ the distribution of bi(γ) is continuous in the weak topology andmoreover that the support of this distribution is independent of γ so longas γi gt 0 for all i isin 01 I + 1 (ie γ isin S

I where SI denotes interior

relative to the smallest affine space that contains SI) It then follows fromProposition 661 of [11] (see also the discussion on [11] page 165 regardinguniformity) that Γn n= 12 satisfies the following uniform large devi-ations lower bound given any ε gt 0 and ζ gt 0 defined above there is σ gt 0such that as long as |Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ

lim infnrarrinfin

1

nlogPΓn(lfloornτrfloorn)lfloornτrfloorn

(

supτlexleβ

|Γn(x)minus γ(x)|lt ζ

)

geminusJ(γ)minusε

2

where PΓn(lfloornτrfloorn)lfloornτrfloorn denotes probability given the occupancy levels Γn(lfloornτrfloorn)at time lfloornτrfloorn Proposition 661 of [11] assumes Condition 632 It is worthnoting that in the present setting this condition holds with the particularlysimple choice β = γ (using the notation of [11])

The lower bound (31) now follows by the Markov property and the lasttwo displays The proof that J has compact level sets is as in [10] andtherefore omitted

4 Examples and extensions In this section we apply the results of theprevious sections to three different occupancy problems We show how theparameters of interest may be computed numerically and plot the solutionsto the associated calculus of variations problems In Section 44 we list someother asymptotic problems of interest which can be solved by relativelystraightforward generalizations of the results presented in this paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 21

In the calculus of variations problems solved in Section 2 precise initialand terminal points were always given In typical applications one is inter-ested in the minimum value of the rate function over a constraint set Whenthe constraint set is sufficiently simple such problems may still be solvedeasily using the tools provided in Section 22 The three problems of thissection are of this type

Suppose that the event of interest is that the random endpoint Γn(β)should lie in a terminal constraint set Ω To apply the LDP for Γn(β)established by Corollary 23 one must compute exponents of the form

J (Ω) = infωisinΩ

J (ω)

Using Theorem 27 we can write

J (Ω) = infωisinΩ

infπisinF (αωβ)

Ksum

k=0

αkD(π(k)P(β))

(41)

= infπisinF (αΩβ)

Ksum

k=0

αkD(π(k)P(β))

where we have abused notation to define F (αΩ β) =⋃

ωisinΩF (αωβ)In many cases for example when the terminal set Ω is convex and defined

by linear constraints the exponent J can be computed directly from (41)using Lagrange multipliers That is one solves minimization problems of thetype given in Theorem 27 but with the endpoint constraints (210) replacedby constraints defining Ω This is the approach used in the second andthird example below We do not prove that appropriate Lagrange multipliersalways exist if needed existence may be established using methods similarto those used in the Appendix for the case Ω= ω Because of the convexityof relative entropy in (41) a local minimum is always a global minimum overconvex sets Hence in any particular scenario with convex Ω it is sufficientto establish a local minimum by numerically computing a set of Lagrangemultipliers

An alternative approach for computing J (Ω) could be to return to thesample path level and use natural boundary conditions on the extremalcurves (see [25])

A set Ω with interior Ω and closure Ω is a J -continuity set if and onlyif infωisinΩ J (ω) = infωisinΩJ (ω) For such sets the large deviations lower andupper bounds coincide so that minus lim1n logP (Γn(β) isinΩ) = infωisinΩJ (ω) Itmay readily be verified in each of the following examples that the event ofinterest is a J -continuity set

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 5: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 5

b gt 0 such that the probability that the process is at least distance b fromthe boundary by time b is bounded below by expminusna It turns out that thesebounds can be easily established by exploiting an explicit representation forsuch probabilities that was proved in [11]

An outline of the paper is as follows Section 2 states the main resultsof the paper In the first part of Section 2 we construct the underlyingstochastic process model and state the corresponding sample path levelLDP as well as the LDP for the terminal distribution The proof of thesample path LDP is given in the following section although in Section 2an additional heuristic argument for the form of the local rate functionbased on Sanovrsquos theorem is provided In the second part of Section 2 theterminal distribution rate function is characterized and explicit expressionsfor the sample path minimizers are presented These constitute more-or-lesscomplete solutions to the corresponding calculus of variations problem Thedetailed proofs of these latter results are deferred to the Appendix

Section 3 gives a proof of the sample path LDP The solutions to thevariational problem are exemplified in Section 4 where we show how specificquestions regarding the occupancy problem can be answered In particularwe work out the asymptotics for a number of examples including an overflowproblem and a partial coupon collection problem Generalizations to ourresults are also described

2 Main results

21 Statement of the LDP In this section we formulate the problem ofinterest and state the LDP The proof will be given in Section 3 As notedin the Introduction our focus is the asymptotic behavior of the occupancyproblem If n denotes the total number of cells then to have a nontriviallimit the number of balls placed in the cells should scale linearly with nWe will place lfloorβnrfloor balls in the cells where β isin (0infin) is a fixed parameterand lfloorarfloor denotes the integer part of a

As will be seen in the sequel the large deviation asymptotics of the oc-cupancy should be treated by first lifting the problem to the level of samplepath large deviations and then using the contraction principle to reduce tothe original problem To do this we introduce a ldquotimerdquo variable x that rangesfrom 0 to β At time x one should imagine that lfloornxrfloor balls have been thrownThus the occupancy process will be piecewise constant over intervals of theform [in in+1n) With this scaling of time large deviation asymptoticscan be obtained if we scale space by a factor of 1n Thus we define the ran-dom occupancy process Γn(x)

= (Γn

0 (x) ΓnI (x)Γ

nI+(x)) by letting Γn

i (x)i= 0 I denote 1n times the number of cells with exactly i balls attime x and letting Γn

I+(x) be 1n times the number of cells with more than I

6 P DUPUIS C NUZMAN AND P WHITING

balls at time x Note that Γn takes values in the set of probability vectorson I + 2 points SI

= γ isinR

I+2 γj ge 00le j le I +1 andsumI+1

j=0 γj = 1The process Γn(in) i= 01 is obviously Markovian It will be con-

venient to work with the following ldquodynamical systemrdquo representation

Γn(

i+1

n

)

=Γn(

i

n

)

+1

nbi

(

Γn(

i

n

))

where the independent and identically distributed random vector fields bi(middot)i= 01 have distributions

Pbi(γ) = v=

γj if v = ej+1 minus ej 0le j le IγI+1 if v = 0

Here ej represents the vector in RI+2 whose jth element is unity and for

which all other elements are zero The occupancy after lfloorβnrfloor balls have beenthrown can then be represented by nΓn(β)

As we have discussed the large deviations behavior of Γn(β) will be de-duced from that of the process Γn(middot) In order to state the large deviationasymptotics precisely we should clarify the space in which Γn(middot) takes val-ues and the topology used on that space As usual for processes with jumpsof this sort we use the Skorokhod space D([0 β] RI+2) together with theSkorokhod topology [5] Chapter 4 However the large deviations propertiesof Γn are the same as for the processes Γn where Γn is defined to bethe piecewise linear process which agrees with Γn at times of the form inFor readers unfamiliar with the Skorokhod space and associated topologythe identical large deviations results hold for Γn save that the space ofcontinuous functions and the sup norm topology are used instead

To complete the statement of the LDP for Γn we need some additionalnotation A vector of (deterministic) occupancy rates θ(x) = (θ0(x) θI(x)θI+(x))is a measurable mapping from [0 β] to SI Intuitively these rates repre-sent the rate at which balls flow into urns of a given occupancy levelAssociated with each such vector of rates is the corresponding determin-istic occupancy function γ(x) = (γ0(x) γI(x) γI+(x)) which is definedby the initial condition γ(0) and the differential equations γ0(x) = minusθ0(x)γj(x) = θjminus1(x)minus θj(x) for j = 1 I and γI+(x) = θI(x) These equationsreflect the idea that the fraction of urns containing i balls increases as ballsenter (iminus 1)-occupied urns and decreases as balls enter i-occupied urnsDefining the matrix

M =

minus1 0 0 middot middot middot 01 minus1 0 middot middot middot 00 1 minus1 middot middot middot 0

0 middot middot middot 1 minus1 00 middot middot middot 0 1 0

(21)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 7

we can write γ(x) =Mθ(x) for all x isin [0 β] Conversely given a differentiableoccupancy function γ the corresponding rates are uniquely determinedby γ(x) =Mθ(x) and the normalization

sumIi=0 θi(x) + θI+(x) = 1 We will

also be interested in the cumulative occupancy function ψ with componentsψi(x) =

sumij=0 γi(x) Inspecting the cumulative sums of the rows of M shows

that ψi =minusθi for i= 0 I As more balls are thrown the fraction of urnscontaining i or fewer balls can only decrease and the rate of decrease is therate at which balls enter i-occupied urns We will say that γ is a valid occu-pancy function if γ is absolutely continuous if γ(x) is a probability vectorfor all x isin [0 β] and if its associated θ(x) is a probability vector for almostall x isin [0 β] Note that the functions ψ γ and θ are interchangeable in thesense that each can be derived from any of the others [given γ(0) in the caseof θ] Thus we say that θ and ψ are valid if the associated γ is valid Thefollowing lemma gives a direct characterization of validity in terms of ψ

Lemma 21 A vector of I + 1 continuous functions ψ each of whichmaps [0 β] to [01] is a valid cumulative occupancy path if and only if

(a) ψi(x)ge ψiminus1(x)(b) ψi(x)ge ψi(y)(c)

sumIk=0(ψk(x)minus ψk(y))le y minus x

for each 0le ile I and 0le x lt y le β

Sketch of the proof In the forward direction the first conditionplus the bound ψI(x)le 1 ensure that γ(x) is a probability distribution Thesecond and third conditions together imply that ψ is Lipschitz continuouswith constant 1 and hence absolutely continuous This implies the absolutecontinuity of γ Since ψ =minusθ the third condition ensures that θ is almostalways a probability distribution The reverse direction proceeds similarly

Given γ θ isin SI let D(θγ) denote relative entropy of θ with respect to γThus

D(θγ)=

I+1sum

i=0

θi log(θiγi)

with the understanding that θi log(θiγi)= 0 whenever θi = 0 and that

θi log(θiγi)=infin if θi gt 0 and γi = 0 If γ(x) is a valid occupancy function

with corresponding occupancy rates θ(x) then we set

J(γ) =

int β

0D(θ(x)γ(x))dx(22)

In all other cases set J(γ) =infin

8 P DUPUIS C NUZMAN AND P WHITING

Theorem 22 Suppose that the sequence of initial conditions Γn(0)n = 12 is deterministic and that it converges to α isin SI as nrarr infinThen the sequence of processes Γn n= 12 satisfies the LDP with ratefunction J In other words for any measurable set A of trajectories we havethe large deviation lower bound

lim infnrarrinfin

1

nlogPΓn isinA geminus infJ(γ) γ isinA γ(0) = α

and the large deviation upper bound

lim sup1

nlogPΓn isinA le minus infJ(γ) γ isin A γ(0) = α

and moreover for any compact set of initial conditions K and C ltinfinthe set

γ J(γ)leCγ(0) isinK

is compact

The proof of this result is provided in Section 3 However a formal justi-fication for the form of the rate function is as follows Let δ gt 0 be a smalltime increment Owing to the fact that Γn varies slowly when comparedto bi(γ) one expects

P

(

Γn(δ)minus Γn(0)

δasymp η

∣Γn(0) = γ

)

asymp P

(

1

lfloornδrfloorminus1sum

j=0

bj(γ)asymp η

)

where the symbol ldquoasymprdquo inside the probability means that the indicated quan-tities are within a small fixed constant ε gt 0 of each other Suppose that weinterpret γ as a probability measure on e0 eI+1 and let Yj be in-dependent and identically distributed (iid) with distribution γ Then thesequence of iid random vectors bj(γ) can be realized by setting

bj(γ) =

ei+1 minus ei0

lArrrArr Yj =

eieI+1

that is

bj(γ) =MYj

By Sanovrsquos theorem for any probability vector θ isin SI

P

(

1

lfloornδrfloorminus1sum

j=0

Yj asymp θ

)

asymp expminusnδD(θγ)

Thus

P

(

Γn(δ)minus Γn(0)

δasympMθ

∣Γn(0) = γ

)

asymp expminusnδD(θγ)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 9

Approximating an arbitrary trajectory by a piecewise linear trajectory withnearly equal cost and using the Markov property one expects

P (Γn asymp γ|Γn(0) = α)asymp expminusn

int β

0D(θ(t)γ(t))dt

where θ and γ are related by γ =Mθ γ(0) = α Thus the rate function onpath space should be J(γ)

The zero cost trajectories are of course the law of large numbers limitsand can easily be computed For example if α = (10 ) (so all cells areinitially empty) and i le I then J(γ) = 0 implies that γi(t) = eminusttii Inother words γ(t) is the Poisson distribution with mean t save that all masscorresponding to i gt I is collected together into the state I+1 Throughoutthis paper we will denote the Poisson distribution with mean t by P(t)where Pi(t) = eminusttii

We are primarily interested in the distribution of Γn(β) The contractionprinciple (eg [11] Theorem 132) implies the following variational repre-sentation for the rate function for Γn(β) n= 12 Let A(αωβ) denotethe set of valid occupancy paths γ on [0 β] satisfying γ(0) = α and γ(β) = ω

Corollary 23 Suppose that the sequence of initial conditions Γn(0)n= 12 is deterministic and that it converges to α as nrarrinfin Then thesequence of random vectors Γn(β) n= 12 satisfies the LDP with the con-vex rate function

J (ω) = infJ(γ) γ isinA(αωβ)

Remark 21 Using the explicit formula for J (ω) stated in Theorem 27and the convexity of relative entropy in its first argument it follows that J isin fact strictly convex

Remark 22 It is sometimes useful to show that the large deviationlower bound holds for sets with no interior relative to the ambient spaceSuch bounds can often be proved for processes such as ours that take valuesin a discrete lattice An example would be a set Asub γ isin SI γI+ = 0 forwhich the minimizing trajectories must be polynomial extremals (definedafter Theorem 26) Although we do not need such results in the presentpaper it is worth observing that lower bounds of this kind can be proved

22 Characterization of the terminal rate function and minimizing pathsThe results of this section were obtained using calculus of variations tech-niques the details of which are given in the Appendix The presentationbegins with the case in which all urns are initially empty because it appears

10 P DUPUIS C NUZMAN AND P WHITING

in many applications and because it is a building block for the general caseIn each case first we give a characterization of J (ω) as a minimal relativeentropy which may be computed explicitly in this form We then give anexplicit functional form for the sample path minimizer The function formconstain parameters determined by the solutions to fixed point equations

Before stating these theorems it is helpful to specify the domain overwhich J (ω) is finite We define an endpoint constraint to be a triple (αωβ)where α ω isin SI are the initial and terminal occupancies and where β gt 0is the number of balls thrown per urn We assume without loss that α0 gt 0that is that some fraction of urns are initially empty and we denote theindex set of positive initial occupancies by K= k αk gt 0 Since we do notdistinguish between urns having more than I balls we can always supposethat no urns initially have more than I+1 balls and denote the last elementof α by αI+1 The last element of ω is denoted ωI+ signifying that it collectsall urns with occupancy greater than or equal to I +1

Definition 21 A endpoint constraint (αωβ) is feasible if the corre-sponding set of valid occupancy paths A(αωβ) is nonempty

Lemma 24 An endpoint constraint (αωβ) is feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(23)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

k=0

kαk + β (conservation)(24)

This lemma is proved in the Appendix Condition (23) relates to the factthat the ψi(x) must decrease monotonically and (24) to a conservation con-straint for the number of balls thrown The right-hand side of the inequalityequals the initial number of balls per urn plus the additional balls per urnthrown up to time β while the left-hand side is a lower bound on the numberof balls per urn at time β

In the treatment of general initial conditions the following further defi-nition will be useful

Definition 22 A set of feasible constraints (αωβ) is irreducible ifthe monotonicity conditions (23) hold with strict inequality for all i lt I Otherwise the constraints are termed reducible

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 11

For a reducible set of constraints let i be the first index such thatsumi

k=0αk =sumi

k=0ωk This condition can only be met if no balls are everthrown into i-occupied urns and it follows that urns which initially con-tain i balls or less will always contain i balls or less Thus these urns maybe treated in isolation from the urns containing more than i balls Further-more by subtracting i+ 1 balls from each urn in this latter set we obtainanother occupancy problem in standard form that is with α0 gt 0 Contin-uing in this way a given problem with constraints (αωβ) may be dividedinto a finite number of isolated irreducible subproblems It is only neces-sary therefore to treat problems with irreducible constraints and wherevernecessary we suppose this to be the case

221 Empty initial conditions An important special case of the initialconstraint is when all urns are initially empty that is α0 = 1 and αi = 0 fori gt 0 We abuse notation and denote this case by α= 1

Define the set F (1 ω β) to be the set of distributions π on the nonnegativeintegers satisfying πi = ωi for i= 0 I and the constraint

infinsum

i=0

iπi = β

In the empty case the conditions for feasibility of (1 ω β) reduce tosumI

i=0 iωi+(I+1)ωI+ le β from which it follows that the set F is nonempty if and onlyif (1 ω β) is feasible As we now discuss the infimum of the rate function Jover paths A(1 ω β) can be represented as an infimum of a relative entropyover distributions in F (1 ω β)

Theorem 25 (Terminal rate function empty case) Given the initialcondition α = 1 the rate function J (ω) SI rarr R

+ defined in Corollary 23may be determined as

J (ω) = minπisinF (1ωβ)

D(πP(β))

if (1 ω β) is feasible and is otherwise infinite The minimizing argumentπlowast isin F (1 ω β) is unique

The above expression is of interest in its own right Moreover the optimalsolution πlowast can be computed explicitly Using Lagrange multipliers onecan show in fact that the solution takes the form πlowasti = CPi(ρβ) for i gt Ifor some constants C ge 0 and ρ gt 0 that we refer to as twist parametersHere ρ is related to the Lagrange multiplier for the conservation constraintsuminfin

i=0 iπlowasti = β while C is a normalization constant ensuring that

suminfini=0 π

lowasti = 1

These two constraints may be solved to determine ρ and C If we have the

12 P DUPUIS C NUZMAN AND P WHITING

strict equalitysumI

i=0 iωi + (I + 1)ωI+ = β there are just enough balls to meetthe terminal constraints and we may replace I by I+1 if necessary to ensurethat ωI+ = 1minus

sumIi=0ωi = 0 In this case it turns out that F (1 ω β) has only

one element we then have C = 0 and we may take ρ = 1 Otherwise wedefine ρ to be the unique positive root of the equation

ρβ minussumI

i=0 iPi(ρβ)

1minussumI

i=0Pi(ρβ)=β minus

sumIi=0 iωi

1minussumI

i=0ωi

(25)

Because of the strict inequality in the conservation condition (24) the right-hand side of the last equation is strictly greater than I+1 The left-hand sideof the equation is the conditional mean E[Y |Y gt I] of a Poisson randomvariable Y with mean ρβ As a function of ρ this conditional mean is astrictly monotonic and continuous map from (0infin) to (I+1infin) and hencethe equation has exactly one positive root C is given by

C =1minus

sumIi=0ωi

1minussumI

i=0Pi(ρβ)=

β minussumI

i=0 iωi

ρβ minussumI

i=0 iPi(ρβ)(26)

Evaluating the relative entropy of πlowast and P(β) the rate function of Γn(β)can be expressed as

J (ω) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

(27)

+

(

β minusIsum

i=0

iωi

)

log ρ

The next theorem shows that the least cost path γlowast satisfying J(γlowast) =J (ω) can also be expressed explicitly in terms of the twist parameters Cand ρ The least cost path is of interest for a number of reasons Firstof all the proof of Theorem 25 is obtained by evaluating (22) using theexplicit form of γlowast In a similar way the least cost paths may be used as atool in other problems of interest such as for determining the rate functionof Γn(β) when β is random Second the least cost path provides insightinto the expected behavior of occupancy experiments conditioned on theoccurrence of a rare event Third they allow empirical estimation of rareevent probabilities by change-of-measure importance sampling A key stepin proving the minimality of the proposed least cost path is to show thatthe path satisfies the EulerndashLagrange equations (defined in the Appendix)

Theorem 26 (Globally minimizing path empty case) Suppose that(1ωβ) are feasible constraints with empty initial conditions The infimum of

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 13

J(γ) over A(1 ω β) is achieved on the occupancy path γ isinA(1 ω β) definedby

γ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(28)

γi(x) =xi

i(minus1)iγ

(i)0 (x) 1le ile I(29)

γI+(x) = 1minusIsum

i=0

γi(x)

where ρ gt 0 and C ge 0 are twist parameters associated with the constraintsIn addition γ satisfies the EulerndashLagrange equations

Note that the entire path γ(x) is completely determined by the emptycomponent γ0(x) and its derivatives In particular the components γi(x) are

the terms in the Taylor expansion of γ0(x) about x γ0(x+ y) = γ0(x) + yγ(1)0 (x) + middot middot middot

evaluated at time 0 that is with y = minusx Note that γ0(x) is the sum ofa polynomial and a single exponential term so that the Taylor expansionalways exists When C = 0 there is no exponential term and we say that γ isa polynomial extremal Otherwise C gt 0 we have an exponential extremal

222 General initial conditions Occupancy problems with general ini-tial conditions may be thought of as a coupled set of problems with emptyinitial conditions In particular we consider the set of urns initially contain-ing k balls to form a class The evolution of excess balls (beyond k) enteringurns of this class may be denoted by occupancy functions of the form γkj(t)representing the fraction of balls initially having k urns which have k + jurns at time t The fraction of urns containing i balls in the overall system isobtained by summing contributions from all subproblem components withk+ j = i

As in the empty case the rate function J (ω) can be expressed as thesolution to a minimization problem Let Sinfin denote the set of distributionson the nonnegative integers and let Sn

infin denote the set of n-tuples of suchdistributions Recall that K is set of indices k such that αk gt 0 We will

denote an element of S|K|infin by π = π(0) π(k) π(K) where for any

k isin K a component distribution is denoted π(k) = πk0 πk1 and it isunderstood that the corresponding π(k) is omitted if αk = 0 Finally let

F (αβω) be the set of π isin S|K|infin which satisfies the terminal constraints

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(210)

14 P DUPUIS C NUZMAN AND P WHITING

along with the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(211)

As in the empty case it may be established that F is nonempty if and onlyif (αωβ) is feasible

Theorem 27 (Terminal rate function general case) The rate functionJ (ω) SI rarrR

+ defined in Corollary 23 may be expressed

J (ω) = minπisinF (αωβ)

sum

kisinK

αkD(π(k)P(β))

whenever (αβω) are feasible and is infinite otherwise The minimizingargument πlowast isin F (αωβ) is unique

As discussed above we may suppose the constraints are irreducible andthen as shown in the Appendix Lagrange multipliers will always exist forthis problem When we have strict inequality in the conservation condi-tion (24) (the exponential case) the solution takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le ICkPj(ρβ) k isinK k+ j gt I

In the case of equality in (24) (the polynomial case) the correspondingform is

πlowastkj =

DkPj(β)Wk+j k isinK j + k le I0 k+ j gt I

As for empty initial conditions ρ may be associated with the conserva-tion condition The Wi correspond to the terminal constraints ωi and theCkDk are normalization constants The constants Ck ρ Wi and Di canall be computed numerically using Lagrangian methods for constrained op-timization (see eg [4]) Given these constants the optimizing trajectoryγ(t) may be constructed explicitly This construction may be most simplyexpressed in terms of the minimizing trajectories in the empty case Fork isinK denote the mean of πlowast(k) by βk =

suminfinj=0 jπ

lowastkj and define the terminal

condition ω(k) isin SIminusk by ωkj = πlowastkj for j = 0 Iminusk Then the constraints(1 ω(k) βk) are feasible constraints and Theorem 26 determines the associ-ated least cost paths which we denote γ(k) = γkj(x) The least cost pathsfor the subproblems combine to form the overall least cost path

Theorem 28 (Globally minimizing path general case) For irreducible

feasible constraints (αωβ) let πlowast isin S|K|infin be the unique minimizing distri-

bution in Theorem 27 and let the functions γkj [0 βk]rarr [01] be the min-imizing paths corresponding to the subproblems (1 ω(k) βk) The infimum

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 15

of J(γ) over A(αωβ) is achieved on the occupancy path γ isin A(αωβ)defined by

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

In addition γ satisfies the EulerndashLagrange equations

3 Proof of Theorem 22 The purpose of this section is to prove themain large deviations result We recall the processes and notation definedat the beginning of Section 2

For ζ isinRI+2 and γ isin SI define

H(γ ζ) = log(E[exp(ζ middot bi(γ))])

where ldquomiddotrdquo denotes inner product Since the support of bi(γ) is boundeduniformly in i and γ there exists a function h Rrarr R such that H(γ ζ)leh(|ζ|) for all ζ and γ Also since the distribution of bi(γ) is weakly continuousin γ H(γ ζ) is jointly continuous It follows from [10] Theorem 41 thatthe sequence Γn n= 12 satisfies a large deviation upper bound witha rate function J which we now define Let L be the LegendrendashFencheltransform of H(γ ζ) in ζ

L(γ η) = supζisinRI+2

[ζ middot ηminus H(γ ζ)]

If γ(x)0 le x le β is an absolutely continuous function that takes valuesin SI then

J(γ) =

int β

0L(γ(x) γ(x))dx

If γ is not absolutely continuous then J(γ) = infin ([10] assumes that thevector fields bi(γ) are defined for all γ isin R

I+2 It is easy to check that wecan extend the definition to this set with the bound H(γ ζ) le h(|ζ|) andthe continuity of H(γ ζ) preserved However if the process Γn starts in SI then it stays in SI and so the exact form of the extension has no effect onthe rate function)

We recall the definition of the matrix M given in (21) To complete theproof of the upper bound we must show that J = J where J is defined asin (22) This will hold if we can show that L(γ η) is finite only when η =Mθfor a unique probability vector θ and that in this case

L(γMθ) =D(θγ)

16 P DUPUIS C NUZMAN AND P WHITING

It is well known that L(γ η) is finite if and only if η is in the convexhull of the support of bi(γ) (see eg [11] Lemma 623(d)) Therefore ifL(γ η)ltinfin then η can be written as a convex combination of the form

θ0(e1 minus e0) + middot middot middot+ θI(eI+1 minus eI)

where θj ge 0 andsumI

j=0 θj le 1 Since the vectors (ej+1 minus ej) j = 01 I

are linearly independent these values are unique Setting θI+1 = 1minussumI

j=0 θj we have η =Mθ for a unique probability vector θ Now assume that η takesthis form Then

L(γMθ) = supζisinRI+2

[ζ middotMθminus log(E[exp(ζ middot bi(γ))])]

= supζisinRI+2

[

MT ζ middot θminus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

= supζisinRI+2

[

Isum

j=0

(ζj+1 minus ζj)θj minus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

Given any values micro0 microI+1 we can define ζ0 ζI+1 recursively by ζ0 =minusmicroI+1

and ζj+1minus ζj = microjminusmicroI+1 With these definitions it is apparent that the lastdisplay is equal to

supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1)minus log

([

Isum

j=0

γj exp(microj minus microI+1)

]

+ γI+1

)]

= supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1) + microI+1 minus log

(

I+1sum

j=0

γj expmicroj

)]

= supmicroisinRI+2

[

I+1sum

j=0

microjθj minus log

(

I+1sum

j=0

γj expmicroj

)]

According to the DonskerndashVaradhan variational formula for relative entropy(eg [11] Lemma 143(a)) the last display equals D(θγ) thereby com-pleting the proof of the upper bound

We turn now to the proof of the lower bound In this proof we will as-sume that γ0(0)gt 0 Since γ0 is always nonincreasing γ0(0) = 0 implies thatγ0(x) = 0 for all x and therefore under this condition the first componentplays no significant role A proof analogous to the one given below applieswhen γ0(0) = 0 where the role of γ0(0) here is played by the first positivecomponent of γ(0)

Let PΓn(0) [resp EΓn(0)] denote probability (resp expected value) givena deterministic initial occupancy Γn(0) To prove the large deviation lower

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 17

bound it suffices to show that given any ε gt 0 and δ gt 0 there is η gt 0 suchthat for any initial occupancies satisfying |Γn(0)minus γ(0)|lt η

lim infnrarrinfin

1

nlogPΓn(0)

(

sup0lexleβ

|Γn(x)minus γ(x)|lt δ

)

geminusJ(γ)minus ε(31)

Of course this inequality is trivial if J(γ) = infin and so we assume thatJ(γ)ltinfin

As we have remarked a source of difficulty is the singular behavior ofthe transition rates of the process when Γn is near the boundary of SI We first show that this can be avoided at all times save x= 0 To do thiswe show that for any a gt 0 there exist b gt 0 K isin N and an occupancyfunction y such that y(0) = γ(0) sup0lexleβ |y(x) minus γ(x)| lt a yj(x) gt bxK

for all j = 01 I I+ and 0lt xle β and such that

J(y)le J(γ)

Consider the zero cost trajectory defined by

z(x) =Mz(x) z(0) = γ(0)

We have the following expression for zj when j le I

zj(x) =

[ jsum

k=0

γk(0)x(jminusk)(j minus k)

]

eminusx(32)

It is easy to check from this explicit formula that zj(x) gt bxK for someb gt 0 K = I and all j = 01 I I+ and 0 lt x le β For ρ isin (01) letyρ = ρz + (1minus ρ)γ Then yρ is the occupancy function that corresponds tothe rate ρz + (1minus ρ)θ Using the joint convexity of relative entropy in bothvariables ([11] Lemma 143(b)) and the fact that D(zz) = 0 we have

J(yρ) =

int β

0D(ρz(x) + (1minus ρ)θ(x)ρz(x) + (1minus ρ)γ(x))dx

le ρ

int β

0D(z(x)z(x)) dx+ (1minus ρ)

int β

0D(θ(x)γ(x))dx

= (1minus ρ)J(γ)

All required properties are then obtained by letting y = yρ for suitably smallρ isin (01)

It follows that in proving the lower bound we can assume without loss ofgenerality that for some fixed constants b gt 0 and K isinN γi(x)ge bxK for allx isin [0 β] We now return to the proof of the lower bound Our first objectiveis to show that the process can be moved into a small neighborhood of γ(τ)

18 P DUPUIS C NUZMAN AND P WHITING

(for τ gt 0 small) with sufficiently high probability Given τ isin (0 β] σ gt 0and ε gt 0 define

h(γ) =

0 |γ minus γ(τ)|lt σ22ε else

For n large enough that |γ(lfloornτrfloorn) minus γ(τ)| le σ2 (and independent ofτ isin (0 β]) we have the inequality

PΓn(0)(|Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ) + eminus2nε

ge PΓn(0)(|Γn(lfloornτrfloorn)minus γ(τ)|lt σ2) + eminus2nε

geEΓn(0)(expminusnh(Γn(lfloornτrfloorn)))

We next exploit a representation for exponential integrals that will give usan explicit lower bound on the last quantity Consider a process Γn(x) con-structed as follows The process dynamics are of the same general structureas those of Γn save that bi(Γ

n(in)) is replaced by a sequence bni

Γn(

i+1

n

)

= Γn(

i

n

)

+1

nbni Γn(0) = Γn(0)

Furthermore the distribution of bni is allowed to depend in any measurableway upon the set of values Γn(jn)0le j le i Let microγ denote the distribu-tion of bi(γ) and (without explicitly exhibiting all the dependencies) let micronidenote the (random) distribution of bni given Γn(jn)0 le j le i We letEΓn(0) denote expectation on the space that supports these processes Itfollows from [11] Theorem 431 that

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

= inf EΓn(0)

[

h

(

Γn(

lfloornτrfloor

n

))

+1

n

lfloornτrfloorminus1sum

i=1

D(microni microΓn(in))

]

where the infimum is over all such processes Γn(x) In order to obtain a lowerbound we now simply insert a particular choice for the random variables bni We can write γ(τ) minus γ(0) =Mvτ for some probability vector v Define aprocess bni as follows

bni =

ej minus ejminus1 if

lfloor jminus1sum

k=0

vkτn

rfloor

le ile

lfloor jsum

k=0

vkτn

rfloor

minus 1 for 0le j le I

0 if

lfloor

Isum

k=0

vkτn

rfloor

le ile lfloorτnrfloor minus 1

In other words bni defines a deterministic discrete time approximation tothe continuous time occupancy rate process that uses e0 minus e1 for an amount

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 19

of time v0τ e1 minus e2 for an amount of time v1τ and so on This continuoustime process will move the occupancy process from γ(0) to γ(τ) at time τ If ej minus ejminus1 is used at the discrete time step i then since microni concentratesits mass on ej minus ejminus1 the cost is

D( microni microΓn(in)) = log

(

1

Γnjminus1(in)

)

=minus log

(

Γnjminus1

(

i

n

))

(33)

The process Γn(in) possesses important monotonicity and convergence

properties Since Γnjminus1((i+1)n)minus Γn

jminus1(in) =minus1n for lfloorsumjminus1

k=0 vkτnrfloor le ile

lfloorsumj

k=0 vkτnrfloorminus 1

Γnjminus1

(

i

n

)

darr Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

as i uarr lfloorsumj

k=0 vkτnrfloor In addition because the (j minus 1)st component is never

modified when ige lfloorsumj

k=0 vkτnrfloor it follows that

Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

rarr γjminus1(τ)

as nrarrinfin and ηrarr 0 Furthermore as observed previously (32) implies theexistence of b gt 0 and K isinN such that γj(τ)ge bτK for j = 0 I Thus atany given time step i we have a strictly positive lower bound on the relevantcomponent of Γn which in turn provides a strictly finite upper bound on thecorresponding relative entropy cost Indeed it follows from (33) and γj(τ)gebτK that for all sufficiently large n and small η gt 0 there are C1C2 ltinfin(and independent of τ ) such that whenever |Γn(0)minus γ(0)|lt η for all i

D(microni microΓn(in))leC1[minus log τK ]leminusC2 log τ

In addition as nrarrinfin and ηrarr 0

Γn(

1

nlfloorτnrfloor

)

rarr γ(τ)

By the Lebesgue dominated convergence theorem for all sufficiently large nand sufficiently small η gt 0 |Γn(0)minus γ(0)|lt η implies

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

leminusC2τ log τ

We now choose τ gt 0 so that minusC2τ log τ le ε2 Choosing τ gt 0 smaller ifneed be we can also guarantee that |Γn(x)minusγ(x)| le δ for all x isin [0 τ ] wp1if |Γn(0)minus γ(0)| lt η and η gt 0 is sufficiently small The following bound is

20 P DUPUIS C NUZMAN AND P WHITING

therefore valid for the given τ gt 0 for any σ gt 0 and all sufficiently smallη gt 0 |Γn(0)minus γ(0)|lt η implies

lim infnrarrinfin

1

nlogPΓn(0)

(∣

Γn(

lfloornτrfloor

n

)

minus γ

(

lfloornτrfloor

n

)∣

ltσ

supxisin[0τ ]

|Γn(x)minus γ(x)|lt δ

)

geminusε

2

Note that the asymptotic lower bound on the normalized log of the proba-bility is independent of σ gt 0 To obtain the lower bound for all x isin [0 β]we will use the Markov property and an existing lower bound for pathswhich avoid that boundary This latter lower bound will hold uniformly ina neighborhood of the initial condition γ(τ) Since we do not know a priorihow small this neighborhood must be it is important that the lower boundin the last display should be independent of σ gt 0

Now choose ζ isin (0 δ] such that γ(x) is at least distance 2ζ from theboundary of SI for all x isin [τ β] Recall that when considered as a func-tion of γ the distribution of bi(γ) is continuous in the weak topology andmoreover that the support of this distribution is independent of γ so longas γi gt 0 for all i isin 01 I + 1 (ie γ isin S

I where SI denotes interior

relative to the smallest affine space that contains SI) It then follows fromProposition 661 of [11] (see also the discussion on [11] page 165 regardinguniformity) that Γn n= 12 satisfies the following uniform large devi-ations lower bound given any ε gt 0 and ζ gt 0 defined above there is σ gt 0such that as long as |Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ

lim infnrarrinfin

1

nlogPΓn(lfloornτrfloorn)lfloornτrfloorn

(

supτlexleβ

|Γn(x)minus γ(x)|lt ζ

)

geminusJ(γ)minusε

2

where PΓn(lfloornτrfloorn)lfloornτrfloorn denotes probability given the occupancy levels Γn(lfloornτrfloorn)at time lfloornτrfloorn Proposition 661 of [11] assumes Condition 632 It is worthnoting that in the present setting this condition holds with the particularlysimple choice β = γ (using the notation of [11])

The lower bound (31) now follows by the Markov property and the lasttwo displays The proof that J has compact level sets is as in [10] andtherefore omitted

4 Examples and extensions In this section we apply the results of theprevious sections to three different occupancy problems We show how theparameters of interest may be computed numerically and plot the solutionsto the associated calculus of variations problems In Section 44 we list someother asymptotic problems of interest which can be solved by relativelystraightforward generalizations of the results presented in this paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 21

In the calculus of variations problems solved in Section 2 precise initialand terminal points were always given In typical applications one is inter-ested in the minimum value of the rate function over a constraint set Whenthe constraint set is sufficiently simple such problems may still be solvedeasily using the tools provided in Section 22 The three problems of thissection are of this type

Suppose that the event of interest is that the random endpoint Γn(β)should lie in a terminal constraint set Ω To apply the LDP for Γn(β)established by Corollary 23 one must compute exponents of the form

J (Ω) = infωisinΩ

J (ω)

Using Theorem 27 we can write

J (Ω) = infωisinΩ

infπisinF (αωβ)

Ksum

k=0

αkD(π(k)P(β))

(41)

= infπisinF (αΩβ)

Ksum

k=0

αkD(π(k)P(β))

where we have abused notation to define F (αΩ β) =⋃

ωisinΩF (αωβ)In many cases for example when the terminal set Ω is convex and defined

by linear constraints the exponent J can be computed directly from (41)using Lagrange multipliers That is one solves minimization problems of thetype given in Theorem 27 but with the endpoint constraints (210) replacedby constraints defining Ω This is the approach used in the second andthird example below We do not prove that appropriate Lagrange multipliersalways exist if needed existence may be established using methods similarto those used in the Appendix for the case Ω= ω Because of the convexityof relative entropy in (41) a local minimum is always a global minimum overconvex sets Hence in any particular scenario with convex Ω it is sufficientto establish a local minimum by numerically computing a set of Lagrangemultipliers

An alternative approach for computing J (Ω) could be to return to thesample path level and use natural boundary conditions on the extremalcurves (see [25])

A set Ω with interior Ω and closure Ω is a J -continuity set if and onlyif infωisinΩ J (ω) = infωisinΩJ (ω) For such sets the large deviations lower andupper bounds coincide so that minus lim1n logP (Γn(β) isinΩ) = infωisinΩJ (ω) Itmay readily be verified in each of the following examples that the event ofinterest is a J -continuity set

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 6: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

6 P DUPUIS C NUZMAN AND P WHITING

balls at time x Note that Γn takes values in the set of probability vectorson I + 2 points SI

= γ isinR

I+2 γj ge 00le j le I +1 andsumI+1

j=0 γj = 1The process Γn(in) i= 01 is obviously Markovian It will be con-

venient to work with the following ldquodynamical systemrdquo representation

Γn(

i+1

n

)

=Γn(

i

n

)

+1

nbi

(

Γn(

i

n

))

where the independent and identically distributed random vector fields bi(middot)i= 01 have distributions

Pbi(γ) = v=

γj if v = ej+1 minus ej 0le j le IγI+1 if v = 0

Here ej represents the vector in RI+2 whose jth element is unity and for

which all other elements are zero The occupancy after lfloorβnrfloor balls have beenthrown can then be represented by nΓn(β)

As we have discussed the large deviations behavior of Γn(β) will be de-duced from that of the process Γn(middot) In order to state the large deviationasymptotics precisely we should clarify the space in which Γn(middot) takes val-ues and the topology used on that space As usual for processes with jumpsof this sort we use the Skorokhod space D([0 β] RI+2) together with theSkorokhod topology [5] Chapter 4 However the large deviations propertiesof Γn are the same as for the processes Γn where Γn is defined to bethe piecewise linear process which agrees with Γn at times of the form inFor readers unfamiliar with the Skorokhod space and associated topologythe identical large deviations results hold for Γn save that the space ofcontinuous functions and the sup norm topology are used instead

To complete the statement of the LDP for Γn we need some additionalnotation A vector of (deterministic) occupancy rates θ(x) = (θ0(x) θI(x)θI+(x))is a measurable mapping from [0 β] to SI Intuitively these rates repre-sent the rate at which balls flow into urns of a given occupancy levelAssociated with each such vector of rates is the corresponding determin-istic occupancy function γ(x) = (γ0(x) γI(x) γI+(x)) which is definedby the initial condition γ(0) and the differential equations γ0(x) = minusθ0(x)γj(x) = θjminus1(x)minus θj(x) for j = 1 I and γI+(x) = θI(x) These equationsreflect the idea that the fraction of urns containing i balls increases as ballsenter (iminus 1)-occupied urns and decreases as balls enter i-occupied urnsDefining the matrix

M =

minus1 0 0 middot middot middot 01 minus1 0 middot middot middot 00 1 minus1 middot middot middot 0

0 middot middot middot 1 minus1 00 middot middot middot 0 1 0

(21)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 7

we can write γ(x) =Mθ(x) for all x isin [0 β] Conversely given a differentiableoccupancy function γ the corresponding rates are uniquely determinedby γ(x) =Mθ(x) and the normalization

sumIi=0 θi(x) + θI+(x) = 1 We will

also be interested in the cumulative occupancy function ψ with componentsψi(x) =

sumij=0 γi(x) Inspecting the cumulative sums of the rows of M shows

that ψi =minusθi for i= 0 I As more balls are thrown the fraction of urnscontaining i or fewer balls can only decrease and the rate of decrease is therate at which balls enter i-occupied urns We will say that γ is a valid occu-pancy function if γ is absolutely continuous if γ(x) is a probability vectorfor all x isin [0 β] and if its associated θ(x) is a probability vector for almostall x isin [0 β] Note that the functions ψ γ and θ are interchangeable in thesense that each can be derived from any of the others [given γ(0) in the caseof θ] Thus we say that θ and ψ are valid if the associated γ is valid Thefollowing lemma gives a direct characterization of validity in terms of ψ

Lemma 21 A vector of I + 1 continuous functions ψ each of whichmaps [0 β] to [01] is a valid cumulative occupancy path if and only if

(a) ψi(x)ge ψiminus1(x)(b) ψi(x)ge ψi(y)(c)

sumIk=0(ψk(x)minus ψk(y))le y minus x

for each 0le ile I and 0le x lt y le β

Sketch of the proof In the forward direction the first conditionplus the bound ψI(x)le 1 ensure that γ(x) is a probability distribution Thesecond and third conditions together imply that ψ is Lipschitz continuouswith constant 1 and hence absolutely continuous This implies the absolutecontinuity of γ Since ψ =minusθ the third condition ensures that θ is almostalways a probability distribution The reverse direction proceeds similarly

Given γ θ isin SI let D(θγ) denote relative entropy of θ with respect to γThus

D(θγ)=

I+1sum

i=0

θi log(θiγi)

with the understanding that θi log(θiγi)= 0 whenever θi = 0 and that

θi log(θiγi)=infin if θi gt 0 and γi = 0 If γ(x) is a valid occupancy function

with corresponding occupancy rates θ(x) then we set

J(γ) =

int β

0D(θ(x)γ(x))dx(22)

In all other cases set J(γ) =infin

8 P DUPUIS C NUZMAN AND P WHITING

Theorem 22 Suppose that the sequence of initial conditions Γn(0)n = 12 is deterministic and that it converges to α isin SI as nrarr infinThen the sequence of processes Γn n= 12 satisfies the LDP with ratefunction J In other words for any measurable set A of trajectories we havethe large deviation lower bound

lim infnrarrinfin

1

nlogPΓn isinA geminus infJ(γ) γ isinA γ(0) = α

and the large deviation upper bound

lim sup1

nlogPΓn isinA le minus infJ(γ) γ isin A γ(0) = α

and moreover for any compact set of initial conditions K and C ltinfinthe set

γ J(γ)leCγ(0) isinK

is compact

The proof of this result is provided in Section 3 However a formal justi-fication for the form of the rate function is as follows Let δ gt 0 be a smalltime increment Owing to the fact that Γn varies slowly when comparedto bi(γ) one expects

P

(

Γn(δ)minus Γn(0)

δasymp η

∣Γn(0) = γ

)

asymp P

(

1

lfloornδrfloorminus1sum

j=0

bj(γ)asymp η

)

where the symbol ldquoasymprdquo inside the probability means that the indicated quan-tities are within a small fixed constant ε gt 0 of each other Suppose that weinterpret γ as a probability measure on e0 eI+1 and let Yj be in-dependent and identically distributed (iid) with distribution γ Then thesequence of iid random vectors bj(γ) can be realized by setting

bj(γ) =

ei+1 minus ei0

lArrrArr Yj =

eieI+1

that is

bj(γ) =MYj

By Sanovrsquos theorem for any probability vector θ isin SI

P

(

1

lfloornδrfloorminus1sum

j=0

Yj asymp θ

)

asymp expminusnδD(θγ)

Thus

P

(

Γn(δ)minus Γn(0)

δasympMθ

∣Γn(0) = γ

)

asymp expminusnδD(θγ)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 9

Approximating an arbitrary trajectory by a piecewise linear trajectory withnearly equal cost and using the Markov property one expects

P (Γn asymp γ|Γn(0) = α)asymp expminusn

int β

0D(θ(t)γ(t))dt

where θ and γ are related by γ =Mθ γ(0) = α Thus the rate function onpath space should be J(γ)

The zero cost trajectories are of course the law of large numbers limitsand can easily be computed For example if α = (10 ) (so all cells areinitially empty) and i le I then J(γ) = 0 implies that γi(t) = eminusttii Inother words γ(t) is the Poisson distribution with mean t save that all masscorresponding to i gt I is collected together into the state I+1 Throughoutthis paper we will denote the Poisson distribution with mean t by P(t)where Pi(t) = eminusttii

We are primarily interested in the distribution of Γn(β) The contractionprinciple (eg [11] Theorem 132) implies the following variational repre-sentation for the rate function for Γn(β) n= 12 Let A(αωβ) denotethe set of valid occupancy paths γ on [0 β] satisfying γ(0) = α and γ(β) = ω

Corollary 23 Suppose that the sequence of initial conditions Γn(0)n= 12 is deterministic and that it converges to α as nrarrinfin Then thesequence of random vectors Γn(β) n= 12 satisfies the LDP with the con-vex rate function

J (ω) = infJ(γ) γ isinA(αωβ)

Remark 21 Using the explicit formula for J (ω) stated in Theorem 27and the convexity of relative entropy in its first argument it follows that J isin fact strictly convex

Remark 22 It is sometimes useful to show that the large deviationlower bound holds for sets with no interior relative to the ambient spaceSuch bounds can often be proved for processes such as ours that take valuesin a discrete lattice An example would be a set Asub γ isin SI γI+ = 0 forwhich the minimizing trajectories must be polynomial extremals (definedafter Theorem 26) Although we do not need such results in the presentpaper it is worth observing that lower bounds of this kind can be proved

22 Characterization of the terminal rate function and minimizing pathsThe results of this section were obtained using calculus of variations tech-niques the details of which are given in the Appendix The presentationbegins with the case in which all urns are initially empty because it appears

10 P DUPUIS C NUZMAN AND P WHITING

in many applications and because it is a building block for the general caseIn each case first we give a characterization of J (ω) as a minimal relativeentropy which may be computed explicitly in this form We then give anexplicit functional form for the sample path minimizer The function formconstain parameters determined by the solutions to fixed point equations

Before stating these theorems it is helpful to specify the domain overwhich J (ω) is finite We define an endpoint constraint to be a triple (αωβ)where α ω isin SI are the initial and terminal occupancies and where β gt 0is the number of balls thrown per urn We assume without loss that α0 gt 0that is that some fraction of urns are initially empty and we denote theindex set of positive initial occupancies by K= k αk gt 0 Since we do notdistinguish between urns having more than I balls we can always supposethat no urns initially have more than I+1 balls and denote the last elementof α by αI+1 The last element of ω is denoted ωI+ signifying that it collectsall urns with occupancy greater than or equal to I +1

Definition 21 A endpoint constraint (αωβ) is feasible if the corre-sponding set of valid occupancy paths A(αωβ) is nonempty

Lemma 24 An endpoint constraint (αωβ) is feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(23)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

k=0

kαk + β (conservation)(24)

This lemma is proved in the Appendix Condition (23) relates to the factthat the ψi(x) must decrease monotonically and (24) to a conservation con-straint for the number of balls thrown The right-hand side of the inequalityequals the initial number of balls per urn plus the additional balls per urnthrown up to time β while the left-hand side is a lower bound on the numberof balls per urn at time β

In the treatment of general initial conditions the following further defi-nition will be useful

Definition 22 A set of feasible constraints (αωβ) is irreducible ifthe monotonicity conditions (23) hold with strict inequality for all i lt I Otherwise the constraints are termed reducible

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 11

For a reducible set of constraints let i be the first index such thatsumi

k=0αk =sumi

k=0ωk This condition can only be met if no balls are everthrown into i-occupied urns and it follows that urns which initially con-tain i balls or less will always contain i balls or less Thus these urns maybe treated in isolation from the urns containing more than i balls Further-more by subtracting i+ 1 balls from each urn in this latter set we obtainanother occupancy problem in standard form that is with α0 gt 0 Contin-uing in this way a given problem with constraints (αωβ) may be dividedinto a finite number of isolated irreducible subproblems It is only neces-sary therefore to treat problems with irreducible constraints and wherevernecessary we suppose this to be the case

221 Empty initial conditions An important special case of the initialconstraint is when all urns are initially empty that is α0 = 1 and αi = 0 fori gt 0 We abuse notation and denote this case by α= 1

Define the set F (1 ω β) to be the set of distributions π on the nonnegativeintegers satisfying πi = ωi for i= 0 I and the constraint

infinsum

i=0

iπi = β

In the empty case the conditions for feasibility of (1 ω β) reduce tosumI

i=0 iωi+(I+1)ωI+ le β from which it follows that the set F is nonempty if and onlyif (1 ω β) is feasible As we now discuss the infimum of the rate function Jover paths A(1 ω β) can be represented as an infimum of a relative entropyover distributions in F (1 ω β)

Theorem 25 (Terminal rate function empty case) Given the initialcondition α = 1 the rate function J (ω) SI rarr R

+ defined in Corollary 23may be determined as

J (ω) = minπisinF (1ωβ)

D(πP(β))

if (1 ω β) is feasible and is otherwise infinite The minimizing argumentπlowast isin F (1 ω β) is unique

The above expression is of interest in its own right Moreover the optimalsolution πlowast can be computed explicitly Using Lagrange multipliers onecan show in fact that the solution takes the form πlowasti = CPi(ρβ) for i gt Ifor some constants C ge 0 and ρ gt 0 that we refer to as twist parametersHere ρ is related to the Lagrange multiplier for the conservation constraintsuminfin

i=0 iπlowasti = β while C is a normalization constant ensuring that

suminfini=0 π

lowasti = 1

These two constraints may be solved to determine ρ and C If we have the

12 P DUPUIS C NUZMAN AND P WHITING

strict equalitysumI

i=0 iωi + (I + 1)ωI+ = β there are just enough balls to meetthe terminal constraints and we may replace I by I+1 if necessary to ensurethat ωI+ = 1minus

sumIi=0ωi = 0 In this case it turns out that F (1 ω β) has only

one element we then have C = 0 and we may take ρ = 1 Otherwise wedefine ρ to be the unique positive root of the equation

ρβ minussumI

i=0 iPi(ρβ)

1minussumI

i=0Pi(ρβ)=β minus

sumIi=0 iωi

1minussumI

i=0ωi

(25)

Because of the strict inequality in the conservation condition (24) the right-hand side of the last equation is strictly greater than I+1 The left-hand sideof the equation is the conditional mean E[Y |Y gt I] of a Poisson randomvariable Y with mean ρβ As a function of ρ this conditional mean is astrictly monotonic and continuous map from (0infin) to (I+1infin) and hencethe equation has exactly one positive root C is given by

C =1minus

sumIi=0ωi

1minussumI

i=0Pi(ρβ)=

β minussumI

i=0 iωi

ρβ minussumI

i=0 iPi(ρβ)(26)

Evaluating the relative entropy of πlowast and P(β) the rate function of Γn(β)can be expressed as

J (ω) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

(27)

+

(

β minusIsum

i=0

iωi

)

log ρ

The next theorem shows that the least cost path γlowast satisfying J(γlowast) =J (ω) can also be expressed explicitly in terms of the twist parameters Cand ρ The least cost path is of interest for a number of reasons Firstof all the proof of Theorem 25 is obtained by evaluating (22) using theexplicit form of γlowast In a similar way the least cost paths may be used as atool in other problems of interest such as for determining the rate functionof Γn(β) when β is random Second the least cost path provides insightinto the expected behavior of occupancy experiments conditioned on theoccurrence of a rare event Third they allow empirical estimation of rareevent probabilities by change-of-measure importance sampling A key stepin proving the minimality of the proposed least cost path is to show thatthe path satisfies the EulerndashLagrange equations (defined in the Appendix)

Theorem 26 (Globally minimizing path empty case) Suppose that(1ωβ) are feasible constraints with empty initial conditions The infimum of

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 13

J(γ) over A(1 ω β) is achieved on the occupancy path γ isinA(1 ω β) definedby

γ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(28)

γi(x) =xi

i(minus1)iγ

(i)0 (x) 1le ile I(29)

γI+(x) = 1minusIsum

i=0

γi(x)

where ρ gt 0 and C ge 0 are twist parameters associated with the constraintsIn addition γ satisfies the EulerndashLagrange equations

Note that the entire path γ(x) is completely determined by the emptycomponent γ0(x) and its derivatives In particular the components γi(x) are

the terms in the Taylor expansion of γ0(x) about x γ0(x+ y) = γ0(x) + yγ(1)0 (x) + middot middot middot

evaluated at time 0 that is with y = minusx Note that γ0(x) is the sum ofa polynomial and a single exponential term so that the Taylor expansionalways exists When C = 0 there is no exponential term and we say that γ isa polynomial extremal Otherwise C gt 0 we have an exponential extremal

222 General initial conditions Occupancy problems with general ini-tial conditions may be thought of as a coupled set of problems with emptyinitial conditions In particular we consider the set of urns initially contain-ing k balls to form a class The evolution of excess balls (beyond k) enteringurns of this class may be denoted by occupancy functions of the form γkj(t)representing the fraction of balls initially having k urns which have k + jurns at time t The fraction of urns containing i balls in the overall system isobtained by summing contributions from all subproblem components withk+ j = i

As in the empty case the rate function J (ω) can be expressed as thesolution to a minimization problem Let Sinfin denote the set of distributionson the nonnegative integers and let Sn

infin denote the set of n-tuples of suchdistributions Recall that K is set of indices k such that αk gt 0 We will

denote an element of S|K|infin by π = π(0) π(k) π(K) where for any

k isin K a component distribution is denoted π(k) = πk0 πk1 and it isunderstood that the corresponding π(k) is omitted if αk = 0 Finally let

F (αβω) be the set of π isin S|K|infin which satisfies the terminal constraints

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(210)

14 P DUPUIS C NUZMAN AND P WHITING

along with the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(211)

As in the empty case it may be established that F is nonempty if and onlyif (αωβ) is feasible

Theorem 27 (Terminal rate function general case) The rate functionJ (ω) SI rarrR

+ defined in Corollary 23 may be expressed

J (ω) = minπisinF (αωβ)

sum

kisinK

αkD(π(k)P(β))

whenever (αβω) are feasible and is infinite otherwise The minimizingargument πlowast isin F (αωβ) is unique

As discussed above we may suppose the constraints are irreducible andthen as shown in the Appendix Lagrange multipliers will always exist forthis problem When we have strict inequality in the conservation condi-tion (24) (the exponential case) the solution takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le ICkPj(ρβ) k isinK k+ j gt I

In the case of equality in (24) (the polynomial case) the correspondingform is

πlowastkj =

DkPj(β)Wk+j k isinK j + k le I0 k+ j gt I

As for empty initial conditions ρ may be associated with the conserva-tion condition The Wi correspond to the terminal constraints ωi and theCkDk are normalization constants The constants Ck ρ Wi and Di canall be computed numerically using Lagrangian methods for constrained op-timization (see eg [4]) Given these constants the optimizing trajectoryγ(t) may be constructed explicitly This construction may be most simplyexpressed in terms of the minimizing trajectories in the empty case Fork isinK denote the mean of πlowast(k) by βk =

suminfinj=0 jπ

lowastkj and define the terminal

condition ω(k) isin SIminusk by ωkj = πlowastkj for j = 0 Iminusk Then the constraints(1 ω(k) βk) are feasible constraints and Theorem 26 determines the associ-ated least cost paths which we denote γ(k) = γkj(x) The least cost pathsfor the subproblems combine to form the overall least cost path

Theorem 28 (Globally minimizing path general case) For irreducible

feasible constraints (αωβ) let πlowast isin S|K|infin be the unique minimizing distri-

bution in Theorem 27 and let the functions γkj [0 βk]rarr [01] be the min-imizing paths corresponding to the subproblems (1 ω(k) βk) The infimum

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 15

of J(γ) over A(αωβ) is achieved on the occupancy path γ isin A(αωβ)defined by

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

In addition γ satisfies the EulerndashLagrange equations

3 Proof of Theorem 22 The purpose of this section is to prove themain large deviations result We recall the processes and notation definedat the beginning of Section 2

For ζ isinRI+2 and γ isin SI define

H(γ ζ) = log(E[exp(ζ middot bi(γ))])

where ldquomiddotrdquo denotes inner product Since the support of bi(γ) is boundeduniformly in i and γ there exists a function h Rrarr R such that H(γ ζ)leh(|ζ|) for all ζ and γ Also since the distribution of bi(γ) is weakly continuousin γ H(γ ζ) is jointly continuous It follows from [10] Theorem 41 thatthe sequence Γn n= 12 satisfies a large deviation upper bound witha rate function J which we now define Let L be the LegendrendashFencheltransform of H(γ ζ) in ζ

L(γ η) = supζisinRI+2

[ζ middot ηminus H(γ ζ)]

If γ(x)0 le x le β is an absolutely continuous function that takes valuesin SI then

J(γ) =

int β

0L(γ(x) γ(x))dx

If γ is not absolutely continuous then J(γ) = infin ([10] assumes that thevector fields bi(γ) are defined for all γ isin R

I+2 It is easy to check that wecan extend the definition to this set with the bound H(γ ζ) le h(|ζ|) andthe continuity of H(γ ζ) preserved However if the process Γn starts in SI then it stays in SI and so the exact form of the extension has no effect onthe rate function)

We recall the definition of the matrix M given in (21) To complete theproof of the upper bound we must show that J = J where J is defined asin (22) This will hold if we can show that L(γ η) is finite only when η =Mθfor a unique probability vector θ and that in this case

L(γMθ) =D(θγ)

16 P DUPUIS C NUZMAN AND P WHITING

It is well known that L(γ η) is finite if and only if η is in the convexhull of the support of bi(γ) (see eg [11] Lemma 623(d)) Therefore ifL(γ η)ltinfin then η can be written as a convex combination of the form

θ0(e1 minus e0) + middot middot middot+ θI(eI+1 minus eI)

where θj ge 0 andsumI

j=0 θj le 1 Since the vectors (ej+1 minus ej) j = 01 I

are linearly independent these values are unique Setting θI+1 = 1minussumI

j=0 θj we have η =Mθ for a unique probability vector θ Now assume that η takesthis form Then

L(γMθ) = supζisinRI+2

[ζ middotMθminus log(E[exp(ζ middot bi(γ))])]

= supζisinRI+2

[

MT ζ middot θminus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

= supζisinRI+2

[

Isum

j=0

(ζj+1 minus ζj)θj minus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

Given any values micro0 microI+1 we can define ζ0 ζI+1 recursively by ζ0 =minusmicroI+1

and ζj+1minus ζj = microjminusmicroI+1 With these definitions it is apparent that the lastdisplay is equal to

supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1)minus log

([

Isum

j=0

γj exp(microj minus microI+1)

]

+ γI+1

)]

= supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1) + microI+1 minus log

(

I+1sum

j=0

γj expmicroj

)]

= supmicroisinRI+2

[

I+1sum

j=0

microjθj minus log

(

I+1sum

j=0

γj expmicroj

)]

According to the DonskerndashVaradhan variational formula for relative entropy(eg [11] Lemma 143(a)) the last display equals D(θγ) thereby com-pleting the proof of the upper bound

We turn now to the proof of the lower bound In this proof we will as-sume that γ0(0)gt 0 Since γ0 is always nonincreasing γ0(0) = 0 implies thatγ0(x) = 0 for all x and therefore under this condition the first componentplays no significant role A proof analogous to the one given below applieswhen γ0(0) = 0 where the role of γ0(0) here is played by the first positivecomponent of γ(0)

Let PΓn(0) [resp EΓn(0)] denote probability (resp expected value) givena deterministic initial occupancy Γn(0) To prove the large deviation lower

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 17

bound it suffices to show that given any ε gt 0 and δ gt 0 there is η gt 0 suchthat for any initial occupancies satisfying |Γn(0)minus γ(0)|lt η

lim infnrarrinfin

1

nlogPΓn(0)

(

sup0lexleβ

|Γn(x)minus γ(x)|lt δ

)

geminusJ(γ)minus ε(31)

Of course this inequality is trivial if J(γ) = infin and so we assume thatJ(γ)ltinfin

As we have remarked a source of difficulty is the singular behavior ofthe transition rates of the process when Γn is near the boundary of SI We first show that this can be avoided at all times save x= 0 To do thiswe show that for any a gt 0 there exist b gt 0 K isin N and an occupancyfunction y such that y(0) = γ(0) sup0lexleβ |y(x) minus γ(x)| lt a yj(x) gt bxK

for all j = 01 I I+ and 0lt xle β and such that

J(y)le J(γ)

Consider the zero cost trajectory defined by

z(x) =Mz(x) z(0) = γ(0)

We have the following expression for zj when j le I

zj(x) =

[ jsum

k=0

γk(0)x(jminusk)(j minus k)

]

eminusx(32)

It is easy to check from this explicit formula that zj(x) gt bxK for someb gt 0 K = I and all j = 01 I I+ and 0 lt x le β For ρ isin (01) letyρ = ρz + (1minus ρ)γ Then yρ is the occupancy function that corresponds tothe rate ρz + (1minus ρ)θ Using the joint convexity of relative entropy in bothvariables ([11] Lemma 143(b)) and the fact that D(zz) = 0 we have

J(yρ) =

int β

0D(ρz(x) + (1minus ρ)θ(x)ρz(x) + (1minus ρ)γ(x))dx

le ρ

int β

0D(z(x)z(x)) dx+ (1minus ρ)

int β

0D(θ(x)γ(x))dx

= (1minus ρ)J(γ)

All required properties are then obtained by letting y = yρ for suitably smallρ isin (01)

It follows that in proving the lower bound we can assume without loss ofgenerality that for some fixed constants b gt 0 and K isinN γi(x)ge bxK for allx isin [0 β] We now return to the proof of the lower bound Our first objectiveis to show that the process can be moved into a small neighborhood of γ(τ)

18 P DUPUIS C NUZMAN AND P WHITING

(for τ gt 0 small) with sufficiently high probability Given τ isin (0 β] σ gt 0and ε gt 0 define

h(γ) =

0 |γ minus γ(τ)|lt σ22ε else

For n large enough that |γ(lfloornτrfloorn) minus γ(τ)| le σ2 (and independent ofτ isin (0 β]) we have the inequality

PΓn(0)(|Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ) + eminus2nε

ge PΓn(0)(|Γn(lfloornτrfloorn)minus γ(τ)|lt σ2) + eminus2nε

geEΓn(0)(expminusnh(Γn(lfloornτrfloorn)))

We next exploit a representation for exponential integrals that will give usan explicit lower bound on the last quantity Consider a process Γn(x) con-structed as follows The process dynamics are of the same general structureas those of Γn save that bi(Γ

n(in)) is replaced by a sequence bni

Γn(

i+1

n

)

= Γn(

i

n

)

+1

nbni Γn(0) = Γn(0)

Furthermore the distribution of bni is allowed to depend in any measurableway upon the set of values Γn(jn)0le j le i Let microγ denote the distribu-tion of bi(γ) and (without explicitly exhibiting all the dependencies) let micronidenote the (random) distribution of bni given Γn(jn)0 le j le i We letEΓn(0) denote expectation on the space that supports these processes Itfollows from [11] Theorem 431 that

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

= inf EΓn(0)

[

h

(

Γn(

lfloornτrfloor

n

))

+1

n

lfloornτrfloorminus1sum

i=1

D(microni microΓn(in))

]

where the infimum is over all such processes Γn(x) In order to obtain a lowerbound we now simply insert a particular choice for the random variables bni We can write γ(τ) minus γ(0) =Mvτ for some probability vector v Define aprocess bni as follows

bni =

ej minus ejminus1 if

lfloor jminus1sum

k=0

vkτn

rfloor

le ile

lfloor jsum

k=0

vkτn

rfloor

minus 1 for 0le j le I

0 if

lfloor

Isum

k=0

vkτn

rfloor

le ile lfloorτnrfloor minus 1

In other words bni defines a deterministic discrete time approximation tothe continuous time occupancy rate process that uses e0 minus e1 for an amount

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 19

of time v0τ e1 minus e2 for an amount of time v1τ and so on This continuoustime process will move the occupancy process from γ(0) to γ(τ) at time τ If ej minus ejminus1 is used at the discrete time step i then since microni concentratesits mass on ej minus ejminus1 the cost is

D( microni microΓn(in)) = log

(

1

Γnjminus1(in)

)

=minus log

(

Γnjminus1

(

i

n

))

(33)

The process Γn(in) possesses important monotonicity and convergence

properties Since Γnjminus1((i+1)n)minus Γn

jminus1(in) =minus1n for lfloorsumjminus1

k=0 vkτnrfloor le ile

lfloorsumj

k=0 vkτnrfloorminus 1

Γnjminus1

(

i

n

)

darr Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

as i uarr lfloorsumj

k=0 vkτnrfloor In addition because the (j minus 1)st component is never

modified when ige lfloorsumj

k=0 vkτnrfloor it follows that

Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

rarr γjminus1(τ)

as nrarrinfin and ηrarr 0 Furthermore as observed previously (32) implies theexistence of b gt 0 and K isinN such that γj(τ)ge bτK for j = 0 I Thus atany given time step i we have a strictly positive lower bound on the relevantcomponent of Γn which in turn provides a strictly finite upper bound on thecorresponding relative entropy cost Indeed it follows from (33) and γj(τ)gebτK that for all sufficiently large n and small η gt 0 there are C1C2 ltinfin(and independent of τ ) such that whenever |Γn(0)minus γ(0)|lt η for all i

D(microni microΓn(in))leC1[minus log τK ]leminusC2 log τ

In addition as nrarrinfin and ηrarr 0

Γn(

1

nlfloorτnrfloor

)

rarr γ(τ)

By the Lebesgue dominated convergence theorem for all sufficiently large nand sufficiently small η gt 0 |Γn(0)minus γ(0)|lt η implies

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

leminusC2τ log τ

We now choose τ gt 0 so that minusC2τ log τ le ε2 Choosing τ gt 0 smaller ifneed be we can also guarantee that |Γn(x)minusγ(x)| le δ for all x isin [0 τ ] wp1if |Γn(0)minus γ(0)| lt η and η gt 0 is sufficiently small The following bound is

20 P DUPUIS C NUZMAN AND P WHITING

therefore valid for the given τ gt 0 for any σ gt 0 and all sufficiently smallη gt 0 |Γn(0)minus γ(0)|lt η implies

lim infnrarrinfin

1

nlogPΓn(0)

(∣

Γn(

lfloornτrfloor

n

)

minus γ

(

lfloornτrfloor

n

)∣

ltσ

supxisin[0τ ]

|Γn(x)minus γ(x)|lt δ

)

geminusε

2

Note that the asymptotic lower bound on the normalized log of the proba-bility is independent of σ gt 0 To obtain the lower bound for all x isin [0 β]we will use the Markov property and an existing lower bound for pathswhich avoid that boundary This latter lower bound will hold uniformly ina neighborhood of the initial condition γ(τ) Since we do not know a priorihow small this neighborhood must be it is important that the lower boundin the last display should be independent of σ gt 0

Now choose ζ isin (0 δ] such that γ(x) is at least distance 2ζ from theboundary of SI for all x isin [τ β] Recall that when considered as a func-tion of γ the distribution of bi(γ) is continuous in the weak topology andmoreover that the support of this distribution is independent of γ so longas γi gt 0 for all i isin 01 I + 1 (ie γ isin S

I where SI denotes interior

relative to the smallest affine space that contains SI) It then follows fromProposition 661 of [11] (see also the discussion on [11] page 165 regardinguniformity) that Γn n= 12 satisfies the following uniform large devi-ations lower bound given any ε gt 0 and ζ gt 0 defined above there is σ gt 0such that as long as |Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ

lim infnrarrinfin

1

nlogPΓn(lfloornτrfloorn)lfloornτrfloorn

(

supτlexleβ

|Γn(x)minus γ(x)|lt ζ

)

geminusJ(γ)minusε

2

where PΓn(lfloornτrfloorn)lfloornτrfloorn denotes probability given the occupancy levels Γn(lfloornτrfloorn)at time lfloornτrfloorn Proposition 661 of [11] assumes Condition 632 It is worthnoting that in the present setting this condition holds with the particularlysimple choice β = γ (using the notation of [11])

The lower bound (31) now follows by the Markov property and the lasttwo displays The proof that J has compact level sets is as in [10] andtherefore omitted

4 Examples and extensions In this section we apply the results of theprevious sections to three different occupancy problems We show how theparameters of interest may be computed numerically and plot the solutionsto the associated calculus of variations problems In Section 44 we list someother asymptotic problems of interest which can be solved by relativelystraightforward generalizations of the results presented in this paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 21

In the calculus of variations problems solved in Section 2 precise initialand terminal points were always given In typical applications one is inter-ested in the minimum value of the rate function over a constraint set Whenthe constraint set is sufficiently simple such problems may still be solvedeasily using the tools provided in Section 22 The three problems of thissection are of this type

Suppose that the event of interest is that the random endpoint Γn(β)should lie in a terminal constraint set Ω To apply the LDP for Γn(β)established by Corollary 23 one must compute exponents of the form

J (Ω) = infωisinΩ

J (ω)

Using Theorem 27 we can write

J (Ω) = infωisinΩ

infπisinF (αωβ)

Ksum

k=0

αkD(π(k)P(β))

(41)

= infπisinF (αΩβ)

Ksum

k=0

αkD(π(k)P(β))

where we have abused notation to define F (αΩ β) =⋃

ωisinΩF (αωβ)In many cases for example when the terminal set Ω is convex and defined

by linear constraints the exponent J can be computed directly from (41)using Lagrange multipliers That is one solves minimization problems of thetype given in Theorem 27 but with the endpoint constraints (210) replacedby constraints defining Ω This is the approach used in the second andthird example below We do not prove that appropriate Lagrange multipliersalways exist if needed existence may be established using methods similarto those used in the Appendix for the case Ω= ω Because of the convexityof relative entropy in (41) a local minimum is always a global minimum overconvex sets Hence in any particular scenario with convex Ω it is sufficientto establish a local minimum by numerically computing a set of Lagrangemultipliers

An alternative approach for computing J (Ω) could be to return to thesample path level and use natural boundary conditions on the extremalcurves (see [25])

A set Ω with interior Ω and closure Ω is a J -continuity set if and onlyif infωisinΩ J (ω) = infωisinΩJ (ω) For such sets the large deviations lower andupper bounds coincide so that minus lim1n logP (Γn(β) isinΩ) = infωisinΩJ (ω) Itmay readily be verified in each of the following examples that the event ofinterest is a J -continuity set

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 7: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 7

we can write γ(x) =Mθ(x) for all x isin [0 β] Conversely given a differentiableoccupancy function γ the corresponding rates are uniquely determinedby γ(x) =Mθ(x) and the normalization

sumIi=0 θi(x) + θI+(x) = 1 We will

also be interested in the cumulative occupancy function ψ with componentsψi(x) =

sumij=0 γi(x) Inspecting the cumulative sums of the rows of M shows

that ψi =minusθi for i= 0 I As more balls are thrown the fraction of urnscontaining i or fewer balls can only decrease and the rate of decrease is therate at which balls enter i-occupied urns We will say that γ is a valid occu-pancy function if γ is absolutely continuous if γ(x) is a probability vectorfor all x isin [0 β] and if its associated θ(x) is a probability vector for almostall x isin [0 β] Note that the functions ψ γ and θ are interchangeable in thesense that each can be derived from any of the others [given γ(0) in the caseof θ] Thus we say that θ and ψ are valid if the associated γ is valid Thefollowing lemma gives a direct characterization of validity in terms of ψ

Lemma 21 A vector of I + 1 continuous functions ψ each of whichmaps [0 β] to [01] is a valid cumulative occupancy path if and only if

(a) ψi(x)ge ψiminus1(x)(b) ψi(x)ge ψi(y)(c)

sumIk=0(ψk(x)minus ψk(y))le y minus x

for each 0le ile I and 0le x lt y le β

Sketch of the proof In the forward direction the first conditionplus the bound ψI(x)le 1 ensure that γ(x) is a probability distribution Thesecond and third conditions together imply that ψ is Lipschitz continuouswith constant 1 and hence absolutely continuous This implies the absolutecontinuity of γ Since ψ =minusθ the third condition ensures that θ is almostalways a probability distribution The reverse direction proceeds similarly

Given γ θ isin SI let D(θγ) denote relative entropy of θ with respect to γThus

D(θγ)=

I+1sum

i=0

θi log(θiγi)

with the understanding that θi log(θiγi)= 0 whenever θi = 0 and that

θi log(θiγi)=infin if θi gt 0 and γi = 0 If γ(x) is a valid occupancy function

with corresponding occupancy rates θ(x) then we set

J(γ) =

int β

0D(θ(x)γ(x))dx(22)

In all other cases set J(γ) =infin

8 P DUPUIS C NUZMAN AND P WHITING

Theorem 22 Suppose that the sequence of initial conditions Γn(0)n = 12 is deterministic and that it converges to α isin SI as nrarr infinThen the sequence of processes Γn n= 12 satisfies the LDP with ratefunction J In other words for any measurable set A of trajectories we havethe large deviation lower bound

lim infnrarrinfin

1

nlogPΓn isinA geminus infJ(γ) γ isinA γ(0) = α

and the large deviation upper bound

lim sup1

nlogPΓn isinA le minus infJ(γ) γ isin A γ(0) = α

and moreover for any compact set of initial conditions K and C ltinfinthe set

γ J(γ)leCγ(0) isinK

is compact

The proof of this result is provided in Section 3 However a formal justi-fication for the form of the rate function is as follows Let δ gt 0 be a smalltime increment Owing to the fact that Γn varies slowly when comparedto bi(γ) one expects

P

(

Γn(δ)minus Γn(0)

δasymp η

∣Γn(0) = γ

)

asymp P

(

1

lfloornδrfloorminus1sum

j=0

bj(γ)asymp η

)

where the symbol ldquoasymprdquo inside the probability means that the indicated quan-tities are within a small fixed constant ε gt 0 of each other Suppose that weinterpret γ as a probability measure on e0 eI+1 and let Yj be in-dependent and identically distributed (iid) with distribution γ Then thesequence of iid random vectors bj(γ) can be realized by setting

bj(γ) =

ei+1 minus ei0

lArrrArr Yj =

eieI+1

that is

bj(γ) =MYj

By Sanovrsquos theorem for any probability vector θ isin SI

P

(

1

lfloornδrfloorminus1sum

j=0

Yj asymp θ

)

asymp expminusnδD(θγ)

Thus

P

(

Γn(δ)minus Γn(0)

δasympMθ

∣Γn(0) = γ

)

asymp expminusnδD(θγ)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 9

Approximating an arbitrary trajectory by a piecewise linear trajectory withnearly equal cost and using the Markov property one expects

P (Γn asymp γ|Γn(0) = α)asymp expminusn

int β

0D(θ(t)γ(t))dt

where θ and γ are related by γ =Mθ γ(0) = α Thus the rate function onpath space should be J(γ)

The zero cost trajectories are of course the law of large numbers limitsand can easily be computed For example if α = (10 ) (so all cells areinitially empty) and i le I then J(γ) = 0 implies that γi(t) = eminusttii Inother words γ(t) is the Poisson distribution with mean t save that all masscorresponding to i gt I is collected together into the state I+1 Throughoutthis paper we will denote the Poisson distribution with mean t by P(t)where Pi(t) = eminusttii

We are primarily interested in the distribution of Γn(β) The contractionprinciple (eg [11] Theorem 132) implies the following variational repre-sentation for the rate function for Γn(β) n= 12 Let A(αωβ) denotethe set of valid occupancy paths γ on [0 β] satisfying γ(0) = α and γ(β) = ω

Corollary 23 Suppose that the sequence of initial conditions Γn(0)n= 12 is deterministic and that it converges to α as nrarrinfin Then thesequence of random vectors Γn(β) n= 12 satisfies the LDP with the con-vex rate function

J (ω) = infJ(γ) γ isinA(αωβ)

Remark 21 Using the explicit formula for J (ω) stated in Theorem 27and the convexity of relative entropy in its first argument it follows that J isin fact strictly convex

Remark 22 It is sometimes useful to show that the large deviationlower bound holds for sets with no interior relative to the ambient spaceSuch bounds can often be proved for processes such as ours that take valuesin a discrete lattice An example would be a set Asub γ isin SI γI+ = 0 forwhich the minimizing trajectories must be polynomial extremals (definedafter Theorem 26) Although we do not need such results in the presentpaper it is worth observing that lower bounds of this kind can be proved

22 Characterization of the terminal rate function and minimizing pathsThe results of this section were obtained using calculus of variations tech-niques the details of which are given in the Appendix The presentationbegins with the case in which all urns are initially empty because it appears

10 P DUPUIS C NUZMAN AND P WHITING

in many applications and because it is a building block for the general caseIn each case first we give a characterization of J (ω) as a minimal relativeentropy which may be computed explicitly in this form We then give anexplicit functional form for the sample path minimizer The function formconstain parameters determined by the solutions to fixed point equations

Before stating these theorems it is helpful to specify the domain overwhich J (ω) is finite We define an endpoint constraint to be a triple (αωβ)where α ω isin SI are the initial and terminal occupancies and where β gt 0is the number of balls thrown per urn We assume without loss that α0 gt 0that is that some fraction of urns are initially empty and we denote theindex set of positive initial occupancies by K= k αk gt 0 Since we do notdistinguish between urns having more than I balls we can always supposethat no urns initially have more than I+1 balls and denote the last elementof α by αI+1 The last element of ω is denoted ωI+ signifying that it collectsall urns with occupancy greater than or equal to I +1

Definition 21 A endpoint constraint (αωβ) is feasible if the corre-sponding set of valid occupancy paths A(αωβ) is nonempty

Lemma 24 An endpoint constraint (αωβ) is feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(23)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

k=0

kαk + β (conservation)(24)

This lemma is proved in the Appendix Condition (23) relates to the factthat the ψi(x) must decrease monotonically and (24) to a conservation con-straint for the number of balls thrown The right-hand side of the inequalityequals the initial number of balls per urn plus the additional balls per urnthrown up to time β while the left-hand side is a lower bound on the numberof balls per urn at time β

In the treatment of general initial conditions the following further defi-nition will be useful

Definition 22 A set of feasible constraints (αωβ) is irreducible ifthe monotonicity conditions (23) hold with strict inequality for all i lt I Otherwise the constraints are termed reducible

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 11

For a reducible set of constraints let i be the first index such thatsumi

k=0αk =sumi

k=0ωk This condition can only be met if no balls are everthrown into i-occupied urns and it follows that urns which initially con-tain i balls or less will always contain i balls or less Thus these urns maybe treated in isolation from the urns containing more than i balls Further-more by subtracting i+ 1 balls from each urn in this latter set we obtainanother occupancy problem in standard form that is with α0 gt 0 Contin-uing in this way a given problem with constraints (αωβ) may be dividedinto a finite number of isolated irreducible subproblems It is only neces-sary therefore to treat problems with irreducible constraints and wherevernecessary we suppose this to be the case

221 Empty initial conditions An important special case of the initialconstraint is when all urns are initially empty that is α0 = 1 and αi = 0 fori gt 0 We abuse notation and denote this case by α= 1

Define the set F (1 ω β) to be the set of distributions π on the nonnegativeintegers satisfying πi = ωi for i= 0 I and the constraint

infinsum

i=0

iπi = β

In the empty case the conditions for feasibility of (1 ω β) reduce tosumI

i=0 iωi+(I+1)ωI+ le β from which it follows that the set F is nonempty if and onlyif (1 ω β) is feasible As we now discuss the infimum of the rate function Jover paths A(1 ω β) can be represented as an infimum of a relative entropyover distributions in F (1 ω β)

Theorem 25 (Terminal rate function empty case) Given the initialcondition α = 1 the rate function J (ω) SI rarr R

+ defined in Corollary 23may be determined as

J (ω) = minπisinF (1ωβ)

D(πP(β))

if (1 ω β) is feasible and is otherwise infinite The minimizing argumentπlowast isin F (1 ω β) is unique

The above expression is of interest in its own right Moreover the optimalsolution πlowast can be computed explicitly Using Lagrange multipliers onecan show in fact that the solution takes the form πlowasti = CPi(ρβ) for i gt Ifor some constants C ge 0 and ρ gt 0 that we refer to as twist parametersHere ρ is related to the Lagrange multiplier for the conservation constraintsuminfin

i=0 iπlowasti = β while C is a normalization constant ensuring that

suminfini=0 π

lowasti = 1

These two constraints may be solved to determine ρ and C If we have the

12 P DUPUIS C NUZMAN AND P WHITING

strict equalitysumI

i=0 iωi + (I + 1)ωI+ = β there are just enough balls to meetthe terminal constraints and we may replace I by I+1 if necessary to ensurethat ωI+ = 1minus

sumIi=0ωi = 0 In this case it turns out that F (1 ω β) has only

one element we then have C = 0 and we may take ρ = 1 Otherwise wedefine ρ to be the unique positive root of the equation

ρβ minussumI

i=0 iPi(ρβ)

1minussumI

i=0Pi(ρβ)=β minus

sumIi=0 iωi

1minussumI

i=0ωi

(25)

Because of the strict inequality in the conservation condition (24) the right-hand side of the last equation is strictly greater than I+1 The left-hand sideof the equation is the conditional mean E[Y |Y gt I] of a Poisson randomvariable Y with mean ρβ As a function of ρ this conditional mean is astrictly monotonic and continuous map from (0infin) to (I+1infin) and hencethe equation has exactly one positive root C is given by

C =1minus

sumIi=0ωi

1minussumI

i=0Pi(ρβ)=

β minussumI

i=0 iωi

ρβ minussumI

i=0 iPi(ρβ)(26)

Evaluating the relative entropy of πlowast and P(β) the rate function of Γn(β)can be expressed as

J (ω) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

(27)

+

(

β minusIsum

i=0

iωi

)

log ρ

The next theorem shows that the least cost path γlowast satisfying J(γlowast) =J (ω) can also be expressed explicitly in terms of the twist parameters Cand ρ The least cost path is of interest for a number of reasons Firstof all the proof of Theorem 25 is obtained by evaluating (22) using theexplicit form of γlowast In a similar way the least cost paths may be used as atool in other problems of interest such as for determining the rate functionof Γn(β) when β is random Second the least cost path provides insightinto the expected behavior of occupancy experiments conditioned on theoccurrence of a rare event Third they allow empirical estimation of rareevent probabilities by change-of-measure importance sampling A key stepin proving the minimality of the proposed least cost path is to show thatthe path satisfies the EulerndashLagrange equations (defined in the Appendix)

Theorem 26 (Globally minimizing path empty case) Suppose that(1ωβ) are feasible constraints with empty initial conditions The infimum of

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 13

J(γ) over A(1 ω β) is achieved on the occupancy path γ isinA(1 ω β) definedby

γ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(28)

γi(x) =xi

i(minus1)iγ

(i)0 (x) 1le ile I(29)

γI+(x) = 1minusIsum

i=0

γi(x)

where ρ gt 0 and C ge 0 are twist parameters associated with the constraintsIn addition γ satisfies the EulerndashLagrange equations

Note that the entire path γ(x) is completely determined by the emptycomponent γ0(x) and its derivatives In particular the components γi(x) are

the terms in the Taylor expansion of γ0(x) about x γ0(x+ y) = γ0(x) + yγ(1)0 (x) + middot middot middot

evaluated at time 0 that is with y = minusx Note that γ0(x) is the sum ofa polynomial and a single exponential term so that the Taylor expansionalways exists When C = 0 there is no exponential term and we say that γ isa polynomial extremal Otherwise C gt 0 we have an exponential extremal

222 General initial conditions Occupancy problems with general ini-tial conditions may be thought of as a coupled set of problems with emptyinitial conditions In particular we consider the set of urns initially contain-ing k balls to form a class The evolution of excess balls (beyond k) enteringurns of this class may be denoted by occupancy functions of the form γkj(t)representing the fraction of balls initially having k urns which have k + jurns at time t The fraction of urns containing i balls in the overall system isobtained by summing contributions from all subproblem components withk+ j = i

As in the empty case the rate function J (ω) can be expressed as thesolution to a minimization problem Let Sinfin denote the set of distributionson the nonnegative integers and let Sn

infin denote the set of n-tuples of suchdistributions Recall that K is set of indices k such that αk gt 0 We will

denote an element of S|K|infin by π = π(0) π(k) π(K) where for any

k isin K a component distribution is denoted π(k) = πk0 πk1 and it isunderstood that the corresponding π(k) is omitted if αk = 0 Finally let

F (αβω) be the set of π isin S|K|infin which satisfies the terminal constraints

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(210)

14 P DUPUIS C NUZMAN AND P WHITING

along with the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(211)

As in the empty case it may be established that F is nonempty if and onlyif (αωβ) is feasible

Theorem 27 (Terminal rate function general case) The rate functionJ (ω) SI rarrR

+ defined in Corollary 23 may be expressed

J (ω) = minπisinF (αωβ)

sum

kisinK

αkD(π(k)P(β))

whenever (αβω) are feasible and is infinite otherwise The minimizingargument πlowast isin F (αωβ) is unique

As discussed above we may suppose the constraints are irreducible andthen as shown in the Appendix Lagrange multipliers will always exist forthis problem When we have strict inequality in the conservation condi-tion (24) (the exponential case) the solution takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le ICkPj(ρβ) k isinK k+ j gt I

In the case of equality in (24) (the polynomial case) the correspondingform is

πlowastkj =

DkPj(β)Wk+j k isinK j + k le I0 k+ j gt I

As for empty initial conditions ρ may be associated with the conserva-tion condition The Wi correspond to the terminal constraints ωi and theCkDk are normalization constants The constants Ck ρ Wi and Di canall be computed numerically using Lagrangian methods for constrained op-timization (see eg [4]) Given these constants the optimizing trajectoryγ(t) may be constructed explicitly This construction may be most simplyexpressed in terms of the minimizing trajectories in the empty case Fork isinK denote the mean of πlowast(k) by βk =

suminfinj=0 jπ

lowastkj and define the terminal

condition ω(k) isin SIminusk by ωkj = πlowastkj for j = 0 Iminusk Then the constraints(1 ω(k) βk) are feasible constraints and Theorem 26 determines the associ-ated least cost paths which we denote γ(k) = γkj(x) The least cost pathsfor the subproblems combine to form the overall least cost path

Theorem 28 (Globally minimizing path general case) For irreducible

feasible constraints (αωβ) let πlowast isin S|K|infin be the unique minimizing distri-

bution in Theorem 27 and let the functions γkj [0 βk]rarr [01] be the min-imizing paths corresponding to the subproblems (1 ω(k) βk) The infimum

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 15

of J(γ) over A(αωβ) is achieved on the occupancy path γ isin A(αωβ)defined by

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

In addition γ satisfies the EulerndashLagrange equations

3 Proof of Theorem 22 The purpose of this section is to prove themain large deviations result We recall the processes and notation definedat the beginning of Section 2

For ζ isinRI+2 and γ isin SI define

H(γ ζ) = log(E[exp(ζ middot bi(γ))])

where ldquomiddotrdquo denotes inner product Since the support of bi(γ) is boundeduniformly in i and γ there exists a function h Rrarr R such that H(γ ζ)leh(|ζ|) for all ζ and γ Also since the distribution of bi(γ) is weakly continuousin γ H(γ ζ) is jointly continuous It follows from [10] Theorem 41 thatthe sequence Γn n= 12 satisfies a large deviation upper bound witha rate function J which we now define Let L be the LegendrendashFencheltransform of H(γ ζ) in ζ

L(γ η) = supζisinRI+2

[ζ middot ηminus H(γ ζ)]

If γ(x)0 le x le β is an absolutely continuous function that takes valuesin SI then

J(γ) =

int β

0L(γ(x) γ(x))dx

If γ is not absolutely continuous then J(γ) = infin ([10] assumes that thevector fields bi(γ) are defined for all γ isin R

I+2 It is easy to check that wecan extend the definition to this set with the bound H(γ ζ) le h(|ζ|) andthe continuity of H(γ ζ) preserved However if the process Γn starts in SI then it stays in SI and so the exact form of the extension has no effect onthe rate function)

We recall the definition of the matrix M given in (21) To complete theproof of the upper bound we must show that J = J where J is defined asin (22) This will hold if we can show that L(γ η) is finite only when η =Mθfor a unique probability vector θ and that in this case

L(γMθ) =D(θγ)

16 P DUPUIS C NUZMAN AND P WHITING

It is well known that L(γ η) is finite if and only if η is in the convexhull of the support of bi(γ) (see eg [11] Lemma 623(d)) Therefore ifL(γ η)ltinfin then η can be written as a convex combination of the form

θ0(e1 minus e0) + middot middot middot+ θI(eI+1 minus eI)

where θj ge 0 andsumI

j=0 θj le 1 Since the vectors (ej+1 minus ej) j = 01 I

are linearly independent these values are unique Setting θI+1 = 1minussumI

j=0 θj we have η =Mθ for a unique probability vector θ Now assume that η takesthis form Then

L(γMθ) = supζisinRI+2

[ζ middotMθminus log(E[exp(ζ middot bi(γ))])]

= supζisinRI+2

[

MT ζ middot θminus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

= supζisinRI+2

[

Isum

j=0

(ζj+1 minus ζj)θj minus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

Given any values micro0 microI+1 we can define ζ0 ζI+1 recursively by ζ0 =minusmicroI+1

and ζj+1minus ζj = microjminusmicroI+1 With these definitions it is apparent that the lastdisplay is equal to

supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1)minus log

([

Isum

j=0

γj exp(microj minus microI+1)

]

+ γI+1

)]

= supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1) + microI+1 minus log

(

I+1sum

j=0

γj expmicroj

)]

= supmicroisinRI+2

[

I+1sum

j=0

microjθj minus log

(

I+1sum

j=0

γj expmicroj

)]

According to the DonskerndashVaradhan variational formula for relative entropy(eg [11] Lemma 143(a)) the last display equals D(θγ) thereby com-pleting the proof of the upper bound

We turn now to the proof of the lower bound In this proof we will as-sume that γ0(0)gt 0 Since γ0 is always nonincreasing γ0(0) = 0 implies thatγ0(x) = 0 for all x and therefore under this condition the first componentplays no significant role A proof analogous to the one given below applieswhen γ0(0) = 0 where the role of γ0(0) here is played by the first positivecomponent of γ(0)

Let PΓn(0) [resp EΓn(0)] denote probability (resp expected value) givena deterministic initial occupancy Γn(0) To prove the large deviation lower

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 17

bound it suffices to show that given any ε gt 0 and δ gt 0 there is η gt 0 suchthat for any initial occupancies satisfying |Γn(0)minus γ(0)|lt η

lim infnrarrinfin

1

nlogPΓn(0)

(

sup0lexleβ

|Γn(x)minus γ(x)|lt δ

)

geminusJ(γ)minus ε(31)

Of course this inequality is trivial if J(γ) = infin and so we assume thatJ(γ)ltinfin

As we have remarked a source of difficulty is the singular behavior ofthe transition rates of the process when Γn is near the boundary of SI We first show that this can be avoided at all times save x= 0 To do thiswe show that for any a gt 0 there exist b gt 0 K isin N and an occupancyfunction y such that y(0) = γ(0) sup0lexleβ |y(x) minus γ(x)| lt a yj(x) gt bxK

for all j = 01 I I+ and 0lt xle β and such that

J(y)le J(γ)

Consider the zero cost trajectory defined by

z(x) =Mz(x) z(0) = γ(0)

We have the following expression for zj when j le I

zj(x) =

[ jsum

k=0

γk(0)x(jminusk)(j minus k)

]

eminusx(32)

It is easy to check from this explicit formula that zj(x) gt bxK for someb gt 0 K = I and all j = 01 I I+ and 0 lt x le β For ρ isin (01) letyρ = ρz + (1minus ρ)γ Then yρ is the occupancy function that corresponds tothe rate ρz + (1minus ρ)θ Using the joint convexity of relative entropy in bothvariables ([11] Lemma 143(b)) and the fact that D(zz) = 0 we have

J(yρ) =

int β

0D(ρz(x) + (1minus ρ)θ(x)ρz(x) + (1minus ρ)γ(x))dx

le ρ

int β

0D(z(x)z(x)) dx+ (1minus ρ)

int β

0D(θ(x)γ(x))dx

= (1minus ρ)J(γ)

All required properties are then obtained by letting y = yρ for suitably smallρ isin (01)

It follows that in proving the lower bound we can assume without loss ofgenerality that for some fixed constants b gt 0 and K isinN γi(x)ge bxK for allx isin [0 β] We now return to the proof of the lower bound Our first objectiveis to show that the process can be moved into a small neighborhood of γ(τ)

18 P DUPUIS C NUZMAN AND P WHITING

(for τ gt 0 small) with sufficiently high probability Given τ isin (0 β] σ gt 0and ε gt 0 define

h(γ) =

0 |γ minus γ(τ)|lt σ22ε else

For n large enough that |γ(lfloornτrfloorn) minus γ(τ)| le σ2 (and independent ofτ isin (0 β]) we have the inequality

PΓn(0)(|Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ) + eminus2nε

ge PΓn(0)(|Γn(lfloornτrfloorn)minus γ(τ)|lt σ2) + eminus2nε

geEΓn(0)(expminusnh(Γn(lfloornτrfloorn)))

We next exploit a representation for exponential integrals that will give usan explicit lower bound on the last quantity Consider a process Γn(x) con-structed as follows The process dynamics are of the same general structureas those of Γn save that bi(Γ

n(in)) is replaced by a sequence bni

Γn(

i+1

n

)

= Γn(

i

n

)

+1

nbni Γn(0) = Γn(0)

Furthermore the distribution of bni is allowed to depend in any measurableway upon the set of values Γn(jn)0le j le i Let microγ denote the distribu-tion of bi(γ) and (without explicitly exhibiting all the dependencies) let micronidenote the (random) distribution of bni given Γn(jn)0 le j le i We letEΓn(0) denote expectation on the space that supports these processes Itfollows from [11] Theorem 431 that

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

= inf EΓn(0)

[

h

(

Γn(

lfloornτrfloor

n

))

+1

n

lfloornτrfloorminus1sum

i=1

D(microni microΓn(in))

]

where the infimum is over all such processes Γn(x) In order to obtain a lowerbound we now simply insert a particular choice for the random variables bni We can write γ(τ) minus γ(0) =Mvτ for some probability vector v Define aprocess bni as follows

bni =

ej minus ejminus1 if

lfloor jminus1sum

k=0

vkτn

rfloor

le ile

lfloor jsum

k=0

vkτn

rfloor

minus 1 for 0le j le I

0 if

lfloor

Isum

k=0

vkτn

rfloor

le ile lfloorτnrfloor minus 1

In other words bni defines a deterministic discrete time approximation tothe continuous time occupancy rate process that uses e0 minus e1 for an amount

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 19

of time v0τ e1 minus e2 for an amount of time v1τ and so on This continuoustime process will move the occupancy process from γ(0) to γ(τ) at time τ If ej minus ejminus1 is used at the discrete time step i then since microni concentratesits mass on ej minus ejminus1 the cost is

D( microni microΓn(in)) = log

(

1

Γnjminus1(in)

)

=minus log

(

Γnjminus1

(

i

n

))

(33)

The process Γn(in) possesses important monotonicity and convergence

properties Since Γnjminus1((i+1)n)minus Γn

jminus1(in) =minus1n for lfloorsumjminus1

k=0 vkτnrfloor le ile

lfloorsumj

k=0 vkτnrfloorminus 1

Γnjminus1

(

i

n

)

darr Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

as i uarr lfloorsumj

k=0 vkτnrfloor In addition because the (j minus 1)st component is never

modified when ige lfloorsumj

k=0 vkτnrfloor it follows that

Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

rarr γjminus1(τ)

as nrarrinfin and ηrarr 0 Furthermore as observed previously (32) implies theexistence of b gt 0 and K isinN such that γj(τ)ge bτK for j = 0 I Thus atany given time step i we have a strictly positive lower bound on the relevantcomponent of Γn which in turn provides a strictly finite upper bound on thecorresponding relative entropy cost Indeed it follows from (33) and γj(τ)gebτK that for all sufficiently large n and small η gt 0 there are C1C2 ltinfin(and independent of τ ) such that whenever |Γn(0)minus γ(0)|lt η for all i

D(microni microΓn(in))leC1[minus log τK ]leminusC2 log τ

In addition as nrarrinfin and ηrarr 0

Γn(

1

nlfloorτnrfloor

)

rarr γ(τ)

By the Lebesgue dominated convergence theorem for all sufficiently large nand sufficiently small η gt 0 |Γn(0)minus γ(0)|lt η implies

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

leminusC2τ log τ

We now choose τ gt 0 so that minusC2τ log τ le ε2 Choosing τ gt 0 smaller ifneed be we can also guarantee that |Γn(x)minusγ(x)| le δ for all x isin [0 τ ] wp1if |Γn(0)minus γ(0)| lt η and η gt 0 is sufficiently small The following bound is

20 P DUPUIS C NUZMAN AND P WHITING

therefore valid for the given τ gt 0 for any σ gt 0 and all sufficiently smallη gt 0 |Γn(0)minus γ(0)|lt η implies

lim infnrarrinfin

1

nlogPΓn(0)

(∣

Γn(

lfloornτrfloor

n

)

minus γ

(

lfloornτrfloor

n

)∣

ltσ

supxisin[0τ ]

|Γn(x)minus γ(x)|lt δ

)

geminusε

2

Note that the asymptotic lower bound on the normalized log of the proba-bility is independent of σ gt 0 To obtain the lower bound for all x isin [0 β]we will use the Markov property and an existing lower bound for pathswhich avoid that boundary This latter lower bound will hold uniformly ina neighborhood of the initial condition γ(τ) Since we do not know a priorihow small this neighborhood must be it is important that the lower boundin the last display should be independent of σ gt 0

Now choose ζ isin (0 δ] such that γ(x) is at least distance 2ζ from theboundary of SI for all x isin [τ β] Recall that when considered as a func-tion of γ the distribution of bi(γ) is continuous in the weak topology andmoreover that the support of this distribution is independent of γ so longas γi gt 0 for all i isin 01 I + 1 (ie γ isin S

I where SI denotes interior

relative to the smallest affine space that contains SI) It then follows fromProposition 661 of [11] (see also the discussion on [11] page 165 regardinguniformity) that Γn n= 12 satisfies the following uniform large devi-ations lower bound given any ε gt 0 and ζ gt 0 defined above there is σ gt 0such that as long as |Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ

lim infnrarrinfin

1

nlogPΓn(lfloornτrfloorn)lfloornτrfloorn

(

supτlexleβ

|Γn(x)minus γ(x)|lt ζ

)

geminusJ(γ)minusε

2

where PΓn(lfloornτrfloorn)lfloornτrfloorn denotes probability given the occupancy levels Γn(lfloornτrfloorn)at time lfloornτrfloorn Proposition 661 of [11] assumes Condition 632 It is worthnoting that in the present setting this condition holds with the particularlysimple choice β = γ (using the notation of [11])

The lower bound (31) now follows by the Markov property and the lasttwo displays The proof that J has compact level sets is as in [10] andtherefore omitted

4 Examples and extensions In this section we apply the results of theprevious sections to three different occupancy problems We show how theparameters of interest may be computed numerically and plot the solutionsto the associated calculus of variations problems In Section 44 we list someother asymptotic problems of interest which can be solved by relativelystraightforward generalizations of the results presented in this paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 21

In the calculus of variations problems solved in Section 2 precise initialand terminal points were always given In typical applications one is inter-ested in the minimum value of the rate function over a constraint set Whenthe constraint set is sufficiently simple such problems may still be solvedeasily using the tools provided in Section 22 The three problems of thissection are of this type

Suppose that the event of interest is that the random endpoint Γn(β)should lie in a terminal constraint set Ω To apply the LDP for Γn(β)established by Corollary 23 one must compute exponents of the form

J (Ω) = infωisinΩ

J (ω)

Using Theorem 27 we can write

J (Ω) = infωisinΩ

infπisinF (αωβ)

Ksum

k=0

αkD(π(k)P(β))

(41)

= infπisinF (αΩβ)

Ksum

k=0

αkD(π(k)P(β))

where we have abused notation to define F (αΩ β) =⋃

ωisinΩF (αωβ)In many cases for example when the terminal set Ω is convex and defined

by linear constraints the exponent J can be computed directly from (41)using Lagrange multipliers That is one solves minimization problems of thetype given in Theorem 27 but with the endpoint constraints (210) replacedby constraints defining Ω This is the approach used in the second andthird example below We do not prove that appropriate Lagrange multipliersalways exist if needed existence may be established using methods similarto those used in the Appendix for the case Ω= ω Because of the convexityof relative entropy in (41) a local minimum is always a global minimum overconvex sets Hence in any particular scenario with convex Ω it is sufficientto establish a local minimum by numerically computing a set of Lagrangemultipliers

An alternative approach for computing J (Ω) could be to return to thesample path level and use natural boundary conditions on the extremalcurves (see [25])

A set Ω with interior Ω and closure Ω is a J -continuity set if and onlyif infωisinΩ J (ω) = infωisinΩJ (ω) For such sets the large deviations lower andupper bounds coincide so that minus lim1n logP (Γn(β) isinΩ) = infωisinΩJ (ω) Itmay readily be verified in each of the following examples that the event ofinterest is a J -continuity set

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 8: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

8 P DUPUIS C NUZMAN AND P WHITING

Theorem 22 Suppose that the sequence of initial conditions Γn(0)n = 12 is deterministic and that it converges to α isin SI as nrarr infinThen the sequence of processes Γn n= 12 satisfies the LDP with ratefunction J In other words for any measurable set A of trajectories we havethe large deviation lower bound

lim infnrarrinfin

1

nlogPΓn isinA geminus infJ(γ) γ isinA γ(0) = α

and the large deviation upper bound

lim sup1

nlogPΓn isinA le minus infJ(γ) γ isin A γ(0) = α

and moreover for any compact set of initial conditions K and C ltinfinthe set

γ J(γ)leCγ(0) isinK

is compact

The proof of this result is provided in Section 3 However a formal justi-fication for the form of the rate function is as follows Let δ gt 0 be a smalltime increment Owing to the fact that Γn varies slowly when comparedto bi(γ) one expects

P

(

Γn(δ)minus Γn(0)

δasymp η

∣Γn(0) = γ

)

asymp P

(

1

lfloornδrfloorminus1sum

j=0

bj(γ)asymp η

)

where the symbol ldquoasymprdquo inside the probability means that the indicated quan-tities are within a small fixed constant ε gt 0 of each other Suppose that weinterpret γ as a probability measure on e0 eI+1 and let Yj be in-dependent and identically distributed (iid) with distribution γ Then thesequence of iid random vectors bj(γ) can be realized by setting

bj(γ) =

ei+1 minus ei0

lArrrArr Yj =

eieI+1

that is

bj(γ) =MYj

By Sanovrsquos theorem for any probability vector θ isin SI

P

(

1

lfloornδrfloorminus1sum

j=0

Yj asymp θ

)

asymp expminusnδD(θγ)

Thus

P

(

Γn(δ)minus Γn(0)

δasympMθ

∣Γn(0) = γ

)

asymp expminusnδD(θγ)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 9

Approximating an arbitrary trajectory by a piecewise linear trajectory withnearly equal cost and using the Markov property one expects

P (Γn asymp γ|Γn(0) = α)asymp expminusn

int β

0D(θ(t)γ(t))dt

where θ and γ are related by γ =Mθ γ(0) = α Thus the rate function onpath space should be J(γ)

The zero cost trajectories are of course the law of large numbers limitsand can easily be computed For example if α = (10 ) (so all cells areinitially empty) and i le I then J(γ) = 0 implies that γi(t) = eminusttii Inother words γ(t) is the Poisson distribution with mean t save that all masscorresponding to i gt I is collected together into the state I+1 Throughoutthis paper we will denote the Poisson distribution with mean t by P(t)where Pi(t) = eminusttii

We are primarily interested in the distribution of Γn(β) The contractionprinciple (eg [11] Theorem 132) implies the following variational repre-sentation for the rate function for Γn(β) n= 12 Let A(αωβ) denotethe set of valid occupancy paths γ on [0 β] satisfying γ(0) = α and γ(β) = ω

Corollary 23 Suppose that the sequence of initial conditions Γn(0)n= 12 is deterministic and that it converges to α as nrarrinfin Then thesequence of random vectors Γn(β) n= 12 satisfies the LDP with the con-vex rate function

J (ω) = infJ(γ) γ isinA(αωβ)

Remark 21 Using the explicit formula for J (ω) stated in Theorem 27and the convexity of relative entropy in its first argument it follows that J isin fact strictly convex

Remark 22 It is sometimes useful to show that the large deviationlower bound holds for sets with no interior relative to the ambient spaceSuch bounds can often be proved for processes such as ours that take valuesin a discrete lattice An example would be a set Asub γ isin SI γI+ = 0 forwhich the minimizing trajectories must be polynomial extremals (definedafter Theorem 26) Although we do not need such results in the presentpaper it is worth observing that lower bounds of this kind can be proved

22 Characterization of the terminal rate function and minimizing pathsThe results of this section were obtained using calculus of variations tech-niques the details of which are given in the Appendix The presentationbegins with the case in which all urns are initially empty because it appears

10 P DUPUIS C NUZMAN AND P WHITING

in many applications and because it is a building block for the general caseIn each case first we give a characterization of J (ω) as a minimal relativeentropy which may be computed explicitly in this form We then give anexplicit functional form for the sample path minimizer The function formconstain parameters determined by the solutions to fixed point equations

Before stating these theorems it is helpful to specify the domain overwhich J (ω) is finite We define an endpoint constraint to be a triple (αωβ)where α ω isin SI are the initial and terminal occupancies and where β gt 0is the number of balls thrown per urn We assume without loss that α0 gt 0that is that some fraction of urns are initially empty and we denote theindex set of positive initial occupancies by K= k αk gt 0 Since we do notdistinguish between urns having more than I balls we can always supposethat no urns initially have more than I+1 balls and denote the last elementof α by αI+1 The last element of ω is denoted ωI+ signifying that it collectsall urns with occupancy greater than or equal to I +1

Definition 21 A endpoint constraint (αωβ) is feasible if the corre-sponding set of valid occupancy paths A(αωβ) is nonempty

Lemma 24 An endpoint constraint (αωβ) is feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(23)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

k=0

kαk + β (conservation)(24)

This lemma is proved in the Appendix Condition (23) relates to the factthat the ψi(x) must decrease monotonically and (24) to a conservation con-straint for the number of balls thrown The right-hand side of the inequalityequals the initial number of balls per urn plus the additional balls per urnthrown up to time β while the left-hand side is a lower bound on the numberof balls per urn at time β

In the treatment of general initial conditions the following further defi-nition will be useful

Definition 22 A set of feasible constraints (αωβ) is irreducible ifthe monotonicity conditions (23) hold with strict inequality for all i lt I Otherwise the constraints are termed reducible

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 11

For a reducible set of constraints let i be the first index such thatsumi

k=0αk =sumi

k=0ωk This condition can only be met if no balls are everthrown into i-occupied urns and it follows that urns which initially con-tain i balls or less will always contain i balls or less Thus these urns maybe treated in isolation from the urns containing more than i balls Further-more by subtracting i+ 1 balls from each urn in this latter set we obtainanother occupancy problem in standard form that is with α0 gt 0 Contin-uing in this way a given problem with constraints (αωβ) may be dividedinto a finite number of isolated irreducible subproblems It is only neces-sary therefore to treat problems with irreducible constraints and wherevernecessary we suppose this to be the case

221 Empty initial conditions An important special case of the initialconstraint is when all urns are initially empty that is α0 = 1 and αi = 0 fori gt 0 We abuse notation and denote this case by α= 1

Define the set F (1 ω β) to be the set of distributions π on the nonnegativeintegers satisfying πi = ωi for i= 0 I and the constraint

infinsum

i=0

iπi = β

In the empty case the conditions for feasibility of (1 ω β) reduce tosumI

i=0 iωi+(I+1)ωI+ le β from which it follows that the set F is nonempty if and onlyif (1 ω β) is feasible As we now discuss the infimum of the rate function Jover paths A(1 ω β) can be represented as an infimum of a relative entropyover distributions in F (1 ω β)

Theorem 25 (Terminal rate function empty case) Given the initialcondition α = 1 the rate function J (ω) SI rarr R

+ defined in Corollary 23may be determined as

J (ω) = minπisinF (1ωβ)

D(πP(β))

if (1 ω β) is feasible and is otherwise infinite The minimizing argumentπlowast isin F (1 ω β) is unique

The above expression is of interest in its own right Moreover the optimalsolution πlowast can be computed explicitly Using Lagrange multipliers onecan show in fact that the solution takes the form πlowasti = CPi(ρβ) for i gt Ifor some constants C ge 0 and ρ gt 0 that we refer to as twist parametersHere ρ is related to the Lagrange multiplier for the conservation constraintsuminfin

i=0 iπlowasti = β while C is a normalization constant ensuring that

suminfini=0 π

lowasti = 1

These two constraints may be solved to determine ρ and C If we have the

12 P DUPUIS C NUZMAN AND P WHITING

strict equalitysumI

i=0 iωi + (I + 1)ωI+ = β there are just enough balls to meetthe terminal constraints and we may replace I by I+1 if necessary to ensurethat ωI+ = 1minus

sumIi=0ωi = 0 In this case it turns out that F (1 ω β) has only

one element we then have C = 0 and we may take ρ = 1 Otherwise wedefine ρ to be the unique positive root of the equation

ρβ minussumI

i=0 iPi(ρβ)

1minussumI

i=0Pi(ρβ)=β minus

sumIi=0 iωi

1minussumI

i=0ωi

(25)

Because of the strict inequality in the conservation condition (24) the right-hand side of the last equation is strictly greater than I+1 The left-hand sideof the equation is the conditional mean E[Y |Y gt I] of a Poisson randomvariable Y with mean ρβ As a function of ρ this conditional mean is astrictly monotonic and continuous map from (0infin) to (I+1infin) and hencethe equation has exactly one positive root C is given by

C =1minus

sumIi=0ωi

1minussumI

i=0Pi(ρβ)=

β minussumI

i=0 iωi

ρβ minussumI

i=0 iPi(ρβ)(26)

Evaluating the relative entropy of πlowast and P(β) the rate function of Γn(β)can be expressed as

J (ω) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

(27)

+

(

β minusIsum

i=0

iωi

)

log ρ

The next theorem shows that the least cost path γlowast satisfying J(γlowast) =J (ω) can also be expressed explicitly in terms of the twist parameters Cand ρ The least cost path is of interest for a number of reasons Firstof all the proof of Theorem 25 is obtained by evaluating (22) using theexplicit form of γlowast In a similar way the least cost paths may be used as atool in other problems of interest such as for determining the rate functionof Γn(β) when β is random Second the least cost path provides insightinto the expected behavior of occupancy experiments conditioned on theoccurrence of a rare event Third they allow empirical estimation of rareevent probabilities by change-of-measure importance sampling A key stepin proving the minimality of the proposed least cost path is to show thatthe path satisfies the EulerndashLagrange equations (defined in the Appendix)

Theorem 26 (Globally minimizing path empty case) Suppose that(1ωβ) are feasible constraints with empty initial conditions The infimum of

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 13

J(γ) over A(1 ω β) is achieved on the occupancy path γ isinA(1 ω β) definedby

γ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(28)

γi(x) =xi

i(minus1)iγ

(i)0 (x) 1le ile I(29)

γI+(x) = 1minusIsum

i=0

γi(x)

where ρ gt 0 and C ge 0 are twist parameters associated with the constraintsIn addition γ satisfies the EulerndashLagrange equations

Note that the entire path γ(x) is completely determined by the emptycomponent γ0(x) and its derivatives In particular the components γi(x) are

the terms in the Taylor expansion of γ0(x) about x γ0(x+ y) = γ0(x) + yγ(1)0 (x) + middot middot middot

evaluated at time 0 that is with y = minusx Note that γ0(x) is the sum ofa polynomial and a single exponential term so that the Taylor expansionalways exists When C = 0 there is no exponential term and we say that γ isa polynomial extremal Otherwise C gt 0 we have an exponential extremal

222 General initial conditions Occupancy problems with general ini-tial conditions may be thought of as a coupled set of problems with emptyinitial conditions In particular we consider the set of urns initially contain-ing k balls to form a class The evolution of excess balls (beyond k) enteringurns of this class may be denoted by occupancy functions of the form γkj(t)representing the fraction of balls initially having k urns which have k + jurns at time t The fraction of urns containing i balls in the overall system isobtained by summing contributions from all subproblem components withk+ j = i

As in the empty case the rate function J (ω) can be expressed as thesolution to a minimization problem Let Sinfin denote the set of distributionson the nonnegative integers and let Sn

infin denote the set of n-tuples of suchdistributions Recall that K is set of indices k such that αk gt 0 We will

denote an element of S|K|infin by π = π(0) π(k) π(K) where for any

k isin K a component distribution is denoted π(k) = πk0 πk1 and it isunderstood that the corresponding π(k) is omitted if αk = 0 Finally let

F (αβω) be the set of π isin S|K|infin which satisfies the terminal constraints

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(210)

14 P DUPUIS C NUZMAN AND P WHITING

along with the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(211)

As in the empty case it may be established that F is nonempty if and onlyif (αωβ) is feasible

Theorem 27 (Terminal rate function general case) The rate functionJ (ω) SI rarrR

+ defined in Corollary 23 may be expressed

J (ω) = minπisinF (αωβ)

sum

kisinK

αkD(π(k)P(β))

whenever (αβω) are feasible and is infinite otherwise The minimizingargument πlowast isin F (αωβ) is unique

As discussed above we may suppose the constraints are irreducible andthen as shown in the Appendix Lagrange multipliers will always exist forthis problem When we have strict inequality in the conservation condi-tion (24) (the exponential case) the solution takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le ICkPj(ρβ) k isinK k+ j gt I

In the case of equality in (24) (the polynomial case) the correspondingform is

πlowastkj =

DkPj(β)Wk+j k isinK j + k le I0 k+ j gt I

As for empty initial conditions ρ may be associated with the conserva-tion condition The Wi correspond to the terminal constraints ωi and theCkDk are normalization constants The constants Ck ρ Wi and Di canall be computed numerically using Lagrangian methods for constrained op-timization (see eg [4]) Given these constants the optimizing trajectoryγ(t) may be constructed explicitly This construction may be most simplyexpressed in terms of the minimizing trajectories in the empty case Fork isinK denote the mean of πlowast(k) by βk =

suminfinj=0 jπ

lowastkj and define the terminal

condition ω(k) isin SIminusk by ωkj = πlowastkj for j = 0 Iminusk Then the constraints(1 ω(k) βk) are feasible constraints and Theorem 26 determines the associ-ated least cost paths which we denote γ(k) = γkj(x) The least cost pathsfor the subproblems combine to form the overall least cost path

Theorem 28 (Globally minimizing path general case) For irreducible

feasible constraints (αωβ) let πlowast isin S|K|infin be the unique minimizing distri-

bution in Theorem 27 and let the functions γkj [0 βk]rarr [01] be the min-imizing paths corresponding to the subproblems (1 ω(k) βk) The infimum

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 15

of J(γ) over A(αωβ) is achieved on the occupancy path γ isin A(αωβ)defined by

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

In addition γ satisfies the EulerndashLagrange equations

3 Proof of Theorem 22 The purpose of this section is to prove themain large deviations result We recall the processes and notation definedat the beginning of Section 2

For ζ isinRI+2 and γ isin SI define

H(γ ζ) = log(E[exp(ζ middot bi(γ))])

where ldquomiddotrdquo denotes inner product Since the support of bi(γ) is boundeduniformly in i and γ there exists a function h Rrarr R such that H(γ ζ)leh(|ζ|) for all ζ and γ Also since the distribution of bi(γ) is weakly continuousin γ H(γ ζ) is jointly continuous It follows from [10] Theorem 41 thatthe sequence Γn n= 12 satisfies a large deviation upper bound witha rate function J which we now define Let L be the LegendrendashFencheltransform of H(γ ζ) in ζ

L(γ η) = supζisinRI+2

[ζ middot ηminus H(γ ζ)]

If γ(x)0 le x le β is an absolutely continuous function that takes valuesin SI then

J(γ) =

int β

0L(γ(x) γ(x))dx

If γ is not absolutely continuous then J(γ) = infin ([10] assumes that thevector fields bi(γ) are defined for all γ isin R

I+2 It is easy to check that wecan extend the definition to this set with the bound H(γ ζ) le h(|ζ|) andthe continuity of H(γ ζ) preserved However if the process Γn starts in SI then it stays in SI and so the exact form of the extension has no effect onthe rate function)

We recall the definition of the matrix M given in (21) To complete theproof of the upper bound we must show that J = J where J is defined asin (22) This will hold if we can show that L(γ η) is finite only when η =Mθfor a unique probability vector θ and that in this case

L(γMθ) =D(θγ)

16 P DUPUIS C NUZMAN AND P WHITING

It is well known that L(γ η) is finite if and only if η is in the convexhull of the support of bi(γ) (see eg [11] Lemma 623(d)) Therefore ifL(γ η)ltinfin then η can be written as a convex combination of the form

θ0(e1 minus e0) + middot middot middot+ θI(eI+1 minus eI)

where θj ge 0 andsumI

j=0 θj le 1 Since the vectors (ej+1 minus ej) j = 01 I

are linearly independent these values are unique Setting θI+1 = 1minussumI

j=0 θj we have η =Mθ for a unique probability vector θ Now assume that η takesthis form Then

L(γMθ) = supζisinRI+2

[ζ middotMθminus log(E[exp(ζ middot bi(γ))])]

= supζisinRI+2

[

MT ζ middot θminus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

= supζisinRI+2

[

Isum

j=0

(ζj+1 minus ζj)θj minus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

Given any values micro0 microI+1 we can define ζ0 ζI+1 recursively by ζ0 =minusmicroI+1

and ζj+1minus ζj = microjminusmicroI+1 With these definitions it is apparent that the lastdisplay is equal to

supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1)minus log

([

Isum

j=0

γj exp(microj minus microI+1)

]

+ γI+1

)]

= supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1) + microI+1 minus log

(

I+1sum

j=0

γj expmicroj

)]

= supmicroisinRI+2

[

I+1sum

j=0

microjθj minus log

(

I+1sum

j=0

γj expmicroj

)]

According to the DonskerndashVaradhan variational formula for relative entropy(eg [11] Lemma 143(a)) the last display equals D(θγ) thereby com-pleting the proof of the upper bound

We turn now to the proof of the lower bound In this proof we will as-sume that γ0(0)gt 0 Since γ0 is always nonincreasing γ0(0) = 0 implies thatγ0(x) = 0 for all x and therefore under this condition the first componentplays no significant role A proof analogous to the one given below applieswhen γ0(0) = 0 where the role of γ0(0) here is played by the first positivecomponent of γ(0)

Let PΓn(0) [resp EΓn(0)] denote probability (resp expected value) givena deterministic initial occupancy Γn(0) To prove the large deviation lower

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 17

bound it suffices to show that given any ε gt 0 and δ gt 0 there is η gt 0 suchthat for any initial occupancies satisfying |Γn(0)minus γ(0)|lt η

lim infnrarrinfin

1

nlogPΓn(0)

(

sup0lexleβ

|Γn(x)minus γ(x)|lt δ

)

geminusJ(γ)minus ε(31)

Of course this inequality is trivial if J(γ) = infin and so we assume thatJ(γ)ltinfin

As we have remarked a source of difficulty is the singular behavior ofthe transition rates of the process when Γn is near the boundary of SI We first show that this can be avoided at all times save x= 0 To do thiswe show that for any a gt 0 there exist b gt 0 K isin N and an occupancyfunction y such that y(0) = γ(0) sup0lexleβ |y(x) minus γ(x)| lt a yj(x) gt bxK

for all j = 01 I I+ and 0lt xle β and such that

J(y)le J(γ)

Consider the zero cost trajectory defined by

z(x) =Mz(x) z(0) = γ(0)

We have the following expression for zj when j le I

zj(x) =

[ jsum

k=0

γk(0)x(jminusk)(j minus k)

]

eminusx(32)

It is easy to check from this explicit formula that zj(x) gt bxK for someb gt 0 K = I and all j = 01 I I+ and 0 lt x le β For ρ isin (01) letyρ = ρz + (1minus ρ)γ Then yρ is the occupancy function that corresponds tothe rate ρz + (1minus ρ)θ Using the joint convexity of relative entropy in bothvariables ([11] Lemma 143(b)) and the fact that D(zz) = 0 we have

J(yρ) =

int β

0D(ρz(x) + (1minus ρ)θ(x)ρz(x) + (1minus ρ)γ(x))dx

le ρ

int β

0D(z(x)z(x)) dx+ (1minus ρ)

int β

0D(θ(x)γ(x))dx

= (1minus ρ)J(γ)

All required properties are then obtained by letting y = yρ for suitably smallρ isin (01)

It follows that in proving the lower bound we can assume without loss ofgenerality that for some fixed constants b gt 0 and K isinN γi(x)ge bxK for allx isin [0 β] We now return to the proof of the lower bound Our first objectiveis to show that the process can be moved into a small neighborhood of γ(τ)

18 P DUPUIS C NUZMAN AND P WHITING

(for τ gt 0 small) with sufficiently high probability Given τ isin (0 β] σ gt 0and ε gt 0 define

h(γ) =

0 |γ minus γ(τ)|lt σ22ε else

For n large enough that |γ(lfloornτrfloorn) minus γ(τ)| le σ2 (and independent ofτ isin (0 β]) we have the inequality

PΓn(0)(|Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ) + eminus2nε

ge PΓn(0)(|Γn(lfloornτrfloorn)minus γ(τ)|lt σ2) + eminus2nε

geEΓn(0)(expminusnh(Γn(lfloornτrfloorn)))

We next exploit a representation for exponential integrals that will give usan explicit lower bound on the last quantity Consider a process Γn(x) con-structed as follows The process dynamics are of the same general structureas those of Γn save that bi(Γ

n(in)) is replaced by a sequence bni

Γn(

i+1

n

)

= Γn(

i

n

)

+1

nbni Γn(0) = Γn(0)

Furthermore the distribution of bni is allowed to depend in any measurableway upon the set of values Γn(jn)0le j le i Let microγ denote the distribu-tion of bi(γ) and (without explicitly exhibiting all the dependencies) let micronidenote the (random) distribution of bni given Γn(jn)0 le j le i We letEΓn(0) denote expectation on the space that supports these processes Itfollows from [11] Theorem 431 that

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

= inf EΓn(0)

[

h

(

Γn(

lfloornτrfloor

n

))

+1

n

lfloornτrfloorminus1sum

i=1

D(microni microΓn(in))

]

where the infimum is over all such processes Γn(x) In order to obtain a lowerbound we now simply insert a particular choice for the random variables bni We can write γ(τ) minus γ(0) =Mvτ for some probability vector v Define aprocess bni as follows

bni =

ej minus ejminus1 if

lfloor jminus1sum

k=0

vkτn

rfloor

le ile

lfloor jsum

k=0

vkτn

rfloor

minus 1 for 0le j le I

0 if

lfloor

Isum

k=0

vkτn

rfloor

le ile lfloorτnrfloor minus 1

In other words bni defines a deterministic discrete time approximation tothe continuous time occupancy rate process that uses e0 minus e1 for an amount

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 19

of time v0τ e1 minus e2 for an amount of time v1τ and so on This continuoustime process will move the occupancy process from γ(0) to γ(τ) at time τ If ej minus ejminus1 is used at the discrete time step i then since microni concentratesits mass on ej minus ejminus1 the cost is

D( microni microΓn(in)) = log

(

1

Γnjminus1(in)

)

=minus log

(

Γnjminus1

(

i

n

))

(33)

The process Γn(in) possesses important monotonicity and convergence

properties Since Γnjminus1((i+1)n)minus Γn

jminus1(in) =minus1n for lfloorsumjminus1

k=0 vkτnrfloor le ile

lfloorsumj

k=0 vkτnrfloorminus 1

Γnjminus1

(

i

n

)

darr Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

as i uarr lfloorsumj

k=0 vkτnrfloor In addition because the (j minus 1)st component is never

modified when ige lfloorsumj

k=0 vkτnrfloor it follows that

Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

rarr γjminus1(τ)

as nrarrinfin and ηrarr 0 Furthermore as observed previously (32) implies theexistence of b gt 0 and K isinN such that γj(τ)ge bτK for j = 0 I Thus atany given time step i we have a strictly positive lower bound on the relevantcomponent of Γn which in turn provides a strictly finite upper bound on thecorresponding relative entropy cost Indeed it follows from (33) and γj(τ)gebτK that for all sufficiently large n and small η gt 0 there are C1C2 ltinfin(and independent of τ ) such that whenever |Γn(0)minus γ(0)|lt η for all i

D(microni microΓn(in))leC1[minus log τK ]leminusC2 log τ

In addition as nrarrinfin and ηrarr 0

Γn(

1

nlfloorτnrfloor

)

rarr γ(τ)

By the Lebesgue dominated convergence theorem for all sufficiently large nand sufficiently small η gt 0 |Γn(0)minus γ(0)|lt η implies

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

leminusC2τ log τ

We now choose τ gt 0 so that minusC2τ log τ le ε2 Choosing τ gt 0 smaller ifneed be we can also guarantee that |Γn(x)minusγ(x)| le δ for all x isin [0 τ ] wp1if |Γn(0)minus γ(0)| lt η and η gt 0 is sufficiently small The following bound is

20 P DUPUIS C NUZMAN AND P WHITING

therefore valid for the given τ gt 0 for any σ gt 0 and all sufficiently smallη gt 0 |Γn(0)minus γ(0)|lt η implies

lim infnrarrinfin

1

nlogPΓn(0)

(∣

Γn(

lfloornτrfloor

n

)

minus γ

(

lfloornτrfloor

n

)∣

ltσ

supxisin[0τ ]

|Γn(x)minus γ(x)|lt δ

)

geminusε

2

Note that the asymptotic lower bound on the normalized log of the proba-bility is independent of σ gt 0 To obtain the lower bound for all x isin [0 β]we will use the Markov property and an existing lower bound for pathswhich avoid that boundary This latter lower bound will hold uniformly ina neighborhood of the initial condition γ(τ) Since we do not know a priorihow small this neighborhood must be it is important that the lower boundin the last display should be independent of σ gt 0

Now choose ζ isin (0 δ] such that γ(x) is at least distance 2ζ from theboundary of SI for all x isin [τ β] Recall that when considered as a func-tion of γ the distribution of bi(γ) is continuous in the weak topology andmoreover that the support of this distribution is independent of γ so longas γi gt 0 for all i isin 01 I + 1 (ie γ isin S

I where SI denotes interior

relative to the smallest affine space that contains SI) It then follows fromProposition 661 of [11] (see also the discussion on [11] page 165 regardinguniformity) that Γn n= 12 satisfies the following uniform large devi-ations lower bound given any ε gt 0 and ζ gt 0 defined above there is σ gt 0such that as long as |Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ

lim infnrarrinfin

1

nlogPΓn(lfloornτrfloorn)lfloornτrfloorn

(

supτlexleβ

|Γn(x)minus γ(x)|lt ζ

)

geminusJ(γ)minusε

2

where PΓn(lfloornτrfloorn)lfloornτrfloorn denotes probability given the occupancy levels Γn(lfloornτrfloorn)at time lfloornτrfloorn Proposition 661 of [11] assumes Condition 632 It is worthnoting that in the present setting this condition holds with the particularlysimple choice β = γ (using the notation of [11])

The lower bound (31) now follows by the Markov property and the lasttwo displays The proof that J has compact level sets is as in [10] andtherefore omitted

4 Examples and extensions In this section we apply the results of theprevious sections to three different occupancy problems We show how theparameters of interest may be computed numerically and plot the solutionsto the associated calculus of variations problems In Section 44 we list someother asymptotic problems of interest which can be solved by relativelystraightforward generalizations of the results presented in this paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 21

In the calculus of variations problems solved in Section 2 precise initialand terminal points were always given In typical applications one is inter-ested in the minimum value of the rate function over a constraint set Whenthe constraint set is sufficiently simple such problems may still be solvedeasily using the tools provided in Section 22 The three problems of thissection are of this type

Suppose that the event of interest is that the random endpoint Γn(β)should lie in a terminal constraint set Ω To apply the LDP for Γn(β)established by Corollary 23 one must compute exponents of the form

J (Ω) = infωisinΩ

J (ω)

Using Theorem 27 we can write

J (Ω) = infωisinΩ

infπisinF (αωβ)

Ksum

k=0

αkD(π(k)P(β))

(41)

= infπisinF (αΩβ)

Ksum

k=0

αkD(π(k)P(β))

where we have abused notation to define F (αΩ β) =⋃

ωisinΩF (αωβ)In many cases for example when the terminal set Ω is convex and defined

by linear constraints the exponent J can be computed directly from (41)using Lagrange multipliers That is one solves minimization problems of thetype given in Theorem 27 but with the endpoint constraints (210) replacedby constraints defining Ω This is the approach used in the second andthird example below We do not prove that appropriate Lagrange multipliersalways exist if needed existence may be established using methods similarto those used in the Appendix for the case Ω= ω Because of the convexityof relative entropy in (41) a local minimum is always a global minimum overconvex sets Hence in any particular scenario with convex Ω it is sufficientto establish a local minimum by numerically computing a set of Lagrangemultipliers

An alternative approach for computing J (Ω) could be to return to thesample path level and use natural boundary conditions on the extremalcurves (see [25])

A set Ω with interior Ω and closure Ω is a J -continuity set if and onlyif infωisinΩ J (ω) = infωisinΩJ (ω) For such sets the large deviations lower andupper bounds coincide so that minus lim1n logP (Γn(β) isinΩ) = infωisinΩJ (ω) Itmay readily be verified in each of the following examples that the event ofinterest is a J -continuity set

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 9: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 9

Approximating an arbitrary trajectory by a piecewise linear trajectory withnearly equal cost and using the Markov property one expects

P (Γn asymp γ|Γn(0) = α)asymp expminusn

int β

0D(θ(t)γ(t))dt

where θ and γ are related by γ =Mθ γ(0) = α Thus the rate function onpath space should be J(γ)

The zero cost trajectories are of course the law of large numbers limitsand can easily be computed For example if α = (10 ) (so all cells areinitially empty) and i le I then J(γ) = 0 implies that γi(t) = eminusttii Inother words γ(t) is the Poisson distribution with mean t save that all masscorresponding to i gt I is collected together into the state I+1 Throughoutthis paper we will denote the Poisson distribution with mean t by P(t)where Pi(t) = eminusttii

We are primarily interested in the distribution of Γn(β) The contractionprinciple (eg [11] Theorem 132) implies the following variational repre-sentation for the rate function for Γn(β) n= 12 Let A(αωβ) denotethe set of valid occupancy paths γ on [0 β] satisfying γ(0) = α and γ(β) = ω

Corollary 23 Suppose that the sequence of initial conditions Γn(0)n= 12 is deterministic and that it converges to α as nrarrinfin Then thesequence of random vectors Γn(β) n= 12 satisfies the LDP with the con-vex rate function

J (ω) = infJ(γ) γ isinA(αωβ)

Remark 21 Using the explicit formula for J (ω) stated in Theorem 27and the convexity of relative entropy in its first argument it follows that J isin fact strictly convex

Remark 22 It is sometimes useful to show that the large deviationlower bound holds for sets with no interior relative to the ambient spaceSuch bounds can often be proved for processes such as ours that take valuesin a discrete lattice An example would be a set Asub γ isin SI γI+ = 0 forwhich the minimizing trajectories must be polynomial extremals (definedafter Theorem 26) Although we do not need such results in the presentpaper it is worth observing that lower bounds of this kind can be proved

22 Characterization of the terminal rate function and minimizing pathsThe results of this section were obtained using calculus of variations tech-niques the details of which are given in the Appendix The presentationbegins with the case in which all urns are initially empty because it appears

10 P DUPUIS C NUZMAN AND P WHITING

in many applications and because it is a building block for the general caseIn each case first we give a characterization of J (ω) as a minimal relativeentropy which may be computed explicitly in this form We then give anexplicit functional form for the sample path minimizer The function formconstain parameters determined by the solutions to fixed point equations

Before stating these theorems it is helpful to specify the domain overwhich J (ω) is finite We define an endpoint constraint to be a triple (αωβ)where α ω isin SI are the initial and terminal occupancies and where β gt 0is the number of balls thrown per urn We assume without loss that α0 gt 0that is that some fraction of urns are initially empty and we denote theindex set of positive initial occupancies by K= k αk gt 0 Since we do notdistinguish between urns having more than I balls we can always supposethat no urns initially have more than I+1 balls and denote the last elementof α by αI+1 The last element of ω is denoted ωI+ signifying that it collectsall urns with occupancy greater than or equal to I +1

Definition 21 A endpoint constraint (αωβ) is feasible if the corre-sponding set of valid occupancy paths A(αωβ) is nonempty

Lemma 24 An endpoint constraint (αωβ) is feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(23)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

k=0

kαk + β (conservation)(24)

This lemma is proved in the Appendix Condition (23) relates to the factthat the ψi(x) must decrease monotonically and (24) to a conservation con-straint for the number of balls thrown The right-hand side of the inequalityequals the initial number of balls per urn plus the additional balls per urnthrown up to time β while the left-hand side is a lower bound on the numberof balls per urn at time β

In the treatment of general initial conditions the following further defi-nition will be useful

Definition 22 A set of feasible constraints (αωβ) is irreducible ifthe monotonicity conditions (23) hold with strict inequality for all i lt I Otherwise the constraints are termed reducible

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 11

For a reducible set of constraints let i be the first index such thatsumi

k=0αk =sumi

k=0ωk This condition can only be met if no balls are everthrown into i-occupied urns and it follows that urns which initially con-tain i balls or less will always contain i balls or less Thus these urns maybe treated in isolation from the urns containing more than i balls Further-more by subtracting i+ 1 balls from each urn in this latter set we obtainanother occupancy problem in standard form that is with α0 gt 0 Contin-uing in this way a given problem with constraints (αωβ) may be dividedinto a finite number of isolated irreducible subproblems It is only neces-sary therefore to treat problems with irreducible constraints and wherevernecessary we suppose this to be the case

221 Empty initial conditions An important special case of the initialconstraint is when all urns are initially empty that is α0 = 1 and αi = 0 fori gt 0 We abuse notation and denote this case by α= 1

Define the set F (1 ω β) to be the set of distributions π on the nonnegativeintegers satisfying πi = ωi for i= 0 I and the constraint

infinsum

i=0

iπi = β

In the empty case the conditions for feasibility of (1 ω β) reduce tosumI

i=0 iωi+(I+1)ωI+ le β from which it follows that the set F is nonempty if and onlyif (1 ω β) is feasible As we now discuss the infimum of the rate function Jover paths A(1 ω β) can be represented as an infimum of a relative entropyover distributions in F (1 ω β)

Theorem 25 (Terminal rate function empty case) Given the initialcondition α = 1 the rate function J (ω) SI rarr R

+ defined in Corollary 23may be determined as

J (ω) = minπisinF (1ωβ)

D(πP(β))

if (1 ω β) is feasible and is otherwise infinite The minimizing argumentπlowast isin F (1 ω β) is unique

The above expression is of interest in its own right Moreover the optimalsolution πlowast can be computed explicitly Using Lagrange multipliers onecan show in fact that the solution takes the form πlowasti = CPi(ρβ) for i gt Ifor some constants C ge 0 and ρ gt 0 that we refer to as twist parametersHere ρ is related to the Lagrange multiplier for the conservation constraintsuminfin

i=0 iπlowasti = β while C is a normalization constant ensuring that

suminfini=0 π

lowasti = 1

These two constraints may be solved to determine ρ and C If we have the

12 P DUPUIS C NUZMAN AND P WHITING

strict equalitysumI

i=0 iωi + (I + 1)ωI+ = β there are just enough balls to meetthe terminal constraints and we may replace I by I+1 if necessary to ensurethat ωI+ = 1minus

sumIi=0ωi = 0 In this case it turns out that F (1 ω β) has only

one element we then have C = 0 and we may take ρ = 1 Otherwise wedefine ρ to be the unique positive root of the equation

ρβ minussumI

i=0 iPi(ρβ)

1minussumI

i=0Pi(ρβ)=β minus

sumIi=0 iωi

1minussumI

i=0ωi

(25)

Because of the strict inequality in the conservation condition (24) the right-hand side of the last equation is strictly greater than I+1 The left-hand sideof the equation is the conditional mean E[Y |Y gt I] of a Poisson randomvariable Y with mean ρβ As a function of ρ this conditional mean is astrictly monotonic and continuous map from (0infin) to (I+1infin) and hencethe equation has exactly one positive root C is given by

C =1minus

sumIi=0ωi

1minussumI

i=0Pi(ρβ)=

β minussumI

i=0 iωi

ρβ minussumI

i=0 iPi(ρβ)(26)

Evaluating the relative entropy of πlowast and P(β) the rate function of Γn(β)can be expressed as

J (ω) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

(27)

+

(

β minusIsum

i=0

iωi

)

log ρ

The next theorem shows that the least cost path γlowast satisfying J(γlowast) =J (ω) can also be expressed explicitly in terms of the twist parameters Cand ρ The least cost path is of interest for a number of reasons Firstof all the proof of Theorem 25 is obtained by evaluating (22) using theexplicit form of γlowast In a similar way the least cost paths may be used as atool in other problems of interest such as for determining the rate functionof Γn(β) when β is random Second the least cost path provides insightinto the expected behavior of occupancy experiments conditioned on theoccurrence of a rare event Third they allow empirical estimation of rareevent probabilities by change-of-measure importance sampling A key stepin proving the minimality of the proposed least cost path is to show thatthe path satisfies the EulerndashLagrange equations (defined in the Appendix)

Theorem 26 (Globally minimizing path empty case) Suppose that(1ωβ) are feasible constraints with empty initial conditions The infimum of

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 13

J(γ) over A(1 ω β) is achieved on the occupancy path γ isinA(1 ω β) definedby

γ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(28)

γi(x) =xi

i(minus1)iγ

(i)0 (x) 1le ile I(29)

γI+(x) = 1minusIsum

i=0

γi(x)

where ρ gt 0 and C ge 0 are twist parameters associated with the constraintsIn addition γ satisfies the EulerndashLagrange equations

Note that the entire path γ(x) is completely determined by the emptycomponent γ0(x) and its derivatives In particular the components γi(x) are

the terms in the Taylor expansion of γ0(x) about x γ0(x+ y) = γ0(x) + yγ(1)0 (x) + middot middot middot

evaluated at time 0 that is with y = minusx Note that γ0(x) is the sum ofa polynomial and a single exponential term so that the Taylor expansionalways exists When C = 0 there is no exponential term and we say that γ isa polynomial extremal Otherwise C gt 0 we have an exponential extremal

222 General initial conditions Occupancy problems with general ini-tial conditions may be thought of as a coupled set of problems with emptyinitial conditions In particular we consider the set of urns initially contain-ing k balls to form a class The evolution of excess balls (beyond k) enteringurns of this class may be denoted by occupancy functions of the form γkj(t)representing the fraction of balls initially having k urns which have k + jurns at time t The fraction of urns containing i balls in the overall system isobtained by summing contributions from all subproblem components withk+ j = i

As in the empty case the rate function J (ω) can be expressed as thesolution to a minimization problem Let Sinfin denote the set of distributionson the nonnegative integers and let Sn

infin denote the set of n-tuples of suchdistributions Recall that K is set of indices k such that αk gt 0 We will

denote an element of S|K|infin by π = π(0) π(k) π(K) where for any

k isin K a component distribution is denoted π(k) = πk0 πk1 and it isunderstood that the corresponding π(k) is omitted if αk = 0 Finally let

F (αβω) be the set of π isin S|K|infin which satisfies the terminal constraints

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(210)

14 P DUPUIS C NUZMAN AND P WHITING

along with the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(211)

As in the empty case it may be established that F is nonempty if and onlyif (αωβ) is feasible

Theorem 27 (Terminal rate function general case) The rate functionJ (ω) SI rarrR

+ defined in Corollary 23 may be expressed

J (ω) = minπisinF (αωβ)

sum

kisinK

αkD(π(k)P(β))

whenever (αβω) are feasible and is infinite otherwise The minimizingargument πlowast isin F (αωβ) is unique

As discussed above we may suppose the constraints are irreducible andthen as shown in the Appendix Lagrange multipliers will always exist forthis problem When we have strict inequality in the conservation condi-tion (24) (the exponential case) the solution takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le ICkPj(ρβ) k isinK k+ j gt I

In the case of equality in (24) (the polynomial case) the correspondingform is

πlowastkj =

DkPj(β)Wk+j k isinK j + k le I0 k+ j gt I

As for empty initial conditions ρ may be associated with the conserva-tion condition The Wi correspond to the terminal constraints ωi and theCkDk are normalization constants The constants Ck ρ Wi and Di canall be computed numerically using Lagrangian methods for constrained op-timization (see eg [4]) Given these constants the optimizing trajectoryγ(t) may be constructed explicitly This construction may be most simplyexpressed in terms of the minimizing trajectories in the empty case Fork isinK denote the mean of πlowast(k) by βk =

suminfinj=0 jπ

lowastkj and define the terminal

condition ω(k) isin SIminusk by ωkj = πlowastkj for j = 0 Iminusk Then the constraints(1 ω(k) βk) are feasible constraints and Theorem 26 determines the associ-ated least cost paths which we denote γ(k) = γkj(x) The least cost pathsfor the subproblems combine to form the overall least cost path

Theorem 28 (Globally minimizing path general case) For irreducible

feasible constraints (αωβ) let πlowast isin S|K|infin be the unique minimizing distri-

bution in Theorem 27 and let the functions γkj [0 βk]rarr [01] be the min-imizing paths corresponding to the subproblems (1 ω(k) βk) The infimum

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 15

of J(γ) over A(αωβ) is achieved on the occupancy path γ isin A(αωβ)defined by

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

In addition γ satisfies the EulerndashLagrange equations

3 Proof of Theorem 22 The purpose of this section is to prove themain large deviations result We recall the processes and notation definedat the beginning of Section 2

For ζ isinRI+2 and γ isin SI define

H(γ ζ) = log(E[exp(ζ middot bi(γ))])

where ldquomiddotrdquo denotes inner product Since the support of bi(γ) is boundeduniformly in i and γ there exists a function h Rrarr R such that H(γ ζ)leh(|ζ|) for all ζ and γ Also since the distribution of bi(γ) is weakly continuousin γ H(γ ζ) is jointly continuous It follows from [10] Theorem 41 thatthe sequence Γn n= 12 satisfies a large deviation upper bound witha rate function J which we now define Let L be the LegendrendashFencheltransform of H(γ ζ) in ζ

L(γ η) = supζisinRI+2

[ζ middot ηminus H(γ ζ)]

If γ(x)0 le x le β is an absolutely continuous function that takes valuesin SI then

J(γ) =

int β

0L(γ(x) γ(x))dx

If γ is not absolutely continuous then J(γ) = infin ([10] assumes that thevector fields bi(γ) are defined for all γ isin R

I+2 It is easy to check that wecan extend the definition to this set with the bound H(γ ζ) le h(|ζ|) andthe continuity of H(γ ζ) preserved However if the process Γn starts in SI then it stays in SI and so the exact form of the extension has no effect onthe rate function)

We recall the definition of the matrix M given in (21) To complete theproof of the upper bound we must show that J = J where J is defined asin (22) This will hold if we can show that L(γ η) is finite only when η =Mθfor a unique probability vector θ and that in this case

L(γMθ) =D(θγ)

16 P DUPUIS C NUZMAN AND P WHITING

It is well known that L(γ η) is finite if and only if η is in the convexhull of the support of bi(γ) (see eg [11] Lemma 623(d)) Therefore ifL(γ η)ltinfin then η can be written as a convex combination of the form

θ0(e1 minus e0) + middot middot middot+ θI(eI+1 minus eI)

where θj ge 0 andsumI

j=0 θj le 1 Since the vectors (ej+1 minus ej) j = 01 I

are linearly independent these values are unique Setting θI+1 = 1minussumI

j=0 θj we have η =Mθ for a unique probability vector θ Now assume that η takesthis form Then

L(γMθ) = supζisinRI+2

[ζ middotMθminus log(E[exp(ζ middot bi(γ))])]

= supζisinRI+2

[

MT ζ middot θminus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

= supζisinRI+2

[

Isum

j=0

(ζj+1 minus ζj)θj minus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

Given any values micro0 microI+1 we can define ζ0 ζI+1 recursively by ζ0 =minusmicroI+1

and ζj+1minus ζj = microjminusmicroI+1 With these definitions it is apparent that the lastdisplay is equal to

supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1)minus log

([

Isum

j=0

γj exp(microj minus microI+1)

]

+ γI+1

)]

= supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1) + microI+1 minus log

(

I+1sum

j=0

γj expmicroj

)]

= supmicroisinRI+2

[

I+1sum

j=0

microjθj minus log

(

I+1sum

j=0

γj expmicroj

)]

According to the DonskerndashVaradhan variational formula for relative entropy(eg [11] Lemma 143(a)) the last display equals D(θγ) thereby com-pleting the proof of the upper bound

We turn now to the proof of the lower bound In this proof we will as-sume that γ0(0)gt 0 Since γ0 is always nonincreasing γ0(0) = 0 implies thatγ0(x) = 0 for all x and therefore under this condition the first componentplays no significant role A proof analogous to the one given below applieswhen γ0(0) = 0 where the role of γ0(0) here is played by the first positivecomponent of γ(0)

Let PΓn(0) [resp EΓn(0)] denote probability (resp expected value) givena deterministic initial occupancy Γn(0) To prove the large deviation lower

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 17

bound it suffices to show that given any ε gt 0 and δ gt 0 there is η gt 0 suchthat for any initial occupancies satisfying |Γn(0)minus γ(0)|lt η

lim infnrarrinfin

1

nlogPΓn(0)

(

sup0lexleβ

|Γn(x)minus γ(x)|lt δ

)

geminusJ(γ)minus ε(31)

Of course this inequality is trivial if J(γ) = infin and so we assume thatJ(γ)ltinfin

As we have remarked a source of difficulty is the singular behavior ofthe transition rates of the process when Γn is near the boundary of SI We first show that this can be avoided at all times save x= 0 To do thiswe show that for any a gt 0 there exist b gt 0 K isin N and an occupancyfunction y such that y(0) = γ(0) sup0lexleβ |y(x) minus γ(x)| lt a yj(x) gt bxK

for all j = 01 I I+ and 0lt xle β and such that

J(y)le J(γ)

Consider the zero cost trajectory defined by

z(x) =Mz(x) z(0) = γ(0)

We have the following expression for zj when j le I

zj(x) =

[ jsum

k=0

γk(0)x(jminusk)(j minus k)

]

eminusx(32)

It is easy to check from this explicit formula that zj(x) gt bxK for someb gt 0 K = I and all j = 01 I I+ and 0 lt x le β For ρ isin (01) letyρ = ρz + (1minus ρ)γ Then yρ is the occupancy function that corresponds tothe rate ρz + (1minus ρ)θ Using the joint convexity of relative entropy in bothvariables ([11] Lemma 143(b)) and the fact that D(zz) = 0 we have

J(yρ) =

int β

0D(ρz(x) + (1minus ρ)θ(x)ρz(x) + (1minus ρ)γ(x))dx

le ρ

int β

0D(z(x)z(x)) dx+ (1minus ρ)

int β

0D(θ(x)γ(x))dx

= (1minus ρ)J(γ)

All required properties are then obtained by letting y = yρ for suitably smallρ isin (01)

It follows that in proving the lower bound we can assume without loss ofgenerality that for some fixed constants b gt 0 and K isinN γi(x)ge bxK for allx isin [0 β] We now return to the proof of the lower bound Our first objectiveis to show that the process can be moved into a small neighborhood of γ(τ)

18 P DUPUIS C NUZMAN AND P WHITING

(for τ gt 0 small) with sufficiently high probability Given τ isin (0 β] σ gt 0and ε gt 0 define

h(γ) =

0 |γ minus γ(τ)|lt σ22ε else

For n large enough that |γ(lfloornτrfloorn) minus γ(τ)| le σ2 (and independent ofτ isin (0 β]) we have the inequality

PΓn(0)(|Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ) + eminus2nε

ge PΓn(0)(|Γn(lfloornτrfloorn)minus γ(τ)|lt σ2) + eminus2nε

geEΓn(0)(expminusnh(Γn(lfloornτrfloorn)))

We next exploit a representation for exponential integrals that will give usan explicit lower bound on the last quantity Consider a process Γn(x) con-structed as follows The process dynamics are of the same general structureas those of Γn save that bi(Γ

n(in)) is replaced by a sequence bni

Γn(

i+1

n

)

= Γn(

i

n

)

+1

nbni Γn(0) = Γn(0)

Furthermore the distribution of bni is allowed to depend in any measurableway upon the set of values Γn(jn)0le j le i Let microγ denote the distribu-tion of bi(γ) and (without explicitly exhibiting all the dependencies) let micronidenote the (random) distribution of bni given Γn(jn)0 le j le i We letEΓn(0) denote expectation on the space that supports these processes Itfollows from [11] Theorem 431 that

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

= inf EΓn(0)

[

h

(

Γn(

lfloornτrfloor

n

))

+1

n

lfloornτrfloorminus1sum

i=1

D(microni microΓn(in))

]

where the infimum is over all such processes Γn(x) In order to obtain a lowerbound we now simply insert a particular choice for the random variables bni We can write γ(τ) minus γ(0) =Mvτ for some probability vector v Define aprocess bni as follows

bni =

ej minus ejminus1 if

lfloor jminus1sum

k=0

vkτn

rfloor

le ile

lfloor jsum

k=0

vkτn

rfloor

minus 1 for 0le j le I

0 if

lfloor

Isum

k=0

vkτn

rfloor

le ile lfloorτnrfloor minus 1

In other words bni defines a deterministic discrete time approximation tothe continuous time occupancy rate process that uses e0 minus e1 for an amount

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 19

of time v0τ e1 minus e2 for an amount of time v1τ and so on This continuoustime process will move the occupancy process from γ(0) to γ(τ) at time τ If ej minus ejminus1 is used at the discrete time step i then since microni concentratesits mass on ej minus ejminus1 the cost is

D( microni microΓn(in)) = log

(

1

Γnjminus1(in)

)

=minus log

(

Γnjminus1

(

i

n

))

(33)

The process Γn(in) possesses important monotonicity and convergence

properties Since Γnjminus1((i+1)n)minus Γn

jminus1(in) =minus1n for lfloorsumjminus1

k=0 vkτnrfloor le ile

lfloorsumj

k=0 vkτnrfloorminus 1

Γnjminus1

(

i

n

)

darr Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

as i uarr lfloorsumj

k=0 vkτnrfloor In addition because the (j minus 1)st component is never

modified when ige lfloorsumj

k=0 vkτnrfloor it follows that

Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

rarr γjminus1(τ)

as nrarrinfin and ηrarr 0 Furthermore as observed previously (32) implies theexistence of b gt 0 and K isinN such that γj(τ)ge bτK for j = 0 I Thus atany given time step i we have a strictly positive lower bound on the relevantcomponent of Γn which in turn provides a strictly finite upper bound on thecorresponding relative entropy cost Indeed it follows from (33) and γj(τ)gebτK that for all sufficiently large n and small η gt 0 there are C1C2 ltinfin(and independent of τ ) such that whenever |Γn(0)minus γ(0)|lt η for all i

D(microni microΓn(in))leC1[minus log τK ]leminusC2 log τ

In addition as nrarrinfin and ηrarr 0

Γn(

1

nlfloorτnrfloor

)

rarr γ(τ)

By the Lebesgue dominated convergence theorem for all sufficiently large nand sufficiently small η gt 0 |Γn(0)minus γ(0)|lt η implies

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

leminusC2τ log τ

We now choose τ gt 0 so that minusC2τ log τ le ε2 Choosing τ gt 0 smaller ifneed be we can also guarantee that |Γn(x)minusγ(x)| le δ for all x isin [0 τ ] wp1if |Γn(0)minus γ(0)| lt η and η gt 0 is sufficiently small The following bound is

20 P DUPUIS C NUZMAN AND P WHITING

therefore valid for the given τ gt 0 for any σ gt 0 and all sufficiently smallη gt 0 |Γn(0)minus γ(0)|lt η implies

lim infnrarrinfin

1

nlogPΓn(0)

(∣

Γn(

lfloornτrfloor

n

)

minus γ

(

lfloornτrfloor

n

)∣

ltσ

supxisin[0τ ]

|Γn(x)minus γ(x)|lt δ

)

geminusε

2

Note that the asymptotic lower bound on the normalized log of the proba-bility is independent of σ gt 0 To obtain the lower bound for all x isin [0 β]we will use the Markov property and an existing lower bound for pathswhich avoid that boundary This latter lower bound will hold uniformly ina neighborhood of the initial condition γ(τ) Since we do not know a priorihow small this neighborhood must be it is important that the lower boundin the last display should be independent of σ gt 0

Now choose ζ isin (0 δ] such that γ(x) is at least distance 2ζ from theboundary of SI for all x isin [τ β] Recall that when considered as a func-tion of γ the distribution of bi(γ) is continuous in the weak topology andmoreover that the support of this distribution is independent of γ so longas γi gt 0 for all i isin 01 I + 1 (ie γ isin S

I where SI denotes interior

relative to the smallest affine space that contains SI) It then follows fromProposition 661 of [11] (see also the discussion on [11] page 165 regardinguniformity) that Γn n= 12 satisfies the following uniform large devi-ations lower bound given any ε gt 0 and ζ gt 0 defined above there is σ gt 0such that as long as |Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ

lim infnrarrinfin

1

nlogPΓn(lfloornτrfloorn)lfloornτrfloorn

(

supτlexleβ

|Γn(x)minus γ(x)|lt ζ

)

geminusJ(γ)minusε

2

where PΓn(lfloornτrfloorn)lfloornτrfloorn denotes probability given the occupancy levels Γn(lfloornτrfloorn)at time lfloornτrfloorn Proposition 661 of [11] assumes Condition 632 It is worthnoting that in the present setting this condition holds with the particularlysimple choice β = γ (using the notation of [11])

The lower bound (31) now follows by the Markov property and the lasttwo displays The proof that J has compact level sets is as in [10] andtherefore omitted

4 Examples and extensions In this section we apply the results of theprevious sections to three different occupancy problems We show how theparameters of interest may be computed numerically and plot the solutionsto the associated calculus of variations problems In Section 44 we list someother asymptotic problems of interest which can be solved by relativelystraightforward generalizations of the results presented in this paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 21

In the calculus of variations problems solved in Section 2 precise initialand terminal points were always given In typical applications one is inter-ested in the minimum value of the rate function over a constraint set Whenthe constraint set is sufficiently simple such problems may still be solvedeasily using the tools provided in Section 22 The three problems of thissection are of this type

Suppose that the event of interest is that the random endpoint Γn(β)should lie in a terminal constraint set Ω To apply the LDP for Γn(β)established by Corollary 23 one must compute exponents of the form

J (Ω) = infωisinΩ

J (ω)

Using Theorem 27 we can write

J (Ω) = infωisinΩ

infπisinF (αωβ)

Ksum

k=0

αkD(π(k)P(β))

(41)

= infπisinF (αΩβ)

Ksum

k=0

αkD(π(k)P(β))

where we have abused notation to define F (αΩ β) =⋃

ωisinΩF (αωβ)In many cases for example when the terminal set Ω is convex and defined

by linear constraints the exponent J can be computed directly from (41)using Lagrange multipliers That is one solves minimization problems of thetype given in Theorem 27 but with the endpoint constraints (210) replacedby constraints defining Ω This is the approach used in the second andthird example below We do not prove that appropriate Lagrange multipliersalways exist if needed existence may be established using methods similarto those used in the Appendix for the case Ω= ω Because of the convexityof relative entropy in (41) a local minimum is always a global minimum overconvex sets Hence in any particular scenario with convex Ω it is sufficientto establish a local minimum by numerically computing a set of Lagrangemultipliers

An alternative approach for computing J (Ω) could be to return to thesample path level and use natural boundary conditions on the extremalcurves (see [25])

A set Ω with interior Ω and closure Ω is a J -continuity set if and onlyif infωisinΩ J (ω) = infωisinΩJ (ω) For such sets the large deviations lower andupper bounds coincide so that minus lim1n logP (Γn(β) isinΩ) = infωisinΩJ (ω) Itmay readily be verified in each of the following examples that the event ofinterest is a J -continuity set

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 10: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

10 P DUPUIS C NUZMAN AND P WHITING

in many applications and because it is a building block for the general caseIn each case first we give a characterization of J (ω) as a minimal relativeentropy which may be computed explicitly in this form We then give anexplicit functional form for the sample path minimizer The function formconstain parameters determined by the solutions to fixed point equations

Before stating these theorems it is helpful to specify the domain overwhich J (ω) is finite We define an endpoint constraint to be a triple (αωβ)where α ω isin SI are the initial and terminal occupancies and where β gt 0is the number of balls thrown per urn We assume without loss that α0 gt 0that is that some fraction of urns are initially empty and we denote theindex set of positive initial occupancies by K= k αk gt 0 Since we do notdistinguish between urns having more than I balls we can always supposethat no urns initially have more than I+1 balls and denote the last elementof α by αI+1 The last element of ω is denoted ωI+ signifying that it collectsall urns with occupancy greater than or equal to I +1

Definition 21 A endpoint constraint (αωβ) is feasible if the corre-sponding set of valid occupancy paths A(αωβ) is nonempty

Lemma 24 An endpoint constraint (αωβ) is feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(23)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

k=0

kαk + β (conservation)(24)

This lemma is proved in the Appendix Condition (23) relates to the factthat the ψi(x) must decrease monotonically and (24) to a conservation con-straint for the number of balls thrown The right-hand side of the inequalityequals the initial number of balls per urn plus the additional balls per urnthrown up to time β while the left-hand side is a lower bound on the numberof balls per urn at time β

In the treatment of general initial conditions the following further defi-nition will be useful

Definition 22 A set of feasible constraints (αωβ) is irreducible ifthe monotonicity conditions (23) hold with strict inequality for all i lt I Otherwise the constraints are termed reducible

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 11

For a reducible set of constraints let i be the first index such thatsumi

k=0αk =sumi

k=0ωk This condition can only be met if no balls are everthrown into i-occupied urns and it follows that urns which initially con-tain i balls or less will always contain i balls or less Thus these urns maybe treated in isolation from the urns containing more than i balls Further-more by subtracting i+ 1 balls from each urn in this latter set we obtainanother occupancy problem in standard form that is with α0 gt 0 Contin-uing in this way a given problem with constraints (αωβ) may be dividedinto a finite number of isolated irreducible subproblems It is only neces-sary therefore to treat problems with irreducible constraints and wherevernecessary we suppose this to be the case

221 Empty initial conditions An important special case of the initialconstraint is when all urns are initially empty that is α0 = 1 and αi = 0 fori gt 0 We abuse notation and denote this case by α= 1

Define the set F (1 ω β) to be the set of distributions π on the nonnegativeintegers satisfying πi = ωi for i= 0 I and the constraint

infinsum

i=0

iπi = β

In the empty case the conditions for feasibility of (1 ω β) reduce tosumI

i=0 iωi+(I+1)ωI+ le β from which it follows that the set F is nonempty if and onlyif (1 ω β) is feasible As we now discuss the infimum of the rate function Jover paths A(1 ω β) can be represented as an infimum of a relative entropyover distributions in F (1 ω β)

Theorem 25 (Terminal rate function empty case) Given the initialcondition α = 1 the rate function J (ω) SI rarr R

+ defined in Corollary 23may be determined as

J (ω) = minπisinF (1ωβ)

D(πP(β))

if (1 ω β) is feasible and is otherwise infinite The minimizing argumentπlowast isin F (1 ω β) is unique

The above expression is of interest in its own right Moreover the optimalsolution πlowast can be computed explicitly Using Lagrange multipliers onecan show in fact that the solution takes the form πlowasti = CPi(ρβ) for i gt Ifor some constants C ge 0 and ρ gt 0 that we refer to as twist parametersHere ρ is related to the Lagrange multiplier for the conservation constraintsuminfin

i=0 iπlowasti = β while C is a normalization constant ensuring that

suminfini=0 π

lowasti = 1

These two constraints may be solved to determine ρ and C If we have the

12 P DUPUIS C NUZMAN AND P WHITING

strict equalitysumI

i=0 iωi + (I + 1)ωI+ = β there are just enough balls to meetthe terminal constraints and we may replace I by I+1 if necessary to ensurethat ωI+ = 1minus

sumIi=0ωi = 0 In this case it turns out that F (1 ω β) has only

one element we then have C = 0 and we may take ρ = 1 Otherwise wedefine ρ to be the unique positive root of the equation

ρβ minussumI

i=0 iPi(ρβ)

1minussumI

i=0Pi(ρβ)=β minus

sumIi=0 iωi

1minussumI

i=0ωi

(25)

Because of the strict inequality in the conservation condition (24) the right-hand side of the last equation is strictly greater than I+1 The left-hand sideof the equation is the conditional mean E[Y |Y gt I] of a Poisson randomvariable Y with mean ρβ As a function of ρ this conditional mean is astrictly monotonic and continuous map from (0infin) to (I+1infin) and hencethe equation has exactly one positive root C is given by

C =1minus

sumIi=0ωi

1minussumI

i=0Pi(ρβ)=

β minussumI

i=0 iωi

ρβ minussumI

i=0 iPi(ρβ)(26)

Evaluating the relative entropy of πlowast and P(β) the rate function of Γn(β)can be expressed as

J (ω) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

(27)

+

(

β minusIsum

i=0

iωi

)

log ρ

The next theorem shows that the least cost path γlowast satisfying J(γlowast) =J (ω) can also be expressed explicitly in terms of the twist parameters Cand ρ The least cost path is of interest for a number of reasons Firstof all the proof of Theorem 25 is obtained by evaluating (22) using theexplicit form of γlowast In a similar way the least cost paths may be used as atool in other problems of interest such as for determining the rate functionof Γn(β) when β is random Second the least cost path provides insightinto the expected behavior of occupancy experiments conditioned on theoccurrence of a rare event Third they allow empirical estimation of rareevent probabilities by change-of-measure importance sampling A key stepin proving the minimality of the proposed least cost path is to show thatthe path satisfies the EulerndashLagrange equations (defined in the Appendix)

Theorem 26 (Globally minimizing path empty case) Suppose that(1ωβ) are feasible constraints with empty initial conditions The infimum of

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 13

J(γ) over A(1 ω β) is achieved on the occupancy path γ isinA(1 ω β) definedby

γ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(28)

γi(x) =xi

i(minus1)iγ

(i)0 (x) 1le ile I(29)

γI+(x) = 1minusIsum

i=0

γi(x)

where ρ gt 0 and C ge 0 are twist parameters associated with the constraintsIn addition γ satisfies the EulerndashLagrange equations

Note that the entire path γ(x) is completely determined by the emptycomponent γ0(x) and its derivatives In particular the components γi(x) are

the terms in the Taylor expansion of γ0(x) about x γ0(x+ y) = γ0(x) + yγ(1)0 (x) + middot middot middot

evaluated at time 0 that is with y = minusx Note that γ0(x) is the sum ofa polynomial and a single exponential term so that the Taylor expansionalways exists When C = 0 there is no exponential term and we say that γ isa polynomial extremal Otherwise C gt 0 we have an exponential extremal

222 General initial conditions Occupancy problems with general ini-tial conditions may be thought of as a coupled set of problems with emptyinitial conditions In particular we consider the set of urns initially contain-ing k balls to form a class The evolution of excess balls (beyond k) enteringurns of this class may be denoted by occupancy functions of the form γkj(t)representing the fraction of balls initially having k urns which have k + jurns at time t The fraction of urns containing i balls in the overall system isobtained by summing contributions from all subproblem components withk+ j = i

As in the empty case the rate function J (ω) can be expressed as thesolution to a minimization problem Let Sinfin denote the set of distributionson the nonnegative integers and let Sn

infin denote the set of n-tuples of suchdistributions Recall that K is set of indices k such that αk gt 0 We will

denote an element of S|K|infin by π = π(0) π(k) π(K) where for any

k isin K a component distribution is denoted π(k) = πk0 πk1 and it isunderstood that the corresponding π(k) is omitted if αk = 0 Finally let

F (αβω) be the set of π isin S|K|infin which satisfies the terminal constraints

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(210)

14 P DUPUIS C NUZMAN AND P WHITING

along with the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(211)

As in the empty case it may be established that F is nonempty if and onlyif (αωβ) is feasible

Theorem 27 (Terminal rate function general case) The rate functionJ (ω) SI rarrR

+ defined in Corollary 23 may be expressed

J (ω) = minπisinF (αωβ)

sum

kisinK

αkD(π(k)P(β))

whenever (αβω) are feasible and is infinite otherwise The minimizingargument πlowast isin F (αωβ) is unique

As discussed above we may suppose the constraints are irreducible andthen as shown in the Appendix Lagrange multipliers will always exist forthis problem When we have strict inequality in the conservation condi-tion (24) (the exponential case) the solution takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le ICkPj(ρβ) k isinK k+ j gt I

In the case of equality in (24) (the polynomial case) the correspondingform is

πlowastkj =

DkPj(β)Wk+j k isinK j + k le I0 k+ j gt I

As for empty initial conditions ρ may be associated with the conserva-tion condition The Wi correspond to the terminal constraints ωi and theCkDk are normalization constants The constants Ck ρ Wi and Di canall be computed numerically using Lagrangian methods for constrained op-timization (see eg [4]) Given these constants the optimizing trajectoryγ(t) may be constructed explicitly This construction may be most simplyexpressed in terms of the minimizing trajectories in the empty case Fork isinK denote the mean of πlowast(k) by βk =

suminfinj=0 jπ

lowastkj and define the terminal

condition ω(k) isin SIminusk by ωkj = πlowastkj for j = 0 Iminusk Then the constraints(1 ω(k) βk) are feasible constraints and Theorem 26 determines the associ-ated least cost paths which we denote γ(k) = γkj(x) The least cost pathsfor the subproblems combine to form the overall least cost path

Theorem 28 (Globally minimizing path general case) For irreducible

feasible constraints (αωβ) let πlowast isin S|K|infin be the unique minimizing distri-

bution in Theorem 27 and let the functions γkj [0 βk]rarr [01] be the min-imizing paths corresponding to the subproblems (1 ω(k) βk) The infimum

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 15

of J(γ) over A(αωβ) is achieved on the occupancy path γ isin A(αωβ)defined by

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

In addition γ satisfies the EulerndashLagrange equations

3 Proof of Theorem 22 The purpose of this section is to prove themain large deviations result We recall the processes and notation definedat the beginning of Section 2

For ζ isinRI+2 and γ isin SI define

H(γ ζ) = log(E[exp(ζ middot bi(γ))])

where ldquomiddotrdquo denotes inner product Since the support of bi(γ) is boundeduniformly in i and γ there exists a function h Rrarr R such that H(γ ζ)leh(|ζ|) for all ζ and γ Also since the distribution of bi(γ) is weakly continuousin γ H(γ ζ) is jointly continuous It follows from [10] Theorem 41 thatthe sequence Γn n= 12 satisfies a large deviation upper bound witha rate function J which we now define Let L be the LegendrendashFencheltransform of H(γ ζ) in ζ

L(γ η) = supζisinRI+2

[ζ middot ηminus H(γ ζ)]

If γ(x)0 le x le β is an absolutely continuous function that takes valuesin SI then

J(γ) =

int β

0L(γ(x) γ(x))dx

If γ is not absolutely continuous then J(γ) = infin ([10] assumes that thevector fields bi(γ) are defined for all γ isin R

I+2 It is easy to check that wecan extend the definition to this set with the bound H(γ ζ) le h(|ζ|) andthe continuity of H(γ ζ) preserved However if the process Γn starts in SI then it stays in SI and so the exact form of the extension has no effect onthe rate function)

We recall the definition of the matrix M given in (21) To complete theproof of the upper bound we must show that J = J where J is defined asin (22) This will hold if we can show that L(γ η) is finite only when η =Mθfor a unique probability vector θ and that in this case

L(γMθ) =D(θγ)

16 P DUPUIS C NUZMAN AND P WHITING

It is well known that L(γ η) is finite if and only if η is in the convexhull of the support of bi(γ) (see eg [11] Lemma 623(d)) Therefore ifL(γ η)ltinfin then η can be written as a convex combination of the form

θ0(e1 minus e0) + middot middot middot+ θI(eI+1 minus eI)

where θj ge 0 andsumI

j=0 θj le 1 Since the vectors (ej+1 minus ej) j = 01 I

are linearly independent these values are unique Setting θI+1 = 1minussumI

j=0 θj we have η =Mθ for a unique probability vector θ Now assume that η takesthis form Then

L(γMθ) = supζisinRI+2

[ζ middotMθminus log(E[exp(ζ middot bi(γ))])]

= supζisinRI+2

[

MT ζ middot θminus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

= supζisinRI+2

[

Isum

j=0

(ζj+1 minus ζj)θj minus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

Given any values micro0 microI+1 we can define ζ0 ζI+1 recursively by ζ0 =minusmicroI+1

and ζj+1minus ζj = microjminusmicroI+1 With these definitions it is apparent that the lastdisplay is equal to

supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1)minus log

([

Isum

j=0

γj exp(microj minus microI+1)

]

+ γI+1

)]

= supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1) + microI+1 minus log

(

I+1sum

j=0

γj expmicroj

)]

= supmicroisinRI+2

[

I+1sum

j=0

microjθj minus log

(

I+1sum

j=0

γj expmicroj

)]

According to the DonskerndashVaradhan variational formula for relative entropy(eg [11] Lemma 143(a)) the last display equals D(θγ) thereby com-pleting the proof of the upper bound

We turn now to the proof of the lower bound In this proof we will as-sume that γ0(0)gt 0 Since γ0 is always nonincreasing γ0(0) = 0 implies thatγ0(x) = 0 for all x and therefore under this condition the first componentplays no significant role A proof analogous to the one given below applieswhen γ0(0) = 0 where the role of γ0(0) here is played by the first positivecomponent of γ(0)

Let PΓn(0) [resp EΓn(0)] denote probability (resp expected value) givena deterministic initial occupancy Γn(0) To prove the large deviation lower

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 17

bound it suffices to show that given any ε gt 0 and δ gt 0 there is η gt 0 suchthat for any initial occupancies satisfying |Γn(0)minus γ(0)|lt η

lim infnrarrinfin

1

nlogPΓn(0)

(

sup0lexleβ

|Γn(x)minus γ(x)|lt δ

)

geminusJ(γ)minus ε(31)

Of course this inequality is trivial if J(γ) = infin and so we assume thatJ(γ)ltinfin

As we have remarked a source of difficulty is the singular behavior ofthe transition rates of the process when Γn is near the boundary of SI We first show that this can be avoided at all times save x= 0 To do thiswe show that for any a gt 0 there exist b gt 0 K isin N and an occupancyfunction y such that y(0) = γ(0) sup0lexleβ |y(x) minus γ(x)| lt a yj(x) gt bxK

for all j = 01 I I+ and 0lt xle β and such that

J(y)le J(γ)

Consider the zero cost trajectory defined by

z(x) =Mz(x) z(0) = γ(0)

We have the following expression for zj when j le I

zj(x) =

[ jsum

k=0

γk(0)x(jminusk)(j minus k)

]

eminusx(32)

It is easy to check from this explicit formula that zj(x) gt bxK for someb gt 0 K = I and all j = 01 I I+ and 0 lt x le β For ρ isin (01) letyρ = ρz + (1minus ρ)γ Then yρ is the occupancy function that corresponds tothe rate ρz + (1minus ρ)θ Using the joint convexity of relative entropy in bothvariables ([11] Lemma 143(b)) and the fact that D(zz) = 0 we have

J(yρ) =

int β

0D(ρz(x) + (1minus ρ)θ(x)ρz(x) + (1minus ρ)γ(x))dx

le ρ

int β

0D(z(x)z(x)) dx+ (1minus ρ)

int β

0D(θ(x)γ(x))dx

= (1minus ρ)J(γ)

All required properties are then obtained by letting y = yρ for suitably smallρ isin (01)

It follows that in proving the lower bound we can assume without loss ofgenerality that for some fixed constants b gt 0 and K isinN γi(x)ge bxK for allx isin [0 β] We now return to the proof of the lower bound Our first objectiveis to show that the process can be moved into a small neighborhood of γ(τ)

18 P DUPUIS C NUZMAN AND P WHITING

(for τ gt 0 small) with sufficiently high probability Given τ isin (0 β] σ gt 0and ε gt 0 define

h(γ) =

0 |γ minus γ(τ)|lt σ22ε else

For n large enough that |γ(lfloornτrfloorn) minus γ(τ)| le σ2 (and independent ofτ isin (0 β]) we have the inequality

PΓn(0)(|Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ) + eminus2nε

ge PΓn(0)(|Γn(lfloornτrfloorn)minus γ(τ)|lt σ2) + eminus2nε

geEΓn(0)(expminusnh(Γn(lfloornτrfloorn)))

We next exploit a representation for exponential integrals that will give usan explicit lower bound on the last quantity Consider a process Γn(x) con-structed as follows The process dynamics are of the same general structureas those of Γn save that bi(Γ

n(in)) is replaced by a sequence bni

Γn(

i+1

n

)

= Γn(

i

n

)

+1

nbni Γn(0) = Γn(0)

Furthermore the distribution of bni is allowed to depend in any measurableway upon the set of values Γn(jn)0le j le i Let microγ denote the distribu-tion of bi(γ) and (without explicitly exhibiting all the dependencies) let micronidenote the (random) distribution of bni given Γn(jn)0 le j le i We letEΓn(0) denote expectation on the space that supports these processes Itfollows from [11] Theorem 431 that

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

= inf EΓn(0)

[

h

(

Γn(

lfloornτrfloor

n

))

+1

n

lfloornτrfloorminus1sum

i=1

D(microni microΓn(in))

]

where the infimum is over all such processes Γn(x) In order to obtain a lowerbound we now simply insert a particular choice for the random variables bni We can write γ(τ) minus γ(0) =Mvτ for some probability vector v Define aprocess bni as follows

bni =

ej minus ejminus1 if

lfloor jminus1sum

k=0

vkτn

rfloor

le ile

lfloor jsum

k=0

vkτn

rfloor

minus 1 for 0le j le I

0 if

lfloor

Isum

k=0

vkτn

rfloor

le ile lfloorτnrfloor minus 1

In other words bni defines a deterministic discrete time approximation tothe continuous time occupancy rate process that uses e0 minus e1 for an amount

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 19

of time v0τ e1 minus e2 for an amount of time v1τ and so on This continuoustime process will move the occupancy process from γ(0) to γ(τ) at time τ If ej minus ejminus1 is used at the discrete time step i then since microni concentratesits mass on ej minus ejminus1 the cost is

D( microni microΓn(in)) = log

(

1

Γnjminus1(in)

)

=minus log

(

Γnjminus1

(

i

n

))

(33)

The process Γn(in) possesses important monotonicity and convergence

properties Since Γnjminus1((i+1)n)minus Γn

jminus1(in) =minus1n for lfloorsumjminus1

k=0 vkτnrfloor le ile

lfloorsumj

k=0 vkτnrfloorminus 1

Γnjminus1

(

i

n

)

darr Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

as i uarr lfloorsumj

k=0 vkτnrfloor In addition because the (j minus 1)st component is never

modified when ige lfloorsumj

k=0 vkτnrfloor it follows that

Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

rarr γjminus1(τ)

as nrarrinfin and ηrarr 0 Furthermore as observed previously (32) implies theexistence of b gt 0 and K isinN such that γj(τ)ge bτK for j = 0 I Thus atany given time step i we have a strictly positive lower bound on the relevantcomponent of Γn which in turn provides a strictly finite upper bound on thecorresponding relative entropy cost Indeed it follows from (33) and γj(τ)gebτK that for all sufficiently large n and small η gt 0 there are C1C2 ltinfin(and independent of τ ) such that whenever |Γn(0)minus γ(0)|lt η for all i

D(microni microΓn(in))leC1[minus log τK ]leminusC2 log τ

In addition as nrarrinfin and ηrarr 0

Γn(

1

nlfloorτnrfloor

)

rarr γ(τ)

By the Lebesgue dominated convergence theorem for all sufficiently large nand sufficiently small η gt 0 |Γn(0)minus γ(0)|lt η implies

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

leminusC2τ log τ

We now choose τ gt 0 so that minusC2τ log τ le ε2 Choosing τ gt 0 smaller ifneed be we can also guarantee that |Γn(x)minusγ(x)| le δ for all x isin [0 τ ] wp1if |Γn(0)minus γ(0)| lt η and η gt 0 is sufficiently small The following bound is

20 P DUPUIS C NUZMAN AND P WHITING

therefore valid for the given τ gt 0 for any σ gt 0 and all sufficiently smallη gt 0 |Γn(0)minus γ(0)|lt η implies

lim infnrarrinfin

1

nlogPΓn(0)

(∣

Γn(

lfloornτrfloor

n

)

minus γ

(

lfloornτrfloor

n

)∣

ltσ

supxisin[0τ ]

|Γn(x)minus γ(x)|lt δ

)

geminusε

2

Note that the asymptotic lower bound on the normalized log of the proba-bility is independent of σ gt 0 To obtain the lower bound for all x isin [0 β]we will use the Markov property and an existing lower bound for pathswhich avoid that boundary This latter lower bound will hold uniformly ina neighborhood of the initial condition γ(τ) Since we do not know a priorihow small this neighborhood must be it is important that the lower boundin the last display should be independent of σ gt 0

Now choose ζ isin (0 δ] such that γ(x) is at least distance 2ζ from theboundary of SI for all x isin [τ β] Recall that when considered as a func-tion of γ the distribution of bi(γ) is continuous in the weak topology andmoreover that the support of this distribution is independent of γ so longas γi gt 0 for all i isin 01 I + 1 (ie γ isin S

I where SI denotes interior

relative to the smallest affine space that contains SI) It then follows fromProposition 661 of [11] (see also the discussion on [11] page 165 regardinguniformity) that Γn n= 12 satisfies the following uniform large devi-ations lower bound given any ε gt 0 and ζ gt 0 defined above there is σ gt 0such that as long as |Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ

lim infnrarrinfin

1

nlogPΓn(lfloornτrfloorn)lfloornτrfloorn

(

supτlexleβ

|Γn(x)minus γ(x)|lt ζ

)

geminusJ(γ)minusε

2

where PΓn(lfloornτrfloorn)lfloornτrfloorn denotes probability given the occupancy levels Γn(lfloornτrfloorn)at time lfloornτrfloorn Proposition 661 of [11] assumes Condition 632 It is worthnoting that in the present setting this condition holds with the particularlysimple choice β = γ (using the notation of [11])

The lower bound (31) now follows by the Markov property and the lasttwo displays The proof that J has compact level sets is as in [10] andtherefore omitted

4 Examples and extensions In this section we apply the results of theprevious sections to three different occupancy problems We show how theparameters of interest may be computed numerically and plot the solutionsto the associated calculus of variations problems In Section 44 we list someother asymptotic problems of interest which can be solved by relativelystraightforward generalizations of the results presented in this paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 21

In the calculus of variations problems solved in Section 2 precise initialand terminal points were always given In typical applications one is inter-ested in the minimum value of the rate function over a constraint set Whenthe constraint set is sufficiently simple such problems may still be solvedeasily using the tools provided in Section 22 The three problems of thissection are of this type

Suppose that the event of interest is that the random endpoint Γn(β)should lie in a terminal constraint set Ω To apply the LDP for Γn(β)established by Corollary 23 one must compute exponents of the form

J (Ω) = infωisinΩ

J (ω)

Using Theorem 27 we can write

J (Ω) = infωisinΩ

infπisinF (αωβ)

Ksum

k=0

αkD(π(k)P(β))

(41)

= infπisinF (αΩβ)

Ksum

k=0

αkD(π(k)P(β))

where we have abused notation to define F (αΩ β) =⋃

ωisinΩF (αωβ)In many cases for example when the terminal set Ω is convex and defined

by linear constraints the exponent J can be computed directly from (41)using Lagrange multipliers That is one solves minimization problems of thetype given in Theorem 27 but with the endpoint constraints (210) replacedby constraints defining Ω This is the approach used in the second andthird example below We do not prove that appropriate Lagrange multipliersalways exist if needed existence may be established using methods similarto those used in the Appendix for the case Ω= ω Because of the convexityof relative entropy in (41) a local minimum is always a global minimum overconvex sets Hence in any particular scenario with convex Ω it is sufficientto establish a local minimum by numerically computing a set of Lagrangemultipliers

An alternative approach for computing J (Ω) could be to return to thesample path level and use natural boundary conditions on the extremalcurves (see [25])

A set Ω with interior Ω and closure Ω is a J -continuity set if and onlyif infωisinΩ J (ω) = infωisinΩJ (ω) For such sets the large deviations lower andupper bounds coincide so that minus lim1n logP (Γn(β) isinΩ) = infωisinΩJ (ω) Itmay readily be verified in each of the following examples that the event ofinterest is a J -continuity set

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 11: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 11

For a reducible set of constraints let i be the first index such thatsumi

k=0αk =sumi

k=0ωk This condition can only be met if no balls are everthrown into i-occupied urns and it follows that urns which initially con-tain i balls or less will always contain i balls or less Thus these urns maybe treated in isolation from the urns containing more than i balls Further-more by subtracting i+ 1 balls from each urn in this latter set we obtainanother occupancy problem in standard form that is with α0 gt 0 Contin-uing in this way a given problem with constraints (αωβ) may be dividedinto a finite number of isolated irreducible subproblems It is only neces-sary therefore to treat problems with irreducible constraints and wherevernecessary we suppose this to be the case

221 Empty initial conditions An important special case of the initialconstraint is when all urns are initially empty that is α0 = 1 and αi = 0 fori gt 0 We abuse notation and denote this case by α= 1

Define the set F (1 ω β) to be the set of distributions π on the nonnegativeintegers satisfying πi = ωi for i= 0 I and the constraint

infinsum

i=0

iπi = β

In the empty case the conditions for feasibility of (1 ω β) reduce tosumI

i=0 iωi+(I+1)ωI+ le β from which it follows that the set F is nonempty if and onlyif (1 ω β) is feasible As we now discuss the infimum of the rate function Jover paths A(1 ω β) can be represented as an infimum of a relative entropyover distributions in F (1 ω β)

Theorem 25 (Terminal rate function empty case) Given the initialcondition α = 1 the rate function J (ω) SI rarr R

+ defined in Corollary 23may be determined as

J (ω) = minπisinF (1ωβ)

D(πP(β))

if (1 ω β) is feasible and is otherwise infinite The minimizing argumentπlowast isin F (1 ω β) is unique

The above expression is of interest in its own right Moreover the optimalsolution πlowast can be computed explicitly Using Lagrange multipliers onecan show in fact that the solution takes the form πlowasti = CPi(ρβ) for i gt Ifor some constants C ge 0 and ρ gt 0 that we refer to as twist parametersHere ρ is related to the Lagrange multiplier for the conservation constraintsuminfin

i=0 iπlowasti = β while C is a normalization constant ensuring that

suminfini=0 π

lowasti = 1

These two constraints may be solved to determine ρ and C If we have the

12 P DUPUIS C NUZMAN AND P WHITING

strict equalitysumI

i=0 iωi + (I + 1)ωI+ = β there are just enough balls to meetthe terminal constraints and we may replace I by I+1 if necessary to ensurethat ωI+ = 1minus

sumIi=0ωi = 0 In this case it turns out that F (1 ω β) has only

one element we then have C = 0 and we may take ρ = 1 Otherwise wedefine ρ to be the unique positive root of the equation

ρβ minussumI

i=0 iPi(ρβ)

1minussumI

i=0Pi(ρβ)=β minus

sumIi=0 iωi

1minussumI

i=0ωi

(25)

Because of the strict inequality in the conservation condition (24) the right-hand side of the last equation is strictly greater than I+1 The left-hand sideof the equation is the conditional mean E[Y |Y gt I] of a Poisson randomvariable Y with mean ρβ As a function of ρ this conditional mean is astrictly monotonic and continuous map from (0infin) to (I+1infin) and hencethe equation has exactly one positive root C is given by

C =1minus

sumIi=0ωi

1minussumI

i=0Pi(ρβ)=

β minussumI

i=0 iωi

ρβ minussumI

i=0 iPi(ρβ)(26)

Evaluating the relative entropy of πlowast and P(β) the rate function of Γn(β)can be expressed as

J (ω) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

(27)

+

(

β minusIsum

i=0

iωi

)

log ρ

The next theorem shows that the least cost path γlowast satisfying J(γlowast) =J (ω) can also be expressed explicitly in terms of the twist parameters Cand ρ The least cost path is of interest for a number of reasons Firstof all the proof of Theorem 25 is obtained by evaluating (22) using theexplicit form of γlowast In a similar way the least cost paths may be used as atool in other problems of interest such as for determining the rate functionof Γn(β) when β is random Second the least cost path provides insightinto the expected behavior of occupancy experiments conditioned on theoccurrence of a rare event Third they allow empirical estimation of rareevent probabilities by change-of-measure importance sampling A key stepin proving the minimality of the proposed least cost path is to show thatthe path satisfies the EulerndashLagrange equations (defined in the Appendix)

Theorem 26 (Globally minimizing path empty case) Suppose that(1ωβ) are feasible constraints with empty initial conditions The infimum of

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 13

J(γ) over A(1 ω β) is achieved on the occupancy path γ isinA(1 ω β) definedby

γ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(28)

γi(x) =xi

i(minus1)iγ

(i)0 (x) 1le ile I(29)

γI+(x) = 1minusIsum

i=0

γi(x)

where ρ gt 0 and C ge 0 are twist parameters associated with the constraintsIn addition γ satisfies the EulerndashLagrange equations

Note that the entire path γ(x) is completely determined by the emptycomponent γ0(x) and its derivatives In particular the components γi(x) are

the terms in the Taylor expansion of γ0(x) about x γ0(x+ y) = γ0(x) + yγ(1)0 (x) + middot middot middot

evaluated at time 0 that is with y = minusx Note that γ0(x) is the sum ofa polynomial and a single exponential term so that the Taylor expansionalways exists When C = 0 there is no exponential term and we say that γ isa polynomial extremal Otherwise C gt 0 we have an exponential extremal

222 General initial conditions Occupancy problems with general ini-tial conditions may be thought of as a coupled set of problems with emptyinitial conditions In particular we consider the set of urns initially contain-ing k balls to form a class The evolution of excess balls (beyond k) enteringurns of this class may be denoted by occupancy functions of the form γkj(t)representing the fraction of balls initially having k urns which have k + jurns at time t The fraction of urns containing i balls in the overall system isobtained by summing contributions from all subproblem components withk+ j = i

As in the empty case the rate function J (ω) can be expressed as thesolution to a minimization problem Let Sinfin denote the set of distributionson the nonnegative integers and let Sn

infin denote the set of n-tuples of suchdistributions Recall that K is set of indices k such that αk gt 0 We will

denote an element of S|K|infin by π = π(0) π(k) π(K) where for any

k isin K a component distribution is denoted π(k) = πk0 πk1 and it isunderstood that the corresponding π(k) is omitted if αk = 0 Finally let

F (αβω) be the set of π isin S|K|infin which satisfies the terminal constraints

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(210)

14 P DUPUIS C NUZMAN AND P WHITING

along with the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(211)

As in the empty case it may be established that F is nonempty if and onlyif (αωβ) is feasible

Theorem 27 (Terminal rate function general case) The rate functionJ (ω) SI rarrR

+ defined in Corollary 23 may be expressed

J (ω) = minπisinF (αωβ)

sum

kisinK

αkD(π(k)P(β))

whenever (αβω) are feasible and is infinite otherwise The minimizingargument πlowast isin F (αωβ) is unique

As discussed above we may suppose the constraints are irreducible andthen as shown in the Appendix Lagrange multipliers will always exist forthis problem When we have strict inequality in the conservation condi-tion (24) (the exponential case) the solution takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le ICkPj(ρβ) k isinK k+ j gt I

In the case of equality in (24) (the polynomial case) the correspondingform is

πlowastkj =

DkPj(β)Wk+j k isinK j + k le I0 k+ j gt I

As for empty initial conditions ρ may be associated with the conserva-tion condition The Wi correspond to the terminal constraints ωi and theCkDk are normalization constants The constants Ck ρ Wi and Di canall be computed numerically using Lagrangian methods for constrained op-timization (see eg [4]) Given these constants the optimizing trajectoryγ(t) may be constructed explicitly This construction may be most simplyexpressed in terms of the minimizing trajectories in the empty case Fork isinK denote the mean of πlowast(k) by βk =

suminfinj=0 jπ

lowastkj and define the terminal

condition ω(k) isin SIminusk by ωkj = πlowastkj for j = 0 Iminusk Then the constraints(1 ω(k) βk) are feasible constraints and Theorem 26 determines the associ-ated least cost paths which we denote γ(k) = γkj(x) The least cost pathsfor the subproblems combine to form the overall least cost path

Theorem 28 (Globally minimizing path general case) For irreducible

feasible constraints (αωβ) let πlowast isin S|K|infin be the unique minimizing distri-

bution in Theorem 27 and let the functions γkj [0 βk]rarr [01] be the min-imizing paths corresponding to the subproblems (1 ω(k) βk) The infimum

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 15

of J(γ) over A(αωβ) is achieved on the occupancy path γ isin A(αωβ)defined by

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

In addition γ satisfies the EulerndashLagrange equations

3 Proof of Theorem 22 The purpose of this section is to prove themain large deviations result We recall the processes and notation definedat the beginning of Section 2

For ζ isinRI+2 and γ isin SI define

H(γ ζ) = log(E[exp(ζ middot bi(γ))])

where ldquomiddotrdquo denotes inner product Since the support of bi(γ) is boundeduniformly in i and γ there exists a function h Rrarr R such that H(γ ζ)leh(|ζ|) for all ζ and γ Also since the distribution of bi(γ) is weakly continuousin γ H(γ ζ) is jointly continuous It follows from [10] Theorem 41 thatthe sequence Γn n= 12 satisfies a large deviation upper bound witha rate function J which we now define Let L be the LegendrendashFencheltransform of H(γ ζ) in ζ

L(γ η) = supζisinRI+2

[ζ middot ηminus H(γ ζ)]

If γ(x)0 le x le β is an absolutely continuous function that takes valuesin SI then

J(γ) =

int β

0L(γ(x) γ(x))dx

If γ is not absolutely continuous then J(γ) = infin ([10] assumes that thevector fields bi(γ) are defined for all γ isin R

I+2 It is easy to check that wecan extend the definition to this set with the bound H(γ ζ) le h(|ζ|) andthe continuity of H(γ ζ) preserved However if the process Γn starts in SI then it stays in SI and so the exact form of the extension has no effect onthe rate function)

We recall the definition of the matrix M given in (21) To complete theproof of the upper bound we must show that J = J where J is defined asin (22) This will hold if we can show that L(γ η) is finite only when η =Mθfor a unique probability vector θ and that in this case

L(γMθ) =D(θγ)

16 P DUPUIS C NUZMAN AND P WHITING

It is well known that L(γ η) is finite if and only if η is in the convexhull of the support of bi(γ) (see eg [11] Lemma 623(d)) Therefore ifL(γ η)ltinfin then η can be written as a convex combination of the form

θ0(e1 minus e0) + middot middot middot+ θI(eI+1 minus eI)

where θj ge 0 andsumI

j=0 θj le 1 Since the vectors (ej+1 minus ej) j = 01 I

are linearly independent these values are unique Setting θI+1 = 1minussumI

j=0 θj we have η =Mθ for a unique probability vector θ Now assume that η takesthis form Then

L(γMθ) = supζisinRI+2

[ζ middotMθminus log(E[exp(ζ middot bi(γ))])]

= supζisinRI+2

[

MT ζ middot θminus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

= supζisinRI+2

[

Isum

j=0

(ζj+1 minus ζj)θj minus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

Given any values micro0 microI+1 we can define ζ0 ζI+1 recursively by ζ0 =minusmicroI+1

and ζj+1minus ζj = microjminusmicroI+1 With these definitions it is apparent that the lastdisplay is equal to

supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1)minus log

([

Isum

j=0

γj exp(microj minus microI+1)

]

+ γI+1

)]

= supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1) + microI+1 minus log

(

I+1sum

j=0

γj expmicroj

)]

= supmicroisinRI+2

[

I+1sum

j=0

microjθj minus log

(

I+1sum

j=0

γj expmicroj

)]

According to the DonskerndashVaradhan variational formula for relative entropy(eg [11] Lemma 143(a)) the last display equals D(θγ) thereby com-pleting the proof of the upper bound

We turn now to the proof of the lower bound In this proof we will as-sume that γ0(0)gt 0 Since γ0 is always nonincreasing γ0(0) = 0 implies thatγ0(x) = 0 for all x and therefore under this condition the first componentplays no significant role A proof analogous to the one given below applieswhen γ0(0) = 0 where the role of γ0(0) here is played by the first positivecomponent of γ(0)

Let PΓn(0) [resp EΓn(0)] denote probability (resp expected value) givena deterministic initial occupancy Γn(0) To prove the large deviation lower

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 17

bound it suffices to show that given any ε gt 0 and δ gt 0 there is η gt 0 suchthat for any initial occupancies satisfying |Γn(0)minus γ(0)|lt η

lim infnrarrinfin

1

nlogPΓn(0)

(

sup0lexleβ

|Γn(x)minus γ(x)|lt δ

)

geminusJ(γ)minus ε(31)

Of course this inequality is trivial if J(γ) = infin and so we assume thatJ(γ)ltinfin

As we have remarked a source of difficulty is the singular behavior ofthe transition rates of the process when Γn is near the boundary of SI We first show that this can be avoided at all times save x= 0 To do thiswe show that for any a gt 0 there exist b gt 0 K isin N and an occupancyfunction y such that y(0) = γ(0) sup0lexleβ |y(x) minus γ(x)| lt a yj(x) gt bxK

for all j = 01 I I+ and 0lt xle β and such that

J(y)le J(γ)

Consider the zero cost trajectory defined by

z(x) =Mz(x) z(0) = γ(0)

We have the following expression for zj when j le I

zj(x) =

[ jsum

k=0

γk(0)x(jminusk)(j minus k)

]

eminusx(32)

It is easy to check from this explicit formula that zj(x) gt bxK for someb gt 0 K = I and all j = 01 I I+ and 0 lt x le β For ρ isin (01) letyρ = ρz + (1minus ρ)γ Then yρ is the occupancy function that corresponds tothe rate ρz + (1minus ρ)θ Using the joint convexity of relative entropy in bothvariables ([11] Lemma 143(b)) and the fact that D(zz) = 0 we have

J(yρ) =

int β

0D(ρz(x) + (1minus ρ)θ(x)ρz(x) + (1minus ρ)γ(x))dx

le ρ

int β

0D(z(x)z(x)) dx+ (1minus ρ)

int β

0D(θ(x)γ(x))dx

= (1minus ρ)J(γ)

All required properties are then obtained by letting y = yρ for suitably smallρ isin (01)

It follows that in proving the lower bound we can assume without loss ofgenerality that for some fixed constants b gt 0 and K isinN γi(x)ge bxK for allx isin [0 β] We now return to the proof of the lower bound Our first objectiveis to show that the process can be moved into a small neighborhood of γ(τ)

18 P DUPUIS C NUZMAN AND P WHITING

(for τ gt 0 small) with sufficiently high probability Given τ isin (0 β] σ gt 0and ε gt 0 define

h(γ) =

0 |γ minus γ(τ)|lt σ22ε else

For n large enough that |γ(lfloornτrfloorn) minus γ(τ)| le σ2 (and independent ofτ isin (0 β]) we have the inequality

PΓn(0)(|Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ) + eminus2nε

ge PΓn(0)(|Γn(lfloornτrfloorn)minus γ(τ)|lt σ2) + eminus2nε

geEΓn(0)(expminusnh(Γn(lfloornτrfloorn)))

We next exploit a representation for exponential integrals that will give usan explicit lower bound on the last quantity Consider a process Γn(x) con-structed as follows The process dynamics are of the same general structureas those of Γn save that bi(Γ

n(in)) is replaced by a sequence bni

Γn(

i+1

n

)

= Γn(

i

n

)

+1

nbni Γn(0) = Γn(0)

Furthermore the distribution of bni is allowed to depend in any measurableway upon the set of values Γn(jn)0le j le i Let microγ denote the distribu-tion of bi(γ) and (without explicitly exhibiting all the dependencies) let micronidenote the (random) distribution of bni given Γn(jn)0 le j le i We letEΓn(0) denote expectation on the space that supports these processes Itfollows from [11] Theorem 431 that

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

= inf EΓn(0)

[

h

(

Γn(

lfloornτrfloor

n

))

+1

n

lfloornτrfloorminus1sum

i=1

D(microni microΓn(in))

]

where the infimum is over all such processes Γn(x) In order to obtain a lowerbound we now simply insert a particular choice for the random variables bni We can write γ(τ) minus γ(0) =Mvτ for some probability vector v Define aprocess bni as follows

bni =

ej minus ejminus1 if

lfloor jminus1sum

k=0

vkτn

rfloor

le ile

lfloor jsum

k=0

vkτn

rfloor

minus 1 for 0le j le I

0 if

lfloor

Isum

k=0

vkτn

rfloor

le ile lfloorτnrfloor minus 1

In other words bni defines a deterministic discrete time approximation tothe continuous time occupancy rate process that uses e0 minus e1 for an amount

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 19

of time v0τ e1 minus e2 for an amount of time v1τ and so on This continuoustime process will move the occupancy process from γ(0) to γ(τ) at time τ If ej minus ejminus1 is used at the discrete time step i then since microni concentratesits mass on ej minus ejminus1 the cost is

D( microni microΓn(in)) = log

(

1

Γnjminus1(in)

)

=minus log

(

Γnjminus1

(

i

n

))

(33)

The process Γn(in) possesses important monotonicity and convergence

properties Since Γnjminus1((i+1)n)minus Γn

jminus1(in) =minus1n for lfloorsumjminus1

k=0 vkτnrfloor le ile

lfloorsumj

k=0 vkτnrfloorminus 1

Γnjminus1

(

i

n

)

darr Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

as i uarr lfloorsumj

k=0 vkτnrfloor In addition because the (j minus 1)st component is never

modified when ige lfloorsumj

k=0 vkτnrfloor it follows that

Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

rarr γjminus1(τ)

as nrarrinfin and ηrarr 0 Furthermore as observed previously (32) implies theexistence of b gt 0 and K isinN such that γj(τ)ge bτK for j = 0 I Thus atany given time step i we have a strictly positive lower bound on the relevantcomponent of Γn which in turn provides a strictly finite upper bound on thecorresponding relative entropy cost Indeed it follows from (33) and γj(τ)gebτK that for all sufficiently large n and small η gt 0 there are C1C2 ltinfin(and independent of τ ) such that whenever |Γn(0)minus γ(0)|lt η for all i

D(microni microΓn(in))leC1[minus log τK ]leminusC2 log τ

In addition as nrarrinfin and ηrarr 0

Γn(

1

nlfloorτnrfloor

)

rarr γ(τ)

By the Lebesgue dominated convergence theorem for all sufficiently large nand sufficiently small η gt 0 |Γn(0)minus γ(0)|lt η implies

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

leminusC2τ log τ

We now choose τ gt 0 so that minusC2τ log τ le ε2 Choosing τ gt 0 smaller ifneed be we can also guarantee that |Γn(x)minusγ(x)| le δ for all x isin [0 τ ] wp1if |Γn(0)minus γ(0)| lt η and η gt 0 is sufficiently small The following bound is

20 P DUPUIS C NUZMAN AND P WHITING

therefore valid for the given τ gt 0 for any σ gt 0 and all sufficiently smallη gt 0 |Γn(0)minus γ(0)|lt η implies

lim infnrarrinfin

1

nlogPΓn(0)

(∣

Γn(

lfloornτrfloor

n

)

minus γ

(

lfloornτrfloor

n

)∣

ltσ

supxisin[0τ ]

|Γn(x)minus γ(x)|lt δ

)

geminusε

2

Note that the asymptotic lower bound on the normalized log of the proba-bility is independent of σ gt 0 To obtain the lower bound for all x isin [0 β]we will use the Markov property and an existing lower bound for pathswhich avoid that boundary This latter lower bound will hold uniformly ina neighborhood of the initial condition γ(τ) Since we do not know a priorihow small this neighborhood must be it is important that the lower boundin the last display should be independent of σ gt 0

Now choose ζ isin (0 δ] such that γ(x) is at least distance 2ζ from theboundary of SI for all x isin [τ β] Recall that when considered as a func-tion of γ the distribution of bi(γ) is continuous in the weak topology andmoreover that the support of this distribution is independent of γ so longas γi gt 0 for all i isin 01 I + 1 (ie γ isin S

I where SI denotes interior

relative to the smallest affine space that contains SI) It then follows fromProposition 661 of [11] (see also the discussion on [11] page 165 regardinguniformity) that Γn n= 12 satisfies the following uniform large devi-ations lower bound given any ε gt 0 and ζ gt 0 defined above there is σ gt 0such that as long as |Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ

lim infnrarrinfin

1

nlogPΓn(lfloornτrfloorn)lfloornτrfloorn

(

supτlexleβ

|Γn(x)minus γ(x)|lt ζ

)

geminusJ(γ)minusε

2

where PΓn(lfloornτrfloorn)lfloornτrfloorn denotes probability given the occupancy levels Γn(lfloornτrfloorn)at time lfloornτrfloorn Proposition 661 of [11] assumes Condition 632 It is worthnoting that in the present setting this condition holds with the particularlysimple choice β = γ (using the notation of [11])

The lower bound (31) now follows by the Markov property and the lasttwo displays The proof that J has compact level sets is as in [10] andtherefore omitted

4 Examples and extensions In this section we apply the results of theprevious sections to three different occupancy problems We show how theparameters of interest may be computed numerically and plot the solutionsto the associated calculus of variations problems In Section 44 we list someother asymptotic problems of interest which can be solved by relativelystraightforward generalizations of the results presented in this paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 21

In the calculus of variations problems solved in Section 2 precise initialand terminal points were always given In typical applications one is inter-ested in the minimum value of the rate function over a constraint set Whenthe constraint set is sufficiently simple such problems may still be solvedeasily using the tools provided in Section 22 The three problems of thissection are of this type

Suppose that the event of interest is that the random endpoint Γn(β)should lie in a terminal constraint set Ω To apply the LDP for Γn(β)established by Corollary 23 one must compute exponents of the form

J (Ω) = infωisinΩ

J (ω)

Using Theorem 27 we can write

J (Ω) = infωisinΩ

infπisinF (αωβ)

Ksum

k=0

αkD(π(k)P(β))

(41)

= infπisinF (αΩβ)

Ksum

k=0

αkD(π(k)P(β))

where we have abused notation to define F (αΩ β) =⋃

ωisinΩF (αωβ)In many cases for example when the terminal set Ω is convex and defined

by linear constraints the exponent J can be computed directly from (41)using Lagrange multipliers That is one solves minimization problems of thetype given in Theorem 27 but with the endpoint constraints (210) replacedby constraints defining Ω This is the approach used in the second andthird example below We do not prove that appropriate Lagrange multipliersalways exist if needed existence may be established using methods similarto those used in the Appendix for the case Ω= ω Because of the convexityof relative entropy in (41) a local minimum is always a global minimum overconvex sets Hence in any particular scenario with convex Ω it is sufficientto establish a local minimum by numerically computing a set of Lagrangemultipliers

An alternative approach for computing J (Ω) could be to return to thesample path level and use natural boundary conditions on the extremalcurves (see [25])

A set Ω with interior Ω and closure Ω is a J -continuity set if and onlyif infωisinΩ J (ω) = infωisinΩJ (ω) For such sets the large deviations lower andupper bounds coincide so that minus lim1n logP (Γn(β) isinΩ) = infωisinΩJ (ω) Itmay readily be verified in each of the following examples that the event ofinterest is a J -continuity set

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 12: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

12 P DUPUIS C NUZMAN AND P WHITING

strict equalitysumI

i=0 iωi + (I + 1)ωI+ = β there are just enough balls to meetthe terminal constraints and we may replace I by I+1 if necessary to ensurethat ωI+ = 1minus

sumIi=0ωi = 0 In this case it turns out that F (1 ω β) has only

one element we then have C = 0 and we may take ρ = 1 Otherwise wedefine ρ to be the unique positive root of the equation

ρβ minussumI

i=0 iPi(ρβ)

1minussumI

i=0Pi(ρβ)=β minus

sumIi=0 iωi

1minussumI

i=0ωi

(25)

Because of the strict inequality in the conservation condition (24) the right-hand side of the last equation is strictly greater than I+1 The left-hand sideof the equation is the conditional mean E[Y |Y gt I] of a Poisson randomvariable Y with mean ρβ As a function of ρ this conditional mean is astrictly monotonic and continuous map from (0infin) to (I+1infin) and hencethe equation has exactly one positive root C is given by

C =1minus

sumIi=0ωi

1minussumI

i=0Pi(ρβ)=

β minussumI

i=0 iωi

ρβ minussumI

i=0 iPi(ρβ)(26)

Evaluating the relative entropy of πlowast and P(β) the rate function of Γn(β)can be expressed as

J (ω) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

(27)

+

(

β minusIsum

i=0

iωi

)

log ρ

The next theorem shows that the least cost path γlowast satisfying J(γlowast) =J (ω) can also be expressed explicitly in terms of the twist parameters Cand ρ The least cost path is of interest for a number of reasons Firstof all the proof of Theorem 25 is obtained by evaluating (22) using theexplicit form of γlowast In a similar way the least cost paths may be used as atool in other problems of interest such as for determining the rate functionof Γn(β) when β is random Second the least cost path provides insightinto the expected behavior of occupancy experiments conditioned on theoccurrence of a rare event Third they allow empirical estimation of rareevent probabilities by change-of-measure importance sampling A key stepin proving the minimality of the proposed least cost path is to show thatthe path satisfies the EulerndashLagrange equations (defined in the Appendix)

Theorem 26 (Globally minimizing path empty case) Suppose that(1ωβ) are feasible constraints with empty initial conditions The infimum of

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 13

J(γ) over A(1 ω β) is achieved on the occupancy path γ isinA(1 ω β) definedby

γ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(28)

γi(x) =xi

i(minus1)iγ

(i)0 (x) 1le ile I(29)

γI+(x) = 1minusIsum

i=0

γi(x)

where ρ gt 0 and C ge 0 are twist parameters associated with the constraintsIn addition γ satisfies the EulerndashLagrange equations

Note that the entire path γ(x) is completely determined by the emptycomponent γ0(x) and its derivatives In particular the components γi(x) are

the terms in the Taylor expansion of γ0(x) about x γ0(x+ y) = γ0(x) + yγ(1)0 (x) + middot middot middot

evaluated at time 0 that is with y = minusx Note that γ0(x) is the sum ofa polynomial and a single exponential term so that the Taylor expansionalways exists When C = 0 there is no exponential term and we say that γ isa polynomial extremal Otherwise C gt 0 we have an exponential extremal

222 General initial conditions Occupancy problems with general ini-tial conditions may be thought of as a coupled set of problems with emptyinitial conditions In particular we consider the set of urns initially contain-ing k balls to form a class The evolution of excess balls (beyond k) enteringurns of this class may be denoted by occupancy functions of the form γkj(t)representing the fraction of balls initially having k urns which have k + jurns at time t The fraction of urns containing i balls in the overall system isobtained by summing contributions from all subproblem components withk+ j = i

As in the empty case the rate function J (ω) can be expressed as thesolution to a minimization problem Let Sinfin denote the set of distributionson the nonnegative integers and let Sn

infin denote the set of n-tuples of suchdistributions Recall that K is set of indices k such that αk gt 0 We will

denote an element of S|K|infin by π = π(0) π(k) π(K) where for any

k isin K a component distribution is denoted π(k) = πk0 πk1 and it isunderstood that the corresponding π(k) is omitted if αk = 0 Finally let

F (αβω) be the set of π isin S|K|infin which satisfies the terminal constraints

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(210)

14 P DUPUIS C NUZMAN AND P WHITING

along with the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(211)

As in the empty case it may be established that F is nonempty if and onlyif (αωβ) is feasible

Theorem 27 (Terminal rate function general case) The rate functionJ (ω) SI rarrR

+ defined in Corollary 23 may be expressed

J (ω) = minπisinF (αωβ)

sum

kisinK

αkD(π(k)P(β))

whenever (αβω) are feasible and is infinite otherwise The minimizingargument πlowast isin F (αωβ) is unique

As discussed above we may suppose the constraints are irreducible andthen as shown in the Appendix Lagrange multipliers will always exist forthis problem When we have strict inequality in the conservation condi-tion (24) (the exponential case) the solution takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le ICkPj(ρβ) k isinK k+ j gt I

In the case of equality in (24) (the polynomial case) the correspondingform is

πlowastkj =

DkPj(β)Wk+j k isinK j + k le I0 k+ j gt I

As for empty initial conditions ρ may be associated with the conserva-tion condition The Wi correspond to the terminal constraints ωi and theCkDk are normalization constants The constants Ck ρ Wi and Di canall be computed numerically using Lagrangian methods for constrained op-timization (see eg [4]) Given these constants the optimizing trajectoryγ(t) may be constructed explicitly This construction may be most simplyexpressed in terms of the minimizing trajectories in the empty case Fork isinK denote the mean of πlowast(k) by βk =

suminfinj=0 jπ

lowastkj and define the terminal

condition ω(k) isin SIminusk by ωkj = πlowastkj for j = 0 Iminusk Then the constraints(1 ω(k) βk) are feasible constraints and Theorem 26 determines the associ-ated least cost paths which we denote γ(k) = γkj(x) The least cost pathsfor the subproblems combine to form the overall least cost path

Theorem 28 (Globally minimizing path general case) For irreducible

feasible constraints (αωβ) let πlowast isin S|K|infin be the unique minimizing distri-

bution in Theorem 27 and let the functions γkj [0 βk]rarr [01] be the min-imizing paths corresponding to the subproblems (1 ω(k) βk) The infimum

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 15

of J(γ) over A(αωβ) is achieved on the occupancy path γ isin A(αωβ)defined by

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

In addition γ satisfies the EulerndashLagrange equations

3 Proof of Theorem 22 The purpose of this section is to prove themain large deviations result We recall the processes and notation definedat the beginning of Section 2

For ζ isinRI+2 and γ isin SI define

H(γ ζ) = log(E[exp(ζ middot bi(γ))])

where ldquomiddotrdquo denotes inner product Since the support of bi(γ) is boundeduniformly in i and γ there exists a function h Rrarr R such that H(γ ζ)leh(|ζ|) for all ζ and γ Also since the distribution of bi(γ) is weakly continuousin γ H(γ ζ) is jointly continuous It follows from [10] Theorem 41 thatthe sequence Γn n= 12 satisfies a large deviation upper bound witha rate function J which we now define Let L be the LegendrendashFencheltransform of H(γ ζ) in ζ

L(γ η) = supζisinRI+2

[ζ middot ηminus H(γ ζ)]

If γ(x)0 le x le β is an absolutely continuous function that takes valuesin SI then

J(γ) =

int β

0L(γ(x) γ(x))dx

If γ is not absolutely continuous then J(γ) = infin ([10] assumes that thevector fields bi(γ) are defined for all γ isin R

I+2 It is easy to check that wecan extend the definition to this set with the bound H(γ ζ) le h(|ζ|) andthe continuity of H(γ ζ) preserved However if the process Γn starts in SI then it stays in SI and so the exact form of the extension has no effect onthe rate function)

We recall the definition of the matrix M given in (21) To complete theproof of the upper bound we must show that J = J where J is defined asin (22) This will hold if we can show that L(γ η) is finite only when η =Mθfor a unique probability vector θ and that in this case

L(γMθ) =D(θγ)

16 P DUPUIS C NUZMAN AND P WHITING

It is well known that L(γ η) is finite if and only if η is in the convexhull of the support of bi(γ) (see eg [11] Lemma 623(d)) Therefore ifL(γ η)ltinfin then η can be written as a convex combination of the form

θ0(e1 minus e0) + middot middot middot+ θI(eI+1 minus eI)

where θj ge 0 andsumI

j=0 θj le 1 Since the vectors (ej+1 minus ej) j = 01 I

are linearly independent these values are unique Setting θI+1 = 1minussumI

j=0 θj we have η =Mθ for a unique probability vector θ Now assume that η takesthis form Then

L(γMθ) = supζisinRI+2

[ζ middotMθminus log(E[exp(ζ middot bi(γ))])]

= supζisinRI+2

[

MT ζ middot θminus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

= supζisinRI+2

[

Isum

j=0

(ζj+1 minus ζj)θj minus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

Given any values micro0 microI+1 we can define ζ0 ζI+1 recursively by ζ0 =minusmicroI+1

and ζj+1minus ζj = microjminusmicroI+1 With these definitions it is apparent that the lastdisplay is equal to

supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1)minus log

([

Isum

j=0

γj exp(microj minus microI+1)

]

+ γI+1

)]

= supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1) + microI+1 minus log

(

I+1sum

j=0

γj expmicroj

)]

= supmicroisinRI+2

[

I+1sum

j=0

microjθj minus log

(

I+1sum

j=0

γj expmicroj

)]

According to the DonskerndashVaradhan variational formula for relative entropy(eg [11] Lemma 143(a)) the last display equals D(θγ) thereby com-pleting the proof of the upper bound

We turn now to the proof of the lower bound In this proof we will as-sume that γ0(0)gt 0 Since γ0 is always nonincreasing γ0(0) = 0 implies thatγ0(x) = 0 for all x and therefore under this condition the first componentplays no significant role A proof analogous to the one given below applieswhen γ0(0) = 0 where the role of γ0(0) here is played by the first positivecomponent of γ(0)

Let PΓn(0) [resp EΓn(0)] denote probability (resp expected value) givena deterministic initial occupancy Γn(0) To prove the large deviation lower

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 17

bound it suffices to show that given any ε gt 0 and δ gt 0 there is η gt 0 suchthat for any initial occupancies satisfying |Γn(0)minus γ(0)|lt η

lim infnrarrinfin

1

nlogPΓn(0)

(

sup0lexleβ

|Γn(x)minus γ(x)|lt δ

)

geminusJ(γ)minus ε(31)

Of course this inequality is trivial if J(γ) = infin and so we assume thatJ(γ)ltinfin

As we have remarked a source of difficulty is the singular behavior ofthe transition rates of the process when Γn is near the boundary of SI We first show that this can be avoided at all times save x= 0 To do thiswe show that for any a gt 0 there exist b gt 0 K isin N and an occupancyfunction y such that y(0) = γ(0) sup0lexleβ |y(x) minus γ(x)| lt a yj(x) gt bxK

for all j = 01 I I+ and 0lt xle β and such that

J(y)le J(γ)

Consider the zero cost trajectory defined by

z(x) =Mz(x) z(0) = γ(0)

We have the following expression for zj when j le I

zj(x) =

[ jsum

k=0

γk(0)x(jminusk)(j minus k)

]

eminusx(32)

It is easy to check from this explicit formula that zj(x) gt bxK for someb gt 0 K = I and all j = 01 I I+ and 0 lt x le β For ρ isin (01) letyρ = ρz + (1minus ρ)γ Then yρ is the occupancy function that corresponds tothe rate ρz + (1minus ρ)θ Using the joint convexity of relative entropy in bothvariables ([11] Lemma 143(b)) and the fact that D(zz) = 0 we have

J(yρ) =

int β

0D(ρz(x) + (1minus ρ)θ(x)ρz(x) + (1minus ρ)γ(x))dx

le ρ

int β

0D(z(x)z(x)) dx+ (1minus ρ)

int β

0D(θ(x)γ(x))dx

= (1minus ρ)J(γ)

All required properties are then obtained by letting y = yρ for suitably smallρ isin (01)

It follows that in proving the lower bound we can assume without loss ofgenerality that for some fixed constants b gt 0 and K isinN γi(x)ge bxK for allx isin [0 β] We now return to the proof of the lower bound Our first objectiveis to show that the process can be moved into a small neighborhood of γ(τ)

18 P DUPUIS C NUZMAN AND P WHITING

(for τ gt 0 small) with sufficiently high probability Given τ isin (0 β] σ gt 0and ε gt 0 define

h(γ) =

0 |γ minus γ(τ)|lt σ22ε else

For n large enough that |γ(lfloornτrfloorn) minus γ(τ)| le σ2 (and independent ofτ isin (0 β]) we have the inequality

PΓn(0)(|Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ) + eminus2nε

ge PΓn(0)(|Γn(lfloornτrfloorn)minus γ(τ)|lt σ2) + eminus2nε

geEΓn(0)(expminusnh(Γn(lfloornτrfloorn)))

We next exploit a representation for exponential integrals that will give usan explicit lower bound on the last quantity Consider a process Γn(x) con-structed as follows The process dynamics are of the same general structureas those of Γn save that bi(Γ

n(in)) is replaced by a sequence bni

Γn(

i+1

n

)

= Γn(

i

n

)

+1

nbni Γn(0) = Γn(0)

Furthermore the distribution of bni is allowed to depend in any measurableway upon the set of values Γn(jn)0le j le i Let microγ denote the distribu-tion of bi(γ) and (without explicitly exhibiting all the dependencies) let micronidenote the (random) distribution of bni given Γn(jn)0 le j le i We letEΓn(0) denote expectation on the space that supports these processes Itfollows from [11] Theorem 431 that

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

= inf EΓn(0)

[

h

(

Γn(

lfloornτrfloor

n

))

+1

n

lfloornτrfloorminus1sum

i=1

D(microni microΓn(in))

]

where the infimum is over all such processes Γn(x) In order to obtain a lowerbound we now simply insert a particular choice for the random variables bni We can write γ(τ) minus γ(0) =Mvτ for some probability vector v Define aprocess bni as follows

bni =

ej minus ejminus1 if

lfloor jminus1sum

k=0

vkτn

rfloor

le ile

lfloor jsum

k=0

vkτn

rfloor

minus 1 for 0le j le I

0 if

lfloor

Isum

k=0

vkτn

rfloor

le ile lfloorτnrfloor minus 1

In other words bni defines a deterministic discrete time approximation tothe continuous time occupancy rate process that uses e0 minus e1 for an amount

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 19

of time v0τ e1 minus e2 for an amount of time v1τ and so on This continuoustime process will move the occupancy process from γ(0) to γ(τ) at time τ If ej minus ejminus1 is used at the discrete time step i then since microni concentratesits mass on ej minus ejminus1 the cost is

D( microni microΓn(in)) = log

(

1

Γnjminus1(in)

)

=minus log

(

Γnjminus1

(

i

n

))

(33)

The process Γn(in) possesses important monotonicity and convergence

properties Since Γnjminus1((i+1)n)minus Γn

jminus1(in) =minus1n for lfloorsumjminus1

k=0 vkτnrfloor le ile

lfloorsumj

k=0 vkτnrfloorminus 1

Γnjminus1

(

i

n

)

darr Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

as i uarr lfloorsumj

k=0 vkτnrfloor In addition because the (j minus 1)st component is never

modified when ige lfloorsumj

k=0 vkτnrfloor it follows that

Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

rarr γjminus1(τ)

as nrarrinfin and ηrarr 0 Furthermore as observed previously (32) implies theexistence of b gt 0 and K isinN such that γj(τ)ge bτK for j = 0 I Thus atany given time step i we have a strictly positive lower bound on the relevantcomponent of Γn which in turn provides a strictly finite upper bound on thecorresponding relative entropy cost Indeed it follows from (33) and γj(τ)gebτK that for all sufficiently large n and small η gt 0 there are C1C2 ltinfin(and independent of τ ) such that whenever |Γn(0)minus γ(0)|lt η for all i

D(microni microΓn(in))leC1[minus log τK ]leminusC2 log τ

In addition as nrarrinfin and ηrarr 0

Γn(

1

nlfloorτnrfloor

)

rarr γ(τ)

By the Lebesgue dominated convergence theorem for all sufficiently large nand sufficiently small η gt 0 |Γn(0)minus γ(0)|lt η implies

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

leminusC2τ log τ

We now choose τ gt 0 so that minusC2τ log τ le ε2 Choosing τ gt 0 smaller ifneed be we can also guarantee that |Γn(x)minusγ(x)| le δ for all x isin [0 τ ] wp1if |Γn(0)minus γ(0)| lt η and η gt 0 is sufficiently small The following bound is

20 P DUPUIS C NUZMAN AND P WHITING

therefore valid for the given τ gt 0 for any σ gt 0 and all sufficiently smallη gt 0 |Γn(0)minus γ(0)|lt η implies

lim infnrarrinfin

1

nlogPΓn(0)

(∣

Γn(

lfloornτrfloor

n

)

minus γ

(

lfloornτrfloor

n

)∣

ltσ

supxisin[0τ ]

|Γn(x)minus γ(x)|lt δ

)

geminusε

2

Note that the asymptotic lower bound on the normalized log of the proba-bility is independent of σ gt 0 To obtain the lower bound for all x isin [0 β]we will use the Markov property and an existing lower bound for pathswhich avoid that boundary This latter lower bound will hold uniformly ina neighborhood of the initial condition γ(τ) Since we do not know a priorihow small this neighborhood must be it is important that the lower boundin the last display should be independent of σ gt 0

Now choose ζ isin (0 δ] such that γ(x) is at least distance 2ζ from theboundary of SI for all x isin [τ β] Recall that when considered as a func-tion of γ the distribution of bi(γ) is continuous in the weak topology andmoreover that the support of this distribution is independent of γ so longas γi gt 0 for all i isin 01 I + 1 (ie γ isin S

I where SI denotes interior

relative to the smallest affine space that contains SI) It then follows fromProposition 661 of [11] (see also the discussion on [11] page 165 regardinguniformity) that Γn n= 12 satisfies the following uniform large devi-ations lower bound given any ε gt 0 and ζ gt 0 defined above there is σ gt 0such that as long as |Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ

lim infnrarrinfin

1

nlogPΓn(lfloornτrfloorn)lfloornτrfloorn

(

supτlexleβ

|Γn(x)minus γ(x)|lt ζ

)

geminusJ(γ)minusε

2

where PΓn(lfloornτrfloorn)lfloornτrfloorn denotes probability given the occupancy levels Γn(lfloornτrfloorn)at time lfloornτrfloorn Proposition 661 of [11] assumes Condition 632 It is worthnoting that in the present setting this condition holds with the particularlysimple choice β = γ (using the notation of [11])

The lower bound (31) now follows by the Markov property and the lasttwo displays The proof that J has compact level sets is as in [10] andtherefore omitted

4 Examples and extensions In this section we apply the results of theprevious sections to three different occupancy problems We show how theparameters of interest may be computed numerically and plot the solutionsto the associated calculus of variations problems In Section 44 we list someother asymptotic problems of interest which can be solved by relativelystraightforward generalizations of the results presented in this paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 21

In the calculus of variations problems solved in Section 2 precise initialand terminal points were always given In typical applications one is inter-ested in the minimum value of the rate function over a constraint set Whenthe constraint set is sufficiently simple such problems may still be solvedeasily using the tools provided in Section 22 The three problems of thissection are of this type

Suppose that the event of interest is that the random endpoint Γn(β)should lie in a terminal constraint set Ω To apply the LDP for Γn(β)established by Corollary 23 one must compute exponents of the form

J (Ω) = infωisinΩ

J (ω)

Using Theorem 27 we can write

J (Ω) = infωisinΩ

infπisinF (αωβ)

Ksum

k=0

αkD(π(k)P(β))

(41)

= infπisinF (αΩβ)

Ksum

k=0

αkD(π(k)P(β))

where we have abused notation to define F (αΩ β) =⋃

ωisinΩF (αωβ)In many cases for example when the terminal set Ω is convex and defined

by linear constraints the exponent J can be computed directly from (41)using Lagrange multipliers That is one solves minimization problems of thetype given in Theorem 27 but with the endpoint constraints (210) replacedby constraints defining Ω This is the approach used in the second andthird example below We do not prove that appropriate Lagrange multipliersalways exist if needed existence may be established using methods similarto those used in the Appendix for the case Ω= ω Because of the convexityof relative entropy in (41) a local minimum is always a global minimum overconvex sets Hence in any particular scenario with convex Ω it is sufficientto establish a local minimum by numerically computing a set of Lagrangemultipliers

An alternative approach for computing J (Ω) could be to return to thesample path level and use natural boundary conditions on the extremalcurves (see [25])

A set Ω with interior Ω and closure Ω is a J -continuity set if and onlyif infωisinΩ J (ω) = infωisinΩJ (ω) For such sets the large deviations lower andupper bounds coincide so that minus lim1n logP (Γn(β) isinΩ) = infωisinΩJ (ω) Itmay readily be verified in each of the following examples that the event ofinterest is a J -continuity set

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 13: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 13

J(γ) over A(1 ω β) is achieved on the occupancy path γ isinA(1 ω β) definedby

γ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(28)

γi(x) =xi

i(minus1)iγ

(i)0 (x) 1le ile I(29)

γI+(x) = 1minusIsum

i=0

γi(x)

where ρ gt 0 and C ge 0 are twist parameters associated with the constraintsIn addition γ satisfies the EulerndashLagrange equations

Note that the entire path γ(x) is completely determined by the emptycomponent γ0(x) and its derivatives In particular the components γi(x) are

the terms in the Taylor expansion of γ0(x) about x γ0(x+ y) = γ0(x) + yγ(1)0 (x) + middot middot middot

evaluated at time 0 that is with y = minusx Note that γ0(x) is the sum ofa polynomial and a single exponential term so that the Taylor expansionalways exists When C = 0 there is no exponential term and we say that γ isa polynomial extremal Otherwise C gt 0 we have an exponential extremal

222 General initial conditions Occupancy problems with general ini-tial conditions may be thought of as a coupled set of problems with emptyinitial conditions In particular we consider the set of urns initially contain-ing k balls to form a class The evolution of excess balls (beyond k) enteringurns of this class may be denoted by occupancy functions of the form γkj(t)representing the fraction of balls initially having k urns which have k + jurns at time t The fraction of urns containing i balls in the overall system isobtained by summing contributions from all subproblem components withk+ j = i

As in the empty case the rate function J (ω) can be expressed as thesolution to a minimization problem Let Sinfin denote the set of distributionson the nonnegative integers and let Sn

infin denote the set of n-tuples of suchdistributions Recall that K is set of indices k such that αk gt 0 We will

denote an element of S|K|infin by π = π(0) π(k) π(K) where for any

k isin K a component distribution is denoted π(k) = πk0 πk1 and it isunderstood that the corresponding π(k) is omitted if αk = 0 Finally let

F (αβω) be the set of π isin S|K|infin which satisfies the terminal constraints

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(210)

14 P DUPUIS C NUZMAN AND P WHITING

along with the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(211)

As in the empty case it may be established that F is nonempty if and onlyif (αωβ) is feasible

Theorem 27 (Terminal rate function general case) The rate functionJ (ω) SI rarrR

+ defined in Corollary 23 may be expressed

J (ω) = minπisinF (αωβ)

sum

kisinK

αkD(π(k)P(β))

whenever (αβω) are feasible and is infinite otherwise The minimizingargument πlowast isin F (αωβ) is unique

As discussed above we may suppose the constraints are irreducible andthen as shown in the Appendix Lagrange multipliers will always exist forthis problem When we have strict inequality in the conservation condi-tion (24) (the exponential case) the solution takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le ICkPj(ρβ) k isinK k+ j gt I

In the case of equality in (24) (the polynomial case) the correspondingform is

πlowastkj =

DkPj(β)Wk+j k isinK j + k le I0 k+ j gt I

As for empty initial conditions ρ may be associated with the conserva-tion condition The Wi correspond to the terminal constraints ωi and theCkDk are normalization constants The constants Ck ρ Wi and Di canall be computed numerically using Lagrangian methods for constrained op-timization (see eg [4]) Given these constants the optimizing trajectoryγ(t) may be constructed explicitly This construction may be most simplyexpressed in terms of the minimizing trajectories in the empty case Fork isinK denote the mean of πlowast(k) by βk =

suminfinj=0 jπ

lowastkj and define the terminal

condition ω(k) isin SIminusk by ωkj = πlowastkj for j = 0 Iminusk Then the constraints(1 ω(k) βk) are feasible constraints and Theorem 26 determines the associ-ated least cost paths which we denote γ(k) = γkj(x) The least cost pathsfor the subproblems combine to form the overall least cost path

Theorem 28 (Globally minimizing path general case) For irreducible

feasible constraints (αωβ) let πlowast isin S|K|infin be the unique minimizing distri-

bution in Theorem 27 and let the functions γkj [0 βk]rarr [01] be the min-imizing paths corresponding to the subproblems (1 ω(k) βk) The infimum

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 15

of J(γ) over A(αωβ) is achieved on the occupancy path γ isin A(αωβ)defined by

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

In addition γ satisfies the EulerndashLagrange equations

3 Proof of Theorem 22 The purpose of this section is to prove themain large deviations result We recall the processes and notation definedat the beginning of Section 2

For ζ isinRI+2 and γ isin SI define

H(γ ζ) = log(E[exp(ζ middot bi(γ))])

where ldquomiddotrdquo denotes inner product Since the support of bi(γ) is boundeduniformly in i and γ there exists a function h Rrarr R such that H(γ ζ)leh(|ζ|) for all ζ and γ Also since the distribution of bi(γ) is weakly continuousin γ H(γ ζ) is jointly continuous It follows from [10] Theorem 41 thatthe sequence Γn n= 12 satisfies a large deviation upper bound witha rate function J which we now define Let L be the LegendrendashFencheltransform of H(γ ζ) in ζ

L(γ η) = supζisinRI+2

[ζ middot ηminus H(γ ζ)]

If γ(x)0 le x le β is an absolutely continuous function that takes valuesin SI then

J(γ) =

int β

0L(γ(x) γ(x))dx

If γ is not absolutely continuous then J(γ) = infin ([10] assumes that thevector fields bi(γ) are defined for all γ isin R

I+2 It is easy to check that wecan extend the definition to this set with the bound H(γ ζ) le h(|ζ|) andthe continuity of H(γ ζ) preserved However if the process Γn starts in SI then it stays in SI and so the exact form of the extension has no effect onthe rate function)

We recall the definition of the matrix M given in (21) To complete theproof of the upper bound we must show that J = J where J is defined asin (22) This will hold if we can show that L(γ η) is finite only when η =Mθfor a unique probability vector θ and that in this case

L(γMθ) =D(θγ)

16 P DUPUIS C NUZMAN AND P WHITING

It is well known that L(γ η) is finite if and only if η is in the convexhull of the support of bi(γ) (see eg [11] Lemma 623(d)) Therefore ifL(γ η)ltinfin then η can be written as a convex combination of the form

θ0(e1 minus e0) + middot middot middot+ θI(eI+1 minus eI)

where θj ge 0 andsumI

j=0 θj le 1 Since the vectors (ej+1 minus ej) j = 01 I

are linearly independent these values are unique Setting θI+1 = 1minussumI

j=0 θj we have η =Mθ for a unique probability vector θ Now assume that η takesthis form Then

L(γMθ) = supζisinRI+2

[ζ middotMθminus log(E[exp(ζ middot bi(γ))])]

= supζisinRI+2

[

MT ζ middot θminus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

= supζisinRI+2

[

Isum

j=0

(ζj+1 minus ζj)θj minus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

Given any values micro0 microI+1 we can define ζ0 ζI+1 recursively by ζ0 =minusmicroI+1

and ζj+1minus ζj = microjminusmicroI+1 With these definitions it is apparent that the lastdisplay is equal to

supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1)minus log

([

Isum

j=0

γj exp(microj minus microI+1)

]

+ γI+1

)]

= supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1) + microI+1 minus log

(

I+1sum

j=0

γj expmicroj

)]

= supmicroisinRI+2

[

I+1sum

j=0

microjθj minus log

(

I+1sum

j=0

γj expmicroj

)]

According to the DonskerndashVaradhan variational formula for relative entropy(eg [11] Lemma 143(a)) the last display equals D(θγ) thereby com-pleting the proof of the upper bound

We turn now to the proof of the lower bound In this proof we will as-sume that γ0(0)gt 0 Since γ0 is always nonincreasing γ0(0) = 0 implies thatγ0(x) = 0 for all x and therefore under this condition the first componentplays no significant role A proof analogous to the one given below applieswhen γ0(0) = 0 where the role of γ0(0) here is played by the first positivecomponent of γ(0)

Let PΓn(0) [resp EΓn(0)] denote probability (resp expected value) givena deterministic initial occupancy Γn(0) To prove the large deviation lower

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 17

bound it suffices to show that given any ε gt 0 and δ gt 0 there is η gt 0 suchthat for any initial occupancies satisfying |Γn(0)minus γ(0)|lt η

lim infnrarrinfin

1

nlogPΓn(0)

(

sup0lexleβ

|Γn(x)minus γ(x)|lt δ

)

geminusJ(γ)minus ε(31)

Of course this inequality is trivial if J(γ) = infin and so we assume thatJ(γ)ltinfin

As we have remarked a source of difficulty is the singular behavior ofthe transition rates of the process when Γn is near the boundary of SI We first show that this can be avoided at all times save x= 0 To do thiswe show that for any a gt 0 there exist b gt 0 K isin N and an occupancyfunction y such that y(0) = γ(0) sup0lexleβ |y(x) minus γ(x)| lt a yj(x) gt bxK

for all j = 01 I I+ and 0lt xle β and such that

J(y)le J(γ)

Consider the zero cost trajectory defined by

z(x) =Mz(x) z(0) = γ(0)

We have the following expression for zj when j le I

zj(x) =

[ jsum

k=0

γk(0)x(jminusk)(j minus k)

]

eminusx(32)

It is easy to check from this explicit formula that zj(x) gt bxK for someb gt 0 K = I and all j = 01 I I+ and 0 lt x le β For ρ isin (01) letyρ = ρz + (1minus ρ)γ Then yρ is the occupancy function that corresponds tothe rate ρz + (1minus ρ)θ Using the joint convexity of relative entropy in bothvariables ([11] Lemma 143(b)) and the fact that D(zz) = 0 we have

J(yρ) =

int β

0D(ρz(x) + (1minus ρ)θ(x)ρz(x) + (1minus ρ)γ(x))dx

le ρ

int β

0D(z(x)z(x)) dx+ (1minus ρ)

int β

0D(θ(x)γ(x))dx

= (1minus ρ)J(γ)

All required properties are then obtained by letting y = yρ for suitably smallρ isin (01)

It follows that in proving the lower bound we can assume without loss ofgenerality that for some fixed constants b gt 0 and K isinN γi(x)ge bxK for allx isin [0 β] We now return to the proof of the lower bound Our first objectiveis to show that the process can be moved into a small neighborhood of γ(τ)

18 P DUPUIS C NUZMAN AND P WHITING

(for τ gt 0 small) with sufficiently high probability Given τ isin (0 β] σ gt 0and ε gt 0 define

h(γ) =

0 |γ minus γ(τ)|lt σ22ε else

For n large enough that |γ(lfloornτrfloorn) minus γ(τ)| le σ2 (and independent ofτ isin (0 β]) we have the inequality

PΓn(0)(|Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ) + eminus2nε

ge PΓn(0)(|Γn(lfloornτrfloorn)minus γ(τ)|lt σ2) + eminus2nε

geEΓn(0)(expminusnh(Γn(lfloornτrfloorn)))

We next exploit a representation for exponential integrals that will give usan explicit lower bound on the last quantity Consider a process Γn(x) con-structed as follows The process dynamics are of the same general structureas those of Γn save that bi(Γ

n(in)) is replaced by a sequence bni

Γn(

i+1

n

)

= Γn(

i

n

)

+1

nbni Γn(0) = Γn(0)

Furthermore the distribution of bni is allowed to depend in any measurableway upon the set of values Γn(jn)0le j le i Let microγ denote the distribu-tion of bi(γ) and (without explicitly exhibiting all the dependencies) let micronidenote the (random) distribution of bni given Γn(jn)0 le j le i We letEΓn(0) denote expectation on the space that supports these processes Itfollows from [11] Theorem 431 that

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

= inf EΓn(0)

[

h

(

Γn(

lfloornτrfloor

n

))

+1

n

lfloornτrfloorminus1sum

i=1

D(microni microΓn(in))

]

where the infimum is over all such processes Γn(x) In order to obtain a lowerbound we now simply insert a particular choice for the random variables bni We can write γ(τ) minus γ(0) =Mvτ for some probability vector v Define aprocess bni as follows

bni =

ej minus ejminus1 if

lfloor jminus1sum

k=0

vkτn

rfloor

le ile

lfloor jsum

k=0

vkτn

rfloor

minus 1 for 0le j le I

0 if

lfloor

Isum

k=0

vkτn

rfloor

le ile lfloorτnrfloor minus 1

In other words bni defines a deterministic discrete time approximation tothe continuous time occupancy rate process that uses e0 minus e1 for an amount

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 19

of time v0τ e1 minus e2 for an amount of time v1τ and so on This continuoustime process will move the occupancy process from γ(0) to γ(τ) at time τ If ej minus ejminus1 is used at the discrete time step i then since microni concentratesits mass on ej minus ejminus1 the cost is

D( microni microΓn(in)) = log

(

1

Γnjminus1(in)

)

=minus log

(

Γnjminus1

(

i

n

))

(33)

The process Γn(in) possesses important monotonicity and convergence

properties Since Γnjminus1((i+1)n)minus Γn

jminus1(in) =minus1n for lfloorsumjminus1

k=0 vkτnrfloor le ile

lfloorsumj

k=0 vkτnrfloorminus 1

Γnjminus1

(

i

n

)

darr Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

as i uarr lfloorsumj

k=0 vkτnrfloor In addition because the (j minus 1)st component is never

modified when ige lfloorsumj

k=0 vkτnrfloor it follows that

Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

rarr γjminus1(τ)

as nrarrinfin and ηrarr 0 Furthermore as observed previously (32) implies theexistence of b gt 0 and K isinN such that γj(τ)ge bτK for j = 0 I Thus atany given time step i we have a strictly positive lower bound on the relevantcomponent of Γn which in turn provides a strictly finite upper bound on thecorresponding relative entropy cost Indeed it follows from (33) and γj(τ)gebτK that for all sufficiently large n and small η gt 0 there are C1C2 ltinfin(and independent of τ ) such that whenever |Γn(0)minus γ(0)|lt η for all i

D(microni microΓn(in))leC1[minus log τK ]leminusC2 log τ

In addition as nrarrinfin and ηrarr 0

Γn(

1

nlfloorτnrfloor

)

rarr γ(τ)

By the Lebesgue dominated convergence theorem for all sufficiently large nand sufficiently small η gt 0 |Γn(0)minus γ(0)|lt η implies

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

leminusC2τ log τ

We now choose τ gt 0 so that minusC2τ log τ le ε2 Choosing τ gt 0 smaller ifneed be we can also guarantee that |Γn(x)minusγ(x)| le δ for all x isin [0 τ ] wp1if |Γn(0)minus γ(0)| lt η and η gt 0 is sufficiently small The following bound is

20 P DUPUIS C NUZMAN AND P WHITING

therefore valid for the given τ gt 0 for any σ gt 0 and all sufficiently smallη gt 0 |Γn(0)minus γ(0)|lt η implies

lim infnrarrinfin

1

nlogPΓn(0)

(∣

Γn(

lfloornτrfloor

n

)

minus γ

(

lfloornτrfloor

n

)∣

ltσ

supxisin[0τ ]

|Γn(x)minus γ(x)|lt δ

)

geminusε

2

Note that the asymptotic lower bound on the normalized log of the proba-bility is independent of σ gt 0 To obtain the lower bound for all x isin [0 β]we will use the Markov property and an existing lower bound for pathswhich avoid that boundary This latter lower bound will hold uniformly ina neighborhood of the initial condition γ(τ) Since we do not know a priorihow small this neighborhood must be it is important that the lower boundin the last display should be independent of σ gt 0

Now choose ζ isin (0 δ] such that γ(x) is at least distance 2ζ from theboundary of SI for all x isin [τ β] Recall that when considered as a func-tion of γ the distribution of bi(γ) is continuous in the weak topology andmoreover that the support of this distribution is independent of γ so longas γi gt 0 for all i isin 01 I + 1 (ie γ isin S

I where SI denotes interior

relative to the smallest affine space that contains SI) It then follows fromProposition 661 of [11] (see also the discussion on [11] page 165 regardinguniformity) that Γn n= 12 satisfies the following uniform large devi-ations lower bound given any ε gt 0 and ζ gt 0 defined above there is σ gt 0such that as long as |Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ

lim infnrarrinfin

1

nlogPΓn(lfloornτrfloorn)lfloornτrfloorn

(

supτlexleβ

|Γn(x)minus γ(x)|lt ζ

)

geminusJ(γ)minusε

2

where PΓn(lfloornτrfloorn)lfloornτrfloorn denotes probability given the occupancy levels Γn(lfloornτrfloorn)at time lfloornτrfloorn Proposition 661 of [11] assumes Condition 632 It is worthnoting that in the present setting this condition holds with the particularlysimple choice β = γ (using the notation of [11])

The lower bound (31) now follows by the Markov property and the lasttwo displays The proof that J has compact level sets is as in [10] andtherefore omitted

4 Examples and extensions In this section we apply the results of theprevious sections to three different occupancy problems We show how theparameters of interest may be computed numerically and plot the solutionsto the associated calculus of variations problems In Section 44 we list someother asymptotic problems of interest which can be solved by relativelystraightforward generalizations of the results presented in this paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 21

In the calculus of variations problems solved in Section 2 precise initialand terminal points were always given In typical applications one is inter-ested in the minimum value of the rate function over a constraint set Whenthe constraint set is sufficiently simple such problems may still be solvedeasily using the tools provided in Section 22 The three problems of thissection are of this type

Suppose that the event of interest is that the random endpoint Γn(β)should lie in a terminal constraint set Ω To apply the LDP for Γn(β)established by Corollary 23 one must compute exponents of the form

J (Ω) = infωisinΩ

J (ω)

Using Theorem 27 we can write

J (Ω) = infωisinΩ

infπisinF (αωβ)

Ksum

k=0

αkD(π(k)P(β))

(41)

= infπisinF (αΩβ)

Ksum

k=0

αkD(π(k)P(β))

where we have abused notation to define F (αΩ β) =⋃

ωisinΩF (αωβ)In many cases for example when the terminal set Ω is convex and defined

by linear constraints the exponent J can be computed directly from (41)using Lagrange multipliers That is one solves minimization problems of thetype given in Theorem 27 but with the endpoint constraints (210) replacedby constraints defining Ω This is the approach used in the second andthird example below We do not prove that appropriate Lagrange multipliersalways exist if needed existence may be established using methods similarto those used in the Appendix for the case Ω= ω Because of the convexityof relative entropy in (41) a local minimum is always a global minimum overconvex sets Hence in any particular scenario with convex Ω it is sufficientto establish a local minimum by numerically computing a set of Lagrangemultipliers

An alternative approach for computing J (Ω) could be to return to thesample path level and use natural boundary conditions on the extremalcurves (see [25])

A set Ω with interior Ω and closure Ω is a J -continuity set if and onlyif infωisinΩ J (ω) = infωisinΩJ (ω) For such sets the large deviations lower andupper bounds coincide so that minus lim1n logP (Γn(β) isinΩ) = infωisinΩJ (ω) Itmay readily be verified in each of the following examples that the event ofinterest is a J -continuity set

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 14: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

14 P DUPUIS C NUZMAN AND P WHITING

along with the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(211)

As in the empty case it may be established that F is nonempty if and onlyif (αωβ) is feasible

Theorem 27 (Terminal rate function general case) The rate functionJ (ω) SI rarrR

+ defined in Corollary 23 may be expressed

J (ω) = minπisinF (αωβ)

sum

kisinK

αkD(π(k)P(β))

whenever (αβω) are feasible and is infinite otherwise The minimizingargument πlowast isin F (αωβ) is unique

As discussed above we may suppose the constraints are irreducible andthen as shown in the Appendix Lagrange multipliers will always exist forthis problem When we have strict inequality in the conservation condi-tion (24) (the exponential case) the solution takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le ICkPj(ρβ) k isinK k+ j gt I

In the case of equality in (24) (the polynomial case) the correspondingform is

πlowastkj =

DkPj(β)Wk+j k isinK j + k le I0 k+ j gt I

As for empty initial conditions ρ may be associated with the conserva-tion condition The Wi correspond to the terminal constraints ωi and theCkDk are normalization constants The constants Ck ρ Wi and Di canall be computed numerically using Lagrangian methods for constrained op-timization (see eg [4]) Given these constants the optimizing trajectoryγ(t) may be constructed explicitly This construction may be most simplyexpressed in terms of the minimizing trajectories in the empty case Fork isinK denote the mean of πlowast(k) by βk =

suminfinj=0 jπ

lowastkj and define the terminal

condition ω(k) isin SIminusk by ωkj = πlowastkj for j = 0 Iminusk Then the constraints(1 ω(k) βk) are feasible constraints and Theorem 26 determines the associ-ated least cost paths which we denote γ(k) = γkj(x) The least cost pathsfor the subproblems combine to form the overall least cost path

Theorem 28 (Globally minimizing path general case) For irreducible

feasible constraints (αωβ) let πlowast isin S|K|infin be the unique minimizing distri-

bution in Theorem 27 and let the functions γkj [0 βk]rarr [01] be the min-imizing paths corresponding to the subproblems (1 ω(k) βk) The infimum

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 15

of J(γ) over A(αωβ) is achieved on the occupancy path γ isin A(αωβ)defined by

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

In addition γ satisfies the EulerndashLagrange equations

3 Proof of Theorem 22 The purpose of this section is to prove themain large deviations result We recall the processes and notation definedat the beginning of Section 2

For ζ isinRI+2 and γ isin SI define

H(γ ζ) = log(E[exp(ζ middot bi(γ))])

where ldquomiddotrdquo denotes inner product Since the support of bi(γ) is boundeduniformly in i and γ there exists a function h Rrarr R such that H(γ ζ)leh(|ζ|) for all ζ and γ Also since the distribution of bi(γ) is weakly continuousin γ H(γ ζ) is jointly continuous It follows from [10] Theorem 41 thatthe sequence Γn n= 12 satisfies a large deviation upper bound witha rate function J which we now define Let L be the LegendrendashFencheltransform of H(γ ζ) in ζ

L(γ η) = supζisinRI+2

[ζ middot ηminus H(γ ζ)]

If γ(x)0 le x le β is an absolutely continuous function that takes valuesin SI then

J(γ) =

int β

0L(γ(x) γ(x))dx

If γ is not absolutely continuous then J(γ) = infin ([10] assumes that thevector fields bi(γ) are defined for all γ isin R

I+2 It is easy to check that wecan extend the definition to this set with the bound H(γ ζ) le h(|ζ|) andthe continuity of H(γ ζ) preserved However if the process Γn starts in SI then it stays in SI and so the exact form of the extension has no effect onthe rate function)

We recall the definition of the matrix M given in (21) To complete theproof of the upper bound we must show that J = J where J is defined asin (22) This will hold if we can show that L(γ η) is finite only when η =Mθfor a unique probability vector θ and that in this case

L(γMθ) =D(θγ)

16 P DUPUIS C NUZMAN AND P WHITING

It is well known that L(γ η) is finite if and only if η is in the convexhull of the support of bi(γ) (see eg [11] Lemma 623(d)) Therefore ifL(γ η)ltinfin then η can be written as a convex combination of the form

θ0(e1 minus e0) + middot middot middot+ θI(eI+1 minus eI)

where θj ge 0 andsumI

j=0 θj le 1 Since the vectors (ej+1 minus ej) j = 01 I

are linearly independent these values are unique Setting θI+1 = 1minussumI

j=0 θj we have η =Mθ for a unique probability vector θ Now assume that η takesthis form Then

L(γMθ) = supζisinRI+2

[ζ middotMθminus log(E[exp(ζ middot bi(γ))])]

= supζisinRI+2

[

MT ζ middot θminus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

= supζisinRI+2

[

Isum

j=0

(ζj+1 minus ζj)θj minus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

Given any values micro0 microI+1 we can define ζ0 ζI+1 recursively by ζ0 =minusmicroI+1

and ζj+1minus ζj = microjminusmicroI+1 With these definitions it is apparent that the lastdisplay is equal to

supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1)minus log

([

Isum

j=0

γj exp(microj minus microI+1)

]

+ γI+1

)]

= supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1) + microI+1 minus log

(

I+1sum

j=0

γj expmicroj

)]

= supmicroisinRI+2

[

I+1sum

j=0

microjθj minus log

(

I+1sum

j=0

γj expmicroj

)]

According to the DonskerndashVaradhan variational formula for relative entropy(eg [11] Lemma 143(a)) the last display equals D(θγ) thereby com-pleting the proof of the upper bound

We turn now to the proof of the lower bound In this proof we will as-sume that γ0(0)gt 0 Since γ0 is always nonincreasing γ0(0) = 0 implies thatγ0(x) = 0 for all x and therefore under this condition the first componentplays no significant role A proof analogous to the one given below applieswhen γ0(0) = 0 where the role of γ0(0) here is played by the first positivecomponent of γ(0)

Let PΓn(0) [resp EΓn(0)] denote probability (resp expected value) givena deterministic initial occupancy Γn(0) To prove the large deviation lower

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 17

bound it suffices to show that given any ε gt 0 and δ gt 0 there is η gt 0 suchthat for any initial occupancies satisfying |Γn(0)minus γ(0)|lt η

lim infnrarrinfin

1

nlogPΓn(0)

(

sup0lexleβ

|Γn(x)minus γ(x)|lt δ

)

geminusJ(γ)minus ε(31)

Of course this inequality is trivial if J(γ) = infin and so we assume thatJ(γ)ltinfin

As we have remarked a source of difficulty is the singular behavior ofthe transition rates of the process when Γn is near the boundary of SI We first show that this can be avoided at all times save x= 0 To do thiswe show that for any a gt 0 there exist b gt 0 K isin N and an occupancyfunction y such that y(0) = γ(0) sup0lexleβ |y(x) minus γ(x)| lt a yj(x) gt bxK

for all j = 01 I I+ and 0lt xle β and such that

J(y)le J(γ)

Consider the zero cost trajectory defined by

z(x) =Mz(x) z(0) = γ(0)

We have the following expression for zj when j le I

zj(x) =

[ jsum

k=0

γk(0)x(jminusk)(j minus k)

]

eminusx(32)

It is easy to check from this explicit formula that zj(x) gt bxK for someb gt 0 K = I and all j = 01 I I+ and 0 lt x le β For ρ isin (01) letyρ = ρz + (1minus ρ)γ Then yρ is the occupancy function that corresponds tothe rate ρz + (1minus ρ)θ Using the joint convexity of relative entropy in bothvariables ([11] Lemma 143(b)) and the fact that D(zz) = 0 we have

J(yρ) =

int β

0D(ρz(x) + (1minus ρ)θ(x)ρz(x) + (1minus ρ)γ(x))dx

le ρ

int β

0D(z(x)z(x)) dx+ (1minus ρ)

int β

0D(θ(x)γ(x))dx

= (1minus ρ)J(γ)

All required properties are then obtained by letting y = yρ for suitably smallρ isin (01)

It follows that in proving the lower bound we can assume without loss ofgenerality that for some fixed constants b gt 0 and K isinN γi(x)ge bxK for allx isin [0 β] We now return to the proof of the lower bound Our first objectiveis to show that the process can be moved into a small neighborhood of γ(τ)

18 P DUPUIS C NUZMAN AND P WHITING

(for τ gt 0 small) with sufficiently high probability Given τ isin (0 β] σ gt 0and ε gt 0 define

h(γ) =

0 |γ minus γ(τ)|lt σ22ε else

For n large enough that |γ(lfloornτrfloorn) minus γ(τ)| le σ2 (and independent ofτ isin (0 β]) we have the inequality

PΓn(0)(|Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ) + eminus2nε

ge PΓn(0)(|Γn(lfloornτrfloorn)minus γ(τ)|lt σ2) + eminus2nε

geEΓn(0)(expminusnh(Γn(lfloornτrfloorn)))

We next exploit a representation for exponential integrals that will give usan explicit lower bound on the last quantity Consider a process Γn(x) con-structed as follows The process dynamics are of the same general structureas those of Γn save that bi(Γ

n(in)) is replaced by a sequence bni

Γn(

i+1

n

)

= Γn(

i

n

)

+1

nbni Γn(0) = Γn(0)

Furthermore the distribution of bni is allowed to depend in any measurableway upon the set of values Γn(jn)0le j le i Let microγ denote the distribu-tion of bi(γ) and (without explicitly exhibiting all the dependencies) let micronidenote the (random) distribution of bni given Γn(jn)0 le j le i We letEΓn(0) denote expectation on the space that supports these processes Itfollows from [11] Theorem 431 that

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

= inf EΓn(0)

[

h

(

Γn(

lfloornτrfloor

n

))

+1

n

lfloornτrfloorminus1sum

i=1

D(microni microΓn(in))

]

where the infimum is over all such processes Γn(x) In order to obtain a lowerbound we now simply insert a particular choice for the random variables bni We can write γ(τ) minus γ(0) =Mvτ for some probability vector v Define aprocess bni as follows

bni =

ej minus ejminus1 if

lfloor jminus1sum

k=0

vkτn

rfloor

le ile

lfloor jsum

k=0

vkτn

rfloor

minus 1 for 0le j le I

0 if

lfloor

Isum

k=0

vkτn

rfloor

le ile lfloorτnrfloor minus 1

In other words bni defines a deterministic discrete time approximation tothe continuous time occupancy rate process that uses e0 minus e1 for an amount

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 19

of time v0τ e1 minus e2 for an amount of time v1τ and so on This continuoustime process will move the occupancy process from γ(0) to γ(τ) at time τ If ej minus ejminus1 is used at the discrete time step i then since microni concentratesits mass on ej minus ejminus1 the cost is

D( microni microΓn(in)) = log

(

1

Γnjminus1(in)

)

=minus log

(

Γnjminus1

(

i

n

))

(33)

The process Γn(in) possesses important monotonicity and convergence

properties Since Γnjminus1((i+1)n)minus Γn

jminus1(in) =minus1n for lfloorsumjminus1

k=0 vkτnrfloor le ile

lfloorsumj

k=0 vkτnrfloorminus 1

Γnjminus1

(

i

n

)

darr Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

as i uarr lfloorsumj

k=0 vkτnrfloor In addition because the (j minus 1)st component is never

modified when ige lfloorsumj

k=0 vkτnrfloor it follows that

Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

rarr γjminus1(τ)

as nrarrinfin and ηrarr 0 Furthermore as observed previously (32) implies theexistence of b gt 0 and K isinN such that γj(τ)ge bτK for j = 0 I Thus atany given time step i we have a strictly positive lower bound on the relevantcomponent of Γn which in turn provides a strictly finite upper bound on thecorresponding relative entropy cost Indeed it follows from (33) and γj(τ)gebτK that for all sufficiently large n and small η gt 0 there are C1C2 ltinfin(and independent of τ ) such that whenever |Γn(0)minus γ(0)|lt η for all i

D(microni microΓn(in))leC1[minus log τK ]leminusC2 log τ

In addition as nrarrinfin and ηrarr 0

Γn(

1

nlfloorτnrfloor

)

rarr γ(τ)

By the Lebesgue dominated convergence theorem for all sufficiently large nand sufficiently small η gt 0 |Γn(0)minus γ(0)|lt η implies

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

leminusC2τ log τ

We now choose τ gt 0 so that minusC2τ log τ le ε2 Choosing τ gt 0 smaller ifneed be we can also guarantee that |Γn(x)minusγ(x)| le δ for all x isin [0 τ ] wp1if |Γn(0)minus γ(0)| lt η and η gt 0 is sufficiently small The following bound is

20 P DUPUIS C NUZMAN AND P WHITING

therefore valid for the given τ gt 0 for any σ gt 0 and all sufficiently smallη gt 0 |Γn(0)minus γ(0)|lt η implies

lim infnrarrinfin

1

nlogPΓn(0)

(∣

Γn(

lfloornτrfloor

n

)

minus γ

(

lfloornτrfloor

n

)∣

ltσ

supxisin[0τ ]

|Γn(x)minus γ(x)|lt δ

)

geminusε

2

Note that the asymptotic lower bound on the normalized log of the proba-bility is independent of σ gt 0 To obtain the lower bound for all x isin [0 β]we will use the Markov property and an existing lower bound for pathswhich avoid that boundary This latter lower bound will hold uniformly ina neighborhood of the initial condition γ(τ) Since we do not know a priorihow small this neighborhood must be it is important that the lower boundin the last display should be independent of σ gt 0

Now choose ζ isin (0 δ] such that γ(x) is at least distance 2ζ from theboundary of SI for all x isin [τ β] Recall that when considered as a func-tion of γ the distribution of bi(γ) is continuous in the weak topology andmoreover that the support of this distribution is independent of γ so longas γi gt 0 for all i isin 01 I + 1 (ie γ isin S

I where SI denotes interior

relative to the smallest affine space that contains SI) It then follows fromProposition 661 of [11] (see also the discussion on [11] page 165 regardinguniformity) that Γn n= 12 satisfies the following uniform large devi-ations lower bound given any ε gt 0 and ζ gt 0 defined above there is σ gt 0such that as long as |Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ

lim infnrarrinfin

1

nlogPΓn(lfloornτrfloorn)lfloornτrfloorn

(

supτlexleβ

|Γn(x)minus γ(x)|lt ζ

)

geminusJ(γ)minusε

2

where PΓn(lfloornτrfloorn)lfloornτrfloorn denotes probability given the occupancy levels Γn(lfloornτrfloorn)at time lfloornτrfloorn Proposition 661 of [11] assumes Condition 632 It is worthnoting that in the present setting this condition holds with the particularlysimple choice β = γ (using the notation of [11])

The lower bound (31) now follows by the Markov property and the lasttwo displays The proof that J has compact level sets is as in [10] andtherefore omitted

4 Examples and extensions In this section we apply the results of theprevious sections to three different occupancy problems We show how theparameters of interest may be computed numerically and plot the solutionsto the associated calculus of variations problems In Section 44 we list someother asymptotic problems of interest which can be solved by relativelystraightforward generalizations of the results presented in this paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 21

In the calculus of variations problems solved in Section 2 precise initialand terminal points were always given In typical applications one is inter-ested in the minimum value of the rate function over a constraint set Whenthe constraint set is sufficiently simple such problems may still be solvedeasily using the tools provided in Section 22 The three problems of thissection are of this type

Suppose that the event of interest is that the random endpoint Γn(β)should lie in a terminal constraint set Ω To apply the LDP for Γn(β)established by Corollary 23 one must compute exponents of the form

J (Ω) = infωisinΩ

J (ω)

Using Theorem 27 we can write

J (Ω) = infωisinΩ

infπisinF (αωβ)

Ksum

k=0

αkD(π(k)P(β))

(41)

= infπisinF (αΩβ)

Ksum

k=0

αkD(π(k)P(β))

where we have abused notation to define F (αΩ β) =⋃

ωisinΩF (αωβ)In many cases for example when the terminal set Ω is convex and defined

by linear constraints the exponent J can be computed directly from (41)using Lagrange multipliers That is one solves minimization problems of thetype given in Theorem 27 but with the endpoint constraints (210) replacedby constraints defining Ω This is the approach used in the second andthird example below We do not prove that appropriate Lagrange multipliersalways exist if needed existence may be established using methods similarto those used in the Appendix for the case Ω= ω Because of the convexityof relative entropy in (41) a local minimum is always a global minimum overconvex sets Hence in any particular scenario with convex Ω it is sufficientto establish a local minimum by numerically computing a set of Lagrangemultipliers

An alternative approach for computing J (Ω) could be to return to thesample path level and use natural boundary conditions on the extremalcurves (see [25])

A set Ω with interior Ω and closure Ω is a J -continuity set if and onlyif infωisinΩ J (ω) = infωisinΩJ (ω) For such sets the large deviations lower andupper bounds coincide so that minus lim1n logP (Γn(β) isinΩ) = infωisinΩJ (ω) Itmay readily be verified in each of the following examples that the event ofinterest is a J -continuity set

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 15: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 15

of J(γ) over A(αωβ) is achieved on the occupancy path γ isin A(αωβ)defined by

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

In addition γ satisfies the EulerndashLagrange equations

3 Proof of Theorem 22 The purpose of this section is to prove themain large deviations result We recall the processes and notation definedat the beginning of Section 2

For ζ isinRI+2 and γ isin SI define

H(γ ζ) = log(E[exp(ζ middot bi(γ))])

where ldquomiddotrdquo denotes inner product Since the support of bi(γ) is boundeduniformly in i and γ there exists a function h Rrarr R such that H(γ ζ)leh(|ζ|) for all ζ and γ Also since the distribution of bi(γ) is weakly continuousin γ H(γ ζ) is jointly continuous It follows from [10] Theorem 41 thatthe sequence Γn n= 12 satisfies a large deviation upper bound witha rate function J which we now define Let L be the LegendrendashFencheltransform of H(γ ζ) in ζ

L(γ η) = supζisinRI+2

[ζ middot ηminus H(γ ζ)]

If γ(x)0 le x le β is an absolutely continuous function that takes valuesin SI then

J(γ) =

int β

0L(γ(x) γ(x))dx

If γ is not absolutely continuous then J(γ) = infin ([10] assumes that thevector fields bi(γ) are defined for all γ isin R

I+2 It is easy to check that wecan extend the definition to this set with the bound H(γ ζ) le h(|ζ|) andthe continuity of H(γ ζ) preserved However if the process Γn starts in SI then it stays in SI and so the exact form of the extension has no effect onthe rate function)

We recall the definition of the matrix M given in (21) To complete theproof of the upper bound we must show that J = J where J is defined asin (22) This will hold if we can show that L(γ η) is finite only when η =Mθfor a unique probability vector θ and that in this case

L(γMθ) =D(θγ)

16 P DUPUIS C NUZMAN AND P WHITING

It is well known that L(γ η) is finite if and only if η is in the convexhull of the support of bi(γ) (see eg [11] Lemma 623(d)) Therefore ifL(γ η)ltinfin then η can be written as a convex combination of the form

θ0(e1 minus e0) + middot middot middot+ θI(eI+1 minus eI)

where θj ge 0 andsumI

j=0 θj le 1 Since the vectors (ej+1 minus ej) j = 01 I

are linearly independent these values are unique Setting θI+1 = 1minussumI

j=0 θj we have η =Mθ for a unique probability vector θ Now assume that η takesthis form Then

L(γMθ) = supζisinRI+2

[ζ middotMθminus log(E[exp(ζ middot bi(γ))])]

= supζisinRI+2

[

MT ζ middot θminus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

= supζisinRI+2

[

Isum

j=0

(ζj+1 minus ζj)θj minus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

Given any values micro0 microI+1 we can define ζ0 ζI+1 recursively by ζ0 =minusmicroI+1

and ζj+1minus ζj = microjminusmicroI+1 With these definitions it is apparent that the lastdisplay is equal to

supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1)minus log

([

Isum

j=0

γj exp(microj minus microI+1)

]

+ γI+1

)]

= supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1) + microI+1 minus log

(

I+1sum

j=0

γj expmicroj

)]

= supmicroisinRI+2

[

I+1sum

j=0

microjθj minus log

(

I+1sum

j=0

γj expmicroj

)]

According to the DonskerndashVaradhan variational formula for relative entropy(eg [11] Lemma 143(a)) the last display equals D(θγ) thereby com-pleting the proof of the upper bound

We turn now to the proof of the lower bound In this proof we will as-sume that γ0(0)gt 0 Since γ0 is always nonincreasing γ0(0) = 0 implies thatγ0(x) = 0 for all x and therefore under this condition the first componentplays no significant role A proof analogous to the one given below applieswhen γ0(0) = 0 where the role of γ0(0) here is played by the first positivecomponent of γ(0)

Let PΓn(0) [resp EΓn(0)] denote probability (resp expected value) givena deterministic initial occupancy Γn(0) To prove the large deviation lower

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 17

bound it suffices to show that given any ε gt 0 and δ gt 0 there is η gt 0 suchthat for any initial occupancies satisfying |Γn(0)minus γ(0)|lt η

lim infnrarrinfin

1

nlogPΓn(0)

(

sup0lexleβ

|Γn(x)minus γ(x)|lt δ

)

geminusJ(γ)minus ε(31)

Of course this inequality is trivial if J(γ) = infin and so we assume thatJ(γ)ltinfin

As we have remarked a source of difficulty is the singular behavior ofthe transition rates of the process when Γn is near the boundary of SI We first show that this can be avoided at all times save x= 0 To do thiswe show that for any a gt 0 there exist b gt 0 K isin N and an occupancyfunction y such that y(0) = γ(0) sup0lexleβ |y(x) minus γ(x)| lt a yj(x) gt bxK

for all j = 01 I I+ and 0lt xle β and such that

J(y)le J(γ)

Consider the zero cost trajectory defined by

z(x) =Mz(x) z(0) = γ(0)

We have the following expression for zj when j le I

zj(x) =

[ jsum

k=0

γk(0)x(jminusk)(j minus k)

]

eminusx(32)

It is easy to check from this explicit formula that zj(x) gt bxK for someb gt 0 K = I and all j = 01 I I+ and 0 lt x le β For ρ isin (01) letyρ = ρz + (1minus ρ)γ Then yρ is the occupancy function that corresponds tothe rate ρz + (1minus ρ)θ Using the joint convexity of relative entropy in bothvariables ([11] Lemma 143(b)) and the fact that D(zz) = 0 we have

J(yρ) =

int β

0D(ρz(x) + (1minus ρ)θ(x)ρz(x) + (1minus ρ)γ(x))dx

le ρ

int β

0D(z(x)z(x)) dx+ (1minus ρ)

int β

0D(θ(x)γ(x))dx

= (1minus ρ)J(γ)

All required properties are then obtained by letting y = yρ for suitably smallρ isin (01)

It follows that in proving the lower bound we can assume without loss ofgenerality that for some fixed constants b gt 0 and K isinN γi(x)ge bxK for allx isin [0 β] We now return to the proof of the lower bound Our first objectiveis to show that the process can be moved into a small neighborhood of γ(τ)

18 P DUPUIS C NUZMAN AND P WHITING

(for τ gt 0 small) with sufficiently high probability Given τ isin (0 β] σ gt 0and ε gt 0 define

h(γ) =

0 |γ minus γ(τ)|lt σ22ε else

For n large enough that |γ(lfloornτrfloorn) minus γ(τ)| le σ2 (and independent ofτ isin (0 β]) we have the inequality

PΓn(0)(|Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ) + eminus2nε

ge PΓn(0)(|Γn(lfloornτrfloorn)minus γ(τ)|lt σ2) + eminus2nε

geEΓn(0)(expminusnh(Γn(lfloornτrfloorn)))

We next exploit a representation for exponential integrals that will give usan explicit lower bound on the last quantity Consider a process Γn(x) con-structed as follows The process dynamics are of the same general structureas those of Γn save that bi(Γ

n(in)) is replaced by a sequence bni

Γn(

i+1

n

)

= Γn(

i

n

)

+1

nbni Γn(0) = Γn(0)

Furthermore the distribution of bni is allowed to depend in any measurableway upon the set of values Γn(jn)0le j le i Let microγ denote the distribu-tion of bi(γ) and (without explicitly exhibiting all the dependencies) let micronidenote the (random) distribution of bni given Γn(jn)0 le j le i We letEΓn(0) denote expectation on the space that supports these processes Itfollows from [11] Theorem 431 that

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

= inf EΓn(0)

[

h

(

Γn(

lfloornτrfloor

n

))

+1

n

lfloornτrfloorminus1sum

i=1

D(microni microΓn(in))

]

where the infimum is over all such processes Γn(x) In order to obtain a lowerbound we now simply insert a particular choice for the random variables bni We can write γ(τ) minus γ(0) =Mvτ for some probability vector v Define aprocess bni as follows

bni =

ej minus ejminus1 if

lfloor jminus1sum

k=0

vkτn

rfloor

le ile

lfloor jsum

k=0

vkτn

rfloor

minus 1 for 0le j le I

0 if

lfloor

Isum

k=0

vkτn

rfloor

le ile lfloorτnrfloor minus 1

In other words bni defines a deterministic discrete time approximation tothe continuous time occupancy rate process that uses e0 minus e1 for an amount

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 19

of time v0τ e1 minus e2 for an amount of time v1τ and so on This continuoustime process will move the occupancy process from γ(0) to γ(τ) at time τ If ej minus ejminus1 is used at the discrete time step i then since microni concentratesits mass on ej minus ejminus1 the cost is

D( microni microΓn(in)) = log

(

1

Γnjminus1(in)

)

=minus log

(

Γnjminus1

(

i

n

))

(33)

The process Γn(in) possesses important monotonicity and convergence

properties Since Γnjminus1((i+1)n)minus Γn

jminus1(in) =minus1n for lfloorsumjminus1

k=0 vkτnrfloor le ile

lfloorsumj

k=0 vkτnrfloorminus 1

Γnjminus1

(

i

n

)

darr Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

as i uarr lfloorsumj

k=0 vkτnrfloor In addition because the (j minus 1)st component is never

modified when ige lfloorsumj

k=0 vkτnrfloor it follows that

Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

rarr γjminus1(τ)

as nrarrinfin and ηrarr 0 Furthermore as observed previously (32) implies theexistence of b gt 0 and K isinN such that γj(τ)ge bτK for j = 0 I Thus atany given time step i we have a strictly positive lower bound on the relevantcomponent of Γn which in turn provides a strictly finite upper bound on thecorresponding relative entropy cost Indeed it follows from (33) and γj(τ)gebτK that for all sufficiently large n and small η gt 0 there are C1C2 ltinfin(and independent of τ ) such that whenever |Γn(0)minus γ(0)|lt η for all i

D(microni microΓn(in))leC1[minus log τK ]leminusC2 log τ

In addition as nrarrinfin and ηrarr 0

Γn(

1

nlfloorτnrfloor

)

rarr γ(τ)

By the Lebesgue dominated convergence theorem for all sufficiently large nand sufficiently small η gt 0 |Γn(0)minus γ(0)|lt η implies

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

leminusC2τ log τ

We now choose τ gt 0 so that minusC2τ log τ le ε2 Choosing τ gt 0 smaller ifneed be we can also guarantee that |Γn(x)minusγ(x)| le δ for all x isin [0 τ ] wp1if |Γn(0)minus γ(0)| lt η and η gt 0 is sufficiently small The following bound is

20 P DUPUIS C NUZMAN AND P WHITING

therefore valid for the given τ gt 0 for any σ gt 0 and all sufficiently smallη gt 0 |Γn(0)minus γ(0)|lt η implies

lim infnrarrinfin

1

nlogPΓn(0)

(∣

Γn(

lfloornτrfloor

n

)

minus γ

(

lfloornτrfloor

n

)∣

ltσ

supxisin[0τ ]

|Γn(x)minus γ(x)|lt δ

)

geminusε

2

Note that the asymptotic lower bound on the normalized log of the proba-bility is independent of σ gt 0 To obtain the lower bound for all x isin [0 β]we will use the Markov property and an existing lower bound for pathswhich avoid that boundary This latter lower bound will hold uniformly ina neighborhood of the initial condition γ(τ) Since we do not know a priorihow small this neighborhood must be it is important that the lower boundin the last display should be independent of σ gt 0

Now choose ζ isin (0 δ] such that γ(x) is at least distance 2ζ from theboundary of SI for all x isin [τ β] Recall that when considered as a func-tion of γ the distribution of bi(γ) is continuous in the weak topology andmoreover that the support of this distribution is independent of γ so longas γi gt 0 for all i isin 01 I + 1 (ie γ isin S

I where SI denotes interior

relative to the smallest affine space that contains SI) It then follows fromProposition 661 of [11] (see also the discussion on [11] page 165 regardinguniformity) that Γn n= 12 satisfies the following uniform large devi-ations lower bound given any ε gt 0 and ζ gt 0 defined above there is σ gt 0such that as long as |Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ

lim infnrarrinfin

1

nlogPΓn(lfloornτrfloorn)lfloornτrfloorn

(

supτlexleβ

|Γn(x)minus γ(x)|lt ζ

)

geminusJ(γ)minusε

2

where PΓn(lfloornτrfloorn)lfloornτrfloorn denotes probability given the occupancy levels Γn(lfloornτrfloorn)at time lfloornτrfloorn Proposition 661 of [11] assumes Condition 632 It is worthnoting that in the present setting this condition holds with the particularlysimple choice β = γ (using the notation of [11])

The lower bound (31) now follows by the Markov property and the lasttwo displays The proof that J has compact level sets is as in [10] andtherefore omitted

4 Examples and extensions In this section we apply the results of theprevious sections to three different occupancy problems We show how theparameters of interest may be computed numerically and plot the solutionsto the associated calculus of variations problems In Section 44 we list someother asymptotic problems of interest which can be solved by relativelystraightforward generalizations of the results presented in this paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 21

In the calculus of variations problems solved in Section 2 precise initialand terminal points were always given In typical applications one is inter-ested in the minimum value of the rate function over a constraint set Whenthe constraint set is sufficiently simple such problems may still be solvedeasily using the tools provided in Section 22 The three problems of thissection are of this type

Suppose that the event of interest is that the random endpoint Γn(β)should lie in a terminal constraint set Ω To apply the LDP for Γn(β)established by Corollary 23 one must compute exponents of the form

J (Ω) = infωisinΩ

J (ω)

Using Theorem 27 we can write

J (Ω) = infωisinΩ

infπisinF (αωβ)

Ksum

k=0

αkD(π(k)P(β))

(41)

= infπisinF (αΩβ)

Ksum

k=0

αkD(π(k)P(β))

where we have abused notation to define F (αΩ β) =⋃

ωisinΩF (αωβ)In many cases for example when the terminal set Ω is convex and defined

by linear constraints the exponent J can be computed directly from (41)using Lagrange multipliers That is one solves minimization problems of thetype given in Theorem 27 but with the endpoint constraints (210) replacedby constraints defining Ω This is the approach used in the second andthird example below We do not prove that appropriate Lagrange multipliersalways exist if needed existence may be established using methods similarto those used in the Appendix for the case Ω= ω Because of the convexityof relative entropy in (41) a local minimum is always a global minimum overconvex sets Hence in any particular scenario with convex Ω it is sufficientto establish a local minimum by numerically computing a set of Lagrangemultipliers

An alternative approach for computing J (Ω) could be to return to thesample path level and use natural boundary conditions on the extremalcurves (see [25])

A set Ω with interior Ω and closure Ω is a J -continuity set if and onlyif infωisinΩ J (ω) = infωisinΩJ (ω) For such sets the large deviations lower andupper bounds coincide so that minus lim1n logP (Γn(β) isinΩ) = infωisinΩJ (ω) Itmay readily be verified in each of the following examples that the event ofinterest is a J -continuity set

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 16: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

16 P DUPUIS C NUZMAN AND P WHITING

It is well known that L(γ η) is finite if and only if η is in the convexhull of the support of bi(γ) (see eg [11] Lemma 623(d)) Therefore ifL(γ η)ltinfin then η can be written as a convex combination of the form

θ0(e1 minus e0) + middot middot middot+ θI(eI+1 minus eI)

where θj ge 0 andsumI

j=0 θj le 1 Since the vectors (ej+1 minus ej) j = 01 I

are linearly independent these values are unique Setting θI+1 = 1minussumI

j=0 θj we have η =Mθ for a unique probability vector θ Now assume that η takesthis form Then

L(γMθ) = supζisinRI+2

[ζ middotMθminus log(E[exp(ζ middot bi(γ))])]

= supζisinRI+2

[

MT ζ middot θminus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

= supζisinRI+2

[

Isum

j=0

(ζj+1 minus ζj)θj minus log

([

Isum

j=0

γj exp(ζj+1 minus ζj)

]

+ γI+1

)]

Given any values micro0 microI+1 we can define ζ0 ζI+1 recursively by ζ0 =minusmicroI+1

and ζj+1minus ζj = microjminusmicroI+1 With these definitions it is apparent that the lastdisplay is equal to

supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1)minus log

([

Isum

j=0

γj exp(microj minus microI+1)

]

+ γI+1

)]

= supmicroisinRI+2

[

Isum

j=0

microjθj minus microI+1(1minus θI+1) + microI+1 minus log

(

I+1sum

j=0

γj expmicroj

)]

= supmicroisinRI+2

[

I+1sum

j=0

microjθj minus log

(

I+1sum

j=0

γj expmicroj

)]

According to the DonskerndashVaradhan variational formula for relative entropy(eg [11] Lemma 143(a)) the last display equals D(θγ) thereby com-pleting the proof of the upper bound

We turn now to the proof of the lower bound In this proof we will as-sume that γ0(0)gt 0 Since γ0 is always nonincreasing γ0(0) = 0 implies thatγ0(x) = 0 for all x and therefore under this condition the first componentplays no significant role A proof analogous to the one given below applieswhen γ0(0) = 0 where the role of γ0(0) here is played by the first positivecomponent of γ(0)

Let PΓn(0) [resp EΓn(0)] denote probability (resp expected value) givena deterministic initial occupancy Γn(0) To prove the large deviation lower

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 17

bound it suffices to show that given any ε gt 0 and δ gt 0 there is η gt 0 suchthat for any initial occupancies satisfying |Γn(0)minus γ(0)|lt η

lim infnrarrinfin

1

nlogPΓn(0)

(

sup0lexleβ

|Γn(x)minus γ(x)|lt δ

)

geminusJ(γ)minus ε(31)

Of course this inequality is trivial if J(γ) = infin and so we assume thatJ(γ)ltinfin

As we have remarked a source of difficulty is the singular behavior ofthe transition rates of the process when Γn is near the boundary of SI We first show that this can be avoided at all times save x= 0 To do thiswe show that for any a gt 0 there exist b gt 0 K isin N and an occupancyfunction y such that y(0) = γ(0) sup0lexleβ |y(x) minus γ(x)| lt a yj(x) gt bxK

for all j = 01 I I+ and 0lt xle β and such that

J(y)le J(γ)

Consider the zero cost trajectory defined by

z(x) =Mz(x) z(0) = γ(0)

We have the following expression for zj when j le I

zj(x) =

[ jsum

k=0

γk(0)x(jminusk)(j minus k)

]

eminusx(32)

It is easy to check from this explicit formula that zj(x) gt bxK for someb gt 0 K = I and all j = 01 I I+ and 0 lt x le β For ρ isin (01) letyρ = ρz + (1minus ρ)γ Then yρ is the occupancy function that corresponds tothe rate ρz + (1minus ρ)θ Using the joint convexity of relative entropy in bothvariables ([11] Lemma 143(b)) and the fact that D(zz) = 0 we have

J(yρ) =

int β

0D(ρz(x) + (1minus ρ)θ(x)ρz(x) + (1minus ρ)γ(x))dx

le ρ

int β

0D(z(x)z(x)) dx+ (1minus ρ)

int β

0D(θ(x)γ(x))dx

= (1minus ρ)J(γ)

All required properties are then obtained by letting y = yρ for suitably smallρ isin (01)

It follows that in proving the lower bound we can assume without loss ofgenerality that for some fixed constants b gt 0 and K isinN γi(x)ge bxK for allx isin [0 β] We now return to the proof of the lower bound Our first objectiveis to show that the process can be moved into a small neighborhood of γ(τ)

18 P DUPUIS C NUZMAN AND P WHITING

(for τ gt 0 small) with sufficiently high probability Given τ isin (0 β] σ gt 0and ε gt 0 define

h(γ) =

0 |γ minus γ(τ)|lt σ22ε else

For n large enough that |γ(lfloornτrfloorn) minus γ(τ)| le σ2 (and independent ofτ isin (0 β]) we have the inequality

PΓn(0)(|Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ) + eminus2nε

ge PΓn(0)(|Γn(lfloornτrfloorn)minus γ(τ)|lt σ2) + eminus2nε

geEΓn(0)(expminusnh(Γn(lfloornτrfloorn)))

We next exploit a representation for exponential integrals that will give usan explicit lower bound on the last quantity Consider a process Γn(x) con-structed as follows The process dynamics are of the same general structureas those of Γn save that bi(Γ

n(in)) is replaced by a sequence bni

Γn(

i+1

n

)

= Γn(

i

n

)

+1

nbni Γn(0) = Γn(0)

Furthermore the distribution of bni is allowed to depend in any measurableway upon the set of values Γn(jn)0le j le i Let microγ denote the distribu-tion of bi(γ) and (without explicitly exhibiting all the dependencies) let micronidenote the (random) distribution of bni given Γn(jn)0 le j le i We letEΓn(0) denote expectation on the space that supports these processes Itfollows from [11] Theorem 431 that

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

= inf EΓn(0)

[

h

(

Γn(

lfloornτrfloor

n

))

+1

n

lfloornτrfloorminus1sum

i=1

D(microni microΓn(in))

]

where the infimum is over all such processes Γn(x) In order to obtain a lowerbound we now simply insert a particular choice for the random variables bni We can write γ(τ) minus γ(0) =Mvτ for some probability vector v Define aprocess bni as follows

bni =

ej minus ejminus1 if

lfloor jminus1sum

k=0

vkτn

rfloor

le ile

lfloor jsum

k=0

vkτn

rfloor

minus 1 for 0le j le I

0 if

lfloor

Isum

k=0

vkτn

rfloor

le ile lfloorτnrfloor minus 1

In other words bni defines a deterministic discrete time approximation tothe continuous time occupancy rate process that uses e0 minus e1 for an amount

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 19

of time v0τ e1 minus e2 for an amount of time v1τ and so on This continuoustime process will move the occupancy process from γ(0) to γ(τ) at time τ If ej minus ejminus1 is used at the discrete time step i then since microni concentratesits mass on ej minus ejminus1 the cost is

D( microni microΓn(in)) = log

(

1

Γnjminus1(in)

)

=minus log

(

Γnjminus1

(

i

n

))

(33)

The process Γn(in) possesses important monotonicity and convergence

properties Since Γnjminus1((i+1)n)minus Γn

jminus1(in) =minus1n for lfloorsumjminus1

k=0 vkτnrfloor le ile

lfloorsumj

k=0 vkτnrfloorminus 1

Γnjminus1

(

i

n

)

darr Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

as i uarr lfloorsumj

k=0 vkτnrfloor In addition because the (j minus 1)st component is never

modified when ige lfloorsumj

k=0 vkτnrfloor it follows that

Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

rarr γjminus1(τ)

as nrarrinfin and ηrarr 0 Furthermore as observed previously (32) implies theexistence of b gt 0 and K isinN such that γj(τ)ge bτK for j = 0 I Thus atany given time step i we have a strictly positive lower bound on the relevantcomponent of Γn which in turn provides a strictly finite upper bound on thecorresponding relative entropy cost Indeed it follows from (33) and γj(τ)gebτK that for all sufficiently large n and small η gt 0 there are C1C2 ltinfin(and independent of τ ) such that whenever |Γn(0)minus γ(0)|lt η for all i

D(microni microΓn(in))leC1[minus log τK ]leminusC2 log τ

In addition as nrarrinfin and ηrarr 0

Γn(

1

nlfloorτnrfloor

)

rarr γ(τ)

By the Lebesgue dominated convergence theorem for all sufficiently large nand sufficiently small η gt 0 |Γn(0)minus γ(0)|lt η implies

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

leminusC2τ log τ

We now choose τ gt 0 so that minusC2τ log τ le ε2 Choosing τ gt 0 smaller ifneed be we can also guarantee that |Γn(x)minusγ(x)| le δ for all x isin [0 τ ] wp1if |Γn(0)minus γ(0)| lt η and η gt 0 is sufficiently small The following bound is

20 P DUPUIS C NUZMAN AND P WHITING

therefore valid for the given τ gt 0 for any σ gt 0 and all sufficiently smallη gt 0 |Γn(0)minus γ(0)|lt η implies

lim infnrarrinfin

1

nlogPΓn(0)

(∣

Γn(

lfloornτrfloor

n

)

minus γ

(

lfloornτrfloor

n

)∣

ltσ

supxisin[0τ ]

|Γn(x)minus γ(x)|lt δ

)

geminusε

2

Note that the asymptotic lower bound on the normalized log of the proba-bility is independent of σ gt 0 To obtain the lower bound for all x isin [0 β]we will use the Markov property and an existing lower bound for pathswhich avoid that boundary This latter lower bound will hold uniformly ina neighborhood of the initial condition γ(τ) Since we do not know a priorihow small this neighborhood must be it is important that the lower boundin the last display should be independent of σ gt 0

Now choose ζ isin (0 δ] such that γ(x) is at least distance 2ζ from theboundary of SI for all x isin [τ β] Recall that when considered as a func-tion of γ the distribution of bi(γ) is continuous in the weak topology andmoreover that the support of this distribution is independent of γ so longas γi gt 0 for all i isin 01 I + 1 (ie γ isin S

I where SI denotes interior

relative to the smallest affine space that contains SI) It then follows fromProposition 661 of [11] (see also the discussion on [11] page 165 regardinguniformity) that Γn n= 12 satisfies the following uniform large devi-ations lower bound given any ε gt 0 and ζ gt 0 defined above there is σ gt 0such that as long as |Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ

lim infnrarrinfin

1

nlogPΓn(lfloornτrfloorn)lfloornτrfloorn

(

supτlexleβ

|Γn(x)minus γ(x)|lt ζ

)

geminusJ(γ)minusε

2

where PΓn(lfloornτrfloorn)lfloornτrfloorn denotes probability given the occupancy levels Γn(lfloornτrfloorn)at time lfloornτrfloorn Proposition 661 of [11] assumes Condition 632 It is worthnoting that in the present setting this condition holds with the particularlysimple choice β = γ (using the notation of [11])

The lower bound (31) now follows by the Markov property and the lasttwo displays The proof that J has compact level sets is as in [10] andtherefore omitted

4 Examples and extensions In this section we apply the results of theprevious sections to three different occupancy problems We show how theparameters of interest may be computed numerically and plot the solutionsto the associated calculus of variations problems In Section 44 we list someother asymptotic problems of interest which can be solved by relativelystraightforward generalizations of the results presented in this paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 21

In the calculus of variations problems solved in Section 2 precise initialand terminal points were always given In typical applications one is inter-ested in the minimum value of the rate function over a constraint set Whenthe constraint set is sufficiently simple such problems may still be solvedeasily using the tools provided in Section 22 The three problems of thissection are of this type

Suppose that the event of interest is that the random endpoint Γn(β)should lie in a terminal constraint set Ω To apply the LDP for Γn(β)established by Corollary 23 one must compute exponents of the form

J (Ω) = infωisinΩ

J (ω)

Using Theorem 27 we can write

J (Ω) = infωisinΩ

infπisinF (αωβ)

Ksum

k=0

αkD(π(k)P(β))

(41)

= infπisinF (αΩβ)

Ksum

k=0

αkD(π(k)P(β))

where we have abused notation to define F (αΩ β) =⋃

ωisinΩF (αωβ)In many cases for example when the terminal set Ω is convex and defined

by linear constraints the exponent J can be computed directly from (41)using Lagrange multipliers That is one solves minimization problems of thetype given in Theorem 27 but with the endpoint constraints (210) replacedby constraints defining Ω This is the approach used in the second andthird example below We do not prove that appropriate Lagrange multipliersalways exist if needed existence may be established using methods similarto those used in the Appendix for the case Ω= ω Because of the convexityof relative entropy in (41) a local minimum is always a global minimum overconvex sets Hence in any particular scenario with convex Ω it is sufficientto establish a local minimum by numerically computing a set of Lagrangemultipliers

An alternative approach for computing J (Ω) could be to return to thesample path level and use natural boundary conditions on the extremalcurves (see [25])

A set Ω with interior Ω and closure Ω is a J -continuity set if and onlyif infωisinΩ J (ω) = infωisinΩJ (ω) For such sets the large deviations lower andupper bounds coincide so that minus lim1n logP (Γn(β) isinΩ) = infωisinΩJ (ω) Itmay readily be verified in each of the following examples that the event ofinterest is a J -continuity set

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 17: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 17

bound it suffices to show that given any ε gt 0 and δ gt 0 there is η gt 0 suchthat for any initial occupancies satisfying |Γn(0)minus γ(0)|lt η

lim infnrarrinfin

1

nlogPΓn(0)

(

sup0lexleβ

|Γn(x)minus γ(x)|lt δ

)

geminusJ(γ)minus ε(31)

Of course this inequality is trivial if J(γ) = infin and so we assume thatJ(γ)ltinfin

As we have remarked a source of difficulty is the singular behavior ofthe transition rates of the process when Γn is near the boundary of SI We first show that this can be avoided at all times save x= 0 To do thiswe show that for any a gt 0 there exist b gt 0 K isin N and an occupancyfunction y such that y(0) = γ(0) sup0lexleβ |y(x) minus γ(x)| lt a yj(x) gt bxK

for all j = 01 I I+ and 0lt xle β and such that

J(y)le J(γ)

Consider the zero cost trajectory defined by

z(x) =Mz(x) z(0) = γ(0)

We have the following expression for zj when j le I

zj(x) =

[ jsum

k=0

γk(0)x(jminusk)(j minus k)

]

eminusx(32)

It is easy to check from this explicit formula that zj(x) gt bxK for someb gt 0 K = I and all j = 01 I I+ and 0 lt x le β For ρ isin (01) letyρ = ρz + (1minus ρ)γ Then yρ is the occupancy function that corresponds tothe rate ρz + (1minus ρ)θ Using the joint convexity of relative entropy in bothvariables ([11] Lemma 143(b)) and the fact that D(zz) = 0 we have

J(yρ) =

int β

0D(ρz(x) + (1minus ρ)θ(x)ρz(x) + (1minus ρ)γ(x))dx

le ρ

int β

0D(z(x)z(x)) dx+ (1minus ρ)

int β

0D(θ(x)γ(x))dx

= (1minus ρ)J(γ)

All required properties are then obtained by letting y = yρ for suitably smallρ isin (01)

It follows that in proving the lower bound we can assume without loss ofgenerality that for some fixed constants b gt 0 and K isinN γi(x)ge bxK for allx isin [0 β] We now return to the proof of the lower bound Our first objectiveis to show that the process can be moved into a small neighborhood of γ(τ)

18 P DUPUIS C NUZMAN AND P WHITING

(for τ gt 0 small) with sufficiently high probability Given τ isin (0 β] σ gt 0and ε gt 0 define

h(γ) =

0 |γ minus γ(τ)|lt σ22ε else

For n large enough that |γ(lfloornτrfloorn) minus γ(τ)| le σ2 (and independent ofτ isin (0 β]) we have the inequality

PΓn(0)(|Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ) + eminus2nε

ge PΓn(0)(|Γn(lfloornτrfloorn)minus γ(τ)|lt σ2) + eminus2nε

geEΓn(0)(expminusnh(Γn(lfloornτrfloorn)))

We next exploit a representation for exponential integrals that will give usan explicit lower bound on the last quantity Consider a process Γn(x) con-structed as follows The process dynamics are of the same general structureas those of Γn save that bi(Γ

n(in)) is replaced by a sequence bni

Γn(

i+1

n

)

= Γn(

i

n

)

+1

nbni Γn(0) = Γn(0)

Furthermore the distribution of bni is allowed to depend in any measurableway upon the set of values Γn(jn)0le j le i Let microγ denote the distribu-tion of bi(γ) and (without explicitly exhibiting all the dependencies) let micronidenote the (random) distribution of bni given Γn(jn)0 le j le i We letEΓn(0) denote expectation on the space that supports these processes Itfollows from [11] Theorem 431 that

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

= inf EΓn(0)

[

h

(

Γn(

lfloornτrfloor

n

))

+1

n

lfloornτrfloorminus1sum

i=1

D(microni microΓn(in))

]

where the infimum is over all such processes Γn(x) In order to obtain a lowerbound we now simply insert a particular choice for the random variables bni We can write γ(τ) minus γ(0) =Mvτ for some probability vector v Define aprocess bni as follows

bni =

ej minus ejminus1 if

lfloor jminus1sum

k=0

vkτn

rfloor

le ile

lfloor jsum

k=0

vkτn

rfloor

minus 1 for 0le j le I

0 if

lfloor

Isum

k=0

vkτn

rfloor

le ile lfloorτnrfloor minus 1

In other words bni defines a deterministic discrete time approximation tothe continuous time occupancy rate process that uses e0 minus e1 for an amount

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 19

of time v0τ e1 minus e2 for an amount of time v1τ and so on This continuoustime process will move the occupancy process from γ(0) to γ(τ) at time τ If ej minus ejminus1 is used at the discrete time step i then since microni concentratesits mass on ej minus ejminus1 the cost is

D( microni microΓn(in)) = log

(

1

Γnjminus1(in)

)

=minus log

(

Γnjminus1

(

i

n

))

(33)

The process Γn(in) possesses important monotonicity and convergence

properties Since Γnjminus1((i+1)n)minus Γn

jminus1(in) =minus1n for lfloorsumjminus1

k=0 vkτnrfloor le ile

lfloorsumj

k=0 vkτnrfloorminus 1

Γnjminus1

(

i

n

)

darr Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

as i uarr lfloorsumj

k=0 vkτnrfloor In addition because the (j minus 1)st component is never

modified when ige lfloorsumj

k=0 vkτnrfloor it follows that

Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

rarr γjminus1(τ)

as nrarrinfin and ηrarr 0 Furthermore as observed previously (32) implies theexistence of b gt 0 and K isinN such that γj(τ)ge bτK for j = 0 I Thus atany given time step i we have a strictly positive lower bound on the relevantcomponent of Γn which in turn provides a strictly finite upper bound on thecorresponding relative entropy cost Indeed it follows from (33) and γj(τ)gebτK that for all sufficiently large n and small η gt 0 there are C1C2 ltinfin(and independent of τ ) such that whenever |Γn(0)minus γ(0)|lt η for all i

D(microni microΓn(in))leC1[minus log τK ]leminusC2 log τ

In addition as nrarrinfin and ηrarr 0

Γn(

1

nlfloorτnrfloor

)

rarr γ(τ)

By the Lebesgue dominated convergence theorem for all sufficiently large nand sufficiently small η gt 0 |Γn(0)minus γ(0)|lt η implies

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

leminusC2τ log τ

We now choose τ gt 0 so that minusC2τ log τ le ε2 Choosing τ gt 0 smaller ifneed be we can also guarantee that |Γn(x)minusγ(x)| le δ for all x isin [0 τ ] wp1if |Γn(0)minus γ(0)| lt η and η gt 0 is sufficiently small The following bound is

20 P DUPUIS C NUZMAN AND P WHITING

therefore valid for the given τ gt 0 for any σ gt 0 and all sufficiently smallη gt 0 |Γn(0)minus γ(0)|lt η implies

lim infnrarrinfin

1

nlogPΓn(0)

(∣

Γn(

lfloornτrfloor

n

)

minus γ

(

lfloornτrfloor

n

)∣

ltσ

supxisin[0τ ]

|Γn(x)minus γ(x)|lt δ

)

geminusε

2

Note that the asymptotic lower bound on the normalized log of the proba-bility is independent of σ gt 0 To obtain the lower bound for all x isin [0 β]we will use the Markov property and an existing lower bound for pathswhich avoid that boundary This latter lower bound will hold uniformly ina neighborhood of the initial condition γ(τ) Since we do not know a priorihow small this neighborhood must be it is important that the lower boundin the last display should be independent of σ gt 0

Now choose ζ isin (0 δ] such that γ(x) is at least distance 2ζ from theboundary of SI for all x isin [τ β] Recall that when considered as a func-tion of γ the distribution of bi(γ) is continuous in the weak topology andmoreover that the support of this distribution is independent of γ so longas γi gt 0 for all i isin 01 I + 1 (ie γ isin S

I where SI denotes interior

relative to the smallest affine space that contains SI) It then follows fromProposition 661 of [11] (see also the discussion on [11] page 165 regardinguniformity) that Γn n= 12 satisfies the following uniform large devi-ations lower bound given any ε gt 0 and ζ gt 0 defined above there is σ gt 0such that as long as |Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ

lim infnrarrinfin

1

nlogPΓn(lfloornτrfloorn)lfloornτrfloorn

(

supτlexleβ

|Γn(x)minus γ(x)|lt ζ

)

geminusJ(γ)minusε

2

where PΓn(lfloornτrfloorn)lfloornτrfloorn denotes probability given the occupancy levels Γn(lfloornτrfloorn)at time lfloornτrfloorn Proposition 661 of [11] assumes Condition 632 It is worthnoting that in the present setting this condition holds with the particularlysimple choice β = γ (using the notation of [11])

The lower bound (31) now follows by the Markov property and the lasttwo displays The proof that J has compact level sets is as in [10] andtherefore omitted

4 Examples and extensions In this section we apply the results of theprevious sections to three different occupancy problems We show how theparameters of interest may be computed numerically and plot the solutionsto the associated calculus of variations problems In Section 44 we list someother asymptotic problems of interest which can be solved by relativelystraightforward generalizations of the results presented in this paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 21

In the calculus of variations problems solved in Section 2 precise initialand terminal points were always given In typical applications one is inter-ested in the minimum value of the rate function over a constraint set Whenthe constraint set is sufficiently simple such problems may still be solvedeasily using the tools provided in Section 22 The three problems of thissection are of this type

Suppose that the event of interest is that the random endpoint Γn(β)should lie in a terminal constraint set Ω To apply the LDP for Γn(β)established by Corollary 23 one must compute exponents of the form

J (Ω) = infωisinΩ

J (ω)

Using Theorem 27 we can write

J (Ω) = infωisinΩ

infπisinF (αωβ)

Ksum

k=0

αkD(π(k)P(β))

(41)

= infπisinF (αΩβ)

Ksum

k=0

αkD(π(k)P(β))

where we have abused notation to define F (αΩ β) =⋃

ωisinΩF (αωβ)In many cases for example when the terminal set Ω is convex and defined

by linear constraints the exponent J can be computed directly from (41)using Lagrange multipliers That is one solves minimization problems of thetype given in Theorem 27 but with the endpoint constraints (210) replacedby constraints defining Ω This is the approach used in the second andthird example below We do not prove that appropriate Lagrange multipliersalways exist if needed existence may be established using methods similarto those used in the Appendix for the case Ω= ω Because of the convexityof relative entropy in (41) a local minimum is always a global minimum overconvex sets Hence in any particular scenario with convex Ω it is sufficientto establish a local minimum by numerically computing a set of Lagrangemultipliers

An alternative approach for computing J (Ω) could be to return to thesample path level and use natural boundary conditions on the extremalcurves (see [25])

A set Ω with interior Ω and closure Ω is a J -continuity set if and onlyif infωisinΩ J (ω) = infωisinΩJ (ω) For such sets the large deviations lower andupper bounds coincide so that minus lim1n logP (Γn(β) isinΩ) = infωisinΩJ (ω) Itmay readily be verified in each of the following examples that the event ofinterest is a J -continuity set

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 18: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

18 P DUPUIS C NUZMAN AND P WHITING

(for τ gt 0 small) with sufficiently high probability Given τ isin (0 β] σ gt 0and ε gt 0 define

h(γ) =

0 |γ minus γ(τ)|lt σ22ε else

For n large enough that |γ(lfloornτrfloorn) minus γ(τ)| le σ2 (and independent ofτ isin (0 β]) we have the inequality

PΓn(0)(|Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ) + eminus2nε

ge PΓn(0)(|Γn(lfloornτrfloorn)minus γ(τ)|lt σ2) + eminus2nε

geEΓn(0)(expminusnh(Γn(lfloornτrfloorn)))

We next exploit a representation for exponential integrals that will give usan explicit lower bound on the last quantity Consider a process Γn(x) con-structed as follows The process dynamics are of the same general structureas those of Γn save that bi(Γ

n(in)) is replaced by a sequence bni

Γn(

i+1

n

)

= Γn(

i

n

)

+1

nbni Γn(0) = Γn(0)

Furthermore the distribution of bni is allowed to depend in any measurableway upon the set of values Γn(jn)0le j le i Let microγ denote the distribu-tion of bi(γ) and (without explicitly exhibiting all the dependencies) let micronidenote the (random) distribution of bni given Γn(jn)0 le j le i We letEΓn(0) denote expectation on the space that supports these processes Itfollows from [11] Theorem 431 that

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

= inf EΓn(0)

[

h

(

Γn(

lfloornτrfloor

n

))

+1

n

lfloornτrfloorminus1sum

i=1

D(microni microΓn(in))

]

where the infimum is over all such processes Γn(x) In order to obtain a lowerbound we now simply insert a particular choice for the random variables bni We can write γ(τ) minus γ(0) =Mvτ for some probability vector v Define aprocess bni as follows

bni =

ej minus ejminus1 if

lfloor jminus1sum

k=0

vkτn

rfloor

le ile

lfloor jsum

k=0

vkτn

rfloor

minus 1 for 0le j le I

0 if

lfloor

Isum

k=0

vkτn

rfloor

le ile lfloorτnrfloor minus 1

In other words bni defines a deterministic discrete time approximation tothe continuous time occupancy rate process that uses e0 minus e1 for an amount

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 19

of time v0τ e1 minus e2 for an amount of time v1τ and so on This continuoustime process will move the occupancy process from γ(0) to γ(τ) at time τ If ej minus ejminus1 is used at the discrete time step i then since microni concentratesits mass on ej minus ejminus1 the cost is

D( microni microΓn(in)) = log

(

1

Γnjminus1(in)

)

=minus log

(

Γnjminus1

(

i

n

))

(33)

The process Γn(in) possesses important monotonicity and convergence

properties Since Γnjminus1((i+1)n)minus Γn

jminus1(in) =minus1n for lfloorsumjminus1

k=0 vkτnrfloor le ile

lfloorsumj

k=0 vkτnrfloorminus 1

Γnjminus1

(

i

n

)

darr Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

as i uarr lfloorsumj

k=0 vkτnrfloor In addition because the (j minus 1)st component is never

modified when ige lfloorsumj

k=0 vkτnrfloor it follows that

Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

rarr γjminus1(τ)

as nrarrinfin and ηrarr 0 Furthermore as observed previously (32) implies theexistence of b gt 0 and K isinN such that γj(τ)ge bτK for j = 0 I Thus atany given time step i we have a strictly positive lower bound on the relevantcomponent of Γn which in turn provides a strictly finite upper bound on thecorresponding relative entropy cost Indeed it follows from (33) and γj(τ)gebτK that for all sufficiently large n and small η gt 0 there are C1C2 ltinfin(and independent of τ ) such that whenever |Γn(0)minus γ(0)|lt η for all i

D(microni microΓn(in))leC1[minus log τK ]leminusC2 log τ

In addition as nrarrinfin and ηrarr 0

Γn(

1

nlfloorτnrfloor

)

rarr γ(τ)

By the Lebesgue dominated convergence theorem for all sufficiently large nand sufficiently small η gt 0 |Γn(0)minus γ(0)|lt η implies

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

leminusC2τ log τ

We now choose τ gt 0 so that minusC2τ log τ le ε2 Choosing τ gt 0 smaller ifneed be we can also guarantee that |Γn(x)minusγ(x)| le δ for all x isin [0 τ ] wp1if |Γn(0)minus γ(0)| lt η and η gt 0 is sufficiently small The following bound is

20 P DUPUIS C NUZMAN AND P WHITING

therefore valid for the given τ gt 0 for any σ gt 0 and all sufficiently smallη gt 0 |Γn(0)minus γ(0)|lt η implies

lim infnrarrinfin

1

nlogPΓn(0)

(∣

Γn(

lfloornτrfloor

n

)

minus γ

(

lfloornτrfloor

n

)∣

ltσ

supxisin[0τ ]

|Γn(x)minus γ(x)|lt δ

)

geminusε

2

Note that the asymptotic lower bound on the normalized log of the proba-bility is independent of σ gt 0 To obtain the lower bound for all x isin [0 β]we will use the Markov property and an existing lower bound for pathswhich avoid that boundary This latter lower bound will hold uniformly ina neighborhood of the initial condition γ(τ) Since we do not know a priorihow small this neighborhood must be it is important that the lower boundin the last display should be independent of σ gt 0

Now choose ζ isin (0 δ] such that γ(x) is at least distance 2ζ from theboundary of SI for all x isin [τ β] Recall that when considered as a func-tion of γ the distribution of bi(γ) is continuous in the weak topology andmoreover that the support of this distribution is independent of γ so longas γi gt 0 for all i isin 01 I + 1 (ie γ isin S

I where SI denotes interior

relative to the smallest affine space that contains SI) It then follows fromProposition 661 of [11] (see also the discussion on [11] page 165 regardinguniformity) that Γn n= 12 satisfies the following uniform large devi-ations lower bound given any ε gt 0 and ζ gt 0 defined above there is σ gt 0such that as long as |Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ

lim infnrarrinfin

1

nlogPΓn(lfloornτrfloorn)lfloornτrfloorn

(

supτlexleβ

|Γn(x)minus γ(x)|lt ζ

)

geminusJ(γ)minusε

2

where PΓn(lfloornτrfloorn)lfloornτrfloorn denotes probability given the occupancy levels Γn(lfloornτrfloorn)at time lfloornτrfloorn Proposition 661 of [11] assumes Condition 632 It is worthnoting that in the present setting this condition holds with the particularlysimple choice β = γ (using the notation of [11])

The lower bound (31) now follows by the Markov property and the lasttwo displays The proof that J has compact level sets is as in [10] andtherefore omitted

4 Examples and extensions In this section we apply the results of theprevious sections to three different occupancy problems We show how theparameters of interest may be computed numerically and plot the solutionsto the associated calculus of variations problems In Section 44 we list someother asymptotic problems of interest which can be solved by relativelystraightforward generalizations of the results presented in this paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 21

In the calculus of variations problems solved in Section 2 precise initialand terminal points were always given In typical applications one is inter-ested in the minimum value of the rate function over a constraint set Whenthe constraint set is sufficiently simple such problems may still be solvedeasily using the tools provided in Section 22 The three problems of thissection are of this type

Suppose that the event of interest is that the random endpoint Γn(β)should lie in a terminal constraint set Ω To apply the LDP for Γn(β)established by Corollary 23 one must compute exponents of the form

J (Ω) = infωisinΩ

J (ω)

Using Theorem 27 we can write

J (Ω) = infωisinΩ

infπisinF (αωβ)

Ksum

k=0

αkD(π(k)P(β))

(41)

= infπisinF (αΩβ)

Ksum

k=0

αkD(π(k)P(β))

where we have abused notation to define F (αΩ β) =⋃

ωisinΩF (αωβ)In many cases for example when the terminal set Ω is convex and defined

by linear constraints the exponent J can be computed directly from (41)using Lagrange multipliers That is one solves minimization problems of thetype given in Theorem 27 but with the endpoint constraints (210) replacedby constraints defining Ω This is the approach used in the second andthird example below We do not prove that appropriate Lagrange multipliersalways exist if needed existence may be established using methods similarto those used in the Appendix for the case Ω= ω Because of the convexityof relative entropy in (41) a local minimum is always a global minimum overconvex sets Hence in any particular scenario with convex Ω it is sufficientto establish a local minimum by numerically computing a set of Lagrangemultipliers

An alternative approach for computing J (Ω) could be to return to thesample path level and use natural boundary conditions on the extremalcurves (see [25])

A set Ω with interior Ω and closure Ω is a J -continuity set if and onlyif infωisinΩ J (ω) = infωisinΩJ (ω) For such sets the large deviations lower andupper bounds coincide so that minus lim1n logP (Γn(β) isinΩ) = infωisinΩJ (ω) Itmay readily be verified in each of the following examples that the event ofinterest is a J -continuity set

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 19: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 19

of time v0τ e1 minus e2 for an amount of time v1τ and so on This continuoustime process will move the occupancy process from γ(0) to γ(τ) at time τ If ej minus ejminus1 is used at the discrete time step i then since microni concentratesits mass on ej minus ejminus1 the cost is

D( microni microΓn(in)) = log

(

1

Γnjminus1(in)

)

=minus log

(

Γnjminus1

(

i

n

))

(33)

The process Γn(in) possesses important monotonicity and convergence

properties Since Γnjminus1((i+1)n)minus Γn

jminus1(in) =minus1n for lfloorsumjminus1

k=0 vkτnrfloor le ile

lfloorsumj

k=0 vkτnrfloorminus 1

Γnjminus1

(

i

n

)

darr Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

as i uarr lfloorsumj

k=0 vkτnrfloor In addition because the (j minus 1)st component is never

modified when ige lfloorsumj

k=0 vkτnrfloor it follows that

Γnjminus1

(

1

n

lfloor jsum

k=0

vkτn

rfloor)

rarr γjminus1(τ)

as nrarrinfin and ηrarr 0 Furthermore as observed previously (32) implies theexistence of b gt 0 and K isinN such that γj(τ)ge bτK for j = 0 I Thus atany given time step i we have a strictly positive lower bound on the relevantcomponent of Γn which in turn provides a strictly finite upper bound on thecorresponding relative entropy cost Indeed it follows from (33) and γj(τ)gebτK that for all sufficiently large n and small η gt 0 there are C1C2 ltinfin(and independent of τ ) such that whenever |Γn(0)minus γ(0)|lt η for all i

D(microni microΓn(in))leC1[minus log τK ]leminusC2 log τ

In addition as nrarrinfin and ηrarr 0

Γn(

1

nlfloorτnrfloor

)

rarr γ(τ)

By the Lebesgue dominated convergence theorem for all sufficiently large nand sufficiently small η gt 0 |Γn(0)minus γ(0)|lt η implies

minus1

nlogEΓn(0)

(

expminusnh

(

Γn(

lfloornτrfloor

n

)))

leminusC2τ log τ

We now choose τ gt 0 so that minusC2τ log τ le ε2 Choosing τ gt 0 smaller ifneed be we can also guarantee that |Γn(x)minusγ(x)| le δ for all x isin [0 τ ] wp1if |Γn(0)minus γ(0)| lt η and η gt 0 is sufficiently small The following bound is

20 P DUPUIS C NUZMAN AND P WHITING

therefore valid for the given τ gt 0 for any σ gt 0 and all sufficiently smallη gt 0 |Γn(0)minus γ(0)|lt η implies

lim infnrarrinfin

1

nlogPΓn(0)

(∣

Γn(

lfloornτrfloor

n

)

minus γ

(

lfloornτrfloor

n

)∣

ltσ

supxisin[0τ ]

|Γn(x)minus γ(x)|lt δ

)

geminusε

2

Note that the asymptotic lower bound on the normalized log of the proba-bility is independent of σ gt 0 To obtain the lower bound for all x isin [0 β]we will use the Markov property and an existing lower bound for pathswhich avoid that boundary This latter lower bound will hold uniformly ina neighborhood of the initial condition γ(τ) Since we do not know a priorihow small this neighborhood must be it is important that the lower boundin the last display should be independent of σ gt 0

Now choose ζ isin (0 δ] such that γ(x) is at least distance 2ζ from theboundary of SI for all x isin [τ β] Recall that when considered as a func-tion of γ the distribution of bi(γ) is continuous in the weak topology andmoreover that the support of this distribution is independent of γ so longas γi gt 0 for all i isin 01 I + 1 (ie γ isin S

I where SI denotes interior

relative to the smallest affine space that contains SI) It then follows fromProposition 661 of [11] (see also the discussion on [11] page 165 regardinguniformity) that Γn n= 12 satisfies the following uniform large devi-ations lower bound given any ε gt 0 and ζ gt 0 defined above there is σ gt 0such that as long as |Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ

lim infnrarrinfin

1

nlogPΓn(lfloornτrfloorn)lfloornτrfloorn

(

supτlexleβ

|Γn(x)minus γ(x)|lt ζ

)

geminusJ(γ)minusε

2

where PΓn(lfloornτrfloorn)lfloornτrfloorn denotes probability given the occupancy levels Γn(lfloornτrfloorn)at time lfloornτrfloorn Proposition 661 of [11] assumes Condition 632 It is worthnoting that in the present setting this condition holds with the particularlysimple choice β = γ (using the notation of [11])

The lower bound (31) now follows by the Markov property and the lasttwo displays The proof that J has compact level sets is as in [10] andtherefore omitted

4 Examples and extensions In this section we apply the results of theprevious sections to three different occupancy problems We show how theparameters of interest may be computed numerically and plot the solutionsto the associated calculus of variations problems In Section 44 we list someother asymptotic problems of interest which can be solved by relativelystraightforward generalizations of the results presented in this paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 21

In the calculus of variations problems solved in Section 2 precise initialand terminal points were always given In typical applications one is inter-ested in the minimum value of the rate function over a constraint set Whenthe constraint set is sufficiently simple such problems may still be solvedeasily using the tools provided in Section 22 The three problems of thissection are of this type

Suppose that the event of interest is that the random endpoint Γn(β)should lie in a terminal constraint set Ω To apply the LDP for Γn(β)established by Corollary 23 one must compute exponents of the form

J (Ω) = infωisinΩ

J (ω)

Using Theorem 27 we can write

J (Ω) = infωisinΩ

infπisinF (αωβ)

Ksum

k=0

αkD(π(k)P(β))

(41)

= infπisinF (αΩβ)

Ksum

k=0

αkD(π(k)P(β))

where we have abused notation to define F (αΩ β) =⋃

ωisinΩF (αωβ)In many cases for example when the terminal set Ω is convex and defined

by linear constraints the exponent J can be computed directly from (41)using Lagrange multipliers That is one solves minimization problems of thetype given in Theorem 27 but with the endpoint constraints (210) replacedby constraints defining Ω This is the approach used in the second andthird example below We do not prove that appropriate Lagrange multipliersalways exist if needed existence may be established using methods similarto those used in the Appendix for the case Ω= ω Because of the convexityof relative entropy in (41) a local minimum is always a global minimum overconvex sets Hence in any particular scenario with convex Ω it is sufficientto establish a local minimum by numerically computing a set of Lagrangemultipliers

An alternative approach for computing J (Ω) could be to return to thesample path level and use natural boundary conditions on the extremalcurves (see [25])

A set Ω with interior Ω and closure Ω is a J -continuity set if and onlyif infωisinΩ J (ω) = infωisinΩJ (ω) For such sets the large deviations lower andupper bounds coincide so that minus lim1n logP (Γn(β) isinΩ) = infωisinΩJ (ω) Itmay readily be verified in each of the following examples that the event ofinterest is a J -continuity set

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 20: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

20 P DUPUIS C NUZMAN AND P WHITING

therefore valid for the given τ gt 0 for any σ gt 0 and all sufficiently smallη gt 0 |Γn(0)minus γ(0)|lt η implies

lim infnrarrinfin

1

nlogPΓn(0)

(∣

Γn(

lfloornτrfloor

n

)

minus γ

(

lfloornτrfloor

n

)∣

ltσ

supxisin[0τ ]

|Γn(x)minus γ(x)|lt δ

)

geminusε

2

Note that the asymptotic lower bound on the normalized log of the proba-bility is independent of σ gt 0 To obtain the lower bound for all x isin [0 β]we will use the Markov property and an existing lower bound for pathswhich avoid that boundary This latter lower bound will hold uniformly ina neighborhood of the initial condition γ(τ) Since we do not know a priorihow small this neighborhood must be it is important that the lower boundin the last display should be independent of σ gt 0

Now choose ζ isin (0 δ] such that γ(x) is at least distance 2ζ from theboundary of SI for all x isin [τ β] Recall that when considered as a func-tion of γ the distribution of bi(γ) is continuous in the weak topology andmoreover that the support of this distribution is independent of γ so longas γi gt 0 for all i isin 01 I + 1 (ie γ isin S

I where SI denotes interior

relative to the smallest affine space that contains SI) It then follows fromProposition 661 of [11] (see also the discussion on [11] page 165 regardinguniformity) that Γn n= 12 satisfies the following uniform large devi-ations lower bound given any ε gt 0 and ζ gt 0 defined above there is σ gt 0such that as long as |Γn(lfloornτrfloorn)minus γ(lfloornτrfloorn)|lt σ

lim infnrarrinfin

1

nlogPΓn(lfloornτrfloorn)lfloornτrfloorn

(

supτlexleβ

|Γn(x)minus γ(x)|lt ζ

)

geminusJ(γ)minusε

2

where PΓn(lfloornτrfloorn)lfloornτrfloorn denotes probability given the occupancy levels Γn(lfloornτrfloorn)at time lfloornτrfloorn Proposition 661 of [11] assumes Condition 632 It is worthnoting that in the present setting this condition holds with the particularlysimple choice β = γ (using the notation of [11])

The lower bound (31) now follows by the Markov property and the lasttwo displays The proof that J has compact level sets is as in [10] andtherefore omitted

4 Examples and extensions In this section we apply the results of theprevious sections to three different occupancy problems We show how theparameters of interest may be computed numerically and plot the solutionsto the associated calculus of variations problems In Section 44 we list someother asymptotic problems of interest which can be solved by relativelystraightforward generalizations of the results presented in this paper

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 21

In the calculus of variations problems solved in Section 2 precise initialand terminal points were always given In typical applications one is inter-ested in the minimum value of the rate function over a constraint set Whenthe constraint set is sufficiently simple such problems may still be solvedeasily using the tools provided in Section 22 The three problems of thissection are of this type

Suppose that the event of interest is that the random endpoint Γn(β)should lie in a terminal constraint set Ω To apply the LDP for Γn(β)established by Corollary 23 one must compute exponents of the form

J (Ω) = infωisinΩ

J (ω)

Using Theorem 27 we can write

J (Ω) = infωisinΩ

infπisinF (αωβ)

Ksum

k=0

αkD(π(k)P(β))

(41)

= infπisinF (αΩβ)

Ksum

k=0

αkD(π(k)P(β))

where we have abused notation to define F (αΩ β) =⋃

ωisinΩF (αωβ)In many cases for example when the terminal set Ω is convex and defined

by linear constraints the exponent J can be computed directly from (41)using Lagrange multipliers That is one solves minimization problems of thetype given in Theorem 27 but with the endpoint constraints (210) replacedby constraints defining Ω This is the approach used in the second andthird example below We do not prove that appropriate Lagrange multipliersalways exist if needed existence may be established using methods similarto those used in the Appendix for the case Ω= ω Because of the convexityof relative entropy in (41) a local minimum is always a global minimum overconvex sets Hence in any particular scenario with convex Ω it is sufficientto establish a local minimum by numerically computing a set of Lagrangemultipliers

An alternative approach for computing J (Ω) could be to return to thesample path level and use natural boundary conditions on the extremalcurves (see [25])

A set Ω with interior Ω and closure Ω is a J -continuity set if and onlyif infωisinΩ J (ω) = infωisinΩJ (ω) For such sets the large deviations lower andupper bounds coincide so that minus lim1n logP (Γn(β) isinΩ) = infωisinΩJ (ω) Itmay readily be verified in each of the following examples that the event ofinterest is a J -continuity set

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 21: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 21

In the calculus of variations problems solved in Section 2 precise initialand terminal points were always given In typical applications one is inter-ested in the minimum value of the rate function over a constraint set Whenthe constraint set is sufficiently simple such problems may still be solvedeasily using the tools provided in Section 22 The three problems of thissection are of this type

Suppose that the event of interest is that the random endpoint Γn(β)should lie in a terminal constraint set Ω To apply the LDP for Γn(β)established by Corollary 23 one must compute exponents of the form

J (Ω) = infωisinΩ

J (ω)

Using Theorem 27 we can write

J (Ω) = infωisinΩ

infπisinF (αωβ)

Ksum

k=0

αkD(π(k)P(β))

(41)

= infπisinF (αΩβ)

Ksum

k=0

αkD(π(k)P(β))

where we have abused notation to define F (αΩ β) =⋃

ωisinΩF (αωβ)In many cases for example when the terminal set Ω is convex and defined

by linear constraints the exponent J can be computed directly from (41)using Lagrange multipliers That is one solves minimization problems of thetype given in Theorem 27 but with the endpoint constraints (210) replacedby constraints defining Ω This is the approach used in the second andthird example below We do not prove that appropriate Lagrange multipliersalways exist if needed existence may be established using methods similarto those used in the Appendix for the case Ω= ω Because of the convexityof relative entropy in (41) a local minimum is always a global minimum overconvex sets Hence in any particular scenario with convex Ω it is sufficientto establish a local minimum by numerically computing a set of Lagrangemultipliers

An alternative approach for computing J (Ω) could be to return to thesample path level and use natural boundary conditions on the extremalcurves (see [25])

A set Ω with interior Ω and closure Ω is a J -continuity set if and onlyif infωisinΩ J (ω) = infωisinΩJ (ω) For such sets the large deviations lower andupper bounds coincide so that minus lim1n logP (Γn(β) isinΩ) = infωisinΩJ (ω) Itmay readily be verified in each of the following examples that the event ofinterest is a J -continuity set

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 22: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

22 P DUPUIS C NUZMAN AND P WHITING

41 The classical occupancy problem In the classical occupancy prob-lem the urns are initially empty and one only distinguishes between emptyand occupied urns or in other words I = 0 The associated large deviationsproblem was solved using a sample path approach in [28] we show herehow this case may be obtained with our results We might be interested inthe probability of having an unusually large number of empty urns Γn

0 (β)gtω0 gt eminusβ or an unusually small number of empty urns Γn

0 (β) lt ω0 lt eminusβ In either case the calculus of variation problem is that in which γ0(β) = ω0From (26) it is immediate that C = 1ρ and

γ0(x) =1

ρeminusρx + 1minus

1

ρ

Using (25) we find that ρ is determined by the unique nonnegative solutionto

ρ(1minus ω0) = 1minus eminusβρ

Finally (27) provides a simple expression for J (ω) in terms of ω0 β Cand ρ

Fig 1 Cumulative urn occupancies ψ0 through ψ5 for the classical occupancy problemincluding unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 23: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 23

In addition our analysis gives the sample paths for the higher occupanciesconditioned on an unusual number of empty urns namely γi(x) = Pi(ρx)ρfor i gt 0 Figure 1 depicts the first five occupancy levels as a function ofballs per urn thrown x In this example the terminal fraction of empty urnsω0 = 015 is three times larger than the expected value P0(30) = 005

42 The overflow problem In the overflow problem an urn is consideredfull when it reaches a finite capacity I gt 0 Once an urn has been filled suc-cessive balls thrown to that urn fall to the floor For specificity we considerthe problem of determining the probability that an unusually large numberof balls end up on the floor and assume the empty initial conditions α0 = 1

The problem can be handled using urns of infinite capacity in the followingway The number of balls that would have fallen on the floor in a finitecapacity system is the number of balls in urns with occupancy greater than I minus I times the number of such urns When r balls have been thrown therandom number of overflowing balls w(r) is thus

w(r) = rminus nΓ1(rn)minus 2nΓ2(rn)minus middot middot middot minus InΓI(rn)minus InΓI+(rn)

or since ΓI+ = 1minussumI

i=0Γi

w(r) = rminus nI + nIsum

i=0

(I minus i)Γi(rn)

In order to compute

JO(ηβ)=minus lim

n

1

nlogP

(

w(βn)

ngt η

)

we therefore consider sample paths which satisfy the end constraint

Isum

i=0

(I minus i)γi(β)ge η+ I minus β= ζ(42)

Note that ζ can be interpreted as the average spare capacity per urn whichmust satisfy the bounds [I minus β]+ le ζ lt I and that the average overflowsatisfies [β minus I]+ le η lt β Assuming that η (and ζ) is larger than wouldbe expected in the zero cost case the minimum large deviations exponentwill be achieved with equality in the constraint (42) Assuming that η gt 0we are in the exponential case and the large deviations exponent JO(ηβ)will be given by minimizing the divergence between π and P(β) under thelinear equality constraints

infinsum

i=0

πi = 1infinsum

i=0

iπi = β andIsum

i=0

(I minus i)πi = ζ(43)

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 24: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

24 P DUPUIS C NUZMAN AND P WHITING

We introduce the Lagrangian

L(πy z λ) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

+ λ

(

ζ minusIsum

i=0

(I minus i)πi

)

with Lagrange multipliers y z and λ On differentiating it follows that theminimizing distribution πlowast should satisfy the conditions

logπlowasti = logPi(β)minus 1 + y + iz + (I minus i)+λ

In terms of variables Cρ and ν we may write

πlowasti =CPi(ρβ) for ige I

and

πlowasti =CPi(ρβ)νIminusi for ile I

The distribution is conditionally Poisson ρβν for i le I and conditionallyPoisson ρβ for ige I For convenience we introduce the notation

QI(ρ)=

infinsum

i=I

Pi(ρβ) =1

ρβ

infinsum

i=I+1

iPi(ρβ)

RI(ρ ν)= νIeminusρβ(1minus1ν)

Iminus1sum

i=0

Pi

(

ρβ

ν

)

=νI+1

ρβeminusρβ(1minus1ν)

Isum

i=1

iPi

(

ρβ

ν

)

Since πlowast must satisfy the three linear constraints (43) the constants Cρ and ν must solve the equations

C(RI(ρ ν) +QI(ρ)) = 1

C

(

ρβ

νRI(ρ ν) + ρβQI(ρ)

)

= β

C

(

IPI(ρβ) +

(

I minusρβ

ν

)

RI(ρ ν)

)

= ζ

There can be at most one positive triple (Cρ ν) satisfying these equationssince each such triple identifies a local minimum of D(πP(β)) for π in aconvex set and there can be only one such minimum The equations can besolved numerically in a number of ways to obtain C ρ and ν For example

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 25: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 25

from the first constraint C can be expressed C = (RI +QI)minus1 Substituting

this expression into the second and third constraints we obtain the equations

(ρν minus 1)RI(ρ ν) + (ρminus 1)QI(ρ) = 0

(I minus ρβν minus ζ)RI(ρ ν)minus ζQI(ρ) + IPI(ρβ) = 0

Each equation implicitly defines a curve ν as a function of ρ and the in-tersection of the two curves gives the desired (ρ ν) We note that largerthan expected values of ζ will lead to ν gt ρ gt 1 while smaller than expectedvalues of ζ give ν lt ρ lt 1

The large deviations exponent for the overflow problem may then be ex-pressed

JO(ηβ) =D(πlowastP(β))

=infinsum

i=0

πlowasti log(Ceβminusρβρiν(Iminusi)+)

= logC + β(1minus ρ) + β log ρ+ ζ log ν

43 Partial coupon collection with initial conditions In the coupon col-lectorrsquos problem the urns represent the n types of coupons that are requiredto form a complete collection The placement of a ball in a given urn corre-sponds to choosing a new coupon at random and the problem is to see howmany coupons must be collected before I + 1 complete sets are obtainedThis event corresponds to the constraint ωi = 0 ile I

In this section we solve a generalization of this problem Beginning fromnonempty initial conditions (a collection already in progress) we collect βnadditional coupons with the goal of obtaining more than I coupons of asmany types as possible We want to determine how likely it is that numberof types for which we have collected I or fewer coupons is less than ξn

In terms of the urn problems we have considered we are given initialoccupancies α and wish to compute

JC(αβ ξ)=minus lim

n

1

nlogP

(

Isum

i=0

Γni (β)lt ξ

)

where

ξ ltKsum

k=0

αk

Iminusksum

i=0

Pi(β)

is an unusually small number of low occupancy urnsThe exponent JC will be given by computing J (ω) as defined in Theo-

rem 27 subject to the conservation constraint (211) replacing the terminal

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 26: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

26 P DUPUIS C NUZMAN AND P WHITING

conditions (210) with the single constraint

Ksum

k=0

αk

Iminusksum

j=0

πkj = ξ

where K le I since any sets which are initially complete may be left out ofthe problem After constructing a Lagrangian and differentiating we findthat the minimizing solution must be of the form

ωkj = πlowastkj =

CkPj(ρβ)W j + k le ICkPj(ρβ) j + k gt I

As in the previous example the unknown constants may be determined bysubstituting the given form of the solution into the constraint equationsand solving the resulting system of equations In terms of these constantsthe large deviations exponent may be expressed

JC(αβ ξ) = β(1minus ρ+ log ρ) + ξ logW +Ksum

k=0

αk logCk

Fig 2 Cumulative urn occupancies ψ0 through ψ5 for a partial coupon collectors prob-lem including unconstrained paths (dashed curves) and constrained paths (solid curves)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 27: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 27

Figure 2 depicts several cumulative occupancy curves ψi for a particularexample Suppose that there are n=100 types of coupons to collect andthe goal is to collect at least four coupons of as many types as possible(ie I = 3) Initially one is given a single coupon of 30 types and pairsof coupons of a further 20 types corresponding to the initial conditionsα = [05 03 02] In the zero-cost solution (dashed lines) ψ3(20) asymp 071Hence after collecting 200 additional coupons at random (β = 2) one wouldexpect to have 4 or more coupons for only about 29 types To compute thelikelihood that we have at least 4 coupons for more than 45 types we takeξ = 055 which gives a large deviations exponent JC asymp 018 and a probabilityof about 10minus8 The corresponding constrained occupancy curves are depictedby solid lines in Figure 2

44 Extensions There are a number of variations of the basic occupancyproblem which can be solved by fairly straightforward generalizations of theresults of this paper but which we will only mention briefly These includethe following problems

(i) a random number of balls are thrown(ii) balls have a probability p of not entering any urn(iii) balls enter different subsets of urns with differing probabilities(iv) an event of interest may occur at any time in the interval (0 β]

rather than just at time β

Some comments are in order A particular example of (i) appeared in [13]where the number of balls thrown r was binomially distributed with pa-rameters 0 lt a lt 1 and n An urn model proposed by [16] is of type (ii)Here the goal is to determine the distribution of the number of targets hitwhen r shots are fired at n targets and when the probability of missingthe target is p In problem (iii) there are K gt 1 urn classes with a frac-tion αk k = 12 K of urns in each class Urns enter class k with a fixedprobability pk but then enter any urn within that class with uniform proba-bility Similar analysis to that for nonempty initial conditions can be appliedto this problem For an example of type (iv) suppose that an infinite se-quence of balls is thrown into n initially empty urns and that one wouldlike to know the probability that the number of urns containing exactly oneball ever exceeds n2 The probability of this occurring after βn balls havebeen thrown can be bounded above by the probability that the number ofempty and singly occupied urns exceeds n2 when exactly βn balls havebeen thrown This computation fits into the framework of a partial couponcollectors problem and the probability can be made negligible from thepoint of view of large deviations by taking β sufficiently large The remain-ing possibility that the event occurs before βn balls have been thrown isthen a problem of type (iv) The associated calculus of variations problem

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 28: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

28 P DUPUIS C NUZMAN AND P WHITING

is to find the lowest cost occupancy curve on (0 β] among all curves withγ1(x)gt 05 for some x isin (0 β]

APPENDIX

Analysis of the calculus of variations problem The Appendix is dedi-cated to proving the calculus of variations results given in Section 22 Re-call that these results provide explicit representations for the terminal ratefunction J (ω) defined in Corollary 23 and for the minimizing occupancyfunctions γlowast satisfying J (ω) = J(γlowast)

In the first step of the proof we characterize a set of extremal occupancypaths that is paths which satisfy the EulerndashLagrange equations For allfeasible terminal conditions of the form (1 ω β) Theorem A1 shows thatoccupancy paths of the form given in Theorem 26 are extremals and thatpaths of this form can be constructed to meet any feasible constraints of theform (1 ω β) Likewise Lemma A6 and Theorem A11 show that occupancypaths of the form given in Theorem 28 are extremals and such paths can beconstructed to meet all general feasible conditions (αωβ) The special formof the extremals is used in Theorem A5 and Theorem A13 to show thatthe extremals have the costs given in Theorem 25 and Theorem 27 respec-tively If γ is the extremal occupancy path constructed for given constraints(αωβ) we thus have the upper bound J (ω)le J(γ) The assertion in The-orem 25 and Theorem 27 that the minimum relative entropy is achieved bya unique distribution πlowast is established in Lemma A6 The final step neededto prove Theorems 25ndash28 is the lower bound J (ω)ge J(γ) This bound isproved in Theorem A14 using the EulerndashLagrange equations together withproperties of the relative entropy

A1 Preliminaries

A2 Proof of Lemma 24 Recall that the lemma states that endpointconstraints (αωβ) are feasible if and only if

isum

j=0

αj geisum

j=0

ωj i= 0 I (monotonicity)(A4)

and

Isum

i=0

iωi + (I +1)ωI+ leI+1sum

i=0

iαi + β (conservation)(A5)

Proof of Lemma 24 If a valid occupancy curve γ [0 β]rarr SI meetsthe initial and terminal constraints α and ω then property (b) of Lemma 21

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 29: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 29

implies (A4) and property (c) implies (A5) since

Isum

i=0

(

isum

j=0

αj minusisum

j=0

ωj

)

=Isum

j=0

(I +1minus j)(αj minus ωj)

(A6)

= (I + 1)(ωI+ minusαI+1) +Isum

j=0

j(ωj minusαj)

On the other hand given the constraints (A4) and (A5) one can show thatthe linear functions

γi(x) = αi + (ωi minusαi)x

β i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x) = αI+1 + (ωI+ minusαI+1)x

β

satisfy the constraints and the conditions of Lemma 21 Properties (a) and (b)are immediate and property (c) will be established by showing thatminus

sumIi=0 ψi le 1

Indeed minusβsumI

i=0 ψi is equal to the left-hand side of (A6) and thereforeminussumI

i=0 ψi le 1 follows from (A5)

A3 EulerndashLagrange equations Given the numerous descriptions of oc-cupancy processes and rates (ψγ θ γ ψ etc) it is convenient to abusenotation Thus for example we will write both J(γ) and J(ψ) with the un-derstanding that the fundamental object of interest is the occupancy func-tion γ and that J(ψ) is merely J(γ) when ψ is the cumulative occupancyprocess that corresponds to γ Also since ψi =minusθi for i= 0 I we candefine the local rate function (which is usually written as a function of ψ andψ) as a function of ψ and θ and represent the overall cost of a cumulativeoccupancy trajectory ψ as an integral of the form

J(ψ) =

int β

0L(ψ(x) θ(x))dx

Because the balls are thrown uniformly and randomly into the urns theexpected rate for balls to enter urns of occupancy i is γi = ψi minus ψiminus1 Asdiscussed in Section 2 the cost of a deviation of a given path ψ from itsexpected behavior at a given instant is given by the rate function

L(ψ θ) =D(θγ)

=Isum

i=0

θi logθiγi

+ θI+ logθI+γI+

=Isum

i=0

θi logθi

ψi minusψiminus1+

(

1minusIsum

i=0

θi

)

log(1minus

sumIi=0 θi)

1minus ψI

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 30: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

30 P DUPUIS C NUZMAN AND P WHITING

The rate function is defined to be infinity if the curve is not a cumulativeoccupancy function

The calculus of variations problem is to find the path ψ having least costamong all paths satisfying given initial conditions and endpoint constraintsand to find the cost of such a minimal path As illustrated in the examplesin Section 4 the results extend to cases where the terminal point (or theinitial point) are required to lie in a given constraint set

Definition A1 An occupancy path defined on [0 β] is said to be anextremal if it satisfies the EulerndashLagrange equations [8]

partL

partψi(ψ(x) θ(x)) =minus

d

dx

partL

partθi(ψ(x) θ(x))

for all i isin 0 I x isin (0 β)

Although the EulerndashLagrange equations are neither necessary nor suf-ficient conditions for minimality in general extremals do turn out to beminimal in many cases In the following sections we will construct a fam-ily of extremal paths for the cost function given above and show that theextremal paths are in fact globally minimal

In the case at hand the EulerndashLagrange equations are given by

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1+ log

θI+1minusψI

(A7)

for i= 0 I minus 1 and by

minusθI

ψI minus ψIminus1+

θI+1minusψI

=d

dx

minus logθI

ψI minusψIminus1+ log

θI+1minusψI

(A8)

In the case when we have equality in the conservation constraint (24) andby taking I +1 to be I if necessary we have that

sumIi=0ωi = 1 Such cases of

equality are referred to as the polynomial case When this holds every validoccupancy path must have ψI(x) = 1 and therefore θI(x) = θI+(x) = 0 Forall such occupancy paths the rate function L(ψ θ) then reduces to

L(ψ θ) =Iminus1sum

i=0

θi logθi

ψi minusψiminus1(A9)

The EulerndashLagrange equations pertinent to the problem of minimizing thisrestricted set of occupancy paths are just

minusθi

ψi minusψiminus1+

θi+1

ψi+1 minusψi=

d

dx

minus logθi

ψi minusψiminus1

(A10)

for i= 0 I minus 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 31: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 31

A4 Characterization of the extremals under empty initial conditions

In this section we consider the simplest and most important case in whichthe urns are all initially empty α0 = 1 Recall that to each feasible end-point constraint (1 ω β) correspond to twist parameters C ge 0 ρ gt 0 whichsatisfy the equations

Isum

i=0

ωi +infinsum

i=I+1

CPi(ρβ) = 1

Isum

i=0

iωi +infinsum

i=I+1

iCPi(ρβ) = β

Theorem A1 Suppose that (1 ω β) are feasible terminal constraintsand that ρ and C are the corresponding twist parameters Then the set offunctions γ defined by

ψ0(x) = Ceminusρx +Isum

k=0

(ωk minusCPk(ρβ))

(

1minusx

β

)k

(A8)

γi(x) =xi

i(minus1)iψ

(i)0 (x) 0le ile I(A9)

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals on [0 β] which satisfy the terminal constraints along with theinitial constraint ψ0(0) = 1

Recall that in the special case C = 0 the first component of the extremalis simply the Ith order polynomial

ψ0(x) =Isum

k=0

ωk

(

1minusx

β

)k

(A10)

We refer to such paths as polynomial extremals and to extremals with C gt 0as exponential extremals

The following definition and lemma are useful in the proof of the above theorem

Definition A2 A nonnegative function ϕ is completely monotone onan interval [a b] if it is infinitely differentiable on [a b] with

(minus1)iϕ(i)(x)ge 0 for all x isin [a b] and i gt 0(A11)

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 32: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

32 P DUPUIS C NUZMAN AND P WHITING

This definition is based on the one pertaining to Bernsteinrsquos theorem whichcharacterizes Laplace transforms see [15] However our definition differs inthat it considers only a finite interval [a b]

Lemma A2 The function ψ0(x) given in (A8) is completely mono-tone on [0 β] Moreover the inequality in (A11) is strict for x isin [0 β) andi= 0 I In the case C gt 0 this can be strengthened to all i= 01

Proof Clearly ψ0(x) is infinitely differentiable Now suppose first thatC gt 0 The ith derivative of ψ0 is

(minus1)iψ(i)0 (x) = ρiCeminusρx +

Isum

k=i

(ωk minusCPk(ρβ))βminusi k

(k minus i)

(

1minusx

β

)kminusi

for i= 0 I and

(minus1)iψ(i)0 (x) = ρiCeminusρx(A12)

for i gt I It is clear that a(x)

= (minus1)I+1ψ

(I+1)0 (x) is completely monotone on [0infin)

Moreover (minus1)ia(i)(x) gt 0 x isin [0infin) i= 01 We deduce that ϕ(x) =

(minus1)Iψ(I)0 (x) must be monotonically strictly decreasing and that

ϕ(β) = (minus1)Iψ(I)0 (β)

= ρICeminusρβ +

(

ωI minusCeminusρβ (ρβ)I

I

)

βminusII

=

(

βI

I

)minus1

ωI ge 0

so that ϕ(x) gt 0 on [0 β) It follows that ϕ(x) is completely monotoneon [0 β] and that the derivative constraint is strict on [0 β) Proceedinginductively to I minus 1 and beyond we arrive at the lemma

In the polynomial case C = 0 the argument proceeds similarly on not-ing that

(minus1)Iψ(I)0 (x) =

(

βI

I

)minus1

ωI gt 0 for all x isin [0 β]

Proof of Theorem A1 The terminal constraints can be verified im-mediately by inspection The initial constraints follow from the constructionof C in (26) since

ψ0(0) =C

(

1minusIsum

i=0

Pi(ρβ)

)

+Isum

i=0

ωi = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 33: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 33

A similar computation also using (26) shows that ψ(1)0 (0) =minus1 a fact that

we will need shortlyTo establish that the given functions are valid occupancy curves it is

useful to introduce the infinite sequence of functions γi and corresponding ψi

obtained by extending (A9) to all i

γi(x) =xi

i(minus1)iψ

(i)0 (x) i= 01 I I +1 (A13)

As the sum of an exponential and a polynomial ψ0 has a Taylor seriesrepresentation of unlimited radius about any point x Then ψ0(0)minusψ0(x) =suminfin

i=1(minusx)i

i ψ(i)0 (x) and thus

infinsum

i=0

γi(x) =infinsum

i=0

(minusx)i

iψ(i)0 (x) = ψ0(0) = 1

Since by Lemma A2 the γi are nonnegative on [0 β] we have γi(x) isin Sinfinfor all x in that interval It follows from (A13) that for all ige 0 the rate ofdecrease of each cumulative occupancy is

θi(x) =minusψi(x)

=minusisum

k=0

γk(x)

(A14)

=isum

k=1

minusxkminus1

(kminus 1)(minus1)kψ

(k)0 (x) +

isum

k=0

xk

k(minus1)k+1ψ

(k+1)0 (x)

=xi

i(minus1)i+1ψ

(i+1)0 (x)ge 0

for x isin [0 β] Forming the Taylor series representation of ψ(1)0 about x it

follows from (A14) that

infinsum

i=0

θi(x) =infinsum

i=0

minusψi(x) =minusψ(1)0 (0) = 1

The infinite sequence of functions can thus be thought of as a valid infinite-dimensional occupancy path on [0 β] Conditions (a) and (b) of Lemma 21

are immediate from the expressions for γi and ψi in terms of ψ(i)0 and

condition (c) follows by integrating the inequality

Isum

i=0

θi(x)le 1

over an arbitrary subinterval of [0 β] Thus the finite-dimensional occu-pancy path γ is valid

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 34: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

34 P DUPUIS C NUZMAN AND P WHITING

Finally we must show that the given curves solve the EulerndashLagrangedifferential equations in (0 β) We begin with the exponential case C gt 0The I+ terms satisfy the simple expressions

θI+(x) = 1+Isum

i=0

ψi(x) =infinsum

i=I+1

minusψi(x) = ρCinfinsum

i=I+1

Pi(ρx)

1minus ψI(x) =infinsum

i=I+1

γi(x) =Cinfinsum

i=I+1

Pi(ρx)

where the first display uses θ = minusψ and equations (A12) and (A14) andthe second uses (A12) and (A13) Then the ratio θI+(x)(1 minus ψI(x)) = ρis constant and the corresponding terms drop out of the right-hand side of(A7) and (A8)

From the expressions for γi(x) and ψi(x) given in (A13) and (A14) itfollows that

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

x isin (0 β) ige 0(A15)

Then

d

dx

minus logθiγi

=minusψ(i+2)0

ψ(i+1)0

+ψ(i+1)0

ψ(i)0

=θi+1

γi+1minusθiγi x isin (0 β)(A16)

verifying (A7) Likewise to verify (A8) we apply (A16) with i= I using

the substitution minusψ(I+2)0 ψ

(I+1)0 = ρ obtained from (A12)

In the polynomial case C = 0 note that (A15) holds for i= 0 I sothat (A16) implies (A10)

A5 Interpretation of the twist parameter ρ For an exponential ex-tremal the twist parameter ρ may be interpreted as follows The expectedrate for balls to enter urns with more than I balls is equal to the proportionof such urns namely 1minusψI The twist parameter ρ is then a multiplicativefactor applied to 1minus ψI to give the actual rate at which balls enter theseurns Thus if ρ gt 1 balls unusually pile into high-occupancy urns whileif ρ lt 1 they instead concentrate on the low-occupancy urns The occu-pancy distribution of the high-occupancy urns remains Poisson but with amodified parameter

The infinite sequence of occupancy functions introduced in the proof justgiven is a useful construct Operations which are technically difficult ininfinite dimensions may nevertheless be carried out formally on the infi-nite sequence of functions giving insight in to the solution for the finite-dimensional system

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 35: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 35

The next two lemmas compute the cost of extremal curves in a generalform which will also apply to the nonempty case of the next section Theconditions of the lemmas are satisfied by the exponential and polynomialextremals of Theorem A1 as can be readily verified

Lemma A3 Suppose that ψ0 is completely monotone on [0 β] with

(minus1)iψ(i)0 (x) =Cρieminusρx for i gt I

for some I ge 0 C gt 0 and ρ gt 0 Further suppose that γ0 γ1 are aninfinite sequence of nonnegative functions on [0 β] satisfying

infinsum

i=0

γi(x) = 1infinsum

i=0

iγi(x)leB0

infinsum

i=0

minusψi(x) = 1 minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

for all x isin [0 β] and some constant B0 lt infin Let γ denote the vector offunctions γ0 γI γI+ where γI+ = 1minus

sumIi=0 γi Then the cost J(γ) is

given by

J(γ) = β +infinsum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A17)

Proof For all i gt I and x isin [0 β] we have ψ(i+1)0 ψ

(i)0 =minusρ Therefore

using θ =minusψ with (A12) and (A13) it follows that

θI+ = 1minusIsum

i=0

minusψi =infinsum

i=I+1

minusψi = ρinfinsum

i=I+1

γi = ρ(1minus ψI)

Then for each x isin [0 β] the cost function L can be interpreted as an infinite-

dimensional cost function Linfin Indeed since minusψiγi =minusψ(i+1)0 ψ

(i)0 = ρ for

i gt I and (1 +sumI

i=0 ψi)(1minus ψI) = ρ

L(ψ ψ) =Isum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1

+ (1 + ψ0 + middot middot middot+ ψI) log(1 + ψ0 + middot middot middot+ ψI)

1minus ψI

=infinsum

i=0

(minusψi) log(minusψi)

ψi minusψiminus1equivLinfin(ψ ψ)

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 36: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

36 P DUPUIS C NUZMAN AND P WHITING

Note that since Linfin(ψ ψ) can be interpreted as a relative entropy we alwayshave Linfin(ψ ψ)ge 0 The total cost J(γ) may be computed by integrating Linfin

Note that |sum

i ψi(x)|= 1 x isin [0 β] and that given ε gt 0 logminusψ(i+1)0 (x)ψ

(i)0 (x)

are uniformly bounded for xisin [ε β minus ε] By the monotone convergence the-orem

J(γ) = limεdarr0

int βminusε

ε

infinsum

i=0

minusψi(x) logminusψ

(i+1)0 (x)

ψ(i)0 (x)

dx

Using our convention that ψminus1(x) = 0 the integral on the right may bewritten as

int βminusε

ε

infinsum

i=0

minusψi(x) log|ψ(i+1)0 (x)|dx+

int βminusε

ε

infinsum

i=0

ψi(x) log|ψ(i)0 (x)|dx

(A18)

=infinsum

i=0

int βminusε

ε(ψi(x)minus ψiminus1(x)) log|ψ

(i)0 (x)|

The above expression is valid as long as the left and right series in the firstline converge But this follows from the finite mean condition on γi(x)since for i gt I

minusψi(x) log|ψ(i)0 (x)|= ργi(x)[logC + i log ρminus ρx]

and similarly for ψi(x) log |ψ(i+1)0 (x)| Applying integration by parts and

using (A15) for each term of (A18) the integral isinfinsum

i=0

[γi log|ψ(i)0 |]βminusε

ε +

int βminusε

ε

infinsum

i=0

minusψi dx

The lemma follows on taking limits as ε darr 0

A similar result holds in the case of the polynomial extremal

Lemma A4 Suppose that ψ0 is a degree I polynomial which is com-pletely monotone on [0 β] Let γ0 γI be nonnegative functions on [0 β]satisfying

Isum

i=0

γi(x) = 1Iminus1sum

i=0

minusψi(x) = 1

minusψi(x) = γi(x)minusψ

(i+1)0 (x)

ψ(i)0 (x)

i= 0 I minus 1

for all x isin [0 β] Then the cost J(γ) is given by

J(γ) = β +Isum

i=0

[γi(β) log|ψ(i)0 (β)| minus γi(0) log|ψ

(i)0 (0)|](A19)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 37: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 37

Proof The cost obtained by integrating the reduced cost function (A9)from 0 to β using the same substitutions and integration by parts as in theproof of Theorem A3

A6 Characterization of the extremal cost In the case of empty ini-tial conditions recall that the γi defined in the proof of Theorem A1

satisfy γi(x) = xi|ψ(i)0 (x)|i Using this expression to substitute for ψ

(i)0 in

(A17) and (A19) we find that the cost J(γ) is simply the relative entropyD(γ(β)P(β)) between the γi(β) and the Poisson zero cost distributionIt turns out that the given γi(β) minimize the relative entropy among alldistributions for which the first I + 1 elements are determined by ω andwhich have mean β Denoting the set of all such distributions by F (1 ω β)we may prove Theorem 25 which we restate here

Theorem A5 Suppose that γ is an extremal occupancy path constructedaccording to Theorem A1 to meet feasible terminal constraints (1 ω β)Then

J(γ) = minπisinF (1ωβ)

D(πP(β))

Proof We first solve the minimization problem and then relate theproblem to J(γ) The result is trivially true for polynomial extremals sincethen F (1 ω β) has only one element In the case of an exponential extremalthe given minimization problem can be solved using Lagrange multiplierswhich turn out to be simple functions of the twist parameters ρ and C gt0 associated with the given endpoint constraints We consider the set ofnonnegative sequences πj

infinj=0 isinR

infin+ satisfying πi = ωi for i= 0 I and

define the Lagrangian L on this set to be

L(πy z) =infinsum

i=0

πi logπi

Pi(β)+ y

(

1minusinfinsum

i=0

πi

)

+ z

(

β minusinfinsum

i=0

iπi

)

where z= log ρ and y

= logC + (1minus ρ)β +1

Define πlowasti= γi(β) where the latter are determined as in the proof of The-

orem A1 with the given C ρ For any i gt I the definitions of πlowasti x and yand the strict convexity of x logx imply that πlowasti is the unique global min-imizer of xrarr x logx minus x[logPi(β) + y + iz] Therefore for any i gt I andπi isin [0infin)

πi logπi

Pi(β)minus yπi minus ziπi ge πlowasti log

πlowastiPi(β)

minus yπlowasti minus ziπlowasti

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 38: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

38 P DUPUIS C NUZMAN AND P WHITING

Following standard Lagrangian arguments we thus have

infπisinF (1ωβ)

D(πP(β)) = infπisinF (1ωβ)

L(πy z)

ge infπisinRinfin

+

L(πy z)

=D(πlowastP(β))

Since πlowast isin F (1 ω β) it follows that πlowast the terminal distribution is theminimizer of the relative entropy The uniqueness of πlowast follows from thestrict convexity of the relative entropy with respect to its first argument

Substituting the particular form of πlowasti into (A17) we have

J(γ) =infinsum

i=0

γi(β) logγi(β)

βieminusβi=D(πlowastP(β))

where

γi(β) =

ωi 0le ile ICPi(ρβ) i gt I

The cost for an exponential extremal can be written explicitly in termsof ρ ω and β as

J(γ) =Isum

i=0

ωi logωi

Pi(β)+

(

1minusIsum

i=0

ωi

)

(logC + (1minus ρ)β)

+

(

β minusIsum

i=0

iωi

)

log ρ

where C is defined by (26) In the polynomial case the cost is simply givenby the first term in the above expression

A7 Characterization of the extremals under general initial conditions

We now generalize the solution to the EulerndashLagrange equations for the casewhen the urns are not all initially empty but in which they may contain up toK balls Recall that the fraction of urns having k balls initially is denoted αkand the set of urn occupancies k with αk gt 0 is denoted K Without loss ofgenerality we may assume that some urns are initially empty (hence 0 isinK)and we may take K =maxK

In the case K ge 1 we regard the urns as belonging to |K| classes accord-ing to their initial occupancies We denote the final number of additionalballs per urn entering the kth class by βk in which case the total number ofadditional balls per urn is β =

sum

kisinKαkβk It turns out in the solution of theEulerndashLagrange equations that the fraction of balls entering the kth class in

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 39: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 39

any time period is αkβkβ After rescaling time by the factor βkβ the evo-lution of additional balls entering each class of urns becomes an occupancyproblem with initially empty conditions and terminal time βk

We can thus define |K| occupancy curves γ(k) = γkii where the functionγki [0 βk] rarr [01] represents the fraction of class k urns which contain iadditional balls (thus k+ i total balls) after x balls per urn have been givento this class The overall extremal occupancy curves for the general initialconditions are then given by

γi(x) =sum

kisinK

αkγkiminusk(xβkβ)

We will see that the occupancy curve γ(k) for the kth subproblem is an ex-tremal of the form given by Theorem A1 with I minus k terminal constraintsGiven the appropriate subproblem terminal conditions (1 ω(k) βk) the re-sults of the previous section determine the twist parameters Ck and ρk andcorresponding extremals

At this point we come to the main obstacle in determining the extremalsfor nonempty initial conditions This is to show that there exist subprob-lem terminal conditions (1 ω(k) βk) which yield the extremal solution in theoverall problem To show that such conditions and the corresponding ex-tremals exist we will first give the form of the final cost function (whichcan be obtained by formal arguments) We then show that the cost functionis the solution of a minimization problem that the problem has a uniqueminimizing argument that it has corresponding Lagrange multipliers andthat the extremal curves can be constructed using the Lagrange multipliers

The large deviations exponent for the case of nonempty initial conditionsα turns out to be

minπisinS

|K|infin

sum

kisinK

αkD(π(k)P(β))(A20)

subject to the conservation constraint

sum

kisinK

αk

infinsum

j=0

jπkj = β(A21)

and the terminal conditions

ωi =sum

kleikisinK

αkπkiminusk for all 0le ile I(A22)

Recall that S|K|infin is the set of all |K|-tuples of distributions π(k)kisinK where

each π(k) is a distribution on the nonnegative integers It is straightforward toshow that this problem is feasible whenever (αωβ) are feasible constraintsas we discuss in the proof of Lemma A6 As in the empty case each polyno-mial problem can be formulated in a standard way so that ωI+ = 0 in which

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 40: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

40 P DUPUIS C NUZMAN AND P WHITING

case equality holds in the monotonicity conditionsumI

j=0αj =sumI

j=0ωj = 1 aswell as in the conservation condition (24) Unlike in the empty case theminimization problem does not become trivial in the polynomial case sincedegrees of freedom remain in allocating balls among the |K| classes In thepolynomial case the constraints imply that

πkj = 0 k+ j gt I(A23)

so that the minimization problem is finite dimensional The polynomial prob-lem can be stated equivalently as minimization of (A20) subject to (A23)and the endpoint constraints (A22) in which case the conservation con-straint (A21) need not be included explicitly

As written the vector of distributions π = (π(0) π(k) π(K)) is suchthat k only ranges over indices with αk gt 0 Including all indices 0le k leKyields an equivalent problem in which the minimizing solution is not uniquesince the π(k) with αk = 0 make no contribution to the objective functionor the constraints In the form given however the solution can be shown tobe unique

Lemma A6 Suppose that (αωβ) are feasible constraints Then there

is a unique vector of distributions πlowast isin S|K|infin which minimize (A20) subject

to constraints (A21) and (A22)

Proof Recall that F (αωβ) denotes the set of all distributions π isin

S|K|infin satisfying the constraints (A21) and (A22) The feasibility of (αωβ)

implies that F has at least one element Moreover it has at least one elementwith finite support and thus finite cost A sketch argument for this pointis as follows First of all any set of exponential constraints with terminalcondition ω isin SI can be reformulated as a set of polynomial constraints withterminal condition ω isin SI where I gt I Any feasible point for constraints(α ω β) would also be feasible for the original constraints and would havefinite support It remains to show that there is at least one feasible pointfor each set of feasible polynomial constraints Such a feasible point may beconstructed by an ordered filling construction First we assign π00 = ω0α0The first monotonicity condition (i= 0) in (23) ensures that this is possiblewith less than or equal to unit mass Next some of the remaining massfrom the distribution π(0) is applied to π01 and mass from π(1) is applied toπ10 as necessary until the constraint α0π01 + α1π10 = ω1 is satisfied Thatπ(0) and π(1) have sufficient mass to do so follows from the (i= 1) conditionin (23) This process continues until all constraints ωi have been satisfiedThe equality in the conservation condition (24) implies that the final stepuses up all of the probability mass in the distributions π(k)

It will be useful to consider |K|-tuples of distributions in Sinfin under thetopology of weak convergence with the distance between distributions given

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 41: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 41

by the Prohorov metric [5] For two distributions P Q in Sinfin this distanceis simply

sum

iPigtQi

Pi minusQi =sum

i QigtPi

Qi minus Pi

and the metric extends to |K|-tuples by treating them as elements of aproduct space For each Altinfin define the set

HA =

π isin S|K| sum

k

αkD(π(k)P(β)) leA

The level sets of the relative entropy are compact under the above topology([11] Lemma 143) from which it follows that HA is also compact For laterreference we note that a finite sum of relative entropy functions also inheritsfrom the relative entropy the properties of lower semi-continuity and of strictconvexity with respect to its first argument As the next step in proving thelemma we wish to show for some finite A that

QA(αωβ)= F (αωβ)capHA

is compact and nonempty This will enable us to find the minimum as alimit of a sequence of distributions in QA

Since there are solutions with finite cost it is automatic that QA(αωβ)is nonempty for large enough A Since the set QA is bounded it is enoughto verify for each convergent sequence of distributions π(n) isin QA that thelimiting distribution π lies in F (αβω) (That it lies in HA is immediatesince this set is compact) The only difficulty lies in showing that the mean

of the kth subdistribution π(k) is equal to the limit of the means of the π(n)(k)

This will be the case if we can show for any sequence in QA and for an

arbitrary k that the sequence π(n)(k) is uniformly integrable In the present

context uniform integrability means that for each ε gt 0 there is m ltinfinsuch that

sum

jgem

jπ(n)kj le ε(A24)

for all n and kSince ey minus 1 = supxgt0[xy minus x logx+ xminus 1] we have the inequality

xy le (x logxminus x+ 1) + (ey minus 1)

where by convention 0 log 0 = 0 Observe that this inequality is valid forall y and all xge 0 and that x logxminusx+1 is nonnegative We will also makeuse of the fact that the Poisson distribution has exponential moments forany δ gt 0

infinsum

j=0

e1δjPj(β) =infinsum

j=0

e1δjeminusββj

j= eβe

1δminusβ ltinfin

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 42: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

42 P DUPUIS C NUZMAN AND P WHITING

Taking y= jδ and x

= π

(n)kj Pj(β) we have the estimate

sum

jgem

jπ(n)kj =

sum

jgem

δj

δ

π(n)kj

Pj(β)Pj(β)

lesum

jge0

δ

[

π(n)kj log

( π(n)kj

Pj(β)

)

minus π(n)kj +Pj(β)

]

+ δsum

jgem

(e1δj minus 1)Pj(β)

= δD(π(n)k P(β)) + δ

sum

jgem

(e1δj minus 1)Pj(β)

le δA+ δsum

jgem

(e1δj minus 1)Pj(β)

where the second equality uses the fact that both π(n)(k) and P(β) are probabil-

ity distributions Thus (A24) follows by first picking δ gt 0 sufficiently smalland then mltinfin sufficiently large [Note also for use in Corollary A7 thatthe analogous result holds if β is replaced by a sequence β(n) rarr β isin (0infin)]

We now apply the uniform integrability to analyze the limits of the meansIt follows from [5] Theorem 54 that

β(n)k rarr βk

Finallysum

k

αkβk = limn

sum

k

αkβ(n)k = lim

nβ = β

so that π isin F (βαω) and QA is compactWe have shown that QA is compact under the Prohorov metric and that

it is nonempty Now let

G= inf

πisinQA

sum

kisinK

αkD(π(k)P(β))ge 0

Choose a sequence π(n) isin QA such thatsumK

k=0αkD(π(n)(k)P(β)) le G + 1n

Since QA is compact this has a convergent subsequence and to simplifythe notation we index this subsequence by n Let πlowast isin QA be the limitpoint Since relative entropy D(πη) is lower semi-continuous in π ([11]Lemma 143(b)) it follows that

sum

kisinK

αkD(πlowast(k)P(β)) le lim infn

sum

kisinK

αkD(π(n)(k)P(β)) =G

where in fact we have equality by definition of G The uniqueness of thisminimizing vector of distributions follows from the strict convexity of theobjective function

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 43: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 43

Before continuing with the question of the existence of the extremals wepause to obtain a corollary which will be useful in establishing the strongminimum in Theorem A14 Define J (αωβ) to be the minimum cost inthe problem (A20) if the constraints are feasible and infin otherwise

Corollary A7 Suppose that (α(n) ω(n) β(n)) is a sequence of feasibleoptimization problems with costs J (n) = J (α(n) ω(n) β(n)) such that

(α(n) ω(n) β(n))rarr (αωβ)

componentwise and such that (αωβ) is feasible with 0lt β ltinfin Then

lim infn

J (n) geJ (αωβ)

Proof For any Agt lim infnJ(n) =G we may choose a subsequence of

problems such that

J (n) leG+1n ltA

and for each problem in this subsequence we define π(n) isinQA(α(n) β(n) ω(n))

to be the minimizing solution πlowast given in the above lemma By compactnessthere is a further subsequence such that π(n) rarr π isinHA As in the proof ofLemma A6 if β ltinfin then

β(n)k

=

infinsum

j=0

jπ(n)kj rarr βk

=

infinsum

j=0

jπkj k = 01 K

If α(n)k rarr 0 and β

(n)k α

(n)k 6rarr 0 then the term α

(n)k D(π

(n)(k)P(β(n))) would go to

infinity to see this note that the infimum of D(πP(β)) over distributionsπ with mean λ is β minus λ+ λ log(λβ) Since our sequence has bounded cost

we must have α(n)k β

(n)k rarr αkβk even if αk = 0

Thussum

kisinK

βkαk = limn

sum

k

β(n)k α

(n)k = lim

nβ(n) = β

and π satisfies the constraints (αωβ) By joint lower semi-continuity ofthe relative entropy in both arguments ([11] Lemma 143(b)) we have

lim infn

Ksum

k=0

α(n)k D(π

(n)(k)P(β(n)))ge lim inf

n

sum

k αkgt0

α(n)k D(π

(n)(k)P(β(n)))

gesum

k αkgt0

αkD(π(k)P(β))

geJ (αωβ)

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 44: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

44 P DUPUIS C NUZMAN AND P WHITING

as desired

We now turn to the existence of the Lagrange multipliers correspondingto the minimization problem (A20) For a given vector ω it will be helpfulto define the set of integers I(ω) where i isin I(ω) if 0le ile I and ωi gt 0 orif i gt I and ωI+ gt 0 This is the set of terminal urn occupancy levels whichthe constraints do not force to be empty As we will show for irreducibleconstraints the optimal πkj with k + j isin I(ω) are always strictly posi-tive This result does not hold directly for reducible constraints but as wediscussed earlier any problem with reducible constraints can be replaced bya finite number of subproblems with irreducible constraints

Lemma A8 Suppose that (αωβ) are irreducible feasible constraints

and let πlowast isin S|K|infin be the minimizer of (A20) subject to constraints (A21) and (A22)

Then πlowastkj gt 0 for all k+ j isin I(ω) and πlowastkj = 0 if k+ j isin I(ω)

Proof For k+ j isin I(ω) the constraints force the πkj to be zero for allfeasible points of the problem and hence for the minimizer in particularThe main point is to show that it is feasible for any other element to bepositive and the result will then follow from the infinite derivative of theobjective function near the boundary

Let π isin S|K|infin be any feasible solution meeting the given constraints and

suppose that πmn = 0 for some particular (mn) with m+ n isin I(ω) Let

π be an arbitrary set of probability distributions in S|K|infin subject to the

restrictions that πmn gt 0 that πkj = 0 for all k+ j isin I(ω) and that π hasfinite support If we define

ωi=

isum

k=0

αkπkiminusk β=sum

kisinK

αk

infinsum

j=0

jπkj

then π is a feasible solution for the constraints (α ω β) Next for any η gt 0we may define

β= (β minus ηβ)(1minus η) ω

= (ω minus ηω)(1minus η)

By construction the constraints (α ω β) are feasible for sufficiently smallη In the exponential case this is immediately true because (αωβ) satisfiesall constraints in (23) and (24) with strict inequality In the polynomialcase the (αωβ) satisfies the Ith monotonicity constraint of (23) and theconservation constraint (24) with equality However it may easily be verifiedthat these two constraints also hold with equality for (α ω β) and hence

for (α ω β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 45: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 45

We may now let π be a feasible point for (α ω β) and finally formπ = ηπ+ (1minus η)π By construction π is a feasible solution correspondingto the original constraints (αωβ) and in addition πmn gt 0 As discussedin the proof of Lemma A6 we may take π to have finite support so that πalso has finite support

We will show that π is not a minimizer by proving that a sufficientlysmall perturbation toward π reduces the objective function Because theconstraints are linear the points πε = επ+(1minus ε)π are feasible solutions tothe original constraints for all 0 le ε le 1 Let f(ε) denote the value of theobjective function evaluated at πε The derivative of this function is

f prime(ε) =sum

kisinK

αk

infinsum

j=0

(

logπεkjPj(β)

+ 1

)

(πkj minus πkj)

=sum

kisinK

αk

infinsum

j=0

logπεkjPj(β)

(πkj minus πkj)

= αmπmn logπεmn

Pn(β)+sum

kisinK

αk

sum

(kj)6=(mn)

πkj logπεkjPj(β)

minussum

kisinK

αk

sum

j

πkj logπεkjPj(β)

In the limit as εrarr 0 the third term in the last display tends to

minussum

k

αkD(π(k)P(β)) le 0

The second term is bounded above by the expression minussum

k αksum

j πkj logPj(β)which is finite because π has finite support The first term in the displaytends to minusinfin which establishes that f(δ)lt f(0) for some sufficiently smallδ gt 0 We have shown that for any m+n isin I(ω) π cannot minimize (A20)if πmn = 0 As πmn is arbitrary withinm+n isin I(ω) it follows that πlowastmn gt 0whenever m+ n isin I(ω)

We now establish the existence of Lagrange multipliers and the form ofthe optimal solution for the polynomial case

Theorem A9 Suppose that (αωβ) are irreducible feasible constraintsyielding equality in (24) and let πlowast be the corresponding unique minimizerfrom Lemma A6 Then there exist positive constants DkkisinK and WiiisinIsuch that the minimizer takes the form

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 46: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

46 P DUPUIS C NUZMAN AND P WHITING

πlowastkj =

DkPj(β)Wk+j k isinK j + k isin I(ω)0 otherwise

Proof The strict equality in the constraints requires every feasiblepoint to be supported on the finite set satisfying k isin K and k + j isin Iand hence the minimization problem (A20) may be considered to be afinite-dimensional problem over this set By Lemma A8 the minimizer πlowast isstrictly positive Since the objective function is continuously differentiablein a neighborhood of πlowast and the constraints are linear Lagrange multipli-ers are guaranteed to exist for this problem (see eg [4] Proposition 133)Specifically there are constants zk and wi such that the Lagrangian

L(π) =sum

kisinK

αk

sum

k+jisinI

πkj logπkjPj(β)

+sum

kisinK

zkαk

(

1minussum

k+lisinI

πkl

)

+sum

iisinI

wi

(

ωi minusisum

k=0

αkπkiminusk

)

is also minimized at πlowast with all partial derivatives of L being zero at theoptimal point Taking partial derivatives and rearranging the optimalitycondition yields

πlowastkj =Pj(β)ezkminus1+wk+j

The result follows on defining Dk = ezkminus1 and Wi = ewi

We now give the corresponding theorem for the exponential case

Theorem A10 Suppose that (αωβ) are irreducible feasible constraintsyielding strict inequality in (24) and let πlowast be the corresponding unique min-imizer from Lemma A6 Then there exist positive constants ρ CkkisinK andWiiisinIileI such that the minimizer takes the form

πlowastkj =

CkPj(ρβ)Wk+j k isinK k+ j le I k+ j isin ICkPj(ρβ) k isinK k+ j gt I0 otherwise

Proof To avoid difficulties with infinite-dimensional Lagrangians con-sider a sequence of truncated problems indexed by a sequence of integersM gt I For each such M define

β(M) =sum

kisinK

αk

sum

lleM

lπlowastkl η(M)k

=sum

lleM

πlowastkl

and consider the problem of minimizingsum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 47: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 47

subject to the constraintssum

kisinK

αk

sum

lleMk+lisinI

lπkl = β(M)

sum

lleMk+lisinI

πkl = η(M)k k isinK

sum

kleikisinK

αkπkiminusk = ωi i isin I

By construction the minimizer of this problem is obtained simply by trun-cating πlowast As in the previous lemma the strict positivity of the minimizertogether with the linearity of the constraints guarantees the existence ofLagrange multipliers via [4] Proposition 133 The Lagrangian for the M thproblem is

L(M) =sum

kisinK

αk

sum

jleMk+jisinI

πkj logπkjPj(β)

+ y(M)

(

β(M) minussum

kisinK

αk

sum

lleMk+lisinI

lπkl

)

+Ksum

kisinK

z(M)k αk

(

η(M)k minus

sum

lleMk+lisinI

πkl

)

+sum

iisinI

w(M)i

(

ωi minussum

kleikisinK

αkπkiminusk

)

Since the derivative of L with respect to πkj must be zero at the minimizerit follows that

πlowastkj =Pj(β)ejy(M)minus1+z

(M)k

+w(M)k+j

for all j ltM and k + j isin I where for convenience we have defined wi = 0for i gt I For k+ j gt I note that

πlowastkj+1

πlowastkj=

Pj+1(β)

Pj(β)ey

(M)

so that y(M) is independent of M Since w(M)k+j = 0 if k+ j gt I it follows that

z(M)k and w

(M)i are also independent of M Then the above expression for

πlowastkj holds for all k+ j isin I with fixed Lagrange multipliers y zk and wiThe form expressed in the theorem is based on the substitutions ρ = ey Ck = e(ρminus1)βminus1+zk and Wi = ewi

Now that we have the general form of the minimizing endpoint values πlowastkjwe are ready to characterize the extremal curves We denote the mean ofthe kth distribution πlowast(k) by βk

=sum

j jπlowastkj In the exponential case we define

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 48: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

48 P DUPUIS C NUZMAN AND P WHITING

ρk = ρββk so that the distribution π(k) meets I minus k terminal constraintshas mean βk and a Poisson ρkβk tail Denoting the terminal constraints byωkj

= πlowastkj and ω(k) = (ωk0 ωkIminusk) we see that πlowast(k) is the minimizing

distribution arising in Theorem A1 for the terminal constraints (1 ω(k) βk)with associated twist parameters ρk and Ck In the polynomial case theterminal constraints (1 ω(k) βk) mean that each subproblem is polynomialwith Ck = 0 In either case Theorem A1 gives the form of the extremaloccupancy curves γkj(x) for each of these k subproblems We now showthat the subproblem curves sum to form the general extremals

Theorem A11 Suppose that πlowastkj are the minimizing distributionsfrom Lemma A6 for feasible constraints (αωβ) Denote the means ofthe minimizing arguments by βk =

sum

j jπlowastkj and let γkj(x) be the extremal

curves from Theorem A1 for the subproblems (1 ω(k) βk) Then the curves

γi(x) =isum

k=0

αkγkiminusk(xβkβ) i= 0 I

γI+(x) = 1minusIsum

i=0

γi(x)

are extremals which satisfy the constraints (αωβ)

Proof We assume without loss of generality that the constraints areirreducible Otherwise each irreducible subproblem may be treated sepa-rately

We extend the definition of γi(x) for i gt I by using the extended defi-nition of γkj(x) used in the proof of Theorem A1 [see (A13)] Also forconvenience define γkj(x)

= γkj(xβkβ) and likewise define ψkj The ψi

inherit from the γi the relation

ψi(x) =isum

k=0

αkψkiminusk(x)

To see that the γi are valid occupancy curves note that

infinsum

i=0

γi(x) =infinsum

i=0

isum

k=0

αkγkiminusk(x) =Ksum

k=0

αk

infinsum

j=0

γkj(x) = 1

Also the minusψi(x) are all nonnegative on [0 β] and

infinsum

i=0

minusψi(x) =Ksum

k=0

αk

infinsum

j=0

minusψkj(x) =Ksum

k=0

αk(βkβ) = 1

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 49: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 49

where the last equality follows from the fact that the πlowastkj satisfy the conser-vation constraint (A21) Hence curves γi(x) i le I and γI+(x) satisfy theconditions of Lemma 21

It is clear that the γi satisfy the desired initial conditions since γk0(0) = 1and γkj(0) = 0 for j gt 0 The terminal conditions are guaranteed by the factthat the values γkj(βk) = πlowastkj satisfy the constraints (A22)

In order to establish that the curves satisfy the EulerndashLagrange equationswe first show that the rescaled zero-occupancy curve ψk0 for the kth sub-problem is proportional to the kth derivative of the overall zero-occupancycurve ψ0

First take the exponential case and recall that ρkβk = ρβ The kth zero-occupancy curve after rescaling is

ψk0(x) =Ckeminusρk(xβkβ) +

Iminusksum

i=0

(ωki minusCkPi(ρkβk))

(

1minusxβkβ

βk

)i

=Ckeminusρx +

Iminusksum

i=0

(ωki minusCkPi(ρβ))

(

1minusx

β

)i

=Ck

[

eminusρx +Iminusksum

i=0

(Wk+i minus 1)Pi(ρβ)

(

1minusx

β

)i]

Here we have used that ωki = πlowastki =CkPi(β)Wk+i when ile I minus k The kth

derivative of ψ0 = α0ψ00 is

(minus1)kψ(k)0 (x)

= α0C0

[

ρkeminusρx +Isum

i=k

(Wi minus 1)Pi(ρβ)i

(iminus k)βk

(

1minusx

β

)iminusk]

(A25)

= α0C0ρk

[

eminusρx +Iminusksum

j=0

(Wk+j minus 1)Pj(ρβ)

(

1minusx

β

)j]

=α0C0ρ

k

Ckψk0(x)

Using Theorem A1 to express γkj in terms of ψk0 we have

γi(x) =isum

k=0

αk(xβkβ)

iminusk

(iminus k)(minus1)iminuskψ

(iminusk)k0

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk

dximinuskψk0(x)

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 50: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

50 P DUPUIS C NUZMAN AND P WHITING

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)iψ

(i)0 (x)

Similarly we obtain

minusψi(x) =minusisum

k=0

αkβkβψkiminusk

(

xβkβ

)

=isum

k=0

αkximinusk

(iminus k)(minus1)iminusk d

iminusk+1

dximinusk+1ψk0(x)

=isum

k=0

αkCk

α0C0ρkximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x)

so that the extremals satisfy the simple relation

θi(x)

γi(x)=minus

ψ(i+1)0 (x)

ψ(i)0 (x)

(A26)

which also arose in the case of empty initial conditions Because the polyno-mial portion of ψ0 has degree I the ratio θi(x)γi(x) is given by the constantρ for i gt I which establishes (as in the proof of Lemma A3) that

θI+(x)

1minusψI(x)= ρ(A27)

As shown in the proof of Theorem A1 the relations (A26) and (A27) aresufficient to show that the γi satisfy the EulerndashLagrange equations

In the polynomial case similar computations show that

(minus1)kψ(k)0 (x) =

α0D0

Dkψk0(x)(A28)

and that

θi(x) =isum

k=0

αkDk

α0D0

ximinusk

(iminus k)(minus1)i+1ψ

(i+1)0 (x) = γi(x)

minusψ(i+1)0 (x)

ψ(i)0 (x)

(A29)

for i= 0 I establishing the restricted set of equations (A10)

In order to demonstrate a strong minimum we will need the following

Corollary A12 For irreducible constraints the occupancy functionsdefined in Theorem A11 satisfy the integrated version of the EulerndashLagrangeequations given x1 isin [0 β) there are constants Ci i= 0 I minus 1 (and alsoi= I in the exponential case) depending on x1 only such that

int xprime

x1

partL

partψi(ψ(x) θ(x))dx+

partL

partθi(ψ(xprime) θ(xprime)) =Ci

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 51: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 51

for all xprime isin (x1 β)

Proof We may take α0 gt 0 without loss of generality By Theorem A11the extremals satisfy the usual form of the EulerndashLagrange equations The

above indefinite integral is log(minusψ(i+1)0 (x)ψ

(i)0 (x)) which is finite at both

endpoints because (minus1)iψ(i)0 (x) gt 0 for x isin [0 β) i le I (and i = I + 1 in

the exponential case) as shown in Lemma A2 The partial derivative withrespect to θi also exists for the same reason

Theorem A13 Suppose that γ is the extremal defined by Theorem A11for the feasible constraints (αωβ) Then the cost J(γ) is the solution tothe minimization problem (A20) subject to constraints (A21) and (A22)

Proof Consider first the exponential case The infinite sequence offunctions γi defined in the proof of Theorem A11 are shown in the proof tosatisfy the conditions of Lemma A3 Hence the cost is

J(γ) = β +infinsum

i=0

[

isum

k=0

αkγkiminusk(β) log|ψ(i)0 (β)|

]

minusKsum

k=0

αk log|ψ(k)0 (0)|

=Ksum

k=0

αk

infinsum

j=0

γkj(β) log

ψ(k+j)0 (β)

ψ(k)0 (0)eminusβ

The fact that ψk0(0) = 1 together with (A25) implies that ψ(k)0 (x) = ψ

(k)0 (0)times

ψk0(x) so that

ψ(k+j)0 (x)

ψ(k)0 (0)eminusβ

=

(

βkβ

)j

ψ(j)k0

(

xβkβ

)

eβ =γkj(x)

(minusx)jeminusβj

Then

J(γ) =Ksum

k=0

αkD(γkmiddot(β)P(β))

By construction the endpoints γkj(β) coincide with the optimal argu-ments πlowastkj of the minimization problem

The proof of the polynomial case is almost identical except that we useLemma A4 and (A28)

A8 Extremal curves have globally minimal cost In this section weprove the following theorem

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 52: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

52 P DUPUIS C NUZMAN AND P WHITING

Theorem A14 (Strong minimum) Given feasible constraints (αωβ)let γ be the corresponding extremal occupancy path defined in Theorem A11and let γ be any other occupancy path satisfying the same constraints ThenJ(γ)le J(γ)

We first introduce some notation Let O denote the set of valid occupancyfunctions that is vector functions γ such that the cumulative occupancyfunctions ψ satisfy the conditions of Lemma 21 and let O(αωβ)

= γ isin

O γ(0) = αγ(β) = ω be the subset of valid occupancy functions satisfyingfeasible constraints (αωβ) The proof of the following lemma is a straight-forward consequence of the convexity of the map (θ γ)rarrD(θγ) and henceomitted

Lemma A15

(a) O(αωβ) is a convex set(b) J(γ) restricted to O(αωβ) is a convex function

For a given pair of occupancy functions γ γ isinO(αωβ) we denote γε= (1minus ε)γ + εγ

and define the function G [01]rarrR+ cup infin by

G[ε]= J(γε) =

int β

0D(θε(x)γε(x))dx

To simplify the notation we do not explicitly indicate the dependence of Gon γ and γ Lemma A15 ensures that G is a well-defined convex functionfor any γ γ isinO(αωβ)

For the remainder of this section we once again restrict attention to irre-ducible constraints without loss of generality The following lemma estab-lishes the minimality of the extremals in an important special case The proofuses the fact that the extremal curve and its derivative can be bounded awayfrom zero

Lemma A16 Suppose that (αωβ) are strictly positive irreduciblefeasible constraints where the upper indices K and I of α and ω satisfyK = I+1 in the exponential case or K = I in the polynomial case Supposethat γ is the extremal curve for these constraints defined in Theorem A11and that γ is any competing occupancy function satisfying the same con-straints Then

J(γ)le J(γ)

Proof We construct the family of paths γε = (1minusε)γ+εγ with G[ε] = J(γε)It follows from convexity that G is left and right differentiable wherever it

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 53: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 53

is finite We will show that Gprime+[0] = 0 where Gprime

+[ε] denotes the right deriva-tive of G The convexity of G then implies the desired result

It will be convenient to work with the cumulative occupancy functions ψεand as in Section A3 we will mix notation by writing for example J(γ) =int β0 L(ψ θ)dxAfter defining

η(x)= ψ(x)minusψ(x)

we may write

G[ε] =

int β

0L(ψ+ εη θ minus εη)dx

=

int β

0g(x ε)dx(A30)

We can assume without loss that G(1) = J(γ)ltinfin since there is nothingto prove otherwise We wish to show that differentiation under the integralsign with respect to ε is valid in a neighborhood of 0 The validity of thisoperation will follow from [23] Corollary 392 if we can provide a constantbound on the partial derivative of g with respect to ε for almost everyx isin [0 β]

To construct this bound we will first establish that the components γi(x)and derivatives θi(x) =minusψi(x) are uniformly bounded away from zero Forspecificity we first assume that (αωβ) are exponential constraints Notethat in the empty case studied in Section A4 γ0(x) decreases monoton-ically to γ0(β) = ω0 and θ0(x) decreases monotonically to θ0(β) Inspec-tion of the formula for ψ0 in Theorem A1 reveals that θ0(β) = ω1β ifI gt 0 and θ0(β) =CP1(ρβ)β otherwise Hence for any exponential prob-lem with empty initial conditions and positive terminal conditions ωi γ0and θ0 are uniformly bounded away from zero For the constraints (αωβ)under consideration each of the associated subproblems is an exponentialempty problem with positive terminal conditions and so γk0(x) and θk0(x)are uniformly bounded from zero for each subproblem These in turn lowerbound the overall γi and θi since for example

γi(x) =isum

k=0

αkγkiminusk(x)ge αiγi0(x)

The catch-all term γI+(x) is monotonically increasing and hence satisfiesγI+(x)ge αI+1 gt 0 The identity θI+(x) = ργI+(x) established in the proofof Theorem A11 ensures a similar bound on θI+(x)

Note that γε ge (1minus ε)γ and θε ge (1minus ε)θ so that for each 0le ε lt 1 thesefunctions are also uniformly bounded from zero on [0 β] Indeed given anarbitrary ε0 lt 1 there are positive lower bounds on γε and θε which holduniformly for x isin [0 β] and ε isin [0 ε0] To be precise these inequalities and

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 54: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

54 P DUPUIS C NUZMAN AND P WHITING

bounds hold everywhere except possibly on a set of measure zero whereθ may not exist

The partial derivative of the integrand of (A30) is

part

partεg(x ε) =

Isum

i=0

partL

partψi

γε(x)ηi(x)minus

Isum

i=0

partL

partθi

γε(x)ηi(x)

The partial derivatives of L are (in mixed notation)

partL

partψi(x) =minus

θi(x)

γi(x)+θi+1(x)

γi+1(x)

partL

partθi(x) = log

θi(x)

γi(x)minus log

θI+(x)

1minus ψI(x)

The uniform lower bounds on γε and θε together with upper bounds γε le 1θε le 1 and the boundedness of η and η combine to establish that there is afinite B such that

part

partεg(x ε)

ltB for ε isin [0 ε0] ae x isin [0 β]

Similar arguments may be used to establish this bound in the polynomialcase In that case the terms ψI = 1 and θI = 0 do not play a role and theexpression for the partial derivative of L with respect to θi simplifies

We are now free to differentiate under the integral sign so that

Gprime[ε] =

int β

0

part

partεg(x ε)dx

for all ε isin [0 ε0) We note thatint x

0

partL

partψi

γdxprime =

int x

0

minusθiγi

+θi+1

γi+1

γdxprime

is absolutely continuous as a function of x since it is the difference of twocontinuous monotone functions with bounded derivatives Applying integra-tion by parts for absolutely continuous functions ([23] 361 page 209) tothe first term we obtain

Gprime+[0] =

int β

0

partL

partψ(x) middot η(x)minus

partL

partθ(x) middot η(x)

dx

=

int β

0η(x) middot

(

minus

int x

0

partL

partψ(xprime)dxprime minus

partL

partθ(x)

)

dx

=minus

int β

0

Isum

i=0

Ciηi(x)dx

=Isum

i=0

Ci(ηi(0)minus ηi(β)) = 0

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 55: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 55

The constants Ci appearing in the third equality come from the integratedversion of the EulerndashLagrange equations (Corollary A12) while the lastequality is due to the fact that γ and γ have the same beginning and endpoints

We now extend the lemma to show that the extremal is a global minimizereven when some of the initial and terminal points are zero Without loss ofgenerality we suppose that α0 gt 0 and note that the extremals defined inTheorem A1 and in Theorem A11 are strictly positive except possibly atthe initial and final times t= 0 and t= β

Let γ be this extremal and suppose that γ is an alternate occupancyfunction It may be supposed that γ also lies on the boundary only at t= 0or t= β since if there is a γ with lower cost than γ the convexity of J impliesthat there is another occupancy path of the form λγ + (1minus λ)γ which alsohas lower cost than γ and which avoids the boundary everywhere that γdoes

Given x(n)1 darr 0 and x

(n)2 uarr β let γ(n) be the extremal curve with initial

point γ(x(n)1 ) and terminal point γ(x

(n)2 ) By Lemma A16

J(γ(n))le

int x(n)2

x(n)1

D(θ(x)γ(x))dx

It then follows that

J(γ) = J (αωβ)

le lim infn

J (γ(x(n)1 ) γ(x

(n)2 ) x

(n)2 minus x

(n)1 )

= lim infn

J(γ(n))

le limn

int x(n)2

x(n)1

D(θ(x)γ(x))dx

= J(γ)

with the first two equalities following from Theorem A13 and the definitionof J (αωβ) the first inequality following from Corollary A7 and the lastequality from the monotone convergence theorem

Acknowledgments We would like to thank an associate editor and tworeferees for suggestions that significantly improved this paper

REFERENCES

[1] Avram F Dai J G and Hasenbein J J (2001) Explicit solutions for variationalproblems in the quadrant Queueing Systems 37 259ndash289 MR1833666

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 56: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

56 P DUPUIS C NUZMAN AND P WHITING

[2] Barton D E and David F N (1959) Contagious occupancy J Roy Stat SocSer B Stat Methodol 21 120ndash133

[3] Barton D E and David F N (1959) Sequential occupancy Biometrika 46 218ndash223 MR100899

[4] Bertsekas D P (1982) Constrained Optimization and Lagrange Multiplier Meth-ods Academic Press New York MR690767

[5] Billingsley P (1968) Convergence of Probability Measures Wiley New YorkMR233396

[6] Boucheron S Gamboa F and Leonard C (2002) Bins and balls Large devia-tions of the empirical occupancy process Ann Appl Probab 2 1ndash30 MR1910642

[7] Bucklew J (1990) Large Deviations Techniques in Decision Simulation and Es-timation Wiley New York MR1067716

[8] Cesari L (1983) Optimization Theory and Applications Springer New YorkMR688142

[9] Charalambides A (1997) A unified derivation of occupancy and sequential occu-pancy distributions In Advances in Combinatorial Methods and Applications toProbability and Statistics (N Balakrishnan ed) 259ndash273 Birkhauser BostonMR1456737

[10] Dupuis P Ellis R S and Weiss A (1991) Large deviations for Markov pro-cesses with discontinuous statistics I General upper bounds Ann Probab 191280ndash1297 MR1112416

[11] Dupuis P and Ellis R (1997) A Weak Convergence Approach to Large DeviationsWiley New York MR1431744

[12] Eramo V and Listanti M (2000) Packet loss in a bufferless optical WDMswitch employing shared tuneable wavelength converters Lightwave Technology18 1818ndash1833

[13] Eramo V Listanti M Nuzman C and Whiting P (2002) Optical switchdimensioning and the classical occupancy problem Int J Commun Syst 15127ndash141

[14] Feller W (1968) An Introduction to Probability Theory and Its Applications 1Wiley New York MR228020

[15] Feller W (1971) An Introduction to Probability Theory and Its Applications 2Wiley New York MR270403

[16] Harkness W L (1971) The classical occupancy problem revisited In RandomCounts in Physical Sciences (G P Patil ed) 107ndash126 Pennsylvania State UnivPress

[17] He F (2000) Estimating species abundance from occurrence American Naturalist156 453ndash459

[18] Holst L (1977) Some asymptotic results for occupancy problems Ann Probab 51028ndash1035 MR443027

[19] Holst L (1986) On the coupon collectors and other urn problems Internat StatistRev 54 15ndash27 MR959649

[20] Johnson N L and Kotz S (1977) Urn Models and their Applications WileyNew York MR488211

[21] Kamath A Motwani R Palem K and Spirakis P (1995) Tail bounds foroccupancy and the satisfiability threshold conjecture Random Structures Algo-rithm 7 59ndash80 MR1346284

[22] Mandjes M and Ridder A (2002) A large deviations analysis of the transient ofa queue with many Markov fluid inputs Approximations and fast simulationACM Transactions on Modeling and Computer Simulation 12 1ndash26

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References
Page 57: Large deviation asymptotics for occupancy problems · Introduction. Urn occupancy problems center on the distribution of ... large deviations, calculus of variations, sample paths,

LARGE DEVIATION FOR OCCUPANCY PROBLEMS 57

[23] McShane E J (1947) Integration Princeton Univ Press[24] Ramakrishna M V and Mukhopadhyay P (1988) Analysis of bounded disorder

file organisation In Proc of the 7th ACM Symposium on Principles of DataBaseSystems Austin Texas 117ndash125

[25] Sagan H (1969) Introduction to the Calculus of Variations Dover New YorkMR1210325

[26] Shwartz A and Weiss A (1995) Large Deviations for Performance AnalysisChapman and Hall New York MR1335456

[27] Shwartz A and Weiss A (2002) Large deviations with diminishing rates Un-published manuscript

[28] Weiss A (1994) Large deviations for the occupancy problem Private communica-tion

P Dupuis

Lefschetz Center for Dynamical Studies

Brown University

Providence Rhode Island 02912

USA

e-mail dupuisdambrowneduurl cmbell-labscomwhonuzman

C Nuzman

P Whiting

Bell Labs

600 Mountain Avenue

Murray Hill New Jersey 07974

USA

e-mail cnuzmanlucentcome-mail pawhitinglucentcomurl cmbell-labscomwhopawhitinge-mail

  • Introduction
  • Main results
    • Statement of the LDP
    • Characterization of the terminal rate function and minimizing paths
      • Empty initial conditions
      • General initial conditions
          • Proof of Theorem
          • Examples and extensions
            • The classical occupancy problem
            • The overflow problem
            • Partial coupon collection with initial conditions
            • Extensions
              • Appendix
                • Preliminaries
                • Proof of Lemma
                • Euler--Lagrange equations
                • Characterization of the extremals under empty initial conditions
                • Interpretation of the twist parameter
                • Characterization of the extremal cost
                • Characterization of the extremals under general initial conditions
                • Extremal curves have globally minimal cost
                  • References