tobias johnson and erol pekoz¨ arxiv:2108.02101v1 …

arX

iv:2

108.

0210

1v1

[m

ath.

PR]

4 A

ug 2

021

CONCENTRATION INEQUALITIES FROM MONOTONE COUPLINGS

FOR GRAPHS, WALKS, TREES AND BRANCHING PROCESSES

TOBIAS JOHNSON AND EROL PEKOZ

Abstract. Generalized gamma distributions arise as limits in many settings involvingrandom graphs, walks, trees, and branching processes. Pekoz, Rollin, and Ross (2016) ex-ploited characterizing distributional fixed point equations to obtain uniform error bounds

for generalized gamma approximations using Stein’s method. Here we show how mono-tone couplings arising with these fixed point equations can be used to obtain sharper tailbounds that, in many cases, outperform competing moment-based bounds and the uni-form bounds obtainable with Stein’s method. Applications are given to concentrationinequalities for preferential attachment random graphs, branching processes, randomwalk local time statistics and the size of random subtrees of uniformly random binaryrooted plane trees.

1. Introduction

Stein’s Method (see [Ros11] and the references therein) is used to obtain distributionalapproximation error bounds in a wide variety of settings in applied probability where normal,Poisson, gamma and other limits arise. One variant of the approach (see [GR97], [PRR16]and the references therein) starts with a distributional fixed point equation that the limitdistribution satisfies and obtains error bounds in terms of how closely both sides of the fixedpoint equation can be coupled together; the fixed point equation often has a probabilisticinterpretation that can be leveraged to achieve a close coupling. To illustrate, we define ageneralization of the size-bias transform for a random variable.

Definition 1. If X is a non-negative random variable with E[Xβ] < ∞, we say that X(β)

is the β−power bias transform of X if

E[Xβ]E[f(X(β))] = E[Xβf(X)]

holds for all for bounded f .

If X∗ d= X(1), the familiar size-bias transform of X , it can be shown that X satisfies the

distributional fixed point equationXd= X∗−1 if and only if X follows a Poisson distribution;

this is the foundation for the Stein-Chen method for obtaining Poisson approximation errorbounds (see [Ros11]). The exponential distribution is similarly characterized by the fixed-

point equation Xd= UX∗ for an independent Uniform(0, 1) variable U ; see [PR11b] for its

application to obtaining error bounds using Stein’s method.[GG11b], [GG11a] and the references therein give concentration results showing how

upper and lower tail probability bounds can be derived in settings when there is monotonicityin couplings. The for example when there is a coupling of (X,X∗) so that X∗−X is almost

2020 Mathematics Subject Classification. 60E15; 60J80, 05C80.Key words and phrases. Concentration inequality, tail bound, Stein’s method, preferential attachment

graph, Galton–Watson process.

1

http://arxiv.org/abs/2108.02101v1

2 TOBIAS JOHNSON AND EROL PEKOZ

surely nonnegative and bounded, tail bounds for X can be obtained. [CGJ18] loosened thismonotonicity condition to P[X∗ −X ≤ c | X∗ ≥ x] ≥ p and applied these to bounding tailprobabilities of the second largest eigenvalue of the adjacency matrix of a random regulargraph.

The monotonicity condition we study in the current article involves the following defini-tion from [PRR16].

Definition 2. The variable X∗ is the (α, β)-generalized equilibrium distribution of X if

X∗ d= VαX

(β),

where Vα has density αxα−1 dx on [0, 1] and is independent of X(β).

The familiar equilibrium distribution Xe of X from renewal theory is the case whereα = β = 1 above. In what follows we let X∗ have the (α, β)-generalized equilibrium

distribution of X . In [PRR16, Theorem 5.8] it was shown that X∗ d= X if and only if

X has a generalized gamma distribution, which has density function βxα−1e−xβ

/Γ(α/β)dxfor x > 0. The main result of [PRR16] is a bound for approximating P[X ≥ t] using theappropriate generalized gamma distribution when X and X∗ are not identical but are closein the Levy–Prokohorov metric. Since this bound is uniform in t, it is too large to be usefulfor small tail probabilities in applications. This is the launching point for the current article.

Let X Y denote that X is stochastically dominated by Y in the usual stochastic order.We introduce two stochastic orders as follows. For a constant p ∈ (0, 1], we write X p Yto denote that P[X > t] ≤ P[Y > t]/p for all t ∈ R. Similarly, X p Y denotes that

P[X ≤ t] ≥ pP[Y ≤ t] for all t ∈ R. In the p = 1 case, both orders are the usual one.For p < 1, they represent two different relaxations of the usual order and have alternatecharacterizations given in Lemma 8.

Our two main result are that stochastic domination of X∗ by X in these orders lead toupper and lower tail probability bounds:

Theorem 3. Suppose that X∗ p X for some p ∈ (0, 1]. Let µ =(

βαEXβ

)1/β. If α 6= β,

then for any t ≥ 1,

P[X ≥ µt] ≤ tαe2−ptβ .

If α = β, then for all t > 0,

P[X ≥ µt] ≤ e1−ptβ .

Theorem 4. Suppose X∗ p X for some p ∈ (0, 1]. Let µ =(

βαEXβ

)1/β. Then

P[X ≤ µt] ≤(

α

β

)1−α/βtα

αβ p− tβ

,(1)

so long as tβ < pα/β. If α = β, an improved bound holds for all t ≥ 0:

P[X ≤ µt] ≤ tα

p.(2)

We mention that the case α = β = p = 1 case of Theorem 3 was already proven by MarkBrown [Bro06, Theorem 3.2]; see Section 2.2.

As an application of these concentration theorems, we prove several results for graphs,walks, trees and branching processes. We next define several quantities from such modelsand show that the above inequalities apply.

CONCENTRATION INEQUALITIES FROM MONOTONE COUPLINGS 3

Preferential attachment random graphs. Consider a preferential attachment ran-dom graph model (see [BA99] and [PRR16]) that starts with an initial “seed graph” consist-ing of one node, or a collection of nodes grouped together, having total weight w. Additionalnodes are added sequentially and when a node is added it attaches l edges, one at a time,directed from it to either itself or to nodes in the existing graph according to the followingrule: each edge attaches to a potential node with chance proportional to that node’s weightright before the moment of attachment, where incoming edges contribute weight one to anode and each node other than the initial node when added has initial weight one. The casewhere l = 1 is the usual Barabasi-Albert tree with loops but started from a node with initialweight w. Let W be the total weight of the initial “seed graph” after an additional n edgeshave been added to the graph.

Random binary rooted plane trees. Let V be the number of vertices in the minimalspanning tree spanned by the root and k randomly chosen distinct leaves of a uniformlychosen binary, rooted plane tree with 2n− 1 nodes, that is, with n leaves and n− 1 internalnodes.

Random walk local times. Consider the one-dimensional simple symmetric randomwalk Sn = (Sn(0), . . . , Sn(n)) of length n starting at the origin. Define

Ln =n∑

i=0

1Sn(i) = 0

to be the number of times the random walk visits the origin by time n. Let

Lb2n ∼ [L2n | S2n(0) = S2n(2n) = 0]

be the local time of a random walk bridge. Here we use the notation [X | E] to denote thedistribution of a random variable X conditional on an event E with nonzero probability.

The next result shows that for the above concentration inequalities hold for these quan-tities.

Theorem 5. With the above definitions,

(a) W ∗ W with α = w, β = l + 1(b) V ∗ V with α = 2k, β = 2(c) (Ln)

∗ Ln with α = 1, β = 2(d) (Lb

2n)∗ Lb

2n with α = 2, β = 2

and thus the conditions and inequalities of Theorem 3 and Theorem 4 hold for W,V, Ln, andLb2n with p = 1 and the corresponding values in (a)-(d) for α, β.

We also give two tail bounds for Galton–Watson branching processes. See Section 4.4 fora review of previous bounds.

Theorem 6. Let Zn be the size of the nth generation of a Galton–Watson process, and letµ = EZ1, the mean of its child distribution. If Z∗

1 Z1 with α = β = 1, then Z∗n Zn for

all n, and hence

P[Zn ≥ tµn] ≤ e1−t,(3)

and

P[Zn ≤ tµn] ≤ t(4)

for all t > 0.


When the child distribution of the process is the geometric distribution with successprobability p on 1, 2, . . ., the size of the nth generation is geometric with success probabilitypn. Then Zn/µ

n → Exp(1) in law as n → ∞, and both upper and lower bounds are sharp(other than an extra factor of e in the upper bound).

Theorem 6 requires the child distribution to be supported on 1, 2, . . ., since this isnecessary for Z∗

1 Z1. The next result relaxes this requirement and allows us to considerGalton–Watson trees with a nonzero extinction probability.

For a random variable X taking values in the positive integers, we say that X is IFR(which stands for increasing failure rate) if P[X = k | X ≥ k] is increasing in k for k ≥ 1.As we will discuss in Section 4.1, this condition implies that X∗ X with α = β = 1. Let

X> denote a random variable with distribution [X | X > 0], and note that (X>)∗d= X∗.

Theorem 7. Let Zn be the size of the nth generation of a Galton–Watson tree, and supposethat Z>

1 is IFR. Then Z∗n Z>

n with α = β = 1, and with m(n) = E[Zn | Zn > 0],

P[Zn ≥ tm(n) | Zn > 0] ≤ e1−t,(5)

and

P[Zn ≤ tm(n) | Zn > 0] ≤ t(6)

for all t > 0.

This theorem holds in subcritical, critical, and supercritical cases alike. The differencecomes only in the mean m(n), which grows exponentially for a supercritical tree, growslinearly for a critical tree, and remains bounded for a subcritical tree.

2. Proof of concentration theorems

Recall that X p Y and X p Y denote that P[X > t] ≤ P[Y > t]/p and P[X ≤ t] ≥pP[Y ≤ t], respectively, for all t ∈ R. We start by characterizing these orders in terms ofcouplings, along the same lines as the standard fact that X Y if and only if there exists acoupling of X and Y so that X ≤ Y a.s. We have not seen these stochastic orders definedbefore, but they are used in form (ii) in [CGJ18] and [DJ18].

Lemma 8. The following statements are equivalent:

(i) there exists a coupling (X,Y ) for which P[X ≤ Y | X ] ≥ p a.s.;(ii) there exists a coupling (X,Y ) for which P[X ≤ Y | X ≥ t] ≥ p for all t ∈ R where

the conditional probability is defined;(iii) X p Y ;

as are the following statements:

(i) there exists a coupling (X,Y ) for which P[X ≤ Y | Y ] ≥ p a.s.;(ii) there exists a coupling (X,Y ) for which P[X ≤ Y | Y ≤ t] ≥ p for all t ∈ R where

the conditional probability is defined;(iii) X p Y .

Proof. It is clear that (i) implies (ii) in both sets of statements. To go from (ii) to (iii) inthe first set of statements, observe that for any s ∈ R,

P[Y ≥ s] ≥ P[X ≤ Y and X ≥ s] ≥ pP[X ≥ s],

applying (ii) in the second inequality. Now let s approach t downward to prove (iii). Asimilar argument proves that (ii) implies (iii) for the second set of statements.


Now we show that (iii) =⇒ (i) for the first set of statements. Let B ∼ Bernoulli(p) beindependent of X , and define a random variable X ′ taking values in [−∞,∞] by

X ′ =

X if B = 1,

−∞ if B = 0.

Then

P[X ′ > t] = pP[X > t] ≤ P[Y > t]

by(iii). Thus X ′ is stochastically dominated by Y in the standard sense, and therefore thereexists a coupling of X ′ and Y such that X ′ ≤ Y a.s. (The standard fact that P[U > t] ≤P [V > t] for all t is equivalent to the existence of a coupling for which U ≤ V a.s. holds whenU and V take values in [−∞,∞], and in fact under considerably more general conditions[Str65, Theorem 11].) Under this coupling, given X , it holds with probability at least p thatX = X ′, in which case X ≤ Y . Thus X p Y implies (i).

To show that (iii) =⇒ (i) for the second set of statements, we use the same idea butdefine Y ′ to be equal to Y with probability p and equal to ∞ with probability 1− p. ThenX Y ′, yielding a coupling of X and Y ′ such that X ≤ Y ′ a.s., and under this couplinggiven Y we have Y = Y ′ ≥ X with probability at least p.

Before we get started with our concentration estimates, we make an observation thatallows us to rescale α and β.

Lemma 9. Let α, β, and γ be positive real numbers. If X∗ has the (α, β)-generalized

equilibrium distribution of X, then (X∗)γ has the(

αγ ,

βγ

)

-generalized equilibrium distribution

of Xγ.

Proof. For a > 0, let Va denote a random variable with density axa−1 dx, and recall that

X∗ d= VαX

(β). Now observe that V γα

d= Vα/γ and (X(β))γ

d= (Xγ)(β/γ).

2.1. Upper tail bounds for α 6= β. We start with a technical lemma.

Lemma 10. Suppose that X∗ p X and EXβ = α/β. Let G(t) = P[X > t]. For all t > 0,∫ 1

0

uα−β−1G(t/u) du ≤ G(t)

pβtβ.(7)

Proof. Since X∗ p X ,

G(t) ≥ pP[VαX(β) > t] = p

∫ 1

0

αuα−1P[X(β) > t/u] du.

By definition of the β-power bias,

p

∫ 1

0

αuα−1P[X(β) > t/u] du = pβ

∫ 1

0

uα−1E[

Xβ1X>t/u]

du

≥ pβtβ∫ 1

0

uα−β−1P[X > t/u] du

= pβtβ∫ 1

0

uα−β−1G(t/u) du.

Next, we apply this lemma to deduce bounds on E[X | X > t] in the β −α = 1 case andon E[X−1 | X > t] in the β−α = −1 case. To give a sense of the purpose of this lemma, themean residual life of a random variable X is the function mX(t) = E[X − t | X > t], with


m(t) defined as 0 if P[X > t] = 0. Bounds on the mean residual life can then be translatedinto tail bounds. Lemma 10 can be used to bound the mean residual life of Xβ−α whenβ > α, of −Xβ−α when β < α, or of logX when β = α. We ignore β = α here because adifferent approach gives better results in that case (see Section 2.2). And since we will lateruse Lemma 9 to rescale α and β, we only need to consider α− β = ±1.

Lemma 11. Suppose that X∗ p X and EXβ = α/β. For all t ≥ 0 such that P[X > t] > 0,

E[

X − t∣

∣ X > t]

≤ 1

pβtαif β − α = 1,(8)

and

E[

X−1 − t−1∣

∣ X > t]

≥ − 1

pβtαif β − α = −1.(9)

Proof. Let G(t) = P[X > t]. When β − α = 1, we apply Lemma 10 and then switch theorder of integration to obtain

G(t)

pβtβ≥∫ 1

0

u−2G(t/u) du = E

∫ 1

0

u−21X>t/u du

= E

[

1X>t

∫ 1

t/X

u−2 du

]

= E[

1X>t(

X/t− 1)

]

= G(t)(

t−1E[

X∣

∣ X > t]

− 1)

.

Canceling the G(t) factors and rearranging terms gives (8). When β − α = −1, the sameapproach yields

G(t)

pβtβ≥∫ 1

0

G(t/u) du = E[

1X>t(

1− t/X)

]

= G(t)(

1− tE[

X−1∣

∣ X > t]

)

,

proving (9).

Proof of Theorem 3 for α < β. First, we prove the theorem under the assumption that µ =1 (i.e., EXβ = α/β) and that β − α = 1. Let t0 = (α/pβ)1/β , and let Z be the randomvariable supported on the interval (t0,∞) with

P[Z > t] =

(

pβe

α

)α/β

tαe−ptβ .(10)

We now compute the mean residual life function of Z. First, by Fubini’s theorem,

E[

(Z − t)1Z>t]

=

∫ ∞

t

P[Z > u] du =1

pβ

(

pβe

α

)α/β

e−ptβ .(11)

Hence, for t ≥ t0,

E[Z − t | Z > t] =1

pβtα.

By Lemma 11, we have E[X − t | X > t] ≤ E[Z − t | Z > t] for all t where P[X > t] > 0.By [SS07, Theorem 4.A.26], it holds for any increasing convex function ϕ(x) and Eϕ(X) ≤Eϕ(Z).

Now, we apply this statement with the right choice of ϕ(x) to obtain information aboutthe tail of X . Fix t and define ϕ(x) = max

(

0, (x − t + γ)/γ)

for γ to be chosen later


satisfying 0 < γ ≤ t]. Then ϕ is an increasing convex function, and hence Eϕ(X) ≤ Eϕ(Z).Observing that 1x≥t ≤ ϕ(x),

P[X ≥ t] ≤ Eϕ(X) ≤ Eϕ(Z)

= γ−1E[

Z − (t− γ))

1Z>t−γ]

=1

γpβ

(

pβe

α

)α/β

e−p(t−γ)β

by (11), assuming that t − γ ≥ t0. Since β > 1, the function xβ is convex and its graphlies above its tangent lines. Hence xβ ≥ tβ + (x − t)βtβ−1, and setting x = t − γ we have(t − γ)β ≥ tβ − γβtα. Using this inequality and then minimizing by setting γ = 1/pβtα

yields

P[X ≥ t] ≤ 1

γpβ

(

pβe

α

)α/β

e−p(tβ−γβtα)

= tα(

pβe

α

)α/β

e−ptβ+1,

so long as t− 1/pβtα ≥ t0. To make this bound simpler, first observe that −x log x ≤ 1− xfor all x > 0, since −x log x is concave and 1−x is its tangent line at x = 1. Exponentiatingboth sides of this inequality, setting x = α/βp, and raising both sides to the pth power gives

(

pβ

α

)α/β

≤ ep−α/β ,(12)

from which we obtain

P[X ≥ t] ≤ tαep−ptβ+1 ≤ tαe2−ptβ ,(13)

still assuming

t− 1/pβtα ≥ t0.(14)

Now, we show that either (14) is satisfied or the right-hand side of (13) exceeds 1. Suppose

tαe2−ptβ < 1. Since we are assuming t ≥ 1, we have tβ ≥ 2/p. Thus our goal is to showthat (14) holds when tβ ≥ 2/p. Since the left-hand side of (14) is increasing in t, it sufficesto prove that (14) holds when t = (2/p)1/β. To obtain this, start with the inequalitylog(1 − x/2) ≥ −(log 2)x for x ∈ (0, 1), which holds since log(1 − x/2) is concave andequals −(log 2)x at x = 0, 1. Setting x = 1/β for β > 1 and exponentiating, we obtain1− 1/2β ≥ 2−1/β . Rearranging terms, 21/β

(

1− 1/2β)

≥ 1. Hence,

21/β(

1− 1

2β

)

≥(

α

β

)1/β

,

since the right-hand side is smaller than 1. Multiplying both sides of this inequality byp−1/β and substituting t = (2/p)1/β ,

t− t

2β≥(

α

pβ

)1/β

.

And this is exactly (14), since t/2β = 1/pβtα for t = (2/p)1/β and α = β−1. This completesthe proof assuming EXβ = α/β and β − α = 1.

Now, suppose we are given X∗ p X without these extra assumptions. Let Y =(X/µ)β−α and let Y † = (X∗/µ)β−α, and observe that (X/µ)∗ is given by X∗/µ. ByLemma 9, the random variable Y † has the (α′, β′)-generalized equilibrium distribution of Y ,


where α′ = α/(β − α) and β′ = β/(β − α). Now β′ − α′ = 1 and EY β′

= E(X/µ)β = α/β.We now apply the special case of the theorem already proven to obtain

P[X ≥ µt] = P[Y ≥ tβ−α] ≤ t(β−α)α′

e2−pt(β−α)β′

= tαe2−ptβ

for any t ≥ 1.

The proof when α > β follows the same structure, differing only in some analytical details.

Proof of Theorem 3 for α > β. As in the α < β case, it suffices to prove the theorem under

the assumptions µ = 1 (or equivalently EXβ = α/β) and α − β = 1. Let t0 = (α/pβ)1β

and Z be the random variable supported on (t0,∞) defined by (10). The mean residual lifefunction of Z computed in (11) does not hold in the α− β = 1 case, since the computationof the integral in (11) relies on β − α = 1, but we can instead observe that

P[Z−1 ≤ t] =

(

pβe

α

)α/β

t−αe−pt−β

for 0 < t < t−10 and then compute

E[

(Z−1 − t)1Z−1<t]

= −∫ t

0

P[Z ≤ u] du = − 1

pβ

(

pβe

α

)α/β

e−pt−β

.

Hence,

E[

Z−1 − t | Z−1 ≤ t]

= − tα

pβ.

We then have

E[

X−1 − t∣

∣ X−1 ≤ t]

≥ E[

X−1 − t∣

∣ X−1 < t]

≥ E[

Z−1 − t | Z−1 ≤ t]

by Lemma 11. By [SS07, Theorem 4.A.27], it holds for all decreasing convex functions ϕ(x)that Eϕ

(

X−1)

≤ Eϕ(

Z−1)

. Taking ϕ(x) = max(

1 + (t − x)/γ, 0)

for γ > 0 to be chosenlater and observing that ϕ(x) ≥ 1x ≤ t, we obtain

P[

X−1 ≤ t]

≤ Eϕ(

X−1)

≤ Eϕ(

Z−1)

=1

γE[

(

γ + t− Z−1)

1

Z−1 ≤ γ + t

]

=1

pβγ

(

pβe

α

)α/β

e−p(γ+t)−β

,

assuming γ + t < t−10 . Applying the inequality (γ + t)−β ≥ t−β − βγt−β−1 which holds by

convexity of x−β and then optimizing by setting γ = tα/pβ,

P[

X−1 ≤ t]

≤ 1

pβγ

(

pβe

α

)α/β

e−p(t−β+βγt−α)

=

(

pβe

α

)α/β

t−αe1−pt−β

.

Thus, under the assumption

1/pβtα + t−1 < (pβ/α)1/β ,(15)


we obtain

P[X ≥ t] = P[

X−1 ≤ t−1]

≤(

pβe

α

)α/β

tαe1−ptβ ≤ tαep+1−ptβ ≤ tαe2−ptβ ,(16)

applying (12) in the last step.Finally, we show that either (15) holds or the right-hand side of (16) exceeds 1. Let

f(t) = 2 − ptβ + (β + 1) log t, so that the right-hand side of (16) is equal to ef(t). Letg(t) = 1/pβtβ+1+ t−1, the left-hand side of (15). Our goal is to show that for all t ≥ 1 thatif f(t) < 0, then g(t) < (pβ/(β + 1))1/β . It is easy to check that for t ≥ 1, the function f(t)increases and then decreases, with f(1) > 0. Since g(t) is decreasing, it suffices to find avalue s ≥ 1 so that f(s) ≥ 0 and g(s) ≤ (pβ/(β + 1))1/β . We claim that this holds for

s =

(

2(β + 1)

pβ

)1/β

.

To confirm this, we compute

f(s) = 2 +β + 1

β

(

log

(

2(β + 1)

pβ

)

− 2

)

≥ 2 +β + 1

β

(

log

(

2(β + 1)

β

)

− 2

)

.

A bit of calculus shows that the right-hand side of this inequality is minimized when (β +1)/β = e/2 and is equal to 2 − e/2 > 0 in this case. Finally, we must confirm that g(s) ≤(pβ/(β + 1))1/β . We compute

g(s) =

(

pβ

2(β + 1)

)1/β(2β + 3

2β + 2

)

.

Cancelling a factor of (β/(β + 1))1/β , we must show that

2−1/β

(

2β + 3

2β + 2

)

≤ 1.(17)

Using the inequality log(x+ 1)− log x ≤ 1/x,

log(2β + 3)− log(2β + 2) ≤ 1

2β + 2≤ 1

2β≤ log 2

β,

and exponentiating both sides of this inequality confirms (17).

2.2. Upper tail bounds for α = β. As we mentioned in the introduction, Theorem 3 inthe case α = β = p = 1 was first proven in [Bro06]. The general α = β case with p = 1 thenfollows by an application of Lemma 9, and it is not difficult to modify Brown’s argument toallow p < 1. To save the reader the effort of going back and forth between Brown’s paperand this one, and to highlight his elegant proof, we will present the full argument here.

Let Z(u) be a random variable with the distribution [X(β−α) | X(β−α) ≥ u]. If U ∼ µis a random variable, let Z(U) denote a random variable whose distribution is the mixturegoverned by µ, i.e.,

P[

Z(U) ∈ B]

=

∫

P[

X(β−α) ∈ B∣

∣ X(β−α) ≥ u]

µ(du).

Proposition 12. Let π be the law of X, and define (U,W ) as the random variables withdensity

1u ≤ wαuα−1wβ−α

EXβdu dπ(w).


Then

(a) Ud= X∗;

(b) Wd= X(β);

(c) for any Borel set B ⊂ R,

P[U ∈ B | W ] = P[VαW ∈ B | W ] a.s.,

where Vα has density αxα−1 on [0, 1] and is independent of W ;(d) for any Borel set B ⊂ R,

P[W ∈ B | U ] = P[Z(U) ∈ B | U ] a.s.,

(e) Z(X∗)d= X(β).

Proof. The density of W is (wβ/EXβ) dπ(w), proving (b). Looking at the conditionaldensity of U given W = w, we see that (c) holds. Taking expectations in (c), we have

Ud= VαW , and together with (b) this implies (a). Fact (d) is proven by observing that the

conditional density of W given U = u is

1w ≥ u wβ−α

E[Xβ−α1X ≥ u] dπ(w),

which is the density of Z(u), the (β − α)-power bias transform of [X | X ≥ u]. Finally,

taking expectations in (d) gives Wd= Z(U), and then (a) and (b) prove (e).

Lemma 13. For any p ∈ (0, 1],

X∗ p X =⇒ X(β) p Z(X),

and

X∗ p X =⇒ X(β) p Z(X).

Proof. First, we observe that Z(u) is stochastically increasing in u. Thus we can couple therandom variables (Z(u))u≥0 so that Z(u) ≤ Z(v) whenever u ≤ v (for example, by couplingall Z(u) to the same Unif[0, 1] random variable by the inverse probability transform). Undersuch a coupling, X∗ ≤ X implies that Z(X∗) ≤ Z(X).

Now, suppose X∗ p X . By Lemma 8, there exists a coupling (X,X∗) under whichP[X∗ ≤ X | X∗] ≥ p a.s. Hence,

P[Z(X∗) ≤ Z(X) | X∗] ≥ P[X∗ ≤ X | X∗] ≥ p a.s.

Thus P[Z(X∗) ≤ Z(X) | Z(X∗)] ≥ p a.s. as well. Since Z(X∗)d= X(β) by Proposition 12(e),

this proves the lemma when X∗ p X . The proof when X∗ p X is identical except we

take conditional expectations given X rather than X∗.

The next lemma will be used to prove a tail estimate for Z(X) when α = β.

Lemma 14. Let µ be a probability measure on [0,∞). Then for any t > 0,∫

[0,t)

1

µ[x,∞)µ(dx) ≤ − logµ[t,∞).


Proof. Assume without loss of generality µ[t,∞] > 0. For some partition 0 = x0 < · · · <xn = t, let ϕ(x) be the step function taking value 1/µ[xi,∞) on interval [xi, xi+1) fori = 0, . . . , n− 1. Then

∫

[0,t)

ϕ(x)µ(dx) =n−1∑

i=0

µ[xi, xi+1)

µ[xi,∞)≤

n−1∑

i=0

− logµ[xi+1,∞)

µ[xi,∞)= − logµ[t,∞),(18)

with the inequality holding because x ≤ − log(1 − x) for x ∈ [0, 1).Now, consider any sequence ϕn(x) of such step functions where each partition refines

the last and the mesh size of the partition goes to zero. Then ϕn(x) converges upward to1/µ[x,∞) for all x ∈ [0, t), and

∫

[0,t)

ϕn(x)µ(dx) →∫

[0,t)

1

µ[x,∞)µ(dx)

by the monotone convergence theorem. This proves the lemma by (18).

Lemma 15. For α = β and any random variable X,

P[Z(X) ≥ t] ≤ P[X ≥ t](

1− logP[X ≥ t])

.

Proof. Let G(t) = P[X ≥ t]. For any u ≥ 0,

P[Z(u) ≥ t] = 1u ≥ t+ 1u < tG(t)

G(u).

Hence,

P[Z(X) ≥ t] = G(t) +G(t)E

[

1X < tG(X)

]

,

and by Lemma 14 we have

P[Z(X) ≥ t] ≤ G(t) +G(t)(

− logG(t))

.

Proof of Theorem 3 for α = β. As in the α 6= β cases, it suffices to prove the theorem underthe assumption µ = 1, or equivalently EXβ = α/β = 1. From the definition of X(β),

ptβP[X ≥ t] ≤ pE[

Xβ1X ≥ t]

= pP[X(β) ≥ t].

By Lemma 13 followed by Lemma 15,

pP[

X(β) ≥ t]

≤ P[Z(X) ≥ t] ≤ P[X ≥ t](

1− logP[X ≥ t])

.

We have now shown that ptβP[X ≥ t] ≤ P[X ≥ t](1− logP[X ≥ t]). Assuming P[X ≥ t] >

0, we can divide to obtain ptβ ≤ 1−logP[X ≥ t], demonstrating thatP[X ≥ t] ≤ e1−ptβ .

Theorem 3 is sharp when α = β, including the constant factor of e. We show this withan example that is a discrete counterpart to the one given in [Bro13]. Choose integers µand n and let p = 1/µ, and let X have a capped version of the geometric distribution withsuccess probability p as follows:

P[X = k] =

(1− p)k−1p if 1 ≤ k ≤ n,

(1− p)n if k = n+ µ,

0 otherwise.


Then X∗ X with α = β = 1 (easy to check with Lemma 22), and EX = µ. Now, setn = (t− 1)µ for some integer t ≥ 2, and we have

P[X ≥ µt] = P[X ≥ n+ µ] =(

1− µ−1)(t−1)µ.

This converges to e1−t as m → ∞, confirming that there exist examples in which P[X ≥ µt]comes arbitrarily close to e1−t.

When α 6= β, one would hope for a tail bound of O(tα−βe−ptβ ) rather than the O(tαe−ptβ )achieved in Theorem 3, which would match the tail of the generalized gamma distribution.Perhaps Brown’s proof for the α = β case could be adapted to this one, though we are notsure what the replacement for Lemma 15 should be when α 6= β.

2.3. Lower tail bounds.

Proof of Theorem 4. As in the proof of Theorem 3, by rescaling it suffices to prove thetheorem when µ = 1, i.e., EXβ = α/β. From the definition of p,

pP[X ≤ t] ≤ P[X∗ ≤ t] =

∫ 1

0

αuα−1P[

X(β) ≤ t/u]

du

=β

αE

[

∫ 1

0

αuα−1Xβ1X ≤ t/u du]

=β

αE

[

Xβ

∫ min(t/X,1)

0

αuα−1 du

]

=β

αE[

Xβ min(

(t/X)α, 1)

]

=β

αE[

tαXβ−α1X > t+Xβ1X ≤ t]

.

If α = β, then the expectation is bounded by tα, proving (2). Otherwise, we bound the twoterms in the expectation separately to get

pP[X ≤ t] ≤ β

α

(

tαEXβ−α + tβP[X ≤ t]

)

.

By Jensen’s inequality, EXβ−α ≤ (EXβ)(β−α)/β = (α/β)1−α/β . Applying this bound andrearranging terms yields (1).

If X has the (α, β)-generalized gamma distribution, then Theorem 4 applies to X withµ = p = 1. Up to constants, the O(tα) bound for P[X ≤ t] shown in Theorem 4 matchesthe true tail behavior of X as t → 0.

3. Concentration for urns, graphs, walks, and trees

Each of the examples in Theorem 5 can be expressed as an urn model that we describenow. An urn starts with black and white balls and draws are made sequentially. After aball is drawn, it is replaced and another ball of the same color is added to the urn. Also,after every lth draw an additional black ball is added to the urn for some l ≥ 1. As definedin Section 1.2 of [PRR16], let P l

n(b, w) denote the distribution of the number of white ballsin the urn after n draws have been made when the urn starts with b ≥ 0 black balls andw > 0 white balls, and let Nn(b, w) ∼ P l

n(b, w). We start with a technical lemma whoseproof we defer to the appendix.


Lemma 16. For all n, k ≥ 0,

P[Nn(b, w) = k − 1]P[Nn+1(b, w) = k + 1] ≤ P[Nn(b, w) = k]P[Nn+1(b, w) = k].(19)

Let N[r]n (b, w) be a rising factorial biased version of Nn(b, w), as defined in Lemma 4.2 of

[PRR16], so that

P[

N [r]n (b, w) = k

]

= cr−1∏

i=0

(k + i)P[Nn(b, w) = k]

for some c. There it is shown that N[r]n (b, w) + r

d= Nn(b, w + r). We will use this fact to

prove concentration for Nn(1, w), but first we must relate the rising factorial bias to theusual power bias:

Lemma 17. For all l ≥ 1 we have N[l+1]n−l (b, w) + l N

(l+1)n (b, w).

Proof. We will show that P[

N[l+1]n−l (b, w) + l = k

]

/P[

N(l+1)n (b, w) = k

]

is increasing in k on

the union of the support of N[l+1]n−l (b, w) + l and N

(l+1)n (b, w), which is w,w+1, . . . , w+n.

We have

P[

N[l+1]n−l (b, w) + l = k

]

P[

N(l+1)n (b, w) = k

]

=(k − l) · · · kP[Nn−l(b, w) = k − l]

kl+1P[Nn(b, w) = k].

The expression (k − l) · · · k/kl+1 is increasing in k, as is

P[Nn−l(b, w) = k − l]

P[Nn(b, w) = k]=

n−1∏

j=n−l

P[Nj(b, w) = j − n+ k]

P[Nj+1(b, w) = j − n+ k + 1]

for w ≤ k ≤ w + n, since it is 0 for w ≤ k < w + l and each factor on the right-hand sidein the product is increasing in k by Lemma 16 for w + l ≤ k ≤ w + n (in this range of k,the denominators of the fractions in the product are all nonzero). This proves the desiredstochastic domination by [SS07, Theorem 1.C.1].

Proposition 18. For all l, w, n ≥ 1,

N∗n(1, w) Nn(1, w),

where N∗n(1, w) is the (w, l + 1)-generalized equilibrium transform of Nn(1, w).

Proof. Let Qw(n) have the distribution of the number of white balls in a regular Polya urnafter n draws starting with 1 black and w white balls. We then use Lemma 4.5 of [PRR16]in the first line below to get

Nn(1, w)d= Qw(Nn(0, w + 1)− w − 1)

d= Qw(Nn−l(1, w + 1 + l)− w − 1)

d= Qw(N

[l+1]n−l (1, w) + l + 1− w − 1)

Vw(N[l+1]n−l (1, w) + l)

where in the second line we use the trivial relation

Nn(0, w + 1)d= Nn−l(1, w + 1 + l),

in the third line we use

Nn(1, w + r)d= N [r]

n (1, w) + r,


which follows from Lemma 4.2 of [PRR16], and in the last line we use

Qw(n) (n+ w)Vw ,

which follows from the fact, taken from the proof of Lemma 4.4 of [PRR16], that for inde-pendent and identically distributed uniform(0,1) variables U1, U2, . . . Uw−1 we can write

Qw(n)d= max

i=0,1,...,w−1(i+ ⌈(n+ w − i)Ui⌉)

and this implies

Qw(n) maxi=0,1,...,w−1

(n+ w)Ui = (n+ w)Vw .

We now apply Lemma 17 to obtain

Nn(1, w) VwN(l+1)n (1, w)

d= N∗

n(1, w).

Proof of Theorem 5. We have W ∼ P ln(1, w), V ∼ P 1

n−k−1(1, 2k), Lb2n ∼ P 1

n(0, 1), and

L2n ∼ P 1n(1, 1) respectively from Remark 1.3, Proposition 2.1, Proposition 3.2, and Propo-

sition 3.4 in [PRR16]. The result then follows from Proposition 18 and then noting that theconditions of Theorems 3 and 4 hold.

Remark 19. The factorial moments ofNn(1, w) are explicitly computed in [PRR16, Lemma 4.1].When w = l+1, the concentration bound given in Theorem 5 is better than the one obtainedfrom these moments. For the sake of simplicity, we illustrate with the case l = 1, w = 2.The result of Part (a) of Theorem 5 along with Theorem 3 shows that

P[

Nn(1, w) ≥ γnt]

≤ e1−t2 ,(20)

where γ2n = ENn(1, w)

2. From [PRR16, Theorem 1.2], we know that γn ∼ 2√n.

Now, we compute the concentration inequality given by the factorial moments ofNn(1, w).Using the notation x[n] = x(x + 1) · · · (x + n− 1), from [PRR16, Lemma 4.1] we have

E(

Nn(1, w))[2m]

= 2[2m]n−1∏

i=0

(

1 +2m

2i+ 3

)

= 2[2m]m−1∏

i=0

2n+ 2i+ 3

2i+ 3.

The bound given by applying Markov’s inequality to this is

P[

Nn(1, w) ≥ γnt]

≤ E(

Nn(1, w))[2m]

(γnt)[2m]=

2[2m]

(γnt)[2m]

m−1∏

i=0

2n+ 2i+ 3

2i+ 3

=m−1∏

i=0

(2i+ 2)(2n+ 2i+ 3)

(γnt+ 2i)(γnt+ 2i+ 1).(21)

Let m∗ be the minimizing choice of m in (21). Some algebra shows that the multiplicandin this expression is bounded by 1 if and only if

i ≤γ2n

n t2 + γn

n t− 4− 6n

4 + 8n − 4γt

n

.

Take t to be fixed with respect to n. From the asymptotics for γn, the right-hand side ofthis inequality converges to t2 − 1 as n → ∞. Hence either m∗ = ⌈t2⌉ − 1 or m∗ = ⌈t2⌉


when n is sufficiently large. The optimal tail bound obtained from the factorial moments isthen

m∗−1∏

i=0

(2i+ 2)(

2 + 2i+3n

)

(

γn√nt+ 2i√

n

)(

γn√nt+ 2i+1√

n

) ,

which converges as n → ∞ to

m∗−1∏

i=0

(2i+ 2)(2)

4t2=

(m∗)!

(t2)m∗= Ω(te−t2),

applying Stirling’s approximation to obtain the last estimate. Thus this bound is worsethan (20) by a factor of t, when n and t are large.

A more involved calculation for the general case w = α, l = β − 1 shows that the tail

bound from moments is on the order of tα−β/2e−tβ . Outside of the α = β case, our bound

is on the order of tαe−tβ and is outperformed by the moment bound.

4. Concentration for Galton–Watson processes

In this section, we will use only the standard equilibrium transform (α = β = 1) forinteger-valued random variables. We use the notation Xe to denote the discrete version ofthis transform, which we can define by setting Xe = ⌈X∗⌉ with X∗ the standard α = β = 1equilibrium transform. Equivalently, we can define Xe to be chosen uniformly at randomfrom 1, 2, . . . , Xs, where Xs is the size-bias transform of X . Observe that Xe X holdsif and only if X∗ X when X takes values in the nonnegative integers, which follows fromthe coupling interpretation of stochastic dominance.

4.1. Some concepts from reliability theory. Let X take positive integer values. Aswe mentioned in the introduction, the random variable X is IFR if P[X = k | X ≥ k] isincreasing for k ≥ 1. If Xe X , then X is said to be NBUE, which stands for “new betterthan used in expectation” (see Lemma 22 for the source of this name). A sequence t0, t1, . . .is called log-concave if

(i) t2n ≥ tn−1tn+1 for all n ≥ 1, and(ii) t0, t1, . . . has no internal zeroes (i.e., if ti > 0 and tk > 0 for some i < k, then tj > 0

for all i < j < k).

For X taking nonnegative integer values, we say that X is log-concave if the sequenceP[X = k] for k ≥ 0 is log-concave.

These definitions are standard in reliability theory, though their analogues for continuousrandom variables are more common. We collect and sketch the proofs of some facts aboutthem, for the sake of convenience.

Lemma 20. A positive integer–valued random variable X is IFR if and only if the distri-butions [X − k | X > k] are stochastically decreasing for integers k ≥ 0.

Proof. Let X be IFR and fix a nonnegative integer k. We define T ∼ [X − k | X > k]and T ′ ∼ [X − k − 1 | X > k + 1] coupled together by the following procedure: Letpn = P[X = n | X ≥ n], and let (Bn)n≥1 be independent with Bn ∼ Bernoulli(pn). Let(B′

n)n≥1 have the same distribution as (Bn)n≥1, but couple the two collections so that


Bn ≤ B′n+1, which we can do because pn ≤ pn+1 by the IFR property. Then let

T = minn : Bn+k = 1, n ≥ 1,T ′ = minn : B′

n+k+1 = 1, n ≥ 1.Then T and T ′ have the right distributions, and T ′ ≤ T since Bn ≤ B′

n+1, thus demonstrat-ing that [X − k | X > k] is stochastically decreasing in k.

Conversely, suppose that [X − k | X > k] is stochastically decreasing in k. Then

P[

X − k + 1 ≤ 1∣

∣ X > k − 1]

≤ P[

X − k ≤ 1∣

∣ X > k]

by the definition of stochastic dominance, which proves that

P[X = k | X ≥ k] ≤ P[X = k + 1 | X ≥ k + 1].

Lemma 21. If a positive integer–valued random variable is log-concave, then it is IFR.

Proof. Suppose X takes values in the positive integers and is log-concave. Let pn = P[X =n], and let N be the highest value such that pN > 0, with N = ∞ a possibility. From thedefinition of log-concave,

pn−1

pn≤ pn

pn+1

for all 1 ≤ n ≤ N . This implies that for any fixed k, the ratio

P[X − k = n | X > k]

P[X − k − 1 = n | X > k + 1]=

pn+k

pn+k+1· P[X > k + 1]

P[X > k]

is increasing in n, and this condition implies that [X − k | X > k] stochastically dominates[X − k − 1 | X > k + 1] (see [SS07, Theorem 1.C.1]). Hence X is IFR by Lemma 20.

Lemma 22. A positive integer–valued random variable X is NBUE if and only if

E[X − k | X > k] ≤ EX

for all integers k ≥ 1.

Proof. From the definition of the discrete equilibrium transform,

P[Xe = n] =

∞∑

k=n

1

kP[Xs = k] =

∞∑

k=n

1

EXP[X = k] =

1

EXP[X ≥ n].

Hence

P[Xe > k] =1

EX

∞∑

n=k+1

P[X ≥ n]

=1

EXE[

(X − k)1X > k]

= P[X > k]E[X − k | X > k]

EX.

Therefore P[Xe > k] ≤ P[X > k] holds for all k ≥ 1 if and only if E[X − k | X > k] ≤ EXfor all k ≥ 1.

These three lemmas show that for X taking values in the positive integers,

X is log-concave =⇒ X is IFR =⇒ X is NBUE.(22)

For X taking nonnegative integer values, we say that X is NBUEC if X> is NBUE,or equivalently if Xe X>. This is our coinage; all other terminology in this section isstandard.


In the language defined here (and taking into account the equivalence of Xe X andX∗ X), Theorem 6 states that if the child distribution of a Galton–Watson process isNBUE, then all generations are NBUE. Theorem 7 states that with L the child distribution,if L> is IFR then all generations of the process are NBUEC.

4.2. Forming the equilibrium transform. First, we give a recipe for forming the equi-librium transform of a sum:

Lemma 23. Let X1, . . . , Xn be i.i.d. nonnegative integer-valued random variables, and letS = X1 + · · ·+Xn. Then

Se d=

I−1∑

k=1

Xk +XeI ,(23)

where I is chosen uniformly at random from 1, . . . , n, independent of all else.Proof. The special case of [PR11a, Theorem 4.1] in which X1, . . . , Xn are i.i.d. shows theanalogous statement to (23) for the usual equilibrium transform, and applying the ceilingfunction to both sides of the resulting equation gives (23).

Next, we consider the equilibrium transform of a mixture. To give notation for a mixture,let h be a probability measure on the real numbers. Suppose that for each b in the supportof h, we have a random variable Xb with distribution νb and mean mb ∈ [0,∞). Also assumethat b 7→ µb is measurable. The random variable X is the mixture of (Xb) governed by h iffor all bounded measurable functions g,

Eg(X) =

∫

Eg(Xb) dh(b).

The basic recipe for the equilibrium transform Xe is that it is a mixture of the equilibriumtransforms Xe

b , governed by a biased version of h. The analogous recipe works for formingthe size-bias transform of a mixture, and this result follows from that.

Lemma 24. Let X be the mixture of (Xb) governed by h as described above. Define themeasure hs by its Radon–Nikodym derivative:

dhs(b)

dh=

m(b)

EX.

Then the distribution of Xe is the mixture of(

Xeb

)

governed by hs.

Proof. By [AGK19, Lemma 2.4], the size-bias transform Xs is distributed as the mixture ofXs

b governed by hs. With U ∼ Unif[0, 1] independent of all else, the equilibrium transform⌈UXs⌉ is thus the mixture of ⌈UXs

b ⌉ governed by hs.

4.3. Proofs of the concentration theorems for Galton–Watson trees.

Proof of Theorem 6. Let L be a random variable whose distribution is the child distribution

of the tree, independent of all else, and let Z(i)n for i ≥ 1 denote independent copies of Zn.

Proceeding by induction, we assume that Zn is NBUE and aim to show that

Zn+1d=

L∑

j=1

Z(j)n

is NBUE as well. The concentration inequalities (3) and (4) then follow from Theorems 3and 4.


Let Sk =∑k

j=1 Z(j)n , so that SL

d= Zn+1. Let Tk

d= Se

k. By Lemma 23,

Tkd= Z(1)

n + · · ·+ Z(Ik−1)n +

(

Z(Ik)n

)e,

where Ik is chosen uniformly at random from 1, . . . , k.By Lemma 24, the equilibrium transform of the mixture SL is a mixture of Tk, governed

by a biased version of L. In fact, this biased version is exactly Ls, since the bias on k isproportional to ESk which is proportional to k. Hence,

(

SL

)e d= TLs = Z(1)

n + · · ·+ Z(ILs−1)n +

(

Z(ILs)n

)e,

Z(1)n + · · ·+ Z(ILs−1)

n + Z(ILs)n ,

with the second line following because Z(k)n is NBUE. Since ILs is a uniform selection from

1, . . . , Ls, it is the discrete equilibrium transform of L. Hence ILs L, and(

SL

)e Z(1)n + · · ·+ Z(L)

n = SL.

The main idea in the proof of Theorem 6 is that a random sum consisting of an NBUEquantity of NBUE summands remains NBUE. For Theorem 7, it would be nice to arguethat an NBUEC quantity of NBUEC summands remain NBUEC, but have not been ableto prove or disprove this (see Remark 27). But we can show the following weaker statement.For a random variable X taking nonnegative integer values, we write Bin(X, p) to denotethe distribution obtained by thinning X by p (i.e., the sum of X independent Bernoulli(p)random variables).

Proposition 25. Let X1, X2, . . . be i.i.d. and NBUEC. Let p = P[Xi ≥ 1], and let L bea random variable taking nonnegative integer values, independent of X1, X2, . . ., such thatBin(L, p) is NBUEC. Then X1 + · · ·+XL is NBUEC.

Proof. Let S = X1+ · · ·+XL, and let M be the number of the random variables X1, . . . , XL

that are nonzero. Then

Sd=

M∑

k=1

X>k ,(24)

where M ∼ Bin(L, p) is independent of (X>i )i≥1.

Now, we form the equilibrium transform of S. Let Yk = (X>k )e = Xe

k. By using Lemmas23 and 24 as we did in the proof of Theorem 6,

Se d= X>

1 + · · ·+X>Me−1 + YMe .

Since Xk is given as NBUEC, we have YMe X>Me . And since M is given as NBUEC, we

have M e M>, and we arrive at

Se M>

∑

k=1

X>k .

Finally, viewing S as a sum of M -many strictly positive random variables according to (24),conditioning S to be positive is the same as conditioning M to be positive. That is,

S> d=

M>

∑

k=1

X>k ,

and we arrive at Se S>.


To apply Proposition 25, the NBUEC property for L must be preserved under thinning.We now show that this holds when L> is IFR.

Lemma 26. Let L be a random variable taking nonnegative integer values. If L> is IFR,then Bin(L, p)> is IFR for all 0 < p ≤ 1.

Proof. Let (Bk)k≥1 be i.i.d.-Bernoulli(p) for arbitrary p ∈ (0, 1), and let M = B1+ · · ·+BL,so that M ∼ Bin(L, p). Our goal is to show that P[M = n | M ≥ n] is increasing for n ≥ 1.

Define

ϕ(t) = E[

(1− p)L−t∣

∣ L ≥ t]

,

which is the conditional probability that Bt+1 = · · · = BL = 0 given that L ≥ t. Let Tn bethe smallest index t such that B1 + · · ·+Bt = n. We make the following claims:

(i) P[M = n | M ≥ n] = E[

ϕ(Tn)∣

∣ L ≥ Tn

]

;(ii) the function ϕ(t) is increasing for integers t ≥ 1;(iii) the distributions [Tn | L ≥ Tn] are stochastically increasing in n.

To prove (i), we start by observing that M = n holds if and only L ≥ Tn and BTn+1, . . . , BL

are all zero. Thus,

P[M = n] = P[BTn+1 = · · · = BL = 0 and L ≥ Tn]

=∑

t,ℓ

P[Tn = t, L = ℓ, Bt+1 = · · · = Bℓ = 0]1ℓ ≥ t

=∑

t,ℓ

P[Tn = t, L = ℓ](1− p)ℓ−t1ℓ ≥ t

=∑

t

P[Tn = t]∑

ℓ

P[L = ℓ](1− p)ℓ−t1ℓ ≥ t

=∑

t

P[Tn = t]E[

(1− p)L−t1L ≥ t]

=∑

t

P[Tn = t]ϕ(t)P[L ≥ t].

Since L and Tn are independent, we obtain

P[M = n] =∑

t

P[Tn = t, L ≥ t]ϕ(t)

=∑

t,ℓ

P[Tn = t, L = ℓ]ϕ(t)1ℓ ≥ t = E[

ϕ(Tn)1L ≥ Tn]

.

Observing that the events M ≥ n and L ≥ Tn are the same and dividing both sides ofthe above equation by its probability yields (i).

Now we prove (ii). Since L is IFR, the distributions [L − t | L ≥ t] are stochasticallydecreasing by Lemma 20. Thus ϕ(t) is obtained by taking the expectation of the decreasingfunction x 7→ (1 − p)x under a stochastically decreasing sequence of distributions, showingthat ϕ(t) is increasing in t.

To prove (iii), it suffices (see [SS07, Theorem 1.C.1]) to show that

P[Tn+1 = k | L ≥ Tn+1]

P[Tn = k | L ≥ Tn]is increasing in k for k ≥ n.(25)


Thus we consider

P[Tn+1 = k | L ≥ Tn+1]

P[Tn = k | L ≥ Tn]=

P[Tn+1 = k, L ≥ k]

P[L ≥ Tn+1]· P[L ≥ Tn]

P[Tn = k, L ≥ k]

=P[Tn+1 = k]

P[Tn = k]· P[L ≥ Tn]

P[L ≥ Tn+1],

with the second line following from the independence of Tn and Tn+1 from L. The final bitis to compute probabilities for the negative binomial distribution:

P[Tn+1 = k]

P[Tn = k]=

(

k−1n

)

pn+1(1− p)k−n−1

(

k−1n−1

)

pn(1− p)k−n=

(k − n)p

n(1− p).

This is increasing in k for k ≥ n, which proves (25).Now, statements (i)–(iii) combine to prove the lemma: the quantity E[ϕ(Tn) | L ≥ Tn]

is increasing in n by (ii) and (iii), and hence M> is IFR by (i).

Proof of Theorem 7. Let L be the child distribution of the tree. By (22) and Lemma 26, allthinnings of L are NBUEC. Hence Proposition 25 applies and shows that

L∑

k=1

Xk

is NBUEC whenever (Xk)k ≥ 1 are an i.i.d. family of NBUEC random variables. Applyingthis inductively to each generation Zn of the Galton–Watson process shows that Zn isNBUEC for all n. Therefore Theorems 3 and 4 apply to Z>

n and prove (5) and (6).

Remark 27. Theorem 6 states that if the child distribution of a Galton–Watson process isNBUE, then all its generations are NBUE and hence satisfy concentration inequalities. Log-concave and IFR distributions are not preserved in this way. For a counterexample, considera child distribution placing probability 1/8 on 1, probability 49/64 on 2, and probability7/64 on 3. This distribution is log-concave and IFR, but the size of the second generationis neither.

Lemma 26 states that if L> is IFR, then Bin(L, p)> is IFR. The NBUEC property is notpreserved under thinning in this way. Let

L =

1 with probability 89/100,




5 with probability 1/90000.

We leave it as an exercise that this distribution is NBUEC (in fact, NBUE) but that all ofits thinnings fail to be NBUEC.

Because the NBUEC property is not preserved by thinning, our proof of Theorem 7 willnot work if the condition that L> is IFR is relaxed to L being NBUEC. In fact, we are trulyunsure whether the theorem holds with that condition.

4.4. Previous concentration results for Galton–Watson processes. Let Zn be thenth generation of a Galton–Watson process whose child distribution has mean µ > 1. Let Wbe the almost sure limit of Zn/µ

n, which exists and is nondegenerate when E[Z1 logZ1] < ∞.In Theorems 6 and 7, properties of the child distribution continue to hold for Zn/µ

n atall generations. This is in a similar spirit to many results linking properties of the child


distribution to those of W . For example, for α > 1 it holds that EZα1 is finite if and only if

EWα is finite [BD74]. Similarly, Z1 has a regularly varying distribution with index α > 1if and only if W does [DM82].

One line of results is on the right tail when the child distribution is bounded. Let d be itsmaximum value, and let γ = log d/ logµ > 1. Biggins and Bingham [BB93] used a classicresult of Harris [Har48] to show that

− logP[W ≥ x] = xγ/(γ−1)N(x) + o(

xγ/(γ−1))

,(26)

N(x) is a continuous, multiplicatively periodic function. Hence, in the limit the tail of Zn/µn

decays faster than exponentially. Fleischmann and Wachtel give a more precise version ofthis result [FW09, Remark 3], showing that the tail of W decays as

N2(x)x−γ/2(γ−1) exp

(

−xγ/(γ−1)N(x))

,(27)

where N(x) and N2(x) are continuous, multiplicatively periodic function. Biggins and Bing-ham give a version of their result that applies directly to Zn rather than its limit, andmore detailed results on the right tail of Zn in this situation can also be obtained fromcombinatorial results of Flajolet and Odlyzko [FO84, Theorem 1].

Results on the right tail are also available when the child distribution is heavy-tailed.When Z1 satisfies

supx

P[Z1 > x/2]

P[Z1 > x]< ∞,

the tails of Zn satisfy

c1P[Z1 > x] ≤ P[Zn > µnx] ≤ c2P[Z1 > x]

for constants c1 > 0 and c2 < ∞ independent of x and n [VDK13, Theorem 1]. This resultapplies, for instance, when the tail of Z1 has polynomial decay. A similar result [VDK13,Theorem 3] holds when the tail of Z1 behaves like e−xα

for 0 < α < 1.For the left tail of Zn, the behavior depends on the weight that the child distribution

places on 0 and 1. It is known as the Schroder case when positive weight is placed onthose values and as the Bottcher case when it is not. Roughly speaking, the left tail in theSchroder case behaves similarly to the right tail in the heavy-tailed case, while the left tail ithe Bottcher case behaves similarly to the right tail in the bounded child distribution case.For example, suppose that the child distribution places no weight on 0 and weight p1 on 1.In the Schroder case, where p1 > 0, let α = − log p1/ logµ. Then P[W ≤ x] behaves likexα as x → 0 [Dub71]. Note that α = 1 for a geometric child distribution, coinciding withthe lower tail bound we prove in Theorem 6. For the Bottcher case, where p1 = 0, let d ≥ 2be the minimum value taken by child distribution, and let β = log d/ logµ ∈ (0, 1). Then− logP[W ≤ x] behaves like x−β/(1−β). A result like (26) is shown in [BB93, Theorem 3],and finer asymptotics along the lines of (27) are given in [FW09, Theorem 1].

Our results apply best to distributions that are unbounded but have exponential tails, acase that seems poorly covered by the existing literature. Our bound is also more explicitthan any we have encountered, with no limits or unspecified constants.

Appendix

In this appendix, we carry out the proof of Lemma 16. In fact, we show that Lemma 16holds for a slightly generalized version of the urn process described in Section 3. As withthat process, start with b ≥ 1 black balls and w ≥ 1 white balls, and after each draw add


an extra ball with the same color as the ball drawn. Instead of adding an additional blackball after every lth draw, we allow black balls to be added at arbitrary but predeterminedtimes. Thus the number of balls in the urn after n draws, denoted by Bn, is an arbitrarybut deterministic strictly increasing sequence with B0 = b + w. Let Nn be the number of

white balls in the urn after n draws. Let t(n)k = P[Nn = k]. The dynamics of the urn process

gives

t(n+1)k =

(

k − 1

Bn

)

t(n)k−1 +

(

1− k

Bn

)

t(n)k .(28)

First, we show that t(n)k is log-concave in k for each fixed n:

Lemma 28. For all n ≥ 0 and all k,(

t(n)k

)2 ≥ t(n)k−1t

(n)k+1.(29)

Proof. We prove this by induction. For the base case, we have t(0)k = 1k = w, and

hence the right-hand side of (29) is always zero when n = 0. Now, we expand(

t(n+1)k

)2 −t(n+1)k−1 t

(n+1)k+1 using (28) as (A1 +A2 +A3)/B

2n for

A1 = (k − 1)2t2k−1 − (k − 2)ktk−2tk,(30)

A2 = (Bn − k)2t2k − (Bn − k + 1)(Bn − k − 1)tk−1tk+1,(31)

A3 = 2(k − 1)(Bn − k)tk−1tk − (k − 2)(Bn − k − 1)tk−2tk+1 − k(Bn − k + 1)tk−1tk,(32)

where we have simplified notation by writing tk for t(n)k . Applying the inductive hypothesis

to (30) and (31) gives

A1 ≥(

(k − 1)2 − (k − 2)k)

t2k−1 = t2k−1,

and

A2 ≥(

(Bn − k)2 − (Bn − k + 1)(Bn − k − 1))

t2k = t2k.

To bound A3, we note that the inductive hypothesis implies tk−2tk+1 ≤ tk−1tk, whichtogether with (32) gives

A3 ≥(

2(k − 1)(Bn − k)− (k − 2)(Bn − k − 1)− k(Bn − k + 1))

tk−1tk = −2tk−1tk.

Hence

A1 +A2 +A3 ≥ t2k−1 + t2k − 2tk−1tk = (tk−1 − tk)2 ≥ 0,

thus extending the induction.

Next, we establish a variant of log-concavity with a similar but more complicated proof.

Lemma 29. For all n ≥ 0 and all k,

(Bn − k)(

t(n)k

)2 − (Bn − k − 1)t(n)k−1t

(n)k+1 − t

(n)k−1t

(n)k ≥ 0.(33)

Proof. We proceed by induction. Let E(n)k be the left-hand side of (33). Since t

(0)k = 1k =

w and B0 = w + b, we have E(0)k = b1k = w, demonstrating (33) when n = 0. Now we

assume E(n)k ≥ 0 for all k, and we show E

(n+1)k ≥ 0 for all k. It suffices to prove E

(n+1)k ≥ 0


under the assumption that Bn+1 = Bn + 1, because Bn+1 is at least this large, and we can

see that E(n+1)k is increasing in Bn+1 by writing it as as

E(n+1)k = (Bn+1 − k − 1)

[

(

t(n+1)k

)2 − t(n+1)k−1 t

(n+1)k+1

]

+(

t(n+1)k

)2 − t(n+1)k−1 t

(n+1)k

and applying Lemma 28.

For the sake of readability, we write tk for t(n)k and B for Bn in this proof. We apply (28)

to obtain(

t(n+1)k

)2=

1

B2

(

(k − 1)2t2k−1 + 2(k − 1)(B − k)tk−1tk + (B − k)2t2k

)

,

t(n+1)k−1 t

(n+1)k+1 =

1

B2

(

(k − 2)ktk−2tk + (k − 2)(B − k − 1)tk−2tk+1

+ k(B − k + 1)tk−1tk + (B − k − 1)(B − k + 1)tk−1tk+1

)

,

t(n+1)k−1 t

(n+1)k =

1

B2

(

(k − 2)(k − 1)tk−2tk−1 + (k − 2)(B − k)tk−2tk

+ (k − 1)(B − k + 1)t2k−1 + (B − k)(B − k + 1)tk−1tk

)

.

Now, under the assumption that Bn+1 = B + 1, we expand E(n+1)k as (A1 + A2 + A3)/B

2,where

A1 = (k − 2)(

(B − k + 1)(k − 1)t2k−1 − (k + 1)(B − k)tk−2tk − (k − 1)tk−2tk−1

)

= (k − 2)(

(k − 1)E(n)k−1 − 2(B − k)tk−2tk

)

≥ −2(k − 2)(B − k)tk−2tk,

and

A2 = (B − k + 1)(B − k)(

(B − k)t2k − (B − k − 1)tk−1tk+1

)

= (B − k + 1)(B − k)(

E(n)k + tk−1tk

)

≥ (B − k + 1)(B − k)tk−1tk,

and

A3 = (B − k + 1)(B − k)(k − 3)tk−1tk − (k − 2)(B − k)(B − k − 1)tk−2tk+1

= (B − k)(

(k − 2)[

(B − k + 1)tk−1tk − (B − k − 1)tk−2tk+1

]

− (B − k + 1)tk−1tk

)

= (B − k)

(

(k − 2)

[

E(n)k tk−2 + E

(n)k−1tk

tk−1+ 2tk−2tk

]

− (B − k + 1)tk−1tk

)

≥ (B − k)(

2(k − 2)tk−2tk + (B − k + 1)tk−1tk

)

,

where we have applied the inductive hypothesis in each final step. Combining these bounds,

E(n+1)k ≥ (B − k)

B2

(

−2(k − 2)tk−2tk + (B − k + 1)tk−1tk

+ 2(k − 2)tk−2tk − (B − k + 1)tk−1tk

)

= 0.

Proof of Lemma 16. First, we dispense with the case that any of t(n)k−1, t

(n+1)k+1 , t

(n)k , or t

(n+1)k

are equal to zero. If either of t(n)k−1 or t

(n+1)k+1 equals zero, then the left-hand side of (19) is

zero and the inequality holds. If t(n)k or t

(n+1)k equals 0 then t

(n+1)k+1 = 0, since the support


of t(n)k is w, . . . , w+n; in this case both sides of (19) are zero. Thus we assume from now

on that these four terms are all nonzero.Now, proving the lemma is equivalent to showing t

(n+1)k /t

(n+1)k+1 − t

(n)k−1/t

(n)k ≥ 0. We

compute

t(n+1)k

t(n+1)k+1

−t(n)k−1

t(n)k

=t(n+1)k − t

(n)k−1

t(n)k

t(n+1)k+1

t(n+1)k+1

=1

t(n+1)k+1

(

(

k − 1

Bn

)

t(n)k−1 +

(

1− k

Bn

)

t(n)k

−t(n)k−1

t(n)k

[(

k

Bn

)

t(n)k +

(

1− k + 1

Bn

)

t(n)k+1

]

)

=1

Bnt(n+1)k+1 t

(n)k

(

−t(n)k−1t

(n)k + (Bn − k)

(

t(n)k

)2 − (Bn − k − 1)t(n)k−1t

(n)k+1

)

,

which is nonnegative by Lemma 29.

Remark 30. It is possible to avoid all the work of this appendix, at the cost of a slightlyinferior concentration bound for Nn(1, w). The result of this appendix (Lemma 16) is used

to prove that N[l+1]n−l (b, w) + l N

(l+1)n (b, w) (Lemma 17), which is then applied in the

proof of Proposition 18. An alternate path is to invoke the following stochastic inequalitybetween the factorial and power bias transformations, which holds for any nonnegativerandom variable:

X(l+1) p X [l+1],(34)

where

p =EX l+1

E[

X(X + 1) · · · (X + l)] .

Modifying the derivation in Proposition 18 slightly, we get

Nn(1, w)d= Qw(Nn−l(1, w + 1 + l)− w − 1)

Qw(Nn(1, w + 1 + l)− l − w − 1,

with the second line holding since at most l white balls can be added from steps n− l to n.Then following the same steps as in Proposition 18,

Nn(1, w) Qw(N[l+1]n (1, w)− w) VwN

[l+1]n (1, w).

Finally, invoking (34), we have

Nn(1, w) p VwN(l+1)n (1, w)

d= N∗

n(1, w).

The concentration bounds obtained from this are worse because of the factor of p in theexponent, but it does illustrate how the p < 1 versions of our concentration bounds can beused.


Acknowledgments

The initial portion of this work was conducted at the meeting Stein’s method and applicationsin high-dimensional statistics held at the American Institute of Mathematics in August2018. We would also like to express our gratitude to John Fry and staff Estelle Basor, BrianConrey, and Harpreet Kaur at the American Institute of Mathematics for the generosity andexcellent hospitality in hosting this meeting at the Fry’s Electronics corporate headquartersin San Jose, CA, and Jay Bartroff, Larry Goldstein, Stanislav Minsker and Gesine Reinert fororganizing such a stimulating meeting. T.J. received support from NSF grant DMS-1811952and PSC-CUNY Award #62628-00 50.

References

[AGK19] Richard Arratia, Larry Goldstein, and Fred Kochman, Size bias for one and all, Probab. Surv. 16(2019), 1–61. MR 3896143

[BA99] Albert-Laszlo Barabasi and Reka Albert, Emergence of scaling in random networks, Science 286

(1999), no. 5439, 509–512. MR 2091634[BB93] J. D. Biggins and N. H. Bingham, Large deviations in the supercritical branching process, Adv. in

Appl. Probab. 25 (1993), no. 4, 757–772. MR 1241927[BD74] N. H. Bingham and R. A. Doney, Asymptotic properties of supercritical branching processes. I.

The Galton-Watson process, Advances in Appl. Probability 6 (1974), 711–731. MR 362525[Bro06] Mark Brown, Exploiting the waiting time paradox: applications of the size-biasing transformation,

Probab. Engrg. Inform. Sci. 20 (2006), no. 2, 195–230. MR 2261286[Bro13] , Sharp bounds for NBUE distributions, Ann. Oper. Res. 208 (2013), 245–250. MR 3100632[CGJ18] Nicholas Cook, Larry Goldstein, and Tobias Johnson, Size biased couplings and the spectral gap

for random regular graphs, Ann. Probab. 46 (2018), no. 1, 72–125. MR 3758727[DJ18] Fraser Daly and Oliver Johnson, Relaxation of monotone coupling conditions: Poisson approxi-

mation and beyond, J. Appl. Probab. 55 (2018), no. 3, 742–759. MR 3877881[DM82] A. De Meyer, On a theorem of Bingham and Doney, J. Appl. Probab. 19 (1982), no. 1, 217–220.

MR 644434[Dub71] M. Serge Dubuc, La densite de la loi-limite d’un processus en cascade expansif, Z. Wahrschein-

lichkeitstheorie und Verw. Gebiete 19 (1971), 281–290. MR 300353

[FO84] P. Flajolet and A. M. Odlyzko, Limit distributions for coefficients of iterates of polynomials with

applications to combinatorial enumerations, Math. Proc. Cambridge Philos. Soc. 96 (1984), no. 2,237–253. MR 757658

[FW09] Klaus Fleischmann and Vitali Wachtel, On the left tail asymptotics for the limit law of supercritical

Galton-Watson processes in the Bottcher case, Ann. Inst. Henri Poincare Probab. Stat. 45 (2009),no. 1, 201–225. MR 2500235

[GG11a] Subhankar Ghosh and Larry Goldstein, Applications of size biased couplings for concentration of

measures, Electron. Commun. Probab. 16 (2011), 70–83. MR 2763529[GG11b] , Concentration of measures via size-biased couplings, Probab. Theory Related Fields 149

(2011), no. 1-2, 271–278. MR 2773032[GR97] Larry Goldstein and Gesine Reinert, Stein’s method and the zero bias transformation with appli-

cation to simple random sampling, Ann. Appl. Probab. 7 (1997), no. 4, 935–952. MR 1484792[Har48] T. E. Harris, Branching processes, Ann. Math. Statistics 19 (1948), 474–494. MR 27465[PR11a] Erol Pekoz and Adrian Rollin, Exponential approximation for the nearly critical Galton-Watson

process and occupation times of Markov chains, Electron. J. Probab. 16 (2011), no. 51, 1381–1393.MR 2827464

[PR11b] Erol A. Pekoz and Adrian Rollin, New rates for exponential approximation and the theorems of

Renyi and Yaglom, Ann. Probab. 39 (2011), no. 2, 587–608. MR 2789507[PRR16] Erol A. Pekoz, Adrian Rollin, and Nathan Ross, Generalized gamma approximation with rates for

urns, walks and trees, Ann. Probab. 44 (2016), no. 3, 1776–1816. MR 3502594[Ros11] Nathan Ross, Fundamentals of Stein’s method, Probab. Surv. 8 (2011), 210–293. MR 2861132[SS07] Moshe Shaked and J. George Shanthikumar, Stochastic orders, Springer Series in Statistics,

Springer, New York, 2007. MR 2265633 (2008g:60005)


[Str65] V. Strassen, The existence of probability measures with given marginals, Ann. Math. Statist. 36(1965), 423–439. MR 177430

[VDK13] V. I. Vakhtel’, D. E. Denisov, and D. A. Korshunov, On the asymptotics of the tail of distribution

of a supercritical Galton-Watson process in the case of heavy tails, Tr. Mat. Inst. Steklova 282

(2013), no. Vetvyashchiesya Protsessy, Sluchaınye Bluzhdaniya, i Smezhnye Voprosy, 288–314,English version published in Proc. Steklov Inst. Math. 282 (2013), no. 1, 273–297. MR 3308596

College of Staten Island

Email address: [email protected]

Boston University

Email address: [email protected]

tobias johnson and erol pekoz¨ arxiv:2108.02101v1 …

Documents