new lower bounds on the total variation distance and...

28
New Lower Bounds on the Total Variation Distance and Relative Entropy for the Poisson Approximation Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel ITA 2013, San-Diego, CA, USA February 2013. I. Sason (Technion) ITA 2013, San-Diego February 2013. 1 / 16

Upload: others

Post on 22-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

New Lower Bounds on the Total Variation Distanceand Relative Entropy for the Poisson Approximation

Igal Sason

Department of Electrical EngineeringTechnion - Israel Institute of Technology

Haifa 32000, Israel

ITA 2013, San-Diego, CA, USAFebruary 2013.

I. Sason (Technion) ITA 2013, San-Diego February 2013. 1 / 16

Page 2: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Problem Statement

Let Xi be n Bernoulli random variables with

pi , P(Xi = 1) = 1 − P(Xi = 0) > 0.

I. Sason (Technion) ITA 2013, San-Diego February 2013. 2 / 16

Page 3: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Problem Statement

Let Xi be n Bernoulli random variables with

pi , P(Xi = 1) = 1 − P(Xi = 0) > 0.

Let

W ,

n∑

i=1

Xi, λ , E(W ) =n∑

i=1

pi

where it is assumed that λ ∈ (0,∞).

I. Sason (Technion) ITA 2013, San-Diego February 2013. 2 / 16

Page 4: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Problem Statement

Let Xi be n Bernoulli random variables with

pi , P(Xi = 1) = 1 − P(Xi = 0) > 0.

Let

W ,

n∑

i=1

Xi, λ , E(W ) =n∑

i=1

pi

where it is assumed that λ ∈ (0,∞).

Let Po(λ) denote the Poisson distribution with parameter λ.

I. Sason (Technion) ITA 2013, San-Diego February 2013. 2 / 16

Page 5: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Problem Statement

Let Xi be n Bernoulli random variables with

pi , P(Xi = 1) = 1 − P(Xi = 0) > 0.

Let

W ,

n∑

i=1

Xi, λ , E(W ) =n∑

i=1

pi

where it is assumed that λ ∈ (0,∞).

Let Po(λ) denote the Poisson distribution with parameter λ.

Problem: Getting tight bounds for the total variation distance andrelative entropy between W and Po(λ).

I. Sason (Technion) ITA 2013, San-Diego February 2013. 2 / 16

Page 6: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Problem Statement

Let Xi be n Bernoulli random variables with

pi , P(Xi = 1) = 1 − P(Xi = 0) > 0.

Let

W ,

n∑

i=1

Xi, λ , E(W ) =n∑

i=1

pi

where it is assumed that λ ∈ (0,∞).

Let Po(λ) denote the Poisson distribution with parameter λ.

Problem: Getting tight bounds for the total variation distance andrelative entropy between W and Po(λ).In this talk: New (improved) lower bounds on the total variation distanceand relative entropy are derived, and their possible use is outlined.

I. Sason (Technion) ITA 2013, San-Diego February 2013. 2 / 16

Page 7: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Related Work1 A. D. Barbour and P. Hall, “On the rate of Poisson convergence,”

Math. Proc. of the Cambridge Philosophical Society, 1984.

2 Barbour et al., “Compound Poisson approximation via informationfunctionals,” EJP, August 2010.

3 Kontoyiannis et al., “Entropy and the law of small numbers,” IEEE

Trans. on IT, Feb. 2005.

4 Harremoes and Ruzankin, “Rate of convergence to Poisson law interms of information divergence,” IEEE Trans. on IT, Sept. 2004.

5 P. Harremoes and C. Vignat, “Lower bounds on informationdivergence,” Feb. 2011 (http://arxiv.org/pdf/1102.2536.pdf).

6 O. Johnson, Information Theory and the Central Limit Theorem,Imperial College Press, 2004.

I. Sason (Technion) ITA 2013, San-Diego February 2013. 3 / 16

Page 8: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Total Variation Distance

Let P and Q be two probability measures defined on a set X .

The total variation distance between P and Q is defined by

dTV(P,Q) , supBorel A⊆X

|P (A) − Q(A)|

where the supermum is taken w.r.t. all the Borel subsets A of X .

If X is a countable set then it is simplified to

dTV(P,Q) =1

2

x∈X

|P (x) − Q(x)| =||P − Q||1

2.

⇒ The total variation distance is equal to one-half of the L1-distancebetween the two probability distributions.

I. Sason (Technion) ITA 2013, San-Diego February 2013. 4 / 16

Page 9: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Theorem 1 - Barbour and Hall, 1984

Let W =∑n

i=1Xi be a sum of n independent Bernoulli random variables

with E(Xi) = pi for i ∈ {1, . . . , n}, and E(W ) = λ. Then, the totalvariation distance between the probability distribution of W and thePoisson distribution with mean λ satisfies

1

32

(1 ∧

1

λ

) n∑

i=1

p2i ≤ dTV(PW ,Po(λ)) ≤

(1 − e−λ

λ

) n∑

i=1

p2i

where a ∧ b , min{a, b} for every a, b ∈ R.

The derivation of the upper and lower bounds is based on the Chen-Steinmethod for Poisson approximation.

I. Sason (Technion) ITA 2013, San-Diego February 2013. 5 / 16

Page 10: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Chen-Stein Method

The Chen-Stein method forms a powerful probabilistic tool to calculateerror bounds for the Poisson approximation (Chen 1975).

Key Idea:

Z ∼ Po(λ) with λ ∈ (0,∞) if and only if

λ E[f(Z + 1)] − E[Z f(Z)] = 0

for all bounded functions f that are defined on N0 , {0, 1, . . .}.

I. Sason (Technion) ITA 2013, San-Diego February 2013. 6 / 16

Page 11: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Chen-Stein Method

The Chen-Stein method forms a powerful probabilistic tool to calculateerror bounds for the Poisson approximation (Chen 1975).

Key Idea:

Z ∼ Po(λ) with λ ∈ (0,∞) if and only if

λ E[f(Z + 1)] − E[Z f(Z)] = 0

for all bounded functions f that are defined on N0 , {0, 1, . . .}.

The idea behind this method is to treat analytically (by bounds) thefunctional

Af(W ) , λ E[f(W + 1)] − E[W f(W )].

⇒ leads to some rigorous bounds on dTV(W,Z) as a function of the

choice of the bounded function f : N0 → R.

I. Sason (Technion) ITA 2013, San-Diego February 2013. 6 / 16

Page 12: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Chen-Stein Method (2)

The proof of Theorem 1 (Barbour and Hall, 1984) relies on the choice

f(k) , (k − λ) exp

(−

(k − λ)2

λ

), ∀ k ∈ N0.

I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Page 13: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Chen-Stein Method (2)

The proof of Theorem 1 (Barbour and Hall, 1984) relies on the choice

f(k) , (k − λ) exp

(−

(k − λ)2

λ

), ∀ k ∈ N0.

The new (and rather surprising) result is that the lower bound on the totalvariation distance is much improved by generalizing f to be

f(k) , (k − α1) exp

(−

(k − α2)2

θλ

), ∀ k ∈ N0

where α1, α2 ∈ R and θ ∈ R+ are optimized to get the tightest lower

bound on dTV(W,Z). But, it complicates the analysis (Th. 2).

I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Page 14: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Chen-Stein Method (2)

The proof of Theorem 1 (Barbour and Hall, 1984) relies on the choice

f(k) , (k − λ) exp

(−

(k − λ)2

λ

), ∀ k ∈ N0.

The new (and rather surprising) result is that the lower bound on the totalvariation distance is much improved by generalizing f to be

f(k) , (k − α1) exp

(−

(k − α2)2

θλ

), ∀ k ∈ N0

where α1, α2 ∈ R and θ ∈ R+ are optimized to get the tightest lower

bound on dTV(W,Z). But, it complicates the analysis (Th. 2).Special case: α1 = α2 , λ and optimizing θ leads to a simplified lowerbound (Th. 3) that achieves almost the same improvement for λ ≥ 10.

I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Page 15: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Theorem 2 - Improved Lower Bound on the Total Variation Distance

In the setting of Theorem 1, the total variation distance between theprobability distribution of W and Po(λ) satisfies

K1(λ)

n∑

i=1

p2i ≤ dTV(PW ,Po(λ)) ≤

(1 − e−λ

λ

) n∑

i=1

p2i

whereK1(λ) , sup

α1, α2 ∈ R,

α2 ≤ λ + 3

2,

θ > 0

(1 − hλ(α1, α2, θ)

2 gλ(α1, α2, θ)

)

for some functions hλ, gλ that are given in Th. 2 of the conference paper.

I. Sason (Technion) ITA 2013, San-Diego February 2013. 8 / 16

Page 16: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Theorem 3 - Simple Lower Bound on the Total Variation Distance

Under the assumptions in Theorem 1, the following inequality holds:

K1(λ)

n∑

i=1

p2i ≤ dTV(PW ,Po(λ)) ≤

(1 − e−λ

λ

) n∑

i=1

p2i

where

K1(λ) ,e

1 − 1

θ

(3 + 7

λ

)

θ + 2e−1/2

θ , 3 +7

λ+

1

λ·√

(3λ + 7)[(3 + 2e−1/2)λ + 7

].

I. Sason (Technion) ITA 2013, San-Diego February 2013. 9 / 16

Page 17: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Theorem 4 - Improved Lower Bound on the Relative Entropy

In the setting of Th. 1, the divergence between the probability distributionof W and the Poisson distribution with mean λ = E(W ) satisfies:

K2(λ)

(n∑

i=1

p2i

)2

≤ D(PW ||Po(λ)

)≤

1

λ

n∑

i=1

p3i

1 − pi

whereK2(λ) , m(λ)

(K1(λ)

)2

m(λ) ,

(1

2e−λ−1

)log(

1

eλ−1

)if λ ∈ (0, log 2)

2 if λ ≥ log 2.

I. Sason (Technion) ITA 2013, San-Diego February 2013. 10 / 16

Page 18: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Theorem 4 - Proof Outline

The lower bound on the relative entropy is based on new lower boundon the total variation distance and the distribution-dependentrefinement of Pinsker’s inequality (Ordentlich & Weinberger, IEEETrans. on IT, 2005).

I. Sason (Technion) ITA 2013, San-Diego February 2013. 11 / 16

Page 19: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Theorem 4 - Proof Outline

The lower bound on the relative entropy is based on new lower boundon the total variation distance and the distribution-dependentrefinement of Pinsker’s inequality (Ordentlich & Weinberger, IEEETrans. on IT, 2005).

The upper bound on the relative entropy was derived by Kontoyianniset al. (IEEE Trans. on IT, 2005).

I. Sason (Technion) ITA 2013, San-Diego February 2013. 11 / 16

Page 20: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Theorem 4 - Proof Outline

The lower bound on the relative entropy is based on new lower boundon the total variation distance and the distribution-dependentrefinement of Pinsker’s inequality (Ordentlich & Weinberger, IEEETrans. on IT, 2005).

The upper bound on the relative entropy was derived by Kontoyianniset al. (IEEE Trans. on IT, 2005).

Corollary

If {Xi} are i.i.d. binary RVs, W ,∑n

i=1Xi, and Z ∼ Po(λ) with

λ =∑n

i=1pi, then

dTV(W,Z) = O

(1

n

), D

(PW ||Po(λ)

)= O

(1

n2

).

I. Sason (Technion) ITA 2013, San-Diego February 2013. 11 / 16

Page 21: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Improved Lower Bound on the Relative Entropy (2)

The combination of the original lower bound on dTV(W,Z) (seeTheorem 1) with Pinsker’s inequality gives:

D(PW ||Po(λ)

)≥

1

512

(1 ∧

1

λ2

) ( n∑

i=1

p2i

)2

.

The improvement of the new lower bound on the relative entropy isby a factor of 179.7 log

(1

λ

)for λ ≈ 0, by a factor of 9.22 for λ → ∞,

and at least by a factor of 6.14 for all λ ∈ (0,∞).

I. Sason (Technion) ITA 2013, San-Diego February 2013. 12 / 16

Page 22: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Personal Communications with P. Harremoes

There exists another recent lower bound on the relative entropy (awork in preparation by Kontoyiannis et al.). The bounds are derivedby different approaches.

The two lower bounds on the relative entropy scale like(∑n

i=1p2

i

)2but with a different scaling factor.

I. Sason (Technion) ITA 2013, San-Diego February 2013. 13 / 16

Page 23: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Possible Uses of the New Bounds

The paper (whose shortened version is submitted to ISIT 2013):

I. Sason, “Entropy bounds for discrete random variables via coupling,”submitted to the IEEE Trans. on Information Theory, September 2012.[Online]. Available: http://arxiv.org/abs/1209.5259

introduces new entropy bounds for discrete random variables via maximalcoupling, providing bounds on the difference between the entropies of twodiscrete random variables in terms of the local and total variationdistances between their probability mass functions.

I. Sason (Technion) ITA 2013, San-Diego February 2013. 14 / 16

Page 24: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Possible Uses of the New Bounds

The paper (whose shortened version is submitted to ISIT 2013):

I. Sason, “Entropy bounds for discrete random variables via coupling,”submitted to the IEEE Trans. on Information Theory, September 2012.[Online]. Available: http://arxiv.org/abs/1209.5259

introduces new entropy bounds for discrete random variables via maximalcoupling, providing bounds on the difference between the entropies of twodiscrete random variables in terms of the local and total variationdistances between their probability mass functions.

The new lower bound on the total variation distance dTV(W,Z) wasinvolved in a rigorous estimation of the entropy of W (via bounds).

Application: Getting numerical bounds on the sum-rate capacity of anoiseless K-user multiple-access channel with binary inputs subject to aconstraint on the total power.

I. Sason (Technion) ITA 2013, San-Diego February 2013. 14 / 16

Page 25: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Possible Uses of the New Bounds (2)

The use of the new lower bound on the relative entropy for thePoisson approximation of a sum of Bernoulli random variables isexemplified in the full paper version of this work (see Section 4.E) inthe context of a problem in binary hypothesis testing.

I. Sason (Technion) ITA 2013, San-Diego February 2013. 15 / 16

Page 26: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Possible Uses of the New Bounds (2)

The use of the new lower bound on the relative entropy for thePoisson approximation of a sum of Bernoulli random variables isexemplified in the full paper version of this work (see Section 4.E) inthe context of a problem in binary hypothesis testing.

The upper bound on the error probability depends exponentially onthe relative entropy between the two probability distributions.

I. Sason (Technion) ITA 2013, San-Diego February 2013. 15 / 16

Page 27: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Possible Uses of the New Bounds (2)

The use of the new lower bound on the relative entropy for thePoisson approximation of a sum of Bernoulli random variables isexemplified in the full paper version of this work (see Section 4.E) inthe context of a problem in binary hypothesis testing.

The upper bound on the error probability depends exponentially onthe relative entropy between the two probability distributions.

The improvement of the lower bound on the relative entropy ⇒Remarkable improvement in the minimal block length that ensures afixed error probability (numerical results appear in the full paper).

I. Sason (Technion) ITA 2013, San-Diego February 2013. 15 / 16

Page 28: New Lower Bounds on the Total Variation Distance and ...webee.technion.ac.il/people/sason/Presentation_ita2013.pdf · I. Sason (Technion) ITA 2013, San-Diego February 2013. 7 / 16

Conclusions

New lower bounds on the total variation distance between thedistribution of a sum of independent Bernoulli random variables andthe Poisson random variable (with the same mean) were introducedvia the Chen-Stein method.

Corresponding lower bounds on the relative entropy were introduced,based on the lower bounds on the total variation distance and anexisting distribution-dependent refinement of Pinsker’s inequality.

Two uses of these bounds were outlined.

The full paper version is available at http://arxiv.org/abs/1206.6811.

Conference paper is available at http://arxiv.org/abs/1301.7504.

I. Sason (Technion) ITA 2013, San-Diego February 2013. 16 / 16