mutual information as a function of matrix snr for linear...

Mutual Information as a Function ofMatrix SNR for Linear Gaussian Channels

Galen Reeves (Duke)Henry Pfister (Duke)

Alex Dytso (Princeton)

ISIT, June 2018

Table of Contents

Additive Gaussian Noise Channels

Matrix SNR Functions

ApplicationsBounds on mutual information and MMSEGaussian approximation via low-dimensional projectionsAdditivity of information via free probability

Conclusion

2 / 37

Interpolation with a Gaussian

P (x) ∗ φt(x) ⇐⇒ X +√tN

I Heat flow, Gaussian channel, Ornstein–Uhlenbeck channel

I Functional properties of entropy and Fisher information:De Bruijn’s identity, I-MMSE relation [Guo et al. 2005].

I Simple proofs:I Entropy power inequality [Rioul 2011],I Gaussian logarithmic Sobolev inequality and Gaussian

hypercontractivity [Raginsky & Sason 2014]

I Connections with replica method and free probability theoryI G. Reeves Additivity of Information in Multilayer Networks via

Additive Gaussian Noise Transforms, Reeves Allerton 2017.

3 / 37

Gaussian Channel

Y =√sX + N

I Random vector X = (X1, . . . , Xn)

I Additive Gaussian noise N ∼ N (0, I)

I Scalar SNR parameter s

4 / 37

Linear Gaussian Channel

Y = AX + N

P (x) A AWGNX Y

I Random vector X = (X1, . . . , Xn)

I Additive Gaussian noise N ∼ N (0, I)

I k × n channel matrix A

5 / 37

Linear Gaussian Channel

Y A X N

= +

Y A X N

= +

short fat tall skinny

6 / 37

Table of Contents




Conclusion

7 / 37

Parallel combination of channels

Given scalars a and b, define

Y = aX + N , Z = bX + N ′,

where N ,N ′ are independent standard Gaussian vectors.

Is the following statement true?

I(X;Y ,Z) = I(X;(a2 + b2

)1/2X + N ′′

)

YES!

8 / 37

Parallel combination of channels

Given matrices A and B, define

Y = AX + N , Z = BX + N ′,

where N ,N ′ are independent standard Gaussian vectors.

Is the following statement true?

I(X;Y ,Z) = I(X;(ATA+BTB

)1/2X + N ′′

)

YES!

9 / 37

Basic fact I

A Linear Gaussian channel with matrix A,

P (x) A AWGNX Y

is statistically equivalent to channel with square matrix (ATA)1/2

P (x) (ATA)1/2 AWGNX Y ′

10 / 37

Proof of basic fact I (short fat case)

Consider singular value decomposition A = UDV T .

Y A X N

= +

Y ′ (ATA)1/2 X N ′

= +

11 / 37


Consider singular value decomposition A = UDV T .

Y U D V T X N

= +

Y ′ V (DTD)1/2 V T X N ′

= +

12 / 37


Apply orthogonal transformations.

Y U D V T X N

= +

Y ′ V (DTD)1/2 V T X N ′

= +

13 / 37


Apply orthogonal transformations.

UT Y D V T X UT N

= +

V T Y ′ (DTD)1/2 V T X V T N ′

= +

14 / 37

Proof of basic fact I (Skinny case)

Note the Gaussian noise is orthogonally invariant.

UT Y D V T X UT N

= +

V T Y ′ (DTD)1/2 V T X V T N ′

= +

15 / 37

Proof of basic fact I (Skinny case)

Note the Gaussian noise is orthogonally invariant.

UT Y D V T X N

= +

V T Y ′ (DTD)1/2 V T X N ′

= +

16 / 37

Basic fact II

The parallel combination of linear Gaussian channels,

P (x)X

A

B

AWGN

AWGN

Y

Z

is the equivalent to a single channel

P (x) (ATA+BTB)1/2 AWGNX Y ′

17 / 37

Proof of basic fact II

Consider concatenation of vector channels:[YZ

]=

[AB

]X +

[NN ′

]By basic fact I, this channel is equivalent to one with matrix([

AB

]T[AB

])1/2

=(ATA+BTB

)1/2

18 / 37

Matix SNR functions

For any matrix A, mutual information satisfies

I(X;AX + N) = I(X; (ATA)1/2X + N ′)

Key idea: Parameterize using positive semidefinite matrix SNR

Mutual information function IX : S+ → R+

IX(S) , I(X;S1/2X + N)

MMSE function MX : S+ → S+

MX(S) , E[Cov(X | S1/2X + N)

]

19 / 37

Matix SNR functions

Lamarca, 2009 showed that IX(S) is twice differentiable with:

∇SIX(S) =1

2MX(S)

∇2SIX(S) = −1

2EY [Cov(X | Y )⊗ Cov(X | Y )]

where Y = S12X + N .

Payaro, Gregori & Palomar, 2011 use this result to provide matrixgeneralization of Costa’s entropy power inequality.

One can provide simple proofs of these results based on additivityrule for parallel combination + chain rule + low SNR expansion

20 / 37

Gaussian distribution plays an extremal role

Mutual information

IX(S) ≤ 1

2log det(I + S Cov(X))

Gradient of mutual information

∇IX(S) � 1

2

(Cov(X)−1 + S

)−1Hessian of mutual information

∇2IX(S) � −∇IX(S)⊗∇IX(S)

Gaussian distribution minimizes the relative curvatureof the mutual information!

21 / 37

Table of Contents




Conclusion

22 / 37

Table of Contents




Conclusion

23 / 37

Monotonicity of Effective Fisher Information

Effective Fisher information matrix KX : Sn+ → Sn+

KX(S) ,M−1X (S)− S.

This is inverse covariance of Gaussian with matched MMSE

Z ∼ N (0,K−1X (S)) =⇒ MX(S) = MZ(S)

LemmaEffective Fisher information KX(S) is increasing monotone with

KX(0) = Cov−1(X),

limλmin(S)→∞

KX(S) = J(X).

Furthermore, KX(S) is constant if and only if X is Gaussian.

24 / 37

Bounds on MMSE

TheoremFor all 0 � R � S � T , the MMSE matrix satisfies

(KX(T ) + S)−1 �MX(S) � (KX(R) + S)−1.

Equality holds if and only if X is Gaussian

I Generalization of single-crossing property [Bustin et al. 2013]

I Implies Bayesian Cramer-Rao matrix lower bound

λmin(T )→∞ =⇒ (J(X) + S)−1 �MX(S)

I Implies Linear MMSE matrix upper bound

R = 0 =⇒ MX(S) �(Cov−1(X) + S

)−1.

25 / 37

Bounds on Mutual Information

Mutual information is the trace of a matrix integral:

IX(S) =1

2tr

(S

∫ 1

0MX(tS) dt

)=

1

2tr

(S

∫ 1

0(KX(tS) + tS)−1 dt

)Thus monotonicity of KX(·) yields upper and lower bounds.

Can also obtain simple proof of multivariate Gaussian log-Sobolevinequality.

26 / 37

Table of Contents




Conclusion

27 / 37

Low-dimensional linear projections

X 7→ AX, A ∈ Rk×n, k � n

I Summary of high-dimensional vector: Projection pursuit instatistics, sketching, compressed sensing

I Central Limit theorem (CLT)

I High-Dimensional geometry: Dvoretzky’s Theorem.

I Conditional CLT for random projections [Reeves 2017]

28 / 37

X

n-dimensionalrandom vector

X∗

IID Gaussianwith matched power

AX

k-dimensional random projectionA is uniform on Stiefel manifold

AX∗


29 / 37

X +√tN

n-dimensionalrandom vector

X∗ +√tN


AX +√tN

k-dimensional random projectionA is uniform on Stiefel manifold

AX∗ +√tN


δEPI

δCCLT

D(PX+

√tN

∥∥∥PX∗+√tN

)= IX∗(sI)− IX(sI), where s = 1/t

= IX∗(sI)− n

kE[IX(sATA)

]︸︷︷︸

δCCLT

+n

kE[IX(sATA)

]− IX(sI)︸︷︷︸

δEPI

The terms δCCLT and δEPI are both non-negative and equal tozero if and only if X is i.i.d. Gaussian.

30 / 37

Table of Contents




Conclusion

31 / 37

Additivity for parallel combination

The parallel combination of linear Gaussian channels,

P (x)X

A

B

AWGN

AWGN

Y

Z

is the equivalent to a single channel

P (x) (ATA+BTB)1/2 AWGNX Y ′

32 / 37

Given a probability measure µ on [0,∞) we define

In(µ) ,1

nE[IX(UTΛU)

],

where U is uniform on the orthogonal group and Λii ∼ µ.

TheoremIf X has bounded second moments and An and Bn areindependent right-orthogonally invariant random matrices whoseempirical spectral distributions converge to compactly supportedprobability measures µ and ν, then∣∣∣∣ 1nIXn(AT

nAn + BTnBn)− In(µ� ν)

∣∣∣∣→ 0.

where µ� ν denotes the free additive convolution.

33 / 37

Table of Contents




Conclusion

34 / 37

Conclusion

I Functional properties of mutual information and MMSE asfunction of matrix SNR

I Gaussian distribution minimizes the relative curvatureI Some initial applications:

I Bounds on mutual information and MMSEI Gaussian approximation via low-dimensional projectionsI Additivity of information via free probability

I Future outlook: Analysis of high-dimensional inferenceproblems with random mixing

35 / 37

References I

D. Guo, S. Shamai, and S. Verdu, “Mutual information and minimummean-square error in Gaussian channels,” IEEE Trans. Inform. Theory, vol. 51,no. 4, pp. 1261–1282, Apr. 2005.

M. Payaro and D. Palomar, “Hessian and concavity of mutual information,differential entropy, and entropy power in linear vector Gaussian channels,” IEEETrans. Inform. Theory, vol. 55, no. 8, pp. 3613–3628, 2009.

O. Rioul, “Information theoretic proofs of entropy power inequalities,” IEEETrans. Inform. Theory, vol. 57, no. 1, pp. 33–55, Jan. 2011.

A. Dytso, R. Bustin, H. V. Poor, and S. Shamai, “Comment on the equalitycondition for the I-MMSE proof of entropy power inequality,” 2017, [Online].Available https://arxiv.org/pdf/1703.07442.pdf.

M. Raginsky and I. Sason, Concentration of Measure Inequalities in InformationTheory, Communications, and Coding, 2nd ed. Foundations and Trends inCommunications and Information Theory, 2014.

G. Reeves, “Additivity of information in multilayer networks via additive Gaussiannoise transforms,” in Proc. Annual Allerton Conf. on Commun., Control, andComp., Monticello, IL, 2017, [Online]. Availablehttps://arxiv.org/abs/1710.04580.

36 / 37

https://arxiv.org/pdf/1703.07442.pdf

https://arxiv.org/abs/1710.04580

References II

M. Lamarca, “Linear precoding for mutual information maximization in MIMOsystems,” in Proceedings of the International Conference on WirelessCommunication Systems, Tuscany, Italy, Sep. 2009.

M. Payaro, M. Gregori, and D. Palomar, “Yet another entropy power inequalitywith an application,” in Proceedings of the International Conference on WirelessCommunications and Signal Processing, Nanjing, China, Nov. 2011.

37 / 37

mutual information as a function of matrix snr for linear...

Documents