mutual information as a function of matrix snr for linear...

37
Mutual Information as a Function of Matrix SNR for Linear Gaussian Channels Galen Reeves (Duke) Henry Pfister (Duke) Alex Dytso (Princeton) ISIT, June 2018

Upload: others

Post on 20-Jan-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Mutual Information as a Function ofMatrix SNR for Linear Gaussian Channels

Galen Reeves (Duke)Henry Pfister (Duke)

Alex Dytso (Princeton)

ISIT, June 2018

Page 2: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Table of Contents

Additive Gaussian Noise Channels

Matrix SNR Functions

ApplicationsBounds on mutual information and MMSEGaussian approximation via low-dimensional projectionsAdditivity of information via free probability

Conclusion

2 / 37

Page 3: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Interpolation with a Gaussian

P (x) ∗ φt(x) ⇐⇒ X +√tN

I Heat flow, Gaussian channel, Ornstein–Uhlenbeck channel

I Functional properties of entropy and Fisher information:De Bruijn’s identity, I-MMSE relation [Guo et al. 2005].

I Simple proofs:I Entropy power inequality [Rioul 2011],I Gaussian logarithmic Sobolev inequality and Gaussian

hypercontractivity [Raginsky & Sason 2014]

I Connections with replica method and free probability theoryI G. Reeves Additivity of Information in Multilayer Networks via

Additive Gaussian Noise Transforms, Reeves Allerton 2017.

3 / 37

Page 4: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Gaussian Channel

Y =√sX + N

I Random vector X = (X1, . . . , Xn)

I Additive Gaussian noise N ∼ N (0, I)

I Scalar SNR parameter s

4 / 37

Page 5: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Linear Gaussian Channel

Y = AX + N

P (x) A AWGNX Y

I Random vector X = (X1, . . . , Xn)

I Additive Gaussian noise N ∼ N (0, I)

I k × n channel matrix A

5 / 37

Page 6: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Linear Gaussian Channel

Y A X N

= +

Y A X N

= +

short fat tall skinny

6 / 37

Page 7: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Table of Contents

Additive Gaussian Noise Channels

Matrix SNR Functions

ApplicationsBounds on mutual information and MMSEGaussian approximation via low-dimensional projectionsAdditivity of information via free probability

Conclusion

7 / 37

Page 8: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Parallel combination of channels

Given scalars a and b, define

Y = aX + N , Z = bX + N ′,

where N ,N ′ are independent standard Gaussian vectors.

Is the following statement true?

I(X;Y ,Z) = I(X;(a2 + b2

)1/2X + N ′′

)

YES!

8 / 37

Page 9: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Parallel combination of channels

Given matrices A and B, define

Y = AX + N , Z = BX + N ′,

where N ,N ′ are independent standard Gaussian vectors.

Is the following statement true?

I(X;Y ,Z) = I(X;(ATA+BTB

)1/2X + N ′′

)

YES!

9 / 37

Page 10: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Basic fact I

A Linear Gaussian channel with matrix A,

P (x) A AWGNX Y

is statistically equivalent to channel with square matrix (ATA)1/2

P (x) (ATA)1/2 AWGNX Y ′

10 / 37

Page 11: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Proof of basic fact I (short fat case)

Consider singular value decomposition A = UDV T .

Y A X N

= +

Y ′ (ATA)1/2 X N ′

= +

11 / 37

Page 12: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Proof of basic fact I (short fat case)

Consider singular value decomposition A = UDV T .

Y U D V T X N

= +

Y ′ V (DTD)1/2 V T X N ′

= +

12 / 37

Page 13: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Proof of basic fact I (short fat case)

Apply orthogonal transformations.

Y U D V T X N

= +

Y ′ V (DTD)1/2 V T X N ′

= +

13 / 37

Page 14: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Proof of basic fact I (short fat case)

Apply orthogonal transformations.

UT Y D V T X UT N

= +

V T Y ′ (DTD)1/2 V T X V T N ′

= +

14 / 37

Page 15: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Proof of basic fact I (Skinny case)

Note the Gaussian noise is orthogonally invariant.

UT Y D V T X UT N

= +

V T Y ′ (DTD)1/2 V T X V T N ′

= +

15 / 37

Page 16: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Proof of basic fact I (Skinny case)

Note the Gaussian noise is orthogonally invariant.

UT Y D V T X N

= +

V T Y ′ (DTD)1/2 V T X N ′

= +

16 / 37

Page 17: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Basic fact II

The parallel combination of linear Gaussian channels,

P (x)X

A

B

AWGN

AWGN

Y

Z

is the equivalent to a single channel

P (x) (ATA+BTB)1/2 AWGNX Y ′

17 / 37

Page 18: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Proof of basic fact II

Consider concatenation of vector channels:[YZ

]=

[AB

]X +

[NN ′

]By basic fact I, this channel is equivalent to one with matrix([

AB

]T[AB

])1/2

=(ATA+BTB

)1/2

18 / 37

Page 19: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Matix SNR functions

For any matrix A, mutual information satisfies

I(X;AX + N) = I(X; (ATA)1/2X + N ′)

Key idea: Parameterize using positive semidefinite matrix SNR

Mutual information function IX : S+ → R+

IX(S) , I(X;S1/2X + N)

MMSE function MX : S+ → S+

MX(S) , E[Cov(X | S1/2X + N)

]

19 / 37

Page 20: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Matix SNR functions

Lamarca, 2009 showed that IX(S) is twice differentiable with:

∇SIX(S) =1

2MX(S)

∇2SIX(S) = −1

2EY [Cov(X | Y )⊗ Cov(X | Y )]

where Y = S12X + N .

Payaro, Gregori & Palomar, 2011 use this result to provide matrixgeneralization of Costa’s entropy power inequality.

One can provide simple proofs of these results based on additivityrule for parallel combination + chain rule + low SNR expansion

20 / 37

Page 21: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Gaussian distribution plays an extremal role

Mutual information

IX(S) ≤ 1

2log det(I + S Cov(X))

Gradient of mutual information

∇IX(S) � 1

2

(Cov(X)−1 + S

)−1Hessian of mutual information

∇2IX(S) � −∇IX(S)⊗∇IX(S)

Gaussian distribution minimizes the relative curvatureof the mutual information!

21 / 37

Page 22: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Table of Contents

Additive Gaussian Noise Channels

Matrix SNR Functions

ApplicationsBounds on mutual information and MMSEGaussian approximation via low-dimensional projectionsAdditivity of information via free probability

Conclusion

22 / 37

Page 23: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Table of Contents

Additive Gaussian Noise Channels

Matrix SNR Functions

ApplicationsBounds on mutual information and MMSEGaussian approximation via low-dimensional projectionsAdditivity of information via free probability

Conclusion

23 / 37

Page 24: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Monotonicity of Effective Fisher Information

Effective Fisher information matrix KX : Sn+ → Sn+

KX(S) ,M−1X (S)− S.

This is inverse covariance of Gaussian with matched MMSE

Z ∼ N (0,K−1X (S)) =⇒ MX(S) = MZ(S)

LemmaEffective Fisher information KX(S) is increasing monotone with

KX(0) = Cov−1(X),

limλmin(S)→∞

KX(S) = J(X).

Furthermore, KX(S) is constant if and only if X is Gaussian.

24 / 37

Page 25: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Bounds on MMSE

TheoremFor all 0 � R � S � T , the MMSE matrix satisfies

(KX(T ) + S)−1 �MX(S) � (KX(R) + S)−1.

Equality holds if and only if X is Gaussian

I Generalization of single-crossing property [Bustin et al. 2013]

I Implies Bayesian Cramer-Rao matrix lower bound

λmin(T )→∞ =⇒ (J(X) + S)−1 �MX(S)

I Implies Linear MMSE matrix upper bound

R = 0 =⇒ MX(S) �(Cov−1(X) + S

)−1.

25 / 37

Page 26: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Bounds on Mutual Information

Mutual information is the trace of a matrix integral:

IX(S) =1

2tr

(S

∫ 1

0MX(tS) dt

)=

1

2tr

(S

∫ 1

0(KX(tS) + tS)−1 dt

)Thus monotonicity of KX(·) yields upper and lower bounds.

Can also obtain simple proof of multivariate Gaussian log-Sobolevinequality.

26 / 37

Page 27: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Table of Contents

Additive Gaussian Noise Channels

Matrix SNR Functions

ApplicationsBounds on mutual information and MMSEGaussian approximation via low-dimensional projectionsAdditivity of information via free probability

Conclusion

27 / 37

Page 28: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Low-dimensional linear projections

X 7→ AX, A ∈ Rk×n, k � n

I Summary of high-dimensional vector: Projection pursuit instatistics, sketching, compressed sensing

I Central Limit theorem (CLT)

I High-Dimensional geometry: Dvoretzky’s Theorem.

I Conditional CLT for random projections [Reeves 2017]

28 / 37

Page 29: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

X

n-dimensionalrandom vector

X∗

IID Gaussianwith matched power

AX

k-dimensional random projectionA is uniform on Stiefel manifold

AX∗

IID Gaussianwith matched power

29 / 37

Page 30: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

X +√tN

n-dimensionalrandom vector

X∗ +√tN

IID Gaussianwith matched power

AX +√tN

k-dimensional random projectionA is uniform on Stiefel manifold

AX∗ +√tN

IID Gaussianwith matched power

δEPI

δCCLT

D(PX+

√tN

∥∥∥PX∗+√tN

)= IX∗(sI)− IX(sI), where s = 1/t

= IX∗(sI)− n

kE[IX(sATA)

]︸ ︷︷ ︸

δCCLT

+n

kE[IX(sATA)

]− IX(sI)︸ ︷︷ ︸

δEPI

The terms δCCLT and δEPI are both non-negative and equal tozero if and only if X is i.i.d. Gaussian.

30 / 37

Page 31: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Table of Contents

Additive Gaussian Noise Channels

Matrix SNR Functions

ApplicationsBounds on mutual information and MMSEGaussian approximation via low-dimensional projectionsAdditivity of information via free probability

Conclusion

31 / 37

Page 32: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Additivity for parallel combination

The parallel combination of linear Gaussian channels,

P (x)X

A

B

AWGN

AWGN

Y

Z

is the equivalent to a single channel

P (x) (ATA+BTB)1/2 AWGNX Y ′

32 / 37

Page 33: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Given a probability measure µ on [0,∞) we define

In(µ) ,1

nE[IX(UTΛU)

],

where U is uniform on the orthogonal group and Λii ∼ µ.

TheoremIf X has bounded second moments and An and Bn areindependent right-orthogonally invariant random matrices whoseempirical spectral distributions converge to compactly supportedprobability measures µ and ν, then∣∣∣∣ 1nIXn(AT

nAn + BTnBn)− In(µ� ν)

∣∣∣∣→ 0.

where µ� ν denotes the free additive convolution.

33 / 37

Page 34: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Table of Contents

Additive Gaussian Noise Channels

Matrix SNR Functions

ApplicationsBounds on mutual information and MMSEGaussian approximation via low-dimensional projectionsAdditivity of information via free probability

Conclusion

34 / 37

Page 35: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

Conclusion

I Functional properties of mutual information and MMSE asfunction of matrix SNR

I Gaussian distribution minimizes the relative curvatureI Some initial applications:

I Bounds on mutual information and MMSEI Gaussian approximation via low-dimensional projectionsI Additivity of information via free probability

I Future outlook: Analysis of high-dimensional inferenceproblems with random mixing

35 / 37

Page 36: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

References I

D. Guo, S. Shamai, and S. Verdu, “Mutual information and minimummean-square error in Gaussian channels,” IEEE Trans. Inform. Theory, vol. 51,no. 4, pp. 1261–1282, Apr. 2005.

M. Payaro and D. Palomar, “Hessian and concavity of mutual information,differential entropy, and entropy power in linear vector Gaussian channels,” IEEETrans. Inform. Theory, vol. 55, no. 8, pp. 3613–3628, 2009.

O. Rioul, “Information theoretic proofs of entropy power inequalities,” IEEETrans. Inform. Theory, vol. 57, no. 1, pp. 33–55, Jan. 2011.

A. Dytso, R. Bustin, H. V. Poor, and S. Shamai, “Comment on the equalitycondition for the I-MMSE proof of entropy power inequality,” 2017, [Online].Available https://arxiv.org/pdf/1703.07442.pdf.

M. Raginsky and I. Sason, Concentration of Measure Inequalities in InformationTheory, Communications, and Coding, 2nd ed. Foundations and Trends inCommunications and Information Theory, 2014.

G. Reeves, “Additivity of information in multilayer networks via additive Gaussiannoise transforms,” in Proc. Annual Allerton Conf. on Commun., Control, andComp., Monticello, IL, 2017, [Online]. Availablehttps://arxiv.org/abs/1710.04580.

36 / 37

Page 37: Mutual Information as a Function of Matrix SNR for Linear ...reeves.ee.duke.edu/papers/matrix_snr_slides_isit2018_v2.pdf · Interpolation with a Gaussian P(x) ˚ t(x) X + p tN IHeat

References II

M. Lamarca, “Linear precoding for mutual information maximization in MIMOsystems,” in Proceedings of the International Conference on WirelessCommunication Systems, Tuscany, Italy, Sep. 2009.

M. Payaro, M. Gregori, and D. Palomar, “Yet another entropy power inequalitywith an application,” in Proceedings of the International Conference on WirelessCommunications and Signal Processing, Nanjing, China, Nov. 2011.

37 / 37