mutual information as a function of matrix snr for linear...
Post on 20-Jan-2021
6 Views
Preview:
TRANSCRIPT
Mutual Information as a Function ofMatrix SNR for Linear Gaussian Channels
Galen Reeves (Duke)Henry Pfister (Duke)
Alex Dytso (Princeton)
ISIT, June 2018
Table of Contents
Additive Gaussian Noise Channels
Matrix SNR Functions
ApplicationsBounds on mutual information and MMSEGaussian approximation via low-dimensional projectionsAdditivity of information via free probability
Conclusion
2 / 37
Interpolation with a Gaussian
P (x) ∗ φt(x) ⇐⇒ X +√tN
I Heat flow, Gaussian channel, Ornstein–Uhlenbeck channel
I Functional properties of entropy and Fisher information:De Bruijn’s identity, I-MMSE relation [Guo et al. 2005].
I Simple proofs:I Entropy power inequality [Rioul 2011],I Gaussian logarithmic Sobolev inequality and Gaussian
hypercontractivity [Raginsky & Sason 2014]
I Connections with replica method and free probability theoryI G. Reeves Additivity of Information in Multilayer Networks via
Additive Gaussian Noise Transforms, Reeves Allerton 2017.
3 / 37
Gaussian Channel
Y =√sX + N
I Random vector X = (X1, . . . , Xn)
I Additive Gaussian noise N ∼ N (0, I)
I Scalar SNR parameter s
4 / 37
Linear Gaussian Channel
Y = AX + N
P (x) A AWGNX Y
I Random vector X = (X1, . . . , Xn)
I Additive Gaussian noise N ∼ N (0, I)
I k × n channel matrix A
5 / 37
Linear Gaussian Channel
Y A X N
= +
Y A X N
= +
short fat tall skinny
6 / 37
Table of Contents
Additive Gaussian Noise Channels
Matrix SNR Functions
ApplicationsBounds on mutual information and MMSEGaussian approximation via low-dimensional projectionsAdditivity of information via free probability
Conclusion
7 / 37
Parallel combination of channels
Given scalars a and b, define
Y = aX + N , Z = bX + N ′,
where N ,N ′ are independent standard Gaussian vectors.
Is the following statement true?
I(X;Y ,Z) = I(X;(a2 + b2
)1/2X + N ′′
)
YES!
8 / 37
Parallel combination of channels
Given matrices A and B, define
Y = AX + N , Z = BX + N ′,
where N ,N ′ are independent standard Gaussian vectors.
Is the following statement true?
I(X;Y ,Z) = I(X;(ATA+BTB
)1/2X + N ′′
)
YES!
9 / 37
Basic fact I
A Linear Gaussian channel with matrix A,
P (x) A AWGNX Y
is statistically equivalent to channel with square matrix (ATA)1/2
P (x) (ATA)1/2 AWGNX Y ′
10 / 37
Proof of basic fact I (short fat case)
Consider singular value decomposition A = UDV T .
Y A X N
= +
Y ′ (ATA)1/2 X N ′
= +
11 / 37
Proof of basic fact I (short fat case)
Consider singular value decomposition A = UDV T .
Y U D V T X N
= +
Y ′ V (DTD)1/2 V T X N ′
= +
12 / 37
Proof of basic fact I (short fat case)
Apply orthogonal transformations.
Y U D V T X N
= +
Y ′ V (DTD)1/2 V T X N ′
= +
13 / 37
Proof of basic fact I (short fat case)
Apply orthogonal transformations.
UT Y D V T X UT N
= +
V T Y ′ (DTD)1/2 V T X V T N ′
= +
14 / 37
Proof of basic fact I (Skinny case)
Note the Gaussian noise is orthogonally invariant.
UT Y D V T X UT N
= +
V T Y ′ (DTD)1/2 V T X V T N ′
= +
15 / 37
Proof of basic fact I (Skinny case)
Note the Gaussian noise is orthogonally invariant.
UT Y D V T X N
= +
V T Y ′ (DTD)1/2 V T X N ′
= +
16 / 37
Basic fact II
The parallel combination of linear Gaussian channels,
P (x)X
A
B
AWGN
AWGN
Y
Z
is the equivalent to a single channel
P (x) (ATA+BTB)1/2 AWGNX Y ′
17 / 37
Proof of basic fact II
Consider concatenation of vector channels:[YZ
]=
[AB
]X +
[NN ′
]By basic fact I, this channel is equivalent to one with matrix([
AB
]T[AB
])1/2
=(ATA+BTB
)1/2
18 / 37
Matix SNR functions
For any matrix A, mutual information satisfies
I(X;AX + N) = I(X; (ATA)1/2X + N ′)
Key idea: Parameterize using positive semidefinite matrix SNR
Mutual information function IX : S+ → R+
IX(S) , I(X;S1/2X + N)
MMSE function MX : S+ → S+
MX(S) , E[Cov(X | S1/2X + N)
]
19 / 37
Matix SNR functions
Lamarca, 2009 showed that IX(S) is twice differentiable with:
∇SIX(S) =1
2MX(S)
∇2SIX(S) = −1
2EY [Cov(X | Y )⊗ Cov(X | Y )]
where Y = S12X + N .
Payaro, Gregori & Palomar, 2011 use this result to provide matrixgeneralization of Costa’s entropy power inequality.
One can provide simple proofs of these results based on additivityrule for parallel combination + chain rule + low SNR expansion
20 / 37
Gaussian distribution plays an extremal role
Mutual information
IX(S) ≤ 1
2log det(I + S Cov(X))
Gradient of mutual information
∇IX(S) � 1
2
(Cov(X)−1 + S
)−1Hessian of mutual information
∇2IX(S) � −∇IX(S)⊗∇IX(S)
Gaussian distribution minimizes the relative curvatureof the mutual information!
21 / 37
Table of Contents
Additive Gaussian Noise Channels
Matrix SNR Functions
ApplicationsBounds on mutual information and MMSEGaussian approximation via low-dimensional projectionsAdditivity of information via free probability
Conclusion
22 / 37
Table of Contents
Additive Gaussian Noise Channels
Matrix SNR Functions
ApplicationsBounds on mutual information and MMSEGaussian approximation via low-dimensional projectionsAdditivity of information via free probability
Conclusion
23 / 37
Monotonicity of Effective Fisher Information
Effective Fisher information matrix KX : Sn+ → Sn+
KX(S) ,M−1X (S)− S.
This is inverse covariance of Gaussian with matched MMSE
Z ∼ N (0,K−1X (S)) =⇒ MX(S) = MZ(S)
LemmaEffective Fisher information KX(S) is increasing monotone with
KX(0) = Cov−1(X),
limλmin(S)→∞
KX(S) = J(X).
Furthermore, KX(S) is constant if and only if X is Gaussian.
24 / 37
Bounds on MMSE
TheoremFor all 0 � R � S � T , the MMSE matrix satisfies
(KX(T ) + S)−1 �MX(S) � (KX(R) + S)−1.
Equality holds if and only if X is Gaussian
I Generalization of single-crossing property [Bustin et al. 2013]
I Implies Bayesian Cramer-Rao matrix lower bound
λmin(T )→∞ =⇒ (J(X) + S)−1 �MX(S)
I Implies Linear MMSE matrix upper bound
R = 0 =⇒ MX(S) �(Cov−1(X) + S
)−1.
25 / 37
Bounds on Mutual Information
Mutual information is the trace of a matrix integral:
IX(S) =1
2tr
(S
∫ 1
0MX(tS) dt
)=
1
2tr
(S
∫ 1
0(KX(tS) + tS)−1 dt
)Thus monotonicity of KX(·) yields upper and lower bounds.
Can also obtain simple proof of multivariate Gaussian log-Sobolevinequality.
26 / 37
Table of Contents
Additive Gaussian Noise Channels
Matrix SNR Functions
ApplicationsBounds on mutual information and MMSEGaussian approximation via low-dimensional projectionsAdditivity of information via free probability
Conclusion
27 / 37
Low-dimensional linear projections
X 7→ AX, A ∈ Rk×n, k � n
I Summary of high-dimensional vector: Projection pursuit instatistics, sketching, compressed sensing
I Central Limit theorem (CLT)
I High-Dimensional geometry: Dvoretzky’s Theorem.
I Conditional CLT for random projections [Reeves 2017]
28 / 37
X
n-dimensionalrandom vector
X∗
IID Gaussianwith matched power
AX
k-dimensional random projectionA is uniform on Stiefel manifold
AX∗
IID Gaussianwith matched power
29 / 37
X +√tN
n-dimensionalrandom vector
X∗ +√tN
IID Gaussianwith matched power
AX +√tN
k-dimensional random projectionA is uniform on Stiefel manifold
AX∗ +√tN
IID Gaussianwith matched power
δEPI
δCCLT
D(PX+
√tN
∥∥∥PX∗+√tN
)= IX∗(sI)− IX(sI), where s = 1/t
= IX∗(sI)− n
kE[IX(sATA)
]︸ ︷︷ ︸
δCCLT
+n
kE[IX(sATA)
]− IX(sI)︸ ︷︷ ︸
δEPI
The terms δCCLT and δEPI are both non-negative and equal tozero if and only if X is i.i.d. Gaussian.
30 / 37
Table of Contents
Additive Gaussian Noise Channels
Matrix SNR Functions
ApplicationsBounds on mutual information and MMSEGaussian approximation via low-dimensional projectionsAdditivity of information via free probability
Conclusion
31 / 37
Additivity for parallel combination
The parallel combination of linear Gaussian channels,
P (x)X
A
B
AWGN
AWGN
Y
Z
is the equivalent to a single channel
P (x) (ATA+BTB)1/2 AWGNX Y ′
32 / 37
Given a probability measure µ on [0,∞) we define
In(µ) ,1
nE[IX(UTΛU)
],
where U is uniform on the orthogonal group and Λii ∼ µ.
TheoremIf X has bounded second moments and An and Bn areindependent right-orthogonally invariant random matrices whoseempirical spectral distributions converge to compactly supportedprobability measures µ and ν, then∣∣∣∣ 1nIXn(AT
nAn + BTnBn)− In(µ� ν)
∣∣∣∣→ 0.
where µ� ν denotes the free additive convolution.
33 / 37
Table of Contents
Additive Gaussian Noise Channels
Matrix SNR Functions
ApplicationsBounds on mutual information and MMSEGaussian approximation via low-dimensional projectionsAdditivity of information via free probability
Conclusion
34 / 37
Conclusion
I Functional properties of mutual information and MMSE asfunction of matrix SNR
I Gaussian distribution minimizes the relative curvatureI Some initial applications:
I Bounds on mutual information and MMSEI Gaussian approximation via low-dimensional projectionsI Additivity of information via free probability
I Future outlook: Analysis of high-dimensional inferenceproblems with random mixing
35 / 37
References I
D. Guo, S. Shamai, and S. Verdu, “Mutual information and minimummean-square error in Gaussian channels,” IEEE Trans. Inform. Theory, vol. 51,no. 4, pp. 1261–1282, Apr. 2005.
M. Payaro and D. Palomar, “Hessian and concavity of mutual information,differential entropy, and entropy power in linear vector Gaussian channels,” IEEETrans. Inform. Theory, vol. 55, no. 8, pp. 3613–3628, 2009.
O. Rioul, “Information theoretic proofs of entropy power inequalities,” IEEETrans. Inform. Theory, vol. 57, no. 1, pp. 33–55, Jan. 2011.
A. Dytso, R. Bustin, H. V. Poor, and S. Shamai, “Comment on the equalitycondition for the I-MMSE proof of entropy power inequality,” 2017, [Online].Available https://arxiv.org/pdf/1703.07442.pdf.
M. Raginsky and I. Sason, Concentration of Measure Inequalities in InformationTheory, Communications, and Coding, 2nd ed. Foundations and Trends inCommunications and Information Theory, 2014.
G. Reeves, “Additivity of information in multilayer networks via additive Gaussiannoise transforms,” in Proc. Annual Allerton Conf. on Commun., Control, andComp., Monticello, IL, 2017, [Online]. Availablehttps://arxiv.org/abs/1710.04580.
36 / 37
References II
M. Lamarca, “Linear precoding for mutual information maximization in MIMOsystems,” in Proceedings of the International Conference on WirelessCommunication Systems, Tuscany, Italy, Sep. 2009.
M. Payaro, M. Gregori, and D. Palomar, “Yet another entropy power inequalitywith an application,” in Proceedings of the International Conference on WirelessCommunications and Signal Processing, Nanjing, China, Nov. 2011.
37 / 37
top related