Information Geometry and Its ApplicationsShun‐ichi Amari RIKEN Brain Science Institute
1.Divergence Function and Dually Flat Riemannian Structure2.Invariant Geometry on Manifold of Probability Distributions3.Geometry and Statistical Inference
semi‐parametrics4. Applications to Machine Learning and Signal Processing
Information Geometry
-- Manifolds of Probability Distributions
{ ( )}M p x
Information GeometryInformation Geometry
Systems Theory Information Theory
Statistics Neural Networks
Combinatorics PhysicsInformation Sciences
Riemannian ManifoldDual Affine Connections
Manifold of Probability Distributions
Math. AIVision
Optimization
2
2
1; , ; , exp22
xS p x p x
Information Geometry ?Information Geometry ?
p x
;S p x θ
Gaussian distributions
( , ) θ
Manifold of Probability DistributionsManifold of Probability Distributions
1 2 3 1 2 3
1, 2,3 ={ ( )}, , 1
nx S p xp p p p p p
3p
2p1p
p
;M p x
Manifold and Coordinate System
coordinate transformation
Examples of Coordinate systems
Euclidean space
Gaussian distributions
2
2
1; , ; , exp22
xS p x p x
Discrete Distributions
Positive measures
Divergence: :D z y
: 0
: 0, iff
: ij i j
D
D
D d g dz dz
z y
z y z y
z z z
positive‐definite
Y
Z
M
Not necessarily symmetricD[z : y] = D[y : z]
Taylor expansion
Various Divergences
Euclidean
f‐divergence
KL‐divergence
(α‐β)‐divergence
Kullback‐Leibler Divergencequasi‐distance
( )[ ( ) : ( )] ( ) log( )
[ ( ) : ( )] 0 =0 iff ( ) ( )[ : ] [ : ]
x
p xD p x q x p xq x
D p x q x p x q xD p q D q p
( , ) divergence
, [ : ] { }i i i iD p q p q p q
: divergence1: -divergence
Manifold with Convex Function
S : coordinates 1 2, , , n
: convex function
negative entropy logp p x p x dx energy
212
i
mathematical programming, control systemsphysics, engineering, vision, economics
Riemannian metric and flatness (affine structure)
Bregman divergence ', grad D
1,2
i jijD d g d d
, ij i j i ig
: geodesic (not Levi-Civita)Flatness (affine)
{ , ( ), }S
Legendre Transformation
, i i i i
one-to-one
0ii
,i i i
i
,D
( ) max { ( )}ii
, ' ' grad '
' ' ' ' 0
' '
,
' '
ii
ii
D
D
Proof
Two affine coordinate systems ,
: geodesic (e-geodesic)
: dual geodesic (m-geodesic)
“dually orthogonal”,
,
j ji i
ii i
i
*, , ,X XX Y Z Y Z Y Z
Bi‐orthogonality
Dually flat manifold
2 2
exponen
-coordin
tial fam
ates -coordinatespotential functions ,
0
, exp
: cumulant generating function: negative entropy
canon
i
ical d v
:
i
ly
ijij
i j i j
i i
i i
g g
p x x
ergence D(P: P')= ' 'i i
Exponential Family
( , ) exp{ ( )}p x x
Gaussian:
Negative entropy
natural parameterexpectation parameter
( ) : convex function, free-energy
x : discrete X = {0, 1, …, n}
0 1
0 0
{ ( ) | }:
( ) ( ) exp[ ( )] exp
log( / ); ( ); ( ) log
[ ] ( )= log
n
n ni
i i ii i
ii i i
i i i i i
S p x x X
p x p x x
p p x x p
E x p p p
exponential family
η
x
Two geodesics
Tangent directions
Function space of probability distributions: topology{p(x)}
Exponential Family
Pythagorean Theorem (dually flat manifold)
: : :D P Q D Q R D P R
Euclidean space: self-dual
212 i
Proof
: : :
[ : ]
)(
(
( )
) ( ) 0
P Q P Q
P Q Q R
P Q Q R
D P Q D Q R D P R
D P Q
Projection Theorem
p
sq
arg min [ : ]s Mq D p s
arg min [ : ]s Mq D s p
m-geodesic
e-geodesic
M
S
Projection Theorem
min :Q M
D P Q
Q = m-geodesic projection of P to Munique when M is e-flat min :
Q MD Q P
Q’ = e-geodesic projection of P to Munique when M is m-flat
Convex function – Bregman divergence– Dually flat Riemannian divergence
Dually flat R‐manifold – convex function – canonical divergenceKL‐divergence
Exponential family – Bregman divergenceBanerjee et al
InvarianceInvariance ,S p x
Invariant under different representation
, ,y y x p y 2
1 2
21 2
, ,
| ( , ) ( , ) |
p x p x dx
p y p y dy
Invariant divergence (manifold of probability distributions; )
: sufficient statisticsy k x
: :X X Y YD p x q x D p y q y
{ ( , )}S p x
ChentsovAmari ‐Nagaoka
Invariance ‐‐‐ characterization of f‐divergence
:ip
:p
1 n
1 2 m
ii A
p p
( )Ap p
Csiszar
: :
: :
A A
A A
D D
D D
p q p q
p q p q
; i ip c q i A
:p
:q
Invariance ⇒ f‐divergence
Csiszar f‐divergence
: ,if i
i
qD p fp
p q
: convex, 1 0,f u f
: :cf fD cDp q p q
1f u f u c u
' ''1 1 0 ; 1 1f f f 1
( )f u
u
Ali‐SilveyMorimoto
Theorem
An invariant separable divergence belongs to the class of f‐divergence.
Separable divergence: [ : ] ( , )
( , ) ( )
i i
ii i i
i
D k p q
qk p q p fp
p q
dually flat space
convex functionsBregman
divergence
invariance
invariant divergence Flat divergence
KL‐divergenceF‐divergenceFisher inf metricAlpha connection
: space of probability distributions}{pS
logp(x)D[p : q] = p(x) { }dxq(x)
(n > 1)
‐Divergence: why? flat & invariant in
12
2
4 2( ) {1 } (1 ), 11 1
f u u u
KL-divergence( ) log ( 1)
[ : ] { log }ii i i
i
f u u u upD p q p p qq
1nS
, 0 : ( 1 holds)i iS p p nn p
Space of positive measures : vectors, matrices, arrays
f‐divergence
α‐divergence
Bregman divergence
: 0if i
i
qD p fp
p q
: 0fD p q p q
not invariant under 1f u f u c u
divergence of f S
divergence
1 12 21 1[ : ] { }
2 2i i i iD p q p q p q
[ : ] { log }ii i i
i
pD p q p p qq
KL‐divergence
: dually flat: not dually flat (except 1)
SS
21
1
1
i
i
p
r
Metric and Connections Induced by Divergence(Eguchi)
'
' '
1: : : = (z - y )(z - y ) 2
:
:
ij i j ij i i j j
ijk i j k
ijk i j k
g D D g
D
D
y z
y z
y z
z z y z y z
z z y
z z y
*
'
{ , }
, i ii iz y
Riemannian metric
affine connections
Invariant geometrical structurealpha‐geometry(derived from invariant divergence)
,S p x
ij i j
ijk i j k
g E l l
T E l l l
log , ; i il p x
‐connection
, ;ijk ijki j k T
: dually coupled
, , ,X XX Y Z Y Z Y Z
α
Fisher information
Levi‐civita:
Duality:
, ,
k ij kij kji
ijk ijk ijk
g
T
M g T
*, , ,X XX Y Z Y Z Y Z
Riemannian Structure
2 ( )
( )
( ) ( )
Euclidean
i jij
T
ij
ds g d d
d G d
G g
G E
Fisher information
AffineConnection
covariant derivative,
0, X=X(t)
(
-
)
X c
X
i jij
Y X Y
X
s g d d
minimal dista
ge
nce non
odesi
me
c
tric
straight line
DualityDuality
, , , i jijX Y X Y X Y g X Y
Riemannian geometry:
X
Y
X
Y
**, , ,X XX Y Z Y Z Y Z
Dual Affine Connections
e‐geodesic
m‐geodesic
log , log 1 logr x t t p x t q x c t
, 1r x t tp x t q x
,
q x
p x
*( , )
Mathematical structure of ,S p x
ij i j
ijk i j k
g E l l
T E l l l
log , ; i il p x
-connection
, ;ijk ijki j k T
: dually coupled
, , ,X XX Y Z Y Z Y Z
{M,g,T}
α‐geometry
Dual Foliations
k‐cut
00110001011010100100110100
0101101001010
firing rates:correlation—covariance?
1x2x
3x
00 01 10 11{ , , , }p p p p
1 2 12, ;r r r
Two neurons:
Correlations of Neural FiringCorrelations of Neural Firing
1 2
00 10 01 11
1 1 10 11
2 1 01 11
,
, , ,
p x x
p p p pr p p pr p p p
11 00
10 01
log p pp p
1x 2x 2
1
1 2{( , ), }r r
orthogonal coordinates
firing ratescorrelations
1 2{ ( , )}S p x x1 2, 0,1x x
1 2{ ( ) ( )}M q x q x
Independent Distributions
two neuron case
1 2 12 1 2 12
12 12 1 200 1112
01 10 1 12 2 12
12 1 2
12 1 2
, , ; , ,1
log log
, ,
, ,
r r rr r r rp p
p p r r r r
r f r r
r t f r t r t
Decomposition of KL-divergence
D[p:r] = D[p:q]+D[q:r]
p,q: same marginals
r,q: same correlations
1 2,
p
qr
independent
correlations
( )[ : ] ( ) log( )x
p xD p r p xq x
pairwise correlations
ij ij i jc r r r
independent distributions
, ,ij i j ijk i j kr r r r rr r
How to generate correlated spikes?(Niebur, Neural Computation [2007])
higher-order correlations
covariance: not orthogonal
Orthogonal higher‐order correlations
1
1
;
;
,
,
,
,
i i n
i n
j
i j rr r
r
Neurons
1x nx
1i ix u
Gaussian [ ]i i ju E u u
2x
Population and Synfire
Synfiring
1( ) ( ,..., )
1n
i
p p x x
r x q rn
x
( )q r
r
Input‐output AnalysisGross product consumptionRelations among industires(K. Tsuda and R. Morioka)
Mathematical Problems
M submanifold of S ?Hong van Le
{M, g} {M, g, T} dually flat J. Armstrong
Affine differential geometryHessian manifoldAlmost complex structure