information geometry and its applicationsimage.diku.dk/mllab/summerschools/slidesamariig1.pdf ·...

Post on 22-Nov-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Information Geometry and Its ApplicationsShun‐ichi Amari    RIKEN Brain Science Institute

1.Divergence Function and Dually Flat Riemannian Structure2.Invariant Geometry on Manifold of Probability Distributions3.Geometry and Statistical Inference

semi‐parametrics4.   Applications to Machine Learning and Signal Processing

Information Geometry

-- Manifolds of Probability Distributions

{ ( )}M p x

Information GeometryInformation Geometry

Systems Theory Information Theory

Statistics Neural Networks

Combinatorics PhysicsInformation Sciences

Riemannian ManifoldDual Affine Connections

Manifold of Probability Distributions

Math. AIVision

Optimization

2

2

1; , ; , exp22

xS p x p x

Information Geometry ?Information Geometry ?

p x

;S p x θ

Gaussian distributions

( , ) θ

Manifold of Probability DistributionsManifold of Probability Distributions

1 2 3 1 2 3

1, 2,3 ={ ( )}, , 1

nx S p xp p p p p p

3p

2p1p

p

;M p x

Manifold and Coordinate System

coordinate transformation

Examples of Coordinate systems

Euclidean space

Gaussian distributions

2

2

1; , ; , exp22

xS p x p x

Discrete Distributions

Positive measures

Divergence: :D z y

: 0

: 0, iff

: ij i j

D

D

D d g dz dz

z y

z y z y

z z z

positive‐definite

Y

Z

M

Not necessarily symmetricD[z : y] = D[y : z]

Taylor expansion

Various Divergences

Euclidean

f‐divergence

KL‐divergence

(α‐β)‐divergence

Kullback‐Leibler Divergencequasi‐distance

( )[ ( ) : ( )] ( ) log( )

[ ( ) : ( )] 0 =0 iff ( ) ( )[ : ] [ : ]

x

p xD p x q x p xq x

D p x q x p x q xD p q D q p

( , ) divergence

, [ : ] { }i i i iD p q p q p q

: divergence1: -divergence

Manifold with Convex Function

S : coordinates 1 2, , , n

: convex function

negative entropy logp p x p x dx energy

212

i

mathematical programming, control systemsphysics, engineering, vision, economics

Riemannian metric and flatness (affine structure)

Bregman divergence ', grad D

1,2

i jijD d g d d

, ij i j i ig

: geodesic (not Levi-Civita)Flatness (affine)

{ , ( ), }S

Legendre Transformation

, i i i i

one-to-one

0ii

,i i i

i

,D

( ) max { ( )}ii

, ' ' grad '

' ' ' ' 0

' '

,

' '

ii

ii

D

D

Proof

Two affine coordinate systems ,

: geodesic (e-geodesic)

: dual geodesic (m-geodesic)

“dually orthogonal”,

,

j ji i

ii i

i

*, , ,X XX Y Z Y Z Y Z

Bi‐orthogonality

Dually flat manifold

2 2

exponen

-coordin

tial fam

ates -coordinatespotential functions ,

0

, exp

: cumulant generating function: negative entropy

canon

i

ical d v

:

i

ly

ijij

i j i j

i i

i i

g g

p x x

ergence D(P: P')= ' 'i i

Exponential Family 

( , ) exp{ ( )}p x x

Gaussian:

Negative entropy

natural parameterexpectation parameter

( ) : convex function, free-energy

x : discrete X = {0, 1, …, n}

0 1

0 0

{ ( ) | }:

( ) ( ) exp[ ( )] exp

log( / ); ( ); ( ) log

[ ] ( )= log

n

n ni

i i ii i

ii i i

i i i i i

S p x x X

p x p x x

p p x x p

E x p p p

exponential family

η

x

Two geodesics

Tangent directions

Function space of probability distributions:  topology{p(x)}

Exponential Family

Pythagorean Theorem (dually flat manifold)

: : :D P Q D Q R D P R

Euclidean space: self-dual

212 i

Proof

: : :

[ : ]

)(

(

( )

) ( ) 0

P Q P Q

P Q Q R

P Q Q R

D P Q D Q R D P R

D P Q

Projection Theorem

p

sq

arg min [ : ]s Mq D p s

arg min [ : ]s Mq D s p

m-geodesic

e-geodesic

M

S

Projection Theorem

min :Q M

D P Q

Q = m-geodesic projection of P to Munique when M is e-flat min :

Q MD Q P

Q’ = e-geodesic projection of P to Munique when M is m-flat

Convex function – Bregman divergence– Dually flat Riemannian divergence

Dually flat R‐manifold – convex function – canonical divergenceKL‐divergence

Exponential family – Bregman divergenceBanerjee et al

InvarianceInvariance ,S p x

Invariant under different representation

, ,y y x p y 2

1 2

21 2

, ,

| ( , ) ( , ) |

p x p x dx

p y p y dy

Invariant divergence (manifold of probability distributions;                 ) 

: sufficient statisticsy k x

: :X X Y YD p x q x D p y q y

{ ( , )}S p x

ChentsovAmari ‐Nagaoka

Invariance ‐‐‐ characterization of f‐divergence

:ip

:p

1 n

1 2 m

ii A

p p

( )Ap p

Csiszar

: :

: :

A A

A A

D D

D D

p q p q

p q p q

; i ip c q i A

:p

:q

Invariance ⇒ f‐divergence

Csiszar f‐divergence

: ,if i

i

qD p fp

p q

: convex, 1 0,f u f

: :cf fD cDp q p q

1f u f u c u

' ''1 1 0 ; 1 1f f f 1

( )f u

u

Ali‐SilveyMorimoto

Theorem

An invariant separable divergence belongs to the class of f‐divergence.

Separable divergence: [ : ] ( , )

( , ) ( )

i i

ii i i

i

D k p q

qk p q p fp

p q

dually flat space

convex functionsBregman

divergence

invariance

invariant divergence Flat divergence

KL‐divergenceF‐divergenceFisher inf metricAlpha connection

: space of probability distributions}{pS

logp(x)D[p : q] = p(x) { }dxq(x)

(n > 1)

‐Divergence:  why? flat & invariant in

12

2

4 2( ) {1 } (1 ), 11 1

f u u u

KL-divergence( ) log ( 1)

[ : ] { log }ii i i

i

f u u u upD p q p p qq

1nS

, 0 : ( 1 holds)i iS p p nn p

Space of positive measures :  vectors, matrices, arrays

f‐divergence

α‐divergence

Bregman divergence

: 0if i

i

qD p fp

p q

: 0fD p q p q

not invariant under 1f u f u c u

divergence of f S

divergence

1 12 21 1[ : ] { }

2 2i i i iD p q p q p q

[ : ] { log }ii i i

i

pD p q p p qq

KL‐divergence

: dually flat: not dually flat (except 1)

SS

21

1

1

i

i

p

r

Metric and Connections Induced by Divergence(Eguchi)

'

' '

1: : : = (z - y )(z - y ) 2

:

:

ij i j ij i i j j

ijk i j k

ijk i j k

g D D g

D

D

y z

y z

y z

z z y z y z

z z y

z z y

*

'

{ , }

, i ii iz y

Riemannian metric

affine connections

Invariant geometrical structurealpha‐geometry(derived from invariant divergence)

,S p x

ij i j

ijk i j k

g E l l

T E l l l

log , ; i il p x

‐connection

, ;ijk ijki j k T

: dually coupled

, , ,X XX Y Z Y Z Y Z

α

Fisher information

Levi‐civita: 

Duality:

, ,

k ij kij kji

ijk ijk ijk

g

T

M g T

*, , ,X XX Y Z Y Z Y Z

Riemannian Structure

2 ( )

( )

( ) ( )

Euclidean

i jij

T

ij

ds g d d

d G d

G g

G E

Fisher information

AffineConnection

covariant derivative,

0, X=X(t)

(

-

)

X c

X

i jij

Y X Y

X

s g d d

minimal dista

ge

nce non

odesi

me

c

tric

straight line

DualityDuality

, , , i jijX Y X Y X Y g X Y

Riemannian geometry:

X

Y

X

Y

**, , ,X XX Y Z Y Z Y Z

Dual Affine Connections

e‐geodesic

m‐geodesic

log , log 1 logr x t t p x t q x c t

, 1r x t tp x t q x

,

q x

p x

*( , )

Mathematical structure of ,S p x

ij i j

ijk i j k

g E l l

T E l l l

log , ; i il p x

-connection

, ;ijk ijki j k T

: dually coupled

, , ,X XX Y Z Y Z Y Z

{M,g,T}

α‐geometry

Dual Foliations

k‐cut

00110001011010100100110100

0101101001010

firing rates:correlation—covariance?

1x2x

3x

00 01 10 11{ , , , }p p p p

1 2 12, ;r r r

Two neurons:

Correlations of Neural FiringCorrelations of Neural Firing

1 2

00 10 01 11

1 1 10 11

2 1 01 11

,

, , ,

p x x

p p p pr p p pr p p p

11 00

10 01

log p pp p

1x 2x 2

1

1 2{( , ), }r r

orthogonal coordinates

firing ratescorrelations

1 2{ ( , )}S p x x1 2, 0,1x x

1 2{ ( ) ( )}M q x q x

Independent Distributions

two neuron case

1 2 12 1 2 12

12 12 1 200 1112

01 10 1 12 2 12

12 1 2

12 1 2

, , ; , ,1

log log

, ,

, ,

r r rr r r rp p

p p r r r r

r f r r

r t f r t r t

Decomposition of KL-divergence

D[p:r] = D[p:q]+D[q:r]

p,q: same marginals

r,q: same correlations

1 2,

p

qr

independent

correlations

( )[ : ] ( ) log( )x

p xD p r p xq x

pairwise correlations

ij ij i jc r r r

independent distributions

, ,ij i j ijk i j kr r r r rr r

How to generate correlated spikes?(Niebur, Neural Computation [2007])

higher-order correlations

covariance: not orthogonal

Orthogonal higher‐order correlations

1

1

;

;

,

,

,

,

i i n

i n

j

i j rr r

r

Neurons

1x nx

1i ix u

Gaussian [ ]i i ju E u u

2x

Population and Synfire

Synfiring

1( ) ( ,..., )

1n

i

p p x x

r x q rn

x

( )q r

r

Input‐output AnalysisGross product consumptionRelations among industires(K. Tsuda and R. Morioka)

Mathematical Problems

M         submanifold of S ?Hong van Le

{M, g}        {M, g, T}  dually flat   J. Armstrong

Affine differential geometryHessian manifoldAlmost complex structure

top related