the geometry of distributions i classification of...
TRANSCRIPT
![Page 1: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/1.jpg)
The Geometry of Distributions I
Classification of Distances
Suresh VenkatasubramanianUniversity of Utah
![Page 2: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/2.jpg)
Histograms And Distributions
Finite (and fixed) domain: { the, apple, orange, and }“text” over that domain: “the apple and the orange andthe orange”(Normalized) “frequency counts” over the domain: { 3/8,1/8, 1/4, 1/4}
Distribution is a point on the d-simplex:
∆d = {(x1, x2, . . . xd+1) |∑ xi = 1}
![Page 3: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/3.jpg)
Comparing Distributions
![Page 4: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/4.jpg)
Comparing Distributions
![Page 5: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/5.jpg)
Comparing Distributions
![Page 6: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/6.jpg)
Data Analysis ≡ Geometry
ProblemFind interesting patterns in a collection of data
![Page 7: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/7.jpg)
Data Analysis ≡ Geometry
ProblemFind interesting patterns in a collection of data
![Page 8: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/8.jpg)
Data Analysis ≡ Geometry
ProblemFind interesting patterns in a collection of data
![Page 9: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/9.jpg)
Data Analysis ≡ Geometry
ProblemFind interesting patterns in a collection of data
![Page 10: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/10.jpg)
Geometry Must Possess Right Properties
Assume that all points lie in Euclidean space
except...if v ∈ Rd, then c · v ∈ Rd for all c ∈ R
−5 · (0.1, 0.2, 0.5) = (−0.1,−0.2,−0.5)
if v, w ∈ Rd, then v + w ∈ Rd
(0.6, 0.1, 0.7) + (0.6, 0.2, 0.1) = (1.2, 0.3, 0.8)
![Page 11: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/11.jpg)
Geometry Must Possess Right Properties
Assume that all points lie in Euclidean space
except...if v ∈ Rd, then c · v ∈ Rd for all c ∈ R
−5 · (0.1, 0.2, 0.5) = (−0.1,−0.2,−0.5)
if v, w ∈ Rd, then v + w ∈ Rd
(0.6, 0.1, 0.7) + (0.6, 0.2, 0.1) = (1.2, 0.3, 0.8)
![Page 12: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/12.jpg)
Geometry Must Possess Right Properties
Assume that all points lie in Euclidean space
except...if v ∈ Rd, then c · v ∈ Rd for all c ∈ R
−5 · (0.1, 0.2, 0.5) = (−0.1,−0.2,−0.5)
if v, w ∈ Rd, then v + w ∈ Rd
(0.6, 0.1, 0.7) + (0.6, 0.2, 0.1) = (1.2, 0.3, 0.8)
![Page 13: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/13.jpg)
Distance Must Have Meaning
Instead ofThe distance between the two objects is 5.3
we needThe distance between the two objects is 5.3, and against
the null hypothesis that they are the same, this has a p-valueof 0.001
![Page 14: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/14.jpg)
Comparing Models Rather Than Data
View data as being generated by models, and compare modelsinstead of data.
![Page 15: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/15.jpg)
Comparing Models Rather Than Data
View data as being generated by models, and compare modelsinstead of data.
![Page 16: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/16.jpg)
Lecture Plan
Informationgeometry
Algorithms forinformationdistances
Spatially-awareinformation distances
![Page 17: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/17.jpg)
Lecture Plan
Informationgeometry
Algorithms forinformationdistances
Spatially-awareinformation distances
![Page 18: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/18.jpg)
Distributions are generated by parameters
Gaussian x ∼ N (µ, σ), p(x; µ, σ) ∝ exp(− ‖x−µ‖2
σ2 )
Poisson k ∼ Pois(λ), p(k; λ) = λk exp(−λ)k!
Multinomial p(x1, . . . , xk; n, θ1, . . . θk) =n!
x1!···xk! θx11 θx2
2 · · · θxkk ,
∑ θi = 1
![Page 19: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/19.jpg)
Manifold of distributions
Space of parameters of a distribution forms a manifold.
Geodesics measure distance.
![Page 20: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/20.jpg)
Manifold of distributions
Space of parameters of a distribution forms a manifold.
Geodesics measure distance.
![Page 21: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/21.jpg)
Riemannian Geometry
L(γ) =∫
γ
√‖ds‖2
Length of tangent is set by aninner product:
‖ds‖2 = ∑i,j
gijdsidsj
gij is the metric tensor.
For example, in Euclidean space, gij = δij, and ‖ds‖2 = dx21 + dx2
2 + · · ·
![Page 22: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/22.jpg)
Riemannian Geometry
ds
L(γ) =∫
γ
√‖ds‖2
Length of tangent is set by aninner product:
‖ds‖2 = ∑i,j
gijdsidsj
gij is the metric tensor.
For example, in Euclidean space, gij = δij, and ‖ds‖2 = dx21 + dx2
2 + · · ·
![Page 23: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/23.jpg)
Riemannian Geometry
ds
L(γ) =∫
γ
√‖ds‖2
Length of tangent is set by aninner product:
‖ds‖2 = ∑i,j
gijdsidsj
gij is the metric tensor.
For example, in Euclidean space, gij = δij, and ‖ds‖2 = dx21 + dx2
2 + · · ·
![Page 24: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/24.jpg)
Riemannian Geometry
ds
L(γ) =∫
γ
√‖ds‖2
Length of tangent is set by aninner product:
‖ds‖2 = ∑i,j
gijdsidsj
gij is the metric tensor.
For example, in Euclidean space, gij = δij, and ‖ds‖2 = dx21 + dx2
2 + · · ·
![Page 25: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/25.jpg)
Riemannian Geometry
ds
L(γ) =∫
γ
√‖ds‖2
Length of tangent is set by aninner product:
‖ds‖2 = ∑i,j
gijdsidsj
gij is the metric tensor.
For example, in Euclidean space, gij = δij, and ‖ds‖2 = dx21 + dx2
2 + · · ·
![Page 26: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/26.jpg)
Fisher Information
Let p(x; θ) be a parametric family of distributions.Set s = (s1, s2, . . . sk)
>, si =∂ log p(x;θ)
∂θi.
gij = E[sisj] =∫
p(x; θ)∂ log p(x; θ)
∂θi
∂ log p(x; θ)
∂θjdx
= −∫
p(x; θ)∂2 log p(x; θ)
∂θj∂θidx
G = {gij} is the Fisher Information
Fisher information acts like a “curvature” of the manifoldHigh Fisher information implies easier estimation of θfrom data.Given a parametric family, Fisher information inducesmetric structure on manifold
![Page 27: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/27.jpg)
Fisher Information
Let p(x; θ) be a parametric family of distributions.Set s = (s1, s2, . . . sk)
>, si =∂ log p(x;θ)
∂θi.
gij = E[sisj] =∫
p(x; θ)∂ log p(x; θ)
∂θi
∂ log p(x; θ)
∂θjdx
= −∫
p(x; θ)∂2 log p(x; θ)
∂θj∂θidx
G = {gij} is the Fisher Information
Fisher information acts like a “curvature” of the manifoldHigh Fisher information implies easier estimation of θfrom data.Given a parametric family, Fisher information inducesmetric structure on manifold
![Page 28: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/28.jpg)
Example: Gaussian distributions
Consider {N(µ, σI) | µ ∈ Rd−1, σ ∈ R+}
log p(x; θ) = −d−1
∑l=1
(xl − µl)2
σ2
−∂2 log p(x; θ)
∂θj∂θi=
1σ2 δij, i, j < d
E[−∂2 log p(x; θ)
∂σ2 ] =2(d− 1)
σ2
After some rescaling,
gij =1σ2 δij
which induces d-dimensional hyperbolic space
Note: If σ = 1 we recover Euclidean geometry.
![Page 29: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/29.jpg)
Example: Multinomials
Consider {p(x1, . . . , xd; n, θ1, . . . , θd) | ∑ θi = 1}
log p(x; θ) = ∑ xi log θi
∂2 log p(x; θ)
∂θj∂θi= − x
θ2i
δij
A few steps later...
∑ gijdsidsj = ∑i
ds2i
θi
By a standard transformation from the simplex to thesphere, this yields the Euclidean inner product !Geodesics between distributions are great circles on thesphere.Hellinger distance is now chordal distance on sphere.
![Page 30: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/30.jpg)
From a metric to a distance
Metric tensor only gives infinitesimal distance (‖ds‖2).
To find shortest paths, we need to minimize path length
Structure What you getManifold Topology
Differentiability Tangent spaceMetric tensor Infinitesimal length
Affine connection Globally minimum paths
Metric tensor induces “natural” connectionIn statistical manifolds, many connections can be defined(parametrized by α)
Different values of α yield different Bregman divergences,f -divergences, α-divergences.
![Page 31: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/31.jpg)
From a metric to a distance
Metric tensor only gives infinitesimal distance (‖ds‖2).To find shortest paths, we need to minimize path length
Structure What you getManifold Topology
Differentiability Tangent spaceMetric tensor Infinitesimal length
Affine connection Globally minimum paths
Metric tensor induces “natural” connectionIn statistical manifolds, many connections can be defined(parametrized by α)
Different values of α yield different Bregman divergences,f -divergences, α-divergences.
![Page 32: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/32.jpg)
From a metric to a distance
Metric tensor only gives infinitesimal distance (‖ds‖2).To find shortest paths, we need to minimize path length
Structure What you getManifold Topology
Differentiability Tangent spaceMetric tensor Infinitesimal length
Affine connection Globally minimum paths
Metric tensor induces “natural” connectionIn statistical manifolds, many connections can be defined(parametrized by α)
Different values of α yield different Bregman divergences,f -divergences, α-divergences.
![Page 33: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/33.jpg)
From a metric to a distance
Metric tensor only gives infinitesimal distance (‖ds‖2).To find shortest paths, we need to minimize path length
Structure What you getManifold Topology
Differentiability Tangent spaceMetric tensor Infinitesimal length
Affine connection Globally minimum paths
Metric tensor induces “natural” connectionIn statistical manifolds, many connections can be defined(parametrized by α)
Different values of α yield different Bregman divergences,f -divergences, α-divergences.
![Page 34: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/34.jpg)
A Rogues’ Gallery
Kullback-Leibler Distance
KL(p, q) = ∑i
pi logpi
qi
The Jensen-Shannon Distance
JSα,β(p, q) = αKL(p, m) + βKL(q, m)
where m = αp + βq, α + β = 1χ2-Distance
χ2(p, q) = ∑i
(pi − qi)2
qi
∆-Distance
∆(p, q) = ∑i
(pi − qi)2
pi + qi
Hellinger-Matsusita-Bhattacharya Distance
dH(p, q) = [∑i(√
pi −√
qi)2]
12
![Page 35: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/35.jpg)
A Rogues’ Gallery
Kullback-Leibler Distance
KL(p, q) = ∑i
pi logpi
qi
The Jensen-Shannon Distance
JSα,β(p, q) = αKL(p, m) + βKL(q, m)
where m = αp + βq, α + β = 1
χ2-Distance
χ2(p, q) = ∑i
(pi − qi)2
qi
∆-Distance
∆(p, q) = ∑i
(pi − qi)2
pi + qi
Hellinger-Matsusita-Bhattacharya Distance
dH(p, q) = [∑i(√
pi −√
qi)2]
12
![Page 36: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/36.jpg)
A Rogues’ Gallery
Kullback-Leibler Distance
KL(p, q) = ∑i
pi logpi
qi
The Jensen-Shannon Distance
JSα,β(p, q) = αKL(p, m) + βKL(q, m)
where m = αp + βq, α + β = 1χ2-Distance
χ2(p, q) = ∑i
(pi − qi)2
qi
∆-Distance
∆(p, q) = ∑i
(pi − qi)2
pi + qi
Hellinger-Matsusita-Bhattacharya Distance
dH(p, q) = [∑i(√
pi −√
qi)2]
12
![Page 37: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/37.jpg)
A Rogues’ Gallery
Kullback-Leibler Distance
KL(p, q) = ∑i
pi logpi
qi
The Jensen-Shannon Distance
JSα,β(p, q) = αKL(p, m) + βKL(q, m)
where m = αp + βq, α + β = 1χ2-Distance
χ2(p, q) = ∑i
(pi − qi)2
qi
∆-Distance
∆(p, q) = ∑i
(pi − qi)2
pi + qi
Hellinger-Matsusita-Bhattacharya Distance
dH(p, q) = [∑i(√
pi −√
qi)2]
12
![Page 38: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/38.jpg)
A Rogues’ Gallery
Kullback-Leibler Distance
KL(p, q) = ∑i
pi logpi
qi
The Jensen-Shannon Distance
JSα,β(p, q) = αKL(p, m) + βKL(q, m)
where m = αp + βq, α + β = 1χ2-Distance
χ2(p, q) = ∑i
(pi − qi)2
qi
∆-Distance
∆(p, q) = ∑i
(pi − qi)2
pi + qi
Hellinger-Matsusita-Bhattacharya Distance
dH(p, q) = [∑i(√
pi −√
qi)2]
12
![Page 39: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/39.jpg)
The Rogues’ Club
Bregman divergence For convex φ : Rd → R
Dφ(p, q) = φ(p)− φ(q)− 〈∇φ(q), p− q〉
α-divergence For |α| < 1,
Dα(p, q) =4
1− α2 [1−∫
p(1−α)/2q(1+łpha)/2]
f-divergence For convex f : R→ R, f (1) = 0,
Df (p, q) = ∑i
pif (qi
pi)
f = − log x, φ = x log x and α→ −1 all give KL(p, q).
![Page 40: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/40.jpg)
The Rogues’ Club
Bregman divergence For convex φ : Rd → R
Dφ(p, q) = φ(p)− φ(q)− 〈∇φ(q), p− q〉
α-divergence For |α| < 1,
Dα(p, q) =4
1− α2 [1−∫
p(1−α)/2q(1+łpha)/2]
f-divergence For convex f : R→ R, f (1) = 0,
Df (p, q) = ∑i
pif (qi
pi)
f = − log x, φ = x log x and α→ −1 all give KL(p, q).
![Page 41: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/41.jpg)
The Rogues’ Club
Bregman divergence For convex φ : Rd → R
Dφ(p, q) = φ(p)− φ(q)− 〈∇φ(q), p− q〉
α-divergence For |α| < 1,
Dα(p, q) =4
1− α2 [1−∫
p(1−α)/2q(1+łpha)/2]
f-divergence For convex f : R→ R, f (1) = 0,
Df (p, q) = ∑i
pif (qi
pi)
f = − log x, φ = x log x and α→ −1 all give KL(p, q).
![Page 42: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/42.jpg)
The Rogues’ Club
Bregman divergence For convex φ : Rd → R
Dφ(p, q) = φ(p)− φ(q)− 〈∇φ(q), p− q〉
α-divergence For |α| < 1,
Dα(p, q) =4
1− α2 [1−∫
p(1−α)/2q(1+łpha)/2]
f-divergence For convex f : R→ R, f (1) = 0,
Df (p, q) = ∑i
pif (qi
pi)
f = − log x, φ = x log x and α→ −1 all give KL(p, q).
![Page 43: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/43.jpg)
Invariance Properties and Cencov’s theorem
![Page 44: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/44.jpg)
Euclidean Invariants
scaling
shearing rotation
14.64 mm
28.22 mm
14.64 mm
6.48 mm
![Page 45: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/45.jpg)
Markov Transformations
A is a column-stochastic matrix if
∀j, ∑i
aij = 1
If p is a distribution, then so is Ap.Information cannot be increased: for any such A,
d(Ap, Aq) ≤ d(p, q)
![Page 46: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/46.jpg)
Sufficient Statistics
Let X ∼ p(x; θ), and T be a transformation of X. T is a sufficientstatistic if
p(x; θ, T(X)) = p(x; T(X))
Example: Let X ∼ N (µ, σ), x1, . . . xn be samples, andT(X) = ( 1
n ∑ xi, 1n−1 ∑(x− 1
n ∑ xi)2).
Informally, a sufficient statistic captures all information in thedistribution.
![Page 47: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/47.jpg)
Cencov’s theorem
TheoremThe Fisher information is the unique (modulo scaling) metric tensorthat remain invariant under Markov transformations that aresufficient.
![Page 48: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/48.jpg)
Coming up
Informationgeometry
Algorithms forinformationdistances
Spatially-awareinformation distances
![Page 49: The Geometry of Distributions I Classification of Distancescgl.uni-jena.de/pub/Workshops/WebHome/SureshLecture1.pdf · Geodesics between distributions are great circles on the sphere](https://reader034.vdocuments.us/reader034/viewer/2022050718/5e17d23ae0833c5ada06020a/html5/thumbnails/49.jpg)
References I
Shun-Ichi Amari and Hiroshi Nagaoka.Methods of information geometry.Oxford University Press., 2000.
L. L. Campbell.An extended Cencov characterization of the informationmetric.Proc. Amer. Math. Soc., 98(1):135–141, 1986.
Guy Lebanon.Riemannian Geometry and Statistical Machine Learning.PhD thesis, CMU, 2005.
N. N. Cencov.Statistical Decision Rules and Optimal Inference.American Mathematical Society, 1982.Originally published in Russian, Nauka, 1972.