incremental algorithms for statistical analysis of manifold …salehian/salehian_h.pdf ·...

INCREMENTAL ALGORITHMS FOR STATISTICAL ANALYSIS OF MANIFOLD VALUEDDATA

By

HESAMODDIN SALEHIAN

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2014

c⃝ 2014 Hesamoddin Salehian

2

To the memory of my mother, who devoted her life to my education and has always been

truly my encouragement. My wife who has always been supportive and proud of my

work, and shared many challenges and sacrifices towards completing my PhD. My

father, who taught me to persist and work hard throughout my life. My brothers, who

have been always my leaders in education and taught me to be ambitious with high

goals.

3

ACKNOWLEDGMENTS

First and foremost, I would like to thank my advisor, Dr. Baba C. Vemuri, for his

persistent support to make this dissertation. His creativity, excellent knowledge and

patience encouraged me all along my PhD study. This dissertation would have not been

completed without his support.

I would also like to thank my committee, Dr. Arunava Banerjee and Dr. Anand

Rangarajan, Dr. William Hager and Dr. John Forder, for making valuable comments and

providing wonderful advice. Dr. Banerjee and Dr. Rangarajan have always been very

supportive and generous with their time, and taught me fundamental and advanced

machine learning concepts. Dr. Hager had a great impact on my knowledge of linear

algebra and matrix analysis. Dr. Forder kindly provided data for medical imaging

applications.

Also, special thanks to Dr. Jeffrey Ho, for his excellent support through my PhD. I

had the honor to collaborate with him in several publications, and I would like to thank

him for his insightful guidance, dedication, and his wonderful attitude.

I cannot express my gratitude enough to my deceased mother, Zahra Khatibi, who

devoted her entire life to my education, and was always an excellent encouragement

and support all along this road. I never got a chance to say goodbye to her when

she passed away overseas, but her memories was the strongest encouragement to

overcome all the difficulties towards completing this degree and to make her wishes

come true.

I am very thankful to my kind wife, Pegah, who has always been proud of my

accomplishments and has been by my side through highest highs and lowest lows. I

cannot imagine how this dissertation could have been completed, without her persistent

help and support.

Special thanks for my father, Manouchehr Salehian, who have always been my role

model of hard working, strength and great personality, and my older brothers, Hamid

4

and Hamed, who were truly my leaders in education, in music arts and in sport, since I

was a little child till present.

Last, but not least, I want to thank my former lab-mate, Dr. Guang Cheng for his

help and guidance and his excellent work in our several collaborations. Besides, I am

thankful to my friendly and knowledgeable colleagues in CVGMI Laboratory, Yuchen,

Meizhu, Ting, Dohyung, Wenxing, Yan, Yuanxiang, Jiaqi, Ted, Rudrasis, Monami, and

others.

The research in this dissertation was in part supported by NIH grant NS066340

to Dr. Baba C. Vemuri. I also received the Student Travel Award from MICCAI’14

Conference, and the Internship Program at Google. I gratefully acknowledge the

permission granted by IEEE and Springer to reuse materials from my previous

publications in this dissertation.

5

TABLE OF CONTENTS

page

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

CHAPTER

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 INCREMENTAL ESTIMATION OF THE STEIN CENTER OF SPD MATRICESAND ITS APPLICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2 Incremental Stein Mean Computation . . . . . . . . . . . . . . . . . . . . 202.3 Properties of Pn Equipped with the Stein Distance . . . . . . . . . . . . . 23

2.3.1 Global Non-Positive Curvature Spaces . . . . . . . . . . . . . . . . 232.3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.4.1 Performance of the Incremental Stein Center . . . . . . . . . . . . 302.4.2 Application to K-means Clustering . . . . . . . . . . . . . . . . . . 312.4.3 Application to Image Retrieval . . . . . . . . . . . . . . . . . . . . . 342.4.4 Application to Shape Retrieval . . . . . . . . . . . . . . . . . . . . . 39

3 INCREMENTAL FRECHET MEAN ESTIMATOR ON SPHERE . . . . . . . . . 42

3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.2.1 Riemannian Geometry of Sphere . . . . . . . . . . . . . . . . . . . 453.2.2 Gnomonic Projection . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.3 Incremental Frechet Mean Estimator on Sphere . . . . . . . . . . . . . . . 473.3.1 Angle Bisector Theorem . . . . . . . . . . . . . . . . . . . . . . . . 513.3.2 Lower Bound for tn . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.3.3 Upper Bound for tn . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.3.4 Convergence of iFME . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563.4.1 Synthetic Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 563.4.2 Application to Incremental Shape-Preserving Frechet Mean of

SPD Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4 IPGA: INCREMENTAL PRINCIPAL GEODESIC ANALYSIS WITH APPLICATIONSTO MOVEMENT DISORDER CLASSIFICATION . . . . . . . . . . . . . . . . . 63

6

4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.2.1 Riemannian Geometry of the Space of SPD Tensor Fields . . . . . 654.2.2 Schild’s Ladder Approximation of Parallel Transport . . . . . . . . . 67

4.3 iPGA: Incremental Principal Geodesic Analysis . . . . . . . . . . . . . . . 684.3.1 Incremental Frechet Mean Estimator . . . . . . . . . . . . . . . . . 694.3.2 Incremental Principal Geodesic Analysis on Pmn . . . . . . . . . . . 704.3.3 Incremental Principal Geodesic Analysis on Sk . . . . . . . . . . . 72

4.4 Synthetic Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754.4.1 Manifold of SPD Tensor Fields . . . . . . . . . . . . . . . . . . . . 754.4.2 Unit Sphere Sk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.5 Real Data Experiments: Classification of PD vs. ET vs. Controls . . . . . 774.5.1 Classification Results using Deformation Tensor Features . . . . . 784.5.2 Classification Results using Shape Features . . . . . . . . . . . . . 80

5 SUMMARY AND DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

7

LIST OF TABLES

Table page

2-1 Average shape retrieval precision (%) for the MPEG7 database, for differentBinary Code (BC) lengths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2-2 Time (in seconds) comparison for shape retrieval. . . . . . . . . . . . . . . . . 41

4-1 Summary of Riemannian geometry of the space of n×n positive definite matrices,Pn, as well as the unit k−dimensional sphere, Sk . . . . . . . . . . . . . . . . . . 67

4-2 Incremental PGA Algorithm for SPD Tensor Fields . . . . . . . . . . . . . . . . 72

4-3 Incremental PGA Algorithm on Unit Sphere . . . . . . . . . . . . . . . . . . . . 73

4-4 Classification results of iPGA, PGA, PCA using SPD tensor field features . . . 79

4-5 Classification results of iPGA, PGA, PCA using shape descriptor features . . . 81

8

LIST OF FIGURES

Figure page

2-1 Schematic view of x1, x2, x3, x4 in Reshetnyak’s quadruple comparison. . . . . . 24

2-2 Illustration of the proof of Reshetnyak’s inequality for the quadruple (I ,D↓2 ,X3,X

↓4 ),

from the quadruple (I ,D↓2 ,X

↓3 ,X

↓4 ). . . . . . . . . . . . . . . . . . . . . . . . . . 29

2-3 Error comparison of the incremental (red) versus non-incremental (blue) Steinmean computation for data on P3. . . . . . . . . . . . . . . . . . . . . . . . . . 31

2-4 Time comparison of the incremental (red) versus non-incremental (blue) Steinmean computation for data on P3. . . . . . . . . . . . . . . . . . . . . . . . . . 32

2-5 Illustration of the incremental mean updates in K-means clustering. . . . . . . . 33

2-6 Time comparison of the K-means clustering using various methods. . . . . . . 35

2-7 Error comparison of the K-means clustering. . . . . . . . . . . . . . . . . . . . 36

2-8 Time consumption in initializing hashing functions. . . . . . . . . . . . . . . . . 39

2-9 Comparison of retrieval accuracy, for techniques specified in Fig. 2-8 . . . . . . 40

2-10 Example results of proposed retrieval system, based on the incremental Steinmean, with 640-bits binary codes. . . . . . . . . . . . . . . . . . . . . . . . . . 41

3-1 Gnomonic Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3-2 Use of Euclidean weights to update iFME in Sk , does not necessarily correspondto the same weights in the tangent space. . . . . . . . . . . . . . . . . . . . . . 49

3-3 Frechet mean of samples on Sk , does not necessarily coincide with the arithmeticmean of projected points in the tangent space. . . . . . . . . . . . . . . . . . . 50

3-4 The comparison of the ratio of variances (defined in Eq. 3–25) between iFMEand FM, for different values of ϕ. . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3-5 The time comparison between iFME and FM, for different values of ϕ. . . . . . 58

3-6 Visual comparison of the mean tensor obtained from shape preserving iFMEon the product manifold (top row), and iFME applied on P(3) (bottom row). . . 60

3-7 Comparison of FA values between iFME on P(3), and iFME on the productmanifold. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4-1 Illustration of Schild’s Ladder algorithm, described in Eq. 4–9. . . . . . . . . . . 68

4-2 Schematic illustration of the algorithm in Table 4-2. . . . . . . . . . . . . . . . . 72

4-3 Step by step illustration of the iPGA algorithm on Sk , summarized in Table 4-3. 74

9

4-4 Estimation of the projection πS(X ) to the 1-D principal geodesic submanifold(red curve). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4-5 Time consumption and residual error comparison between iPGA (proposed)and PGA on Pmn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4-6 Mean angular error of iPGA estimates w.r.t. PGA on S10000. . . . . . . . . . . . 77

4-7 Time comparison of incremental and non-incremental PGA estimators on S10000. 78

4-8 S0 images of a control and a Parkinson subject, along with the computed atlas. 79

4-9 Population of Substantia Nigra regions extracted from the control brain images. 81

4-10 Comparison of incremental (bottom row) and non-incremental (top row) resultsof (1) Frechet Means (left column), (2) PGA with the coefficient 1.5

√λ (middle

column), and (3) PGA with the coefficient 3√λ (right column) . . . . . . . . . . 82

10

Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy

INCREMENTAL ALGORITHMS FOR STATISTICAL ANALYSIS OF MANIFOLD VALUEDDATA

By

Hesamoddin Salehian

December 2014

Chair: Baba C. VemuriMajor: Computer Engineering

Manifold-valued features are ubiquitous in many applications in computer vision,

machine learning and medical image analysis. Statistical analysis of a population

of such data is commonly encountered in many tasks in the aforementioned fields

such as, object recognition, shape analysis, facial expression analysis, longitudinal

studies quantifying for example disease related changes in structure/function, and

many others. In this dissertation we present a suite of efficient incremental tools and

techniques for statistical analysis of a given population of manifold-valued data. Most of

the existing tools suffer from computational and storage (memory) inefficiency, due to

the complexities introduced when dealing with manifold-valued features. Therefore, an

incremental technique is an appealing choice in these applications, because, when the

input population is augmented, one only needs to update the most recently estimated

statistical feature (e.g., mean, principal component, etc), without having to re-compute it

from scratch.

We start the dissertation with efficient statistical analysis algorithms of a population

of Symmetric Positive Definite (SPD) matrices. In this regard, we first propose a novel

incremental algorithm to compute the mean of a population of SPD matrices, based on

the recently introduced Stein distance. It is known that the compute time of the Stein

distance between two SPD matrices is far less than that required for computing the

geodesic distance using the canonical GL-invariant metric . However, there is no closed

11

from solution for the Stein mean of a group of SPD tensors, which is defined as the

minimizer of the sum of squared Stein distances. Therefore, our incremental Stein mean

estimator plays a crucial role to speed up many applications dealing with SPD matrices.

In a wide variety of applications the input data lies on a sphere which is an example

of Riemannian manifolds with positive constant sectional curvature. We develop a novel

incremental mean computation algorithm for features lying on a sphere, which is one

of the most widely used manifolds in science and engineering problems. Although

there are several convergence results in recent literature for many manifestations of

an incremental mean estimator, these analysis are all limited to the non-positively

curved spaces. We analytically show the convergence of the incremental method to

the true mean on sphere, when the number of samples tends to infinity. To the best

of our knowledge, there is no similar convergence analysis introduced in literature, for

positively curved spaces. We provide several synthetic and real data experiments to

illustrate the effectiveness and efficiency of the proposed incremental method.

Next, we continue the statistical analysis of manifold-valued data, with the

introduction of a novel incremental Principal Geodesic Analysis (PGA) algorithm.

PGA is the non-linear counterpart of the well-known Principal Component Analysis

(PCA), and is applicable to manifold-valued data. However, the existing PGA algorithms

are computationally very expensive, specially for very large data. Using our incremental

method, we show considerable gains in computation time over the standard PGA

algorithm, while retaining the same accuracy.

12

CHAPTER 1INTRODUCTION

In many applications in computer vision, machine learning and medical imaging,

features do not belong to a vector space. For instance, having a unit norm is a constraint

which is frequently imposed on a group of vectors, but it is easy to verify that this

fundamental constraint is not necessarily closed under linear operations. Therefore,

these types of data can be best interpreted as features belonging to some manifold. To

mention a few, Symmetric Positive Definite (SPD) matrices which frequently appear in

computer vision and medical imaging, belong to a Riemannian manifold with negative

sectional curvature [47], most of the popular image features such as SIFT [32] are often

defined on spheres, due to normalization, etc.

Statistical analysis of manifold-valued features is encountered in most of the

applications mentioned above, either to characterize the uncertainty of the noisy data,

or to compare and classify the observations in group difference and longitudinal studies.

However, due to the lack of the vector space structure, standard statistical analysis tools,

e.g., arithmetic mean, Principal Component Analysis (PCA), etc., can not be directly

applied to a group of these features. In this dissertation, we introduce computationally

efficient tools for statistical analysis of a given population of manifold-valued data. This is

achieved by developing incremental algorithms for computing the statistics.

Finding the mean of a population of manifold-valued features has gotten a lot of

attentions in recent years. Computing the mean of data lying on a manifold, can be

achieved through minimization of the sum of squared geodesic distances between the

manifold-valued data points and the unknown mean. Mathematically speaking, for a set

of given points, xi , on a Riemannian manifold M,

µ∗ = argminµ∈M

n∑i=1

d2(xi ,µ) (1–1)

13

This cost function is usually called the Frechet function, in literature, and its global

minimizer is referred to as the Frechet mean [15]. The uniqueness of Frechet mean

for general manifolds cannot be guaranteed, unless some conditions are satisfied [52].

Consequently, any point that is a local minimizer of the above sum of squared distances

is known as Karcher mean. For Riemannian manifolds with non-positive sectional

curvatures, Cartan showed that the Frechet mean always exists and is unique [38, p.

222]. Later, Grove and Karcher in [16] tried to generalize Cartan’s theorem, and proved

the uniqueness of this center of mass in general Riemannian manifolds, but for the

samples within a geodesic ball with small enough radius. We refer the interested reader

to [1, 15, 52] for further details.

Among various examples of Riemannian manifolds, we are particularly interested in

the statistical analysis of the features lying on one the these two well-known manifolds

which widely appear in computer vision, medical image analysis and machine learning

literature: (i) the space of (n × n) Symmetric Positive Definite (SPD) matrices which is

denoted by P(n), and is a Riemannian manifold with negative sectional curvature [47],

(ii) the k-dimensional unit sphere embedded in Rk+1, which is denoted by Sk , and is a

standard instance of positively curved spaces [11].

Symmetric Positive Definite (SPD) matrices have been widely used in many

computer vision and medical imaging applications. For instance, structure tensors

and covariance descriptors are ubiquitous in computer vision problems, including but

not limited to classification, object tracking and recognition. Also, in medical imaging,

they are often encountered in Diffusion Tensor Imaging (DTI), Conductance Imaging,

elastography, etc. In DTI, they are used to characterize the diffusion of water molecules,

in elastography, the elasticity tensor is used to describe the material properties of the

tissue and so on and so forth. Cauchy-Green deformation tensors are another example

of such matrices which appear in fluid and solid mechanics.

14

On the other hand, spherical features are frequently used in many applications in

computer vision and machine learning. To mention a few, any probability distribution

function can be parameterized, using square root density and thus mapping it to a point

on a hyper-sphere in an infinite dimensional Hilbert space [45]. (3 × 3) orthogonal

matrices can be represented by unit quaternions which are points on a 4-dimensional

unit sphere [18]. Also, any directional feature, due to normalization, inherently lie on a

3-dimensional unit sphere [33].

It is known that the geodesic distance computation on P(n) is computationally

inefficient, specially for large matrix dimensions. The Stein distance is a recently

proposed alternative [9], which is more efficient. However, lack of a closed form solution

for the Stein mean of more than two SPD matrices, makes it less appealing, because

iterative optimization techniques must be employed to compute the mean. In Chapter

2, we present a novel incremental algorithm to compute the Frechet mean of a group

of SPD matrices, based on the Stein distance. Through several synthetic and real

data experiments, we demonstrate significant time gains achieved by our incremental

method, compared to its non-incremental counterpart, while the accuracy of the two

methods are very similar.

Further, in Chapter 3, the incremental Frechet mean estimator for data lying

on sphere, is presented. The existing incremental mean computation techniques in

literature e.g., [6, 21, 30, 46], are applicable to non-positively curved Riemannian

manifolds, while sphere is a space with positive sectional curvature [11]. Therefore,

convergence results in the aforementioned references are not directly applicable to this

case. We analytically prove the convergence of the incremental estimator to the true

Frechet mean for symmetric distributions, when the number of samples tends to infinity.

To the best of our knowledge, there is no similar convergence results for positively

curved manifolds, in literature. We demonstrate the efficiency of our incremental

method, in several applications.

15

Principal Component Analysis (PCA) is a well-known statistical analysis tool which

is widely used in literature. The non-linear version of PCA is called Principal Geodesic

Analysis (PGA) and was first introduced in [14]. PGA has been applied to many

problems in the past decade. To mention a few, in medical imaging literature, it was

used in [13, 14, 57] and [55] for statistical shape analysis and tensor field classification,

respectively. Also, in computer vision it was applied to facial gender classification [53]

and motion compression [48]. We continue the statistical analysis of manifold-valued

data, by presenting a novel incremental PGA (iPGA) algorithm for both a population of

SPD tensor fields, as well as spherical features, in Chapter 4. To this end, we present a

novel iPGA method using the incremental Frechet mean estimation technique presented

in [21], and reformulate the PGA algorithm in [55] in an incremental form. In order

to illustrate the effectiveness and accuracy of the proposed method we compare the

performance of iPGA and the batch-mode PGA via synthetic and real data experiments.

16

CHAPTER 2INCREMENTAL ESTIMATION OF THE STEIN CENTER OF SPD MATRICES AND ITS

APPLICATIONS

2.1 Background

Finding the mean of data lying on Pn can be achieved through a minimization

process. More formally, the mean of a set of N data xi ∈ Pn is defined by

x∗ = argminx

N∑i=1

d2(xi , x) (2–1)

where d is the chosen distance/divergence. Depending on the choice of d , different

types of means are obtained. Many techniques have been published on computing

the mean SPD matrix based on different kinds of similarity distances/divergences. In

[51], symmetrized Kullback-Leibler divergence was used to measure the similarities

between SPD matrices, and the mean was computed in closed-form and applied to

texture and diffusion tensor image (DTI) segmentation. Frechet mean was obtained

by using the GL-invariant (GL denotes the general linear group i.e., the group of (n, n)

invertible matrices) Riemannian metric on Pn and used for DTI segmentation in [28]

and for interpolation in [34]. Another popular distance is the so called Log-Euclidean

distance introduced in [12] and used for computing the mean. More recently, in [9] the

LogDet divergence was introduced and applied for tensor clustering and covariance

tracking. Each one of these distances and divergences possesses their own properties

with regards to invariance to group transformations/operations. For instance, the

natural geodesic distance derived from the GL-invariant metric is GL-invariant. The

c⃝2013 IEEE. Reprinted with minor changes, with permission, from H. Salehian,G. Cheng, B.C. Vemuri and J. Ho, ”Recursive Estimation of the Stein Center of SPDMatrices and Its Applications”, In Computer Vision (ICCV), 2013 IEEE InternationalConference on, pp. 1793-1800. IEEE, December 2013. [39]

17

LogEuclidean distance is invariant to the group of rigid motions and so on. Among

these distances/divergences, the LogDet divergence was shown to posses interesting

bounding properties with regards to the natural Riemannian distance in [9] and much

more computationally attractive for computing the mean. However, no closed form

expression exists for computing the mean using the LogDet divergence, for more than

two matrices. When the number of samples in the population is large and the size of

SPD matrices is larger, it would be desirable to have a computationally more attractive

algorithm for computing the mean using this divergence.

An incremental form can effectively address this problem. Incremental formulation

leads to considerable efficiency in mean computation, because for each new sample,

all one needs to do is to update the old. Consequently, the algorithm only needs to

keep track of the most recently computed mean, while computing the mean in a batch

mode requires one to store all previously given samples. This can prove to be quite

storage intensive for large problems. Thus, by using an incremental formula we can

significantly reduce the time and storage consumption. Recently, in [6] recursive

algorithms to estimate the mean SPD matrix based on the natural GL-invariant

Riemannian metric and symmetrized KL-divergence were proposed and applied to

the task of DTI segmentation. Also in [54] a recursive form of Log-Euclidean based

mean was introduced. In this chapter we present a novel incremental algorithm for

computing the mean of a set of SPD matrices, using the Stein metric.

The Jensen-Bregman LogDet (JBLD) divergence was recently introduced in [9] for

(n × n) SPD matrices. Compared to the standard approaches, the JBLD has a much

lower computational cost since the formula does not require any eigen decompositions

of the SPD matrices. Moreover, it has been shown that it is useful for use in nearest

neighbor retrieval [9]. However, JBLD is not a metric on Pn, since it does not satisfy the

triangle inequality. In [44] the authors proved that the square root of JBLD is a metric,

which is called Stein metric. Unfortunately, the mean of SPD matrices based on the

18

Stein metric can not be computed in a closed form, for more than two matrices [5, 9].

Therefore, iterative optimization schemes are applied to find the mean for a given set

of SPD matrices. The computational efficiency of these iterative schemes is effected

considerably especially when the number of samples and size of matrices is large. This

makes the Stein based mean inefficient for computer vision applications which deal with

huge amounts of data. In this chapter, we introduce an efficient incremental formula

to compute the Stein mean. To illustrate the effectiveness of proposed algorithm we

first show that applying the incremental Stein mean estimator to the task of K-means

clustering leads to significant gain in compute time when compared to using the batch

mode Stein center, as well as other recursive mean estimators based on aforementioned

distances/divergences. Furthermore, we develop a novel hashing technique which is a

generalization of the work in [20] to SPD matrices.

The key contributions are: (i) derivation of a closed form solution to the weighted

Stein center of two matrices which is then used in the formulation of the incremental

form for the Stein center estimation of more than two SPD matrices. (ii) Empirical

evidence of convergence of the incremental estimator of Stein mean to the true

Stein mean is shown. (iii) A new hashing technique for image indexing and retrieval

using covariance descriptors. (iv) Synthetic and real data experiments depicting

significant gains in computation time for SPD matrix clustering and image retrieval

(using covariance descriptor features), using our incremental Stein center estimator.

The rest of this chapter is organized as follows: in Section 3.3 we present the

incremental algorithm to find the Stein distance based mean of a set of SPD matrices.

Then in Section 2.3 we provide an overview of the important properties of Pn equipped

with the Stein distance. Section 2.4 includes the empirical evidences of the convergence

of incremental Stein mean estimator to the true Stein mean. Further, we present a set of

synthetic and real data experiments showing the improvements in compute time of SPD

matrix clustering and hashing.

19

2.2 Incremental Stein Mean Computation

The action of the general linear group of n×n invertible matrices (denoted by GL(n))

on Pn defines the natural group action and is defined as follows: ∀g ∈ GL(n),∀X ∈

Pn,X [g] = gXgT , where T denotes the matrix transpose operation. Let A and B be

any two points in Pn. The geodesic distance on this manifold is defined by the following

GL(n)-invariant Riemannian metric:

dR(A,B)2 = trace(Log(A−1B)2), (2–2)

where Log is the matrix logarithm. The mean of a set of N SPD matrices based on the

above Riemannian metric is called the Frechet mean, and is defined as

X ∗ = argminX

N∑i=1

d2R(X ,Xi), (2–3)

where X ∗ is the Frechet mean, and Xi are the given matrix-valued data. However,

computation of the distance using (2–2), requires eigen decomposition of the matrix,

which for large matrices slows down the computation considerably. Furthermore, the

minimization problem (2–3) does not have a closed form solution in general (for more

than two matrices) and iterative schemes such as the gradient descent technique are

employed to find the solution.

Recently in [9], the Jensen-Bregman LogDet (JBLD) divergence was introduced to

measure similarity/dissimilarity between SPD matrices. It is defined as

DLD(A,B) = logdet(A+ B

2)− 12logdet(AB), (2–4)

where A and B are two given SPD matrices. It can be seen that JBLD is much more

computationally efficient than the Riemannian metric, as no eigen decomposition

is required. JBLD is however not a metric, because it does not satisfy the triangle

inequality. However, in [44], it was shown that the square root of JBLD divergence is a

metric, i.e., it is non-negative definite, symmetric and satisfies the triangle inequality.

20

This new metric is called Stein metric and is defined by,

dS(A,B) =√DLD(A,B), (2–5)

where DLD is defined in (2–4). Clearly, Stein metric can also be computed efficiently.

Accordingly, the mean of a set of SPD tensors, based on Stein metric is defined by

X ∗ = argminX

N∑i=1

d2S(X ,Xi). (2–6)

Let X1,X2, ,XN ∈ Pn be a set of SPD matrices. The incremental Stein mean can be

defined as

M1 = X1 (2–7)

Mk+1(wk+1) = argminM(1− wk+1)d2S(Mk ,M)

+wk+1d2S(Xk+1,M) (2–8)

where wk+1 = 1k+1

, Mk is the old mean of k SPD matrices, Xk+1 is the new incoming

sample and Mk+1 is the updated mean for k +1 matrices. Note that (2–8) can be thought

of as a weighted Stein mean between the old mean and the new sample point, with the

weight being set to be the same as in Euclidean mean update.

Now, we show that (2–8) has a closed form solution for SPD matrices. Let A and B

be two matrices in Pn. The weighted mean of A and B, denoted by C , with the weights

being wa and wb such that wa + wb = 1, should minimize (2–8). Therefore, one can

compute the gradient of this objective function and set it to zero to find the minimizer C

wa[(C + A

2)−1 − C−1] + wb[(

C + B

2)−1 − C−1] = 0 (2–9)

Multiplying both sides of (2–9) by matrices C , C + A and C + B in a right order yields:

CA−1C + (wb − wa)C(I − A−1B)− B = 0 (2–10)

21

It can be verified that for any matrices A, B and C in Pn, satisfying (2–10), the matrices

A− 12CA− 1

2 and A− 12BA− 1

2 commute. In other words

A−1CA−1B = A−1BA−1C (2–11)

Left multiplication of (2–10) by A−1 yields

A−1CA−1C + (wb − wa)A−1C(I − A−1B) = A−1B (2–12)

The equation above can be rewritten in a matrix quadratic form as the following, by using

the equality in (2–11)

(A−1C +(wb − wa)2

(I − A−1B))2

=

A−1B +(wb − wa)2

4(I − A−1B)2 (2–13)

Taking the square root of both sides and rearranging yields

A−1C =

√A−1B +

(wb − wa)24

(I − A−1B)2

−(wb − wa)2

(I − A−1B) (2–14)

Therefore, the solution of (2–10) for C can be written in the following closed form

C = A[

√A−1B +

(wb − wa)24

(I − A−1B)2

−wb − wa2

(I − A−1B)] (2–15)

It can be verified that the solution in (2–15) satisfies Eq. (2–11). Therefore, Eq. (2–8) for

incremental Stein mean estimation can be rewritten as

Mk+1 =

Mk [

√M−1k Xk+1 +

(2wk+1 − 1)24

(I −M−1k Xk+1)

2

−2wk+1 − 12

(I −M−1k Xk+1)] (2–16)

22

with wk+1, Mk , Mk+1 and Xk+1 being the same as in (2–8).

2.3 Properties of Pn Equipped with the Stein Distance

In this section we briefly remark on the metric geometry of Pn equipped with the

Stein metric. Both the Stein metric dS and the GL(n)-invariant Riemannian metric dR

are GL(n)-invariant. However, their similarity does not go beyond this GL(n)-invariance.

In particular, we first show in this section that Pn equipped with the Stein metric is not

a global Non-Positive Curvature (NPC) space defined in [46]. Lack of this important

property makes it impossible to directly apply the convergence results of the incremental

mean estimators on global NPC spaces, provided in [46], to our incremental Stein mean

estimator. However, we will show that the Stein metric still shares important similarities

and features with global NPC spaces that can serve as strong piece of evidence in favor

of the algorithm’s convergence.

2.3.1 Global Non-Positive Curvature Spaces

In [46], Sturm had provided a study of probability theory on metric spaces of

non-positive curvature (so called global NPC spaces). An important requirement for

this type of spaces is that, aside from being a metric space, the distance between two

arbitrary points in the space M, denoted by dM, can be realized as the arc-length of a

length-minimizing path (geodesic) joining the two points. Non-positive curvature, in this

broader context, is formulated using several important inequalities, and the foremost

of which is the following inequality among three arbitrary points x , y , z ∈ M and the

geodesic path γ(t) joining x , y (with γ(0) = x , γ(1) = y ):

d2M(z , γ(t)) ≤ (1− t)d2M(z , x) + td2M(z , y)− t(1− t)d2M(x , y). (2–17)

This important inequality then implies the following well-known Reshetnyak’s quadruple

comparison: for all x1, x2, x3, x4 ∈ M, we have

d2M(x1, x3) + d2M(x2, x4) ≤ d2M(x2, x3) + d2M(x1, x4) + d2M(x1, x2) + d2M(x3, x4).

23

Figure 2-1. Schematic view of x1, x2, x3, x4 in Reshetnyak’s quadruple comparison.

Reshetnyak’s quadruple comparison is a particularly useful result for deducing important

theorems for global NPC spaces (see [46] and the references therein). In particular,

for any global NPC space M and a set of samples, x1, x2, ... defined on M, its Frechet

mean (or barycenter in [46]) will be a unique point on M. Besides, the incremental

mean estimator (similar to [6]) will asymptotically converge to the true Frechet mean.

Proposition 2.1. Pn with Stein metric is not a global NPC space.

Proof. (Sketch) Proposition 2.3 in [46] states that if a metric space (M,dM) is a global

NPC space, then it is a geodesic space. However we show in the following proposition

that (Pn, dS) is not a geodesic space.

Proposition 2.2. Let x , y be two arbitrary points in Pn. Their midpoints, ma,ms , with

respect to the affine-invariant Riemannian metric and the Stein metric, respectively,

coincide:

ma = ms .

However, in general, we have dS(x ,ms) = dS(y ,ms) but

dS(x ,ms) =1

2dS(x , y).

Proof. (Sketch) The coincidence of midpoint is a consequence of [5]. The difference

between dS(x ,ms) and 12dS(x , y) can be easily shown with a counter-example. Let x = 1

24

and y = 4, where x , y ∈ P1, then the coincidence of midpoint implies that ms = ma = 2.

But, it can be verified that dS(x ,ms) = dS(y ,ms) = 0.2427, while 12dS(x , y) = 0.2362,

hence dS(x ,ms) = 12dS(x , y). Therefore, based on Proposition 1.2 in [46], (Pn, dS) is not

a geodesic space.

However, the following proposition illustrates that Pn with Stein metric shares an

important similarity with global NPC spaces, although it is not one.

Proposition 2.3. Pn with Stein metric satisfies Reshetnyak’s quadruple comparison. In

other words, for all x1, x2, x3, x4 ∈ Pn, the inequality in 2.3.1 is satisfied.

To prove the theorem we will need to make use of the following lemmas.

Lemma 1. For any quadruple of positive real numbers (matrices in P1) the Reshetnyak’s

inequality holds.

Proof. For positive real numbers, x and y , Stein distance can be rewritten as:

dS(x , y) =

√logx + y

2√xy

(2–18)

Therefore, the Reshetnyak’s inequality can be expressed by the following summation

of real log functions

logx1 + x32√x1x3

+ logx2 + x42√x2x4

≤ log x1 + x22√x1x2

+ logx2 + x32√x2x3

+ logx3 + x42√x3x4

+ logx4 + x12√x4x1

⇒ log (x1 + x3)(x2 + x4)4√x1x2x3x4

≤ log (x1 + x2)(x2 + x3)(x3 + x4)(x4 + x1)16x1x2x3x4

⇒ (x1 + x3)(x2 + x4)

4√x1x2x3x4

≤ (x1 + x2)(x2 + x3)(x3 + x4)(x4 + x1)16x1x2x3x4

⇒ (√x1x2x3x4

+

√x3x4x1x2

+

√x2x3x1x4

+

√x1x4x2x3) ≤

1

4(

√x1x3x2x4

+

√x2x4x1x3

+

√x2x3x1x4

+

√x1x4x2x3)(

√x2x4x1x3

+

√x1x3x2x4

+

√x1x2x3x4

+

√x3x4x1x2)

⇒ (a + 1a+ b +

1

b) ≤ 14(b +

1

b+ c +

1

c)(a +

1

a+ c +

1

c) (2–19)

25

where a =√x1x2x3x4

, b =√x1x4x2x3

and c =√x1x3x2x4

.

But, for any positive number x , x + 1x≥ 2. Therefore,

A = a +1

a≥ 2

B = b +1

b≥ 2

C = c +1

c≥ 2

So, the inequality 3–19 can be rewritten as

4(A+ B) ≤ (C + A)(C + B)

⇒ C 2 + C(A+ B) + AB − 4(A+ B) ≥ 0

⇒ (A+ B)(C − 4) + C 2 + AB ≥ 0 (2–20)

We already know that,

C ≥ 2

⇒ C − 4 ≥ −2

⇒ (C − 4)(A+ B) ≥ −8 (2–21)

since A ≥ 2 and B ≥ 2 and hence A + B ≥ 4. On the other hand: C 2 ≥ 4 and also

AB ≥ 4. Summing up these two inequalities with Eq. 2–21 shows the correctness of

Eq. 2–20.

Lemma 2. For any quadruple of diagonal matrices on Pn, the Reshetnyak’s inequality is

satisfied.

Proof. The previous result can be immediately extended to the diagonal matrices

on Pn. Let X and Y be diagonal matrices, and xi and yi be their diagonal elements,

26

respectively. Then the Stein distance between X and Y can be obtained as

d2S(X ,Y ) =

n∑i=1

log(xi + yi2)− 12log(xiyi) =

n∑i=1

d2(xi , yi)

Now, let X ,Y ,Z , andW are diagonal matrices, with diagonal elements being xi , yi , zi

and wi , respectively. Based on lemma 1, the inequality for each i is satisfied, resulting n

inequalities for real numbers. Summing up these inequalities and using 2–22 completes

the proof.

Lemma 3. Let A and B be two SPD matrices. There is a matrix P for which PTAP = I

and PTBP = D↓, where I is the identity matrix and D↓ is a diagonal matrix whose

diagonal elements are sorted in decreasing order.

Proof. (Based on the intuition from [44]) Let A = UΛUT , and define S = Λ−12 U. Now

define C = STUTBUS , hence there exists a matrix V such that C = VD↓V T , where D↓

is diagonal with elements sorted in decreasing order.

The proof will be followed by setting P = USV , because:

PTAP = V TSTUTUΛUTUSV = V TUTΛ−12 ΛΛ

−12 UV = I (2–22)

also, by construction of P,

PTBP = V TSTUTBUSV = V TCV = D↓ (2–23)

Proof of Proposition 2.3 Let A1, A2, A3 and A4 be the given quadruple. Based on

Lemma 3, there exists a matrix P such that PTA1P = I and PTA2P = D↓2 , where I is the

identity matrix and D↓2 is a diagonal matrix in which the diagonal elements are sorted in

decreasing order. Assume that PTA3P = X3 and PTA4P = X4. Therefore, based on the

congruence invariance of the Stein metric, it will be sufficient to prove the inequality for

the new quadruple (I ,D↓2 ,X3,X4).

27

Let X ↓i be the diagonal matrix with diagonal elements being the eigenvalues of Xi ,

sorted in decreasing order. Based on lemma 2, the Reshetnyak’s inequality holds for

quadruple (I ,D↓2 ,X

↓3 ,X

↓4 ), as all these matrices are diagonal. Mathematically,

dS(I ,X↓4 )2+ dS(D

↓2 ,X

↓3 )2 ≤ dS(I ,D↓

2)2+ dS(D

↓2 ,X

↓4 )2+ dS(X

↓4 ,X

↓3 )2+ dS(X

↓3 , I )

2(2–24)

Now, we want to show the inequality for (I ,D↓2 ,X3,X

↓4 ), where X ↓

3 is replaced by X3.

To this end, we make use of the congruence invariance property of the Stein metric.

There exists a matrix Q for which QTD↓2Q = I and QTX ↓

3Q = Y↓3 , where I is the identity

and Y ↓3 is a diagonal matrix with decreasing diagonal elements. Suppose I , X3 and X ↓

4

are moved to Y1, Y3 and Y4 by the congruent transform Q, respectively. Based on the

congruence invariance, the inequality holds for (Y1, I ,Y↓3 ,Y4):

dS(Y1,Y4)2 + dS(I ,Y

↓3 )2 ≤ dS(Y1, I )2 + dS(I ,Y4)2 + dS(Y4,Y ↓

3 )2+ dS(Y

↓3 ,Y1)

2(2–25)

Moreover, it has been shown in [44] that for all pairs of SPD matrices, dS(A,B) ≥

dS(A↓,B↓), and in the special case, dS(I ,A) = dS(I ,A↓). Accordingly, dS(X

↓3 ,X

↓4 ) ≤

dS(X3,X↓4 ) and dS(I ,X

↓3 ) = dS(I ,X3). Based on the congruence invariance property,

these two relations can be extended to dS(Y↓3 ,Y4) ≤ dS(Y3,Y4) and dS(Y1,Y

↓3 ) =

dS(Y1,Y3). Furthermore, in the new quadruple we can obviously see that dS(I ,Y↓3 ) =

dS(I ,Y3). According to these relations we can replace Y ↓3 by Y3 in Eq. 2–25, which

implies that

dS(Y1,Y4)2 + dS(I ,Y3)

2 ≤ dS(Y1, I )2 + dS(I ,Y4)2 + dS(Y4,Y3)2 + dS(Y3,Y1)2 (2–26)

At the end, we can apply the group action, Q−1, to get the original quadruple, which

proves the inequality for (I ,D↓2 ,X3,X

↓4 ). The sequence of the above group actions is

illustrated in the Fig. 2-2. Note that the curves between each pair of points are drawn

only for demonstration of the corresponding Stein distances, and they do not represent

geodesic curves.

28

Figure 2-2. Illustration of the proof of Reshetnyak’s inequality for the quadruple(I ,D↓

2 ,X3,X↓4 ), from the quadruple (I ,D↓

2 ,X↓3 ,X

↓4 ).

In the last step we will prove the inequality for (I ,D↓2 ,X3,X4), where X ↓

4 is replaced

by X4. Similar to above, we apply the congruence invariance in the following manner;

there exists a matrix R for which RTX3R = I and RTX ↓4R = Z

↓4 . The matrices I , X4

and D↓2 are moved to Z1, Z4 and Z2, respectively under this transformation. Congruence

invariance implies that

dS(Z1,Z↓4 )2+ dS(Z2, I )

2 ≤ dS(Z1,Z2)2 + dS(Z2,Z ↓4 )2+ dS(Z

↓4 , I )

2+ dS(I ,Z1)

2 (2–27)

In a similar fashion to the last part we can say that dS(Z1,Z↓4 ) = dS(Z1,Z4) and also

dS(Z2,Z↓4 ) ≤ dS(Z2,Z4). Using these relations we will end up with the following inequality

dS(Z1,Z4)2 + dS(Z2, I )

2 ≤ dS(Z1,Z2)2 + dS(Z2,Z4)2 + dS(Z4, I )2 + dS(I ,Z1)2 (2–28)

Applying the group action, R−1, asserts that

dS(I ,X4)2 + dS(D

↓2 ,X3)

2 ≤ dS(I ,D↓2)2+ dS(D

↓2 ,X4)

2+ dS(X4,X3)

2 + dS(X3, I )2 (2–29)

29

Finally, we will use the group action P−1 to get the original quadruple

dS(A1,A4)2+ dS(A2,A3)

2 ≤ dS(A1,A2)2+ dS(A2,A4)2+ dS(A4,A3)2+ dS(A3,A1)2 (2–30)

which completes the proof.□

2.3.2 Discussion

If Pn equipped with the Stein metric were a global Non-Positive Curvature (NPC)

space [46], Sturm shows that Mk+1 resulted in 2–16 converges to the unique Stein

expectation as k → ∞ [46]. Unfortunately, as shown in this section, it is not a geodesic

space, and consequently not a global NPC space. Therefore, the proof of convergence

for our case requires further efforts. However, we present empirical evidence for 100

SPD matrices randomly drawn from a log-Normal distribution to indicate that the

incremental estimates of the Stein mean converge to the batch mode Stein mean (see

Fig. 2-3).

2.4 Experiments

In this section, we present several synthetic and real data experiments. All of the

execution times reported in this section are for experiments performed on a machine

with a 2.67GHz Intel-7 CPU with 8GB RAM.

2.4.1 Performance of the Incremental Stein Center

To illustrate the performance of the proposed incremental algorithm, we generate

100 i.i.d samples form a Log-normal distribution [41] on P3 with the variance and

expectation set to 0.25 and the identity matrix respectively. Then, we input these random

samples to the incremental Stein based mean estimator (ISM) and its non-incremental

counterpart (SM). To compare the accuracy of ISM and SM we compute the Stein

distance between the ground truth and the computed estimate. Further, the computation

time for each newly acquired sample is recorded. We repeat this experiment 20 times

and plot the average error and the average computation time at each step. Fig. 2-3

depicts the accuracies of ISM and SM in the same plot. It can be seen that for the

30

Figure 2-3. Error comparison of the incremental (red) versus non-incremental (blue)Stein mean computation for data on P3.

given 100 samples, as desired, the accuracy of the incremental and non-incremental

algorithms are almost the same. It should be noted that ISM computes the new mean

by a simple matrix operations, e.g., summations and multiplications, which makes it very

fast for any number of samples. This means that the incremental Stein based mean

is computationally far more efficient, especially when the number of samples is very

large and the samples are input incrementally, for example as in clustering and some

segmentation algorithms.

2.4.2 Application to K-means Clustering

In this section we evaluate the performance of our proposed incremental algorithm

applied to K-means clustering. The two fundamental components of the K-means

algorithm at each step are: (i) distance computation and (ii) the mean update. Due

to the computational efficiency involved in evaluating the Stein metric, the distances

can be efficiently computed. However, due to the lack of a closed form formula for

computing the Stein mean, the cluster center update is more time consuming. To tackle

this problem we employ our incremental Stein mean estimator.

31

Figure 2-4. Time comparison of the incremental (red) versus non-incremental (blue)Stein mean computation for data on P3.

To this end, at the end of each K-means iteration, only the matrices that change

cluster membership in previous iteration are considered. Then, each cluster center

is updated only by applying the changes imposed by the matrices that most recently

changed cluster memberships. For instance, let C i1 and C i2 be the centers of the first

and second clusters, at the end of the i -th iteration. Also, let X be a matrix which has

moved from the first cluster to the second one. Therefore, we can directly update C i1 by

removing X from it to get C i+11 , and adding X to C i2 in its update, to get C i+12 . This will

significantly decrease the computation time of the K-means algorithm, especially for

huge datasets. This process is shown in Fig. 2-5.

To illustrate the efficiency resulting from using our proposed incremental Stein

mean (ISM) update, we compared its performance to the non-incremental Stein

mean (SM), as well as the following three widely used mean computation techniques:

Frechet mean (FM), symmetric Kullback-Leibler mean (KLsM) and Log-Euclidean

(LEM) mean. Furthermore, to show the effectiveness of the Stein metric in K-means

distance computation, we included comparisons to the following recursive mean

32

Figure 2-5. Illustration of the incremental mean updates in K-means clustering.

estimators recently introduced in literature: Recursive Log-Euclidean mean (RLEM) [54],

Incremental Frechet Expectation Estimator (IFEE) and Recursive KLs mean (RKLsM)

in [6]. We should emphasize that for each of these mean estimators we used the

corresponding distance/divergence in the K-means algorithm.

The efficiency of the proposed K-means algorithm is investigated in the following

set of experiments. We tested our algorithm in three different scenarios namely, with

increasing (i) number of samples, (ii) matrix size, and (iii) number of clusters. For each

scenario we generated samples from a mixture of Log-normal distributions, where the

expectation of each component is assumed to be the true cluster center. To measure the

error in clustering, we compute the geodesic distance between each estimated cluster

center and its true value, and take the summation of error values over all clusters.

Fig. 2-6 depicts the time comparison between the aforementioned K-means

clustering techniques. It is clearly evident that the proposed method (ISM) is significantly

faster than other competing methods, in all the aforementioned settings of the

33

experiment. There are two reasons that support the time efficiency of ISM: (i) incremental

update of the Stein mean, which is achieved via the closed form expression in Eq. 2–16,

(ii) fast distance computation, by exploiting the Stein metric, as the Stein distance is

computed using a simple matrix determinant followed by a scalar logarithm, while

the Log-Euclidean, GL-invariant Riemannian distances and the KLs divergence,

require complicated matrix operations, e.g., matrix logarithm, inverse and square

root. Consequently, it can be seen in Fig. 2-6 that for large datasets, the recursive

Log-Euclidean, Frechet and KLs mean methods are as slow as their non-recursive

counterparts, since a substantial portion of time is consumed in the distance computation

task involved in the algorithm.

Furthermore, Fig. 2-7 depicts the error defined earlier, for each experiment. It can

be seen that, in all the cases, the accuracy of the ISM estimator is very close to the

other competing methods, and in particular to the non-incremental Stein mean (SM) and

Frechet mean (FM). Thus, accuracy wise, the proposed ISM estimator is as good as the

best in the class but far more computationally efficient. These experiments verify that the

proposed incremental method is a computationally attractive candidate for the task of

K-means clustering in the space of SPD matrices.

2.4.3 Application to Image Retrieval

In this section, we present results of applying our incremental Stein mean estimator

to the image hashing and retrieval problem. To this end, we present a novel hashing

function which is a generalization of spherical hashing applied to SPD matrices. The

spherical hashing was introduced in [20] for binary encoding of large scale image

databases. However, it can not be applied as is (without modifications) to the space of

SPD matrices, since it has been developed for inputs in a vector space. In this section

we describe our extensions to the spherical hashing technique in order to deal with

SPD matrices (which are elements of a Riemannian manifold with negative sectional

curvature).

34

Figure 2-6. Time comparison of the K-means clustering using various methods. Figure(a) is the result for increasing number of clusters, with 1000 samples on P2.In (b) the database size is increased from 400 to 2000, with 5 clusters, onP2. Finally, in (c) the matrix dimension is increasing with 1000 samples and 3clusters.

35

Figure 2-7. Error comparison of the K-means clustering using techniques specified inFig. 2-6. (a), (b) and (c) are the results for varying number of clusters,number of samples and matrix dimensions, respectively.

36

Given a population of SPD matrices, our hashing function is based on the distances

to a set of fixed pivot points. Let P1,P2, ...,Pk be the set of produced pivot points for the

given population. The hashing function is denoted by H(X ) = (h1(X ), ..., hk(X )), with X

being the given SPD matrix, and each hi defined by

hi(X ) =

0 if dist(Pi ,X ) > ri

1 if dist(Pi ,X ) ≤ ri(2–31)

where dist(., .) denotes any distance defined on the manifold of SPD matrices. The

value of hi(X ) illustrates whether the given matrix X is inside the geodesic ball formed

around Pi , with the radius ri . In our experiments we used the Stein distance defined in

Equation (2–5), because it is more computationally appealing for large datasets.

An appropriate choice of pivot points as well as radii is crucial to guarantee the

accuracy of the hashing. In order to locate the pivot points we have employed the

K-means clustering based on the Stein mean, which was discussed in Section 2.4.2.

Furthermore, the radius ri is picked such that for the hashing function, hi satisfies,

Pr [hi(X ) = 1] =1

2(2–32)

which guarantees that each geodesic ball contains half of the samples. Based on this

framework, each member of a set of (n × n) SPD matrices is mapped to a binary code

with the length k . To measure similarity/dissimilarity between binary codes the spherical

Hamming distance described in [20] is used.

In order to evaluate the performance of the proposed incremental Stein mean

algorithm in this image hashing framework, we first located the pivot points by exploiting

four of the K-means clustering techniques discussed in Section 2.4.2: ISM, SM, IFEE

and RLEM. Then, the retrieval precision for each method is measured and compared.

Experiments were performed on the COREL image database [29], which contains

10K images categorized into 80 classes. For each image a set of feature vectors were

37

computed of the form

f = [Ir , Ig, Ib, IL, IA, IB , Ix , Iy , Ixx , Iyy , |G0,0(x , y)|, ..., |G2,1(x , y)|] (2–33)

where the first three components represent the RGB color channels, the second three

encode the Lab color dimensions, and the next four specify the first and second order

gradients at each pixel. Further, as in [17], the Gu,v(x , y) represent the response of a 2D

Gabor wavelet, centered at (x , y) with scale v and orientation u. Finally, for the set of N

feature vectors extracted from each image, f1, f2, ..., fN , a covariance matrix was created

using

Cov =1

N

N∑1

(fi − f )(fi − f )T (2–34)

where f is the mean vector. Therefore, from this dataset ten thousand 16×16 covariance

matrices were extracted.

To compare the time efficiency, we record the total time to compute the pivots,

and also to find the radii, for each aforementioned technique. Furthermore, a set of

1000 random queries were picked from the dataset, and for each query its 10 nearest

neighbors were retrieved based on the spherical Hamming distance. The retrieval

precision for each query was measured by the number of correct matches to the total

number of retrieved images, namely 10. Total precision is then computed by averaging

these accuracies.

Fig. 2-8 shows the time taken by each method. As expected, it can be observed

that the incremental Stein mean estimator significantly outperforms other methods,

especially for longer binary codes. The incremental framework provides an efficient way

to update the mean covariance matrix. Further, IFEE which is based on the GL-invariant

Riemannian metric is much more computationally expensive than our incremental

Stein method. Fig. 2-9 depicts the accuracy for each technique. It can be seen that

the incremental Stein mean estimator provides almost the same accuracy as the

non-incremental Stein as well as the IFEE . Therefore, the accuracy and computational

38

Figure 2-8. Time consumption in initializing hashing functions, for incremental Steinmean (ISM), non-incremental Stein mean (SM), recursive LogEuclideanmean (RLEM) and Incremental Frechet expectation estimator (IFEE ), overincreasing binary code lengths.

efficiency of our proposed method makes it an appealing choice for image indexing and

retrieval on huge datasets. Fig. 2-10 shows the outputs of the proposed system for four

sample queries. Note that all of the retrieved images shown in Fig. 2-10 belong to the

same class in the provided ground truth.

2.4.4 Application to Shape Retrieval

In this section, the image hashing technique presented in Section 2.4.3 is evaluated

in a shape retrieval experiment, using the MPEG-7 database [27], which consists of 70

different objects with 20 shapes per object, for a total of 1400 shapes. To extract the

covariance features from each shape, we first partition the image into four equal areas

and compute the 2 × 2 covariance matrices constructed from (x , y) coordinates of the

edge points, in each region. Finally, we combined these matrices into a single block

diagonal matrix, resulting in an 8× 8 covariance descriptor.

39

Figure 2-9. Comparison of retrieval accuracy, for techniques specified in Fig. 2-8

Table 2-1. Average shape retrieval precision (%) for the MPEG7 database, for differentBinary Code (BC) lengths.

BC Length ISM SM IFEE RLEM

64 60.67 62.10 61.46 61.15128 63.59 64.65 64.69 63.23192 69.69 69.63 70.10 68.19256 73.13 73.13 73.84 70.14

We used the same methods as in Section 2.4.3 to compare the shape retrieval

speed and precision. Table 2-1 contains the retrieval precision comparison, and it can be

seen that the ISM provides roughly the same retrieval accuracy as IFEE, while table 2-2

shows that ISM is significantly faster than all the competing methods.

40

Figure 2-10. Example results of proposed retrieval system, based on the incrementalStein mean, with 640-bits binary codes. The leftmost column in each rowrepresents the query image, and the rest of the columns show the 5 mostsimilar images retrieved. The retrieved images are sorted in increasingorder with respect to the Hamming distance to the query, where theHamming distance is specified below each image.

Table 2-2. Time (in seconds) comparison for shape retrieval.

BC Length ISM SM IFEE RLEM

64 48.76 104.61 381.14 397.66128 53.44 185.80 366.60 415.62192 89.04 189.89 380.41 397.66256 105.33 196.61 368.63 398.23

41

CHAPTER 3INCREMENTAL FRECHET MEAN ESTIMATOR ON SPHERE

3.1 Background

In many applications in computer vision, machine learning and medical imaging,

the data lies on sphere. To mention a few, the directional data which often appear in

computer vision are points on the unit sphere S2 [33]. Furthermore, any 3 × 3 rotation

matrices can be parameterized by unit quaternions which can be represented by points

on the 3-dimensional unit sphere S3 [18]. Also, the square root density functions are

points on a hyper-sphere embedded in an infinite dimensional Hilbert space [45].

In most of the aforementioned applications, mean computation is a fundamental

component. For instance, in the interpolation and smoothing of Orientation Distribution

Functions (ODFs) [8], estimation of the mean rotation from several corresponding pair

of points in multi-view geometry [18], and statistical analysis of directional data [33].

The Riemannian geometry of the sphere have been well-studied in the past decades

[11, 38]. Given, a set of n points, X1,X2, ...,Xn, on the sphere, the Riemannian center of

mass, M, is defined as the (global) minimizer of the sum of squared geodesic distances,

M = argminY

n∑i=1

d2(Xi ,Y ) (3–1)

where d(.) is the intrinsic distance defined on sphere. We will henceforth refer to this

center of mass by Frechet mean, as opposed to the Karcher mean which is frequently

used in literature, because Karcher mean often refers to a local solution, while Frechet

mean is the global minimizer of this cost function. For detailed discussions we refer

the reader to [1, 24]. It is known that there is no closed form solution for this objective

The material in this chapter with minor changes is going to be submitted to theInformation Processing in Medical Imaging (IPMI), Springer, 2015

42

function, the so called Frechet function, on the sphere, and iterative schemes like

gradient descent must be employed. Therefore, the task of Frechet mean computation

can be computationally expensive, specially for very large datasets.

In this chapter, we propose an incremental method to estimate the Frechet

mean of a set of samples on sphere. The incremental way to update the mean is

computationally efficient, because, given the mean estimated for n samples, Mn,

and the new given sample Xn+1, one can update the mean to Mn+1, in one shot and

no iterative optimization algorithm needs to be employed to compute the new mean

from scratch. Therefore, the incremental technique speeds up the compute time,

significantly. Moreover, an incremental method only needs to keep track of the most

recently computed Frechet mean, and this provides considerable efficiency in space

consumption. Although this significant time/space efficiency comes with the cost of

lower accuracy, the major part of this chapter is devoted to showing that in the limit (over

the number of samples), our incremental technique converges to the true Frechet mean,

for symmetric distributions.

In [6] authors proposed an incremental Frechet mean estimator for the manifold of

(n × n) SPD matrices, denoted by P(n), and provided the convergence analysis of the

incremental estimator to the true Frechet mean. However, it is known that the space of

SPD matrices is a Riemannian manifold with non-positive sectional curvature [34], while

sphere is an example of positively curved Riemannian manifolds [38]. This does indeed

make a significant difference to proving the convergence. Specially, the following two

items are the most important obstacles in extending the convergence analysis in [6] to a

similar estimator on sphere:

First, the existence and uniqueness of minimizer of Frechet function for a set

of samples on a complete Riemannian manifold with positive sectional curvature, is

not guaranteed [1]. This is a consequence of the fact that the Frechet function is not

necessarily convex on the entire manifold. Several authors tried to restrict the geodesic

43

ball containing the data points to guarantee the convexity of the Frechet function

[1, 25].It was shown in [25] that if the sample points belong to a geodesic ball with radius

π2

on a unit sphere Sk , the (L2) minimizer of the Frechet function will exist and will be

unique. Therefore, in the rest of the chapter we assume that the samples belong only to

the (northern) hemisphere of Sk .

Second, the well-known parallelogram law in Euclidean space has its counterpart,

the so called semi-parallelogram law, in any complete negatively curved Riemannian

manifold, M, [46]; for any pair of points X ,Y ∈ M, there exists a point M ∈ M, such that

∀Z ∈ M, d2(Z ,M) ≤ 12d2(X ,Z) +

1

2d2(Y ,Z)− 1

4d2(X ,Y ) (3–2)

Note that the equality is satisfied only in a Euclidean space. This inequality is of

crucial importance in the convergence analysis of the incremental Frechet mean on

non-positively curved spaces [6, 21, 46]. However, for a positively curved space, e.g.,

sphere, the opposite inequality holds, hence, further efforts must be made to prove the

convergence of incremental Frechet mean estimator on sphere.

To the best of our knowledge, there is no convergence analysis proposed in

literature for the incremental Frechet mean estimator, on any positively curved

Riemannian manifold. In this chapter, we show that the incremental estimator converges

to the true Frechet mean in the limit over the number of samples. We employ the

well-known concept of Gnomonic Projection in computer vision [22] to project the

sample points to a (linear) projection space, in order to simplify the convergence proof.

The rest of this chapter is organized as follows. In section 3.2 we briefly introduce

the Riemannian geometry of sphere as well as gnomonic projection, and provide the

notations that are used in the rest of the chapter. The main convergence result will be

provided in section 3.3, along with the necessary theorems and lemmas. Finally, section

3.4 contains the experiments illustrating the efficiency and accuracy of our incremental

method.

44

3.2 Preliminaries

3.2.1 Riemannian Geometry of Sphere

Here, we provide a brief introduction to the Riemannian geometry of sphere. For

more details, reader is referred to [8, 45]. Let Sk denote the k-dimensional unit sphere,

embedded in Rk+1, i.e., Sk = {X ∈ Rk+1|||X || = 1}, where ||.|| is the L2 norm of

a vector. It is evident that sphere is not closed under vector operations, e.g., given

X ,Y ∈ Sk , X + Y does not necessarily belong to Sk , hence it is not a vector space, but

a Riemannian metric space with positive constant sectional curvature [38]. Let TXSk

denote the tangent space of Sk , at point X . For any two tangent vectors U,V ∈ TXSk ,

the inner product between U = [u1, u2, ..., uk+1] and V = [v1, v2, ..., vk+1] is defined by:

< U,V >=

k+1∑i=1

uivi (3–3)

The curve length on sphere can be measured and the geodesic distance between

any given points X ,Y ∈ Sk can be computed by

d(X ,Y ) = cos−1(< X ,Y >) (3–4)

The exponential map of a given vector V ∈ TXSk is defined by

ExpX (V ) = X cos(||V ||) +V

||V ||sin(||V ||) (3–5)

and the log map of Y ∈ Sk at any point X ∈ Sk is obtained by

LogX (Y ) =Y − X cos(ϕ)

||Y − X cos(ϕ)||ϕ (3–6)

where ϕ =< X ,Y >. Using the exponential and log map, the geodesic curve

between any pair of points X ,Y ∈ Sk is given by

γ(t) = X#tY = ExpX (tLogX (Y )) (3–7)

45

with γ(0) = X and γ(1) = Y . The geodesic curve is a part of the great circle, i.e.,

circle with unit radius, that connects X and Y .

Using the geodesic distance provided above, one can define the Frechet mean

of a set of points on sphere as the minimizer of sum of squared geodesic distances.

Formally speaking, let X1,X2, ...,Xn ∈ Sk be n given points. Then, the Frechet mean is

defined by:

µ∗ = argminµ∈Sk

n∑i=1

d2(Xi ,µ) (3–8)

Let B(C , ρ), be the geodesic ball centered at C with radius ρ, i.e., B(C , ρ) = {Q ∈

Sk |d(C ,Q) < ρ}. Authors in [1] showed that for any C ∈ Sk and for data samples in

B(C , π2), the minimizer of the Frechet function exists and is unique (and also belongs

to B(C , π2)). Therefore, in the rest of the chapter, we assume that this condition is

satisfied for any set of given points, Xi . For simplicity, we are particularly interested in

the samples belonging to the northern hemisphere, in which case C is the north pole,

e.g., C = [0, 0, 1] ∈ S2, and ρ = π2. Note that based on the strict inequality in definition of

B(C , ρ); d(C ,Q) < π2, hence the equator is excluded from the geodesic ball.

3.2.2 Gnomonic Projection

On a unit k-dimensional sphere Sk , the Gnomonic Projection of any point X ∈ Sk ,

is defined as the intersection of the tangent plane at the north pole and the line which

passes through the origin, i.e., O = [0, 0, ..., 0], and X [22]. For instance, in Fig. 3-1, xn+1

is the projection of Xn+1 ∈ Sk .

The gnomonic projection is not well-defined for the points on the equator, because

they are projected to infinity in the tangent plane, but this will not affect our statistical

analysis, since we assume that the data points belong to the hemisphere, with the

equater being excluded.

Using this gnomonic projection, the geodesic curve between any pair of points,

X and Y , on the hemisphere is projected to a straight line connecting x and y in the

46

Figure 3-1. Gnomonic Projection

projection space [18], where x and y are the projections of X and Y , respectively. We

employed the gnomonic projection to simplify the statistical analysis of points on sphere.

3.3 Incremental Frechet Mean Estimator on Sphere

With the background materials established so far, we are now ready to present

our incremental Frechet Mean Estimator (iFME) on sphere. The proposed method is

motivated by the idea in [6] which is similar to the Euclidean case; given the old mean,

Mn−1, and the new sample, Xn, define the new mean, Mn, as the weighted mean of Mn−1

and Xn with the weights being n−1n

and 1n, respectively. From a geometric viewpoint, this

corresponds to the choice of the point on geodesic curve between Mn−1 and Xn, with the

parameter t = 1n.

Formally speaking, let X1,X2, ...,XN be a set of N samples on sphere Sk , which all

belong to the geodesic ball B(C , π2), and C is the north pole. Also, let Mn be the iFME

47

estimate for nth given sample, Xn, which is defined by:

M1 = X1 (3–9)

Mn = Mn−1# 1nXn (3–10)

where A#tB is the geodesic curve parameterized by t, from A to B (∈ Sk ), and 1n

is

our weighting scheme which is henceforth called the Euclidean weight. In the rest of the

chapter, we will show that if the number of given samples, N, tends to infinity, the iFME

estimates will converge to the Frechet mean of the distribution from which the samples

are drawn..

Our strategy is based on the idea of projecting the spherical samples, Xi , to

the tangent plane and perform the convergence analysis on this linear space on the

projected samples, i.e., xi, instead. We take advantage of the fact that the geodesic

curve between any pair of points on hemisphere, is projected to a straight line in the

tangent space at the north pole, via the gnomonic projection [18]. According to the

law of large numbers in Euclidean space [3], the arithmetic mean of a set of samples

converges to the mean of the distribution from which the samples are drawn, as number

of samples tends to infinity.

Despite the simplifications followed in the statistical analysis of iFME estimates

on sphere using gnomonic projection, there are two important obstacles that must be

considered. Suppose the true Frechet mean of the input samples, Xi , is the north pole.

Then, it can be shown by counter examples that:

(1) The use of Euclidean weights, 1n, to update the iFME estimates on Sk , does not

necessarily correspond to the same weighting scheme between the old mean andthe new sample, in the projection space.

(2) The mean of the projected samples, xi’s, does not necessarily coincide with thenorth pole.

The first fact above can be illustrated using two sample points on a unit circle

(S1), X1 = π/6 and X2 = π/3, whose midpoint is M = π/4. Then, the midpoint

48

Figure 3-2. Illustration of the counterexample showing that the use of Euclidean weightsto update iFME in Sk , does not necessarily correspond to the same weightsin the tangent space.

of the gnomonic projections of X1 and X2, which are denoted by x1 and x2, is m =tan(π/3)+tan(π/6)

2= 1.1547 = tan(π/4) = m (see Fig. 3-2).

To observe the second fact, consider three points, X1,X2,X3, in S1, respectively

equal to π/4, π/12 and −π/3 (Fig. 3-3). Although the Frechet mean of these points

is located at the north pole (c), the arithmetic mean of the gnomonic projections, c,

is not. Nevertheless, in Lemma 1, we will show that for the sample points which are

symmetrically distributed around the north pole, the mean of the projected samples

coincides the north pole.

Lemma 1. For a set of samples, Xi ∈ Sk which are symmetrically distributed

around the north pole, C , the arithmetic mean of the projected points, xi, in the tangent

plane at the north pole, is the north pole. By symmetry we mean that ∀Xi ∈ X =

{X1,X2, ...,XN}, ∃Xj ∈ X, such that, Xi# 12Xj = C .

Proof Sketch. By the symmetry assumption of the input, one can divide the

samples in X, into N2

disjoint pairs of points on Sk , i.e., Pm = {Xm,1,Xm,2}, 1 ≤ m ≤ N2

,

such that ∀m,Xm,1# 12Xm,2 = C , and ∪

N2m=1Pm = X. Then, for the gnomonic projection of

each pair of points, the midpoint coincide the north pole, using the fact that ∀ϕ, tan(ϕ) +

tan(−ϕ) = 0. Therefore, the mean of projected points in the tangent plane will be

49

Figure 3-3. Demonstration of the counterexample to prove that the Frechet mean ofsamples on Sk , does not necessarily coincide with the arithmetic mean ofprojected points in the tangent space.

reduced to the mean of N2

sample points, all located at the north pole. Hence, the result

holds. ■

In the rest of this section, we assume that the population of the samples are

symmetrically distributed around the Frechet mean. Besides, without loss of generality,

we assume that the true Frechet mean of N given samples is located at the north pole.

Since the gnomonic projection space is centered at the north pole, this assumption

makes significant simplifications in our convergence analysis. However a similar

convergence proof can be worked out for any arbitrary Frechet mean, with the projection

space established at the mean location.

In what follows, we prove that the use of Euclidean weights, i.e., wn = 1n, to

update the incremental Frechet mean on sphere, corresponds to a set of weights in the

projection space, denoted henceforth by tn, for which the convergence of incremental

mean to the true Frechet mean, can be shown.

50

3.3.1 Angle Bisector Theorem

The relation between the weights on sphere, and the corresponding weights on the

projection space, can be obtained in closed form, depending upon the point where the

projection space has been anchored.

In Fig. 3-1, Mn and Mn+1 denote the iFME estimates for n and n + 1 given samples,

respectively, and Xn+1 denotes the (n + 1)st sample. Further, mn,mn+1, xn+1 are the

corresponding points in the projection space. Based on the Angle Bisector Theorem [2]:

tn =||mn −mn+1||||xn+1 −mn+1||

=||O −mn||||O − xn+1||

× sin(d(Mn,Mn+1))

sin(d(Mn+1,Xn+1))(3–11)

where d(.) is the geodesic distance on hemisphere. Note that in the standard law

of large number, tn = 1n. In the next sections, we assume that the input samples, Xi , are

within the geodesic ball, B(C ,ϕ), where 0 < ϕ < π/2. Then, we bound the values that tn

can possibly take, with respect to the radius ϕ.

3.3.2 Lower Bound for tn

To find the lower bound for tn, we find the lower bounds for each fraction in right

hand side of Eq. 3–11. The first term reaches its minimum value, if Mn is located at the

north pole, and Xn+1 is located on the boundary of the geodesic ball, B(C ,ϕ). In this

case, ||O −mn|| = 1 and ||O − xn+1|| = 1cos(ϕ)

. This implies that:

||O −mn||||O − xn+1||

≥ cos(ϕ) (3–12)

Next, note that based on the definition of iFME, this second fraction in 3–11 can be

rewritten as:

sin(d(Mn,Mn+1))

sin(d(Mn+1,Xn+1))=

sin(d(Mn,Mn+1))

sin(n × d(Mn,Mn+1))=

1

Un−1(cos(d(Mn,Mn+1)))(3–13)

51

where Un−1(x) is the Chebyshev polynomial of the second kind [42]. For any

x ∈ [−1, 1], the maximum of Un−1(x) is reached when x = 1, for which Un−1(1) = n.

Therefore, Un−1(x) ≤ n and 1Un−1(x)

≥ 1n. This implies that:

sin(d(Mn,Mn+1))

sin(n × d(Mn+1,Mn+1))=

1

Un−1(cos(d(Mn,Mn+1)))≥ 1n

(3–14)

From inequalities 3–12 and 3–14,

tn ≥cos(ϕ)

n(3–15)

Note that when ϕ tends to zero, cos(ϕ) converges to one, and the above ratio tends

to 1n, which is the case in Euclidean space. On the other hand, if ϕ tends to π

2, then

cos(ϕ) tends to zero, and this ratio becomes very small.

3.3.3 Upper Bound for tn

First, the upper bound for the first term in 3–11 is reached when Mn is on the edge

of geodesic ball, and Xn+1 is given at the north pole. Therefore,

||O −mn||||O − xn+1||

≤ 1

cos(ϕ)(3–16)

Finding the upper bound for the sin term however is quite involved. Note that the

maximum of the angle between OMn and OXn+1, denoted by α, is reached when Mn and

Xn+1 are both on the edge of the geodesic ball, i.e., α ≤ 2ϕ. Therefore, ϕ ∈ [0, π2) implies

that α ∈ [0,π).

Further, it has been shown in the Appendix that the following inequality holds for any

α ∈ (0, π).

sin( nαn+1)

sin( αn+1)≥ n cos2(α

2) = n cos2(ϕ) (3–17)

From 3–16 and 3–17,

52

tn ≤1

cos(ϕ)3n(3–18)

In summary, we showed that once iFME algorithm is employed using Euclidean

weights on the sphere, the sequence of the corresponding weights, tn, in the projection

space satisfy the following inequality. In the next section, we prove the main theorem of

convergence, using these bounds.

cos(ϕ)

n≤ tn ≤

1

cos(ϕ)3n(3–19)

3.3.4 Convergence of iFME

So far, we have shown analytical bounds for the sequence of weights, tn, on

projection space, corresponding to Euclidean weights on sphere (Eq. 3–19). We

now prove the convergence of iFME estimates to the true Frechet mean of samples,

when the sample size tends to infinity. We first show that the incremental mean in the

projection space using tn, is unbiased.

Theorem 1. Let x1, x2, ... be i.i.d. samples from a distribution in Rk . Also, let mn be

the incremental estimate corresponding to nth given sample, xn, which is defined by: (i)

m1 = x1, (ii) mn = tnxn + (1− tn)mn−1. Then, mn is an unbiased estimator of E [x].

Proof. For n = 2; m2 = t2x2+(1− t2)x1, hence E [m2] = t2E [x]+ (1− t2)E [x] = E [x].

Now, by induction hypothesis E [mn−1] = E [x]. Then, E [mn] = tnE [x] + (1− tn)E [x] =

E [x], hence the result. ■

Theorem 2. Let var [mn] denotes the variance of the nth incremental estimate

(defined above), with cos(ϕ)n

≤ tn ≤ 1cos(ϕ)3n

, ∀ϕ ∈ [0, π/2). Then, ∃p ∈ (0, 1], such thatvar [mn]var [x]

≤ (np cos6(ϕ))−1.

First note that var [mn] = t2nvar [x]+(1−tn)2var [mn−1]. Since, 0 ≤ tn ≤ 1, one can see

that var [mn] ≤ var [x] for all n. Besides, for each n, the maximum of the right hand side is

achieved, when tn attains either its minimum or its maximum value. Therefore, we need

53

to prove the theorem for the following two values of tn, (i) tn = cos(ϕ)n

and (ii) tn = 1n cos3(ϕ)

.

These two cases will be discussed in Lemma 2 and Lemma 3, respectively.

Lemma 2. With the same assumptions as in Theorem 2, and tn = 1n cos3(ϕ)

, ∀n and

∀ϕ ∈ [0, π/2), the following inequality is satisfied: var [mn]var [x]

≤ (n cos6(ϕ))−1.

Proof. For n = 1, var [m1] = var [x] which yields the result, since cos(ϕ) ≤ 1. Now,

assume by induction that var [mn−1]var [x]

≤ (n − 1) cos6(ϕ))−1. Then,

var [mn]

var [x]= t2n + (1− tn)2

var [mn−1]

var [x]≤ t2n + (1− tn)2

1

(n − 1) cos6(ϕ)

≤ 1

cos6(ϕ)n2+ (1− 1

cos3(ϕ)n)2 × 1

(n − 1) cos6(ϕ)

≤ 1

cos6(ϕ)n2+ (1− 1

n)2 × 1

(n − 1) cos6(ϕ)

=1

cos6(ϕ)n2+

n − 1n2 cos6(ϕ)

=1

n cos6(ϕ)

(3–20)

■

Lemma 3. With the same assumptions as in Theorem 1, and tn = cos(ϕ)n

, ∀n and

∀ϕ ∈ [0, π/2), the following inequality is satisfied: var [mn]var [x]

≤ n−p for some 0 < p ≤ 1..

Proof. For n = 1, var [mn] = var [x] which yields the result, since cos(ϕ) ≤ 1. Now,

assume by induction that var [mn−1]var [x]

≤ (n − 1)−p. Then,

var [mn]

var [x]= t2n + (1− tn)2

var [mn−1]

var [x]≤ t2n + (1− tn)2

1

(n − 1)p

≤ cos2(ϕ)

n2+(n − cos(ϕ))2

n2× 1

(n − 1)p

=(n − 1)p cos2(ϕ) + cos2(ϕ)− 2n cos(ϕ) + n2

n2(n − 1)p

(3–21)

Now, it suffices to show that the numerator of the above expression is not greater

than n2−p(n − 1)p. In other words:

(n − 1)p cos2(ϕ) + cos2(ϕ)− 2n cos(ϕ) + n2 − n2−p(n − 1)p ≤ 0 (3–22)

54

The above quadratic function with respect to cos(ϕ) is less than zero, when

n(1− (n − 1)p/2

√(n−1n)p + 1

np− 1

1 + (n − 1)p) ≤ cos(ϕ) ≤ n(

1 + (n − 1)p/2√(n−1n)p + 1

np− 1

1 + (n − 1)p)

(3–23)

The inequality in right is satisfied for all cos values. Besides, it is easy to see that

the function in the left hand side is increasing w.r.t. n, hence attains its minimum over all

n > 1, when n = 2. This implies that:

1−√21−p − 1 ≤ cos(ϕ)

→ ϕ ≤ cos−1(1−√21−p − 1)

→ 0 < p ≤ 1− log2[(1− cos(ϕ))2 + 1]

(3–24)

Note that p > 0, for all ϕ < π/2.

■

Proof of Theorem 2. With the above two results, it is easy to see that ∀ϕ ∈ [0,π/2),

there exists a p satisfying 0 < p ≤ 1, such that

- If tn = cos(ϕ)n

, then var [mn]var [x]

≤ 1np

≤ 1np cos6(ϕ)

, because cos(ϕ) ≤ 1.

- If tn = 1n cos3(ϕ)

, then var [mn]var [x]

≤ 1n cos6(ϕ)

≤ 1np cos6(ϕ)

, because p ≤ 1.

These two pieces together complete the proof of convergence.

■

The inequality in Theorem 2 implies that when n → ∞, for any ϕ ∈ [0, π/2) the

variance of iFME estimates in the projection space tends to zero. Besides, when ϕ

approaches π/2, the corresponding power of n, as well as cos(ϕ), become very small,

hence the ratio of convergence gets slower.

55

3.4 Experiments

3.4.1 Synthetic Experiments

We now evaluate the effectiveness of iFME algorithm, compared to the non-incremental

Frechet Mean (FM) of a set of samples on sphere, using synthetically generated data.

To this end, a set of samples, Xi ∈ S2, are generated on the boundary of the geodesic

ball, B(C ,ϕ), where ϕ < π/2, and C is the north pole.

Note that the value of ϕ controls the variance of the input samples. Further, the

variance of any given set of samples on the boundary of B(C ,ϕ) can be computed in

closed form and is equal to Var [X ] = ϕ2, since ∀i , d(Xi ,C) = ϕ.

We tried 4 different values of ϕ, i.e., ϕ ∈ {0.70, 1, 1.21, 1.40}. For each value of ϕ, a

set of 20 points are randomly picked on the boundary of B(C ,ϕ), and fed into both iFME

and FM algorithms. Because of the randomness in generating the samples, we repeated

this experiment 100 times for each ϕ.

Let iFMn,i and FMn,i respectively denote the iFME and FM estimates of the mean,

for n given samples, in i th trial, where 1 ≤ i ≤ 100 and 1 ≤ n ≤ 20. Therefore, for each

number of samples, we obtain a population of iFME and FM estimates, from different

trials. Accordingly, for both methods, we are able to compute the ratio of the estimator

variance to the data variance, i.e., for any 1 ≤ n ≤ 20,

iRn = (1

Var [X ])(1

100

100∑i=1

d2(iFMn,i ,C))

Rn = (1

Var [X ])(1

100

100∑i=1

d2(FMn,i ,C))

(3–25)

where iRn and Rn are the ratio of variances for iFME and FM, respectively, and

Var [X ] = ϕ2 (see above).

Note that if iRn tends to zero for large values of n, then variance of iFME tends to

zero, hence iFME estimates converge to the true Frechet mean. We want to emphasize

56

Figure 3-4. The comparison of the ratio of variances (defined in Eq. 3–25) betweeniFME and FM, for different values of ϕ.

that in a Euclidean space, Rn = iRn = 1n, for any population of sample points. Besides,

in [6, 21] it was shown that for non-positively curved spaces, e.g., P(n), the following

inequality holds for any n, iRn ≤ 1n.

Fig. 3-4 illustrates the ratios defined in Eq. 3–25 for iFME and FM, over different

values of ϕ. It is evident from the plots that the iFME’s ratio is close to the non-incremental

version, i.e., FM, specially for smaller ϕ’s. In the right-most column, ϕ = 1.4 which is

relatively close to π/2 and the input variance is very large. It can be seen that even in

this case, iFME is still competitive to FM, with respect to the accuracy.

Fig. 3-5 compares the time consumptions of iFME and FM, in the above experiments.

We need to emphasize that the FM computes the mean iteratively, and its speed

depends upon the initial value. Therefore, in order to make a fair comparison, for each

new sample Xn, we used FMn−1 as the initial value of the gradient descent method, to

compute the mean over the augmented dataset. From the figure, one can see that iFME

is significantly faster than FM, specially for large number of samples. More importantly,

the time consumed by iFME for all values of ϕ, remains roughly the same, while FM

gets considerably slower when the sample variance increases. This is not surprising,

because our incremental method updates the mean in one shot, while FM re-computes

the mean from scratch. It also worths mentioning that for n = 2, the Frechet Mean can

be computed in closed form, and no iterative scheme is needed. This justifies the jumps

in the time plots of FM in Fig. 3-5.

57

Figure 3-5. The time comparison between iFME and FM, for different values of ϕ.

3.4.2 Application to Incremental Shape-Preserving Frechet Mean of SPD Matri-ces

In this section, we illustrate the effectiveness and accuracy of iFME on sphere, in

the shape preserving Frechet mean computation of a group of 3 × 3 SPD matrices. As

described earlier, the space of n × n SPD matrices , denoted by P(n), is not a vector

space, but a Riemannian manifold with negative sectional curvature [47].

The Frechet mean is defined as the minimizer of the sum of squared geodesic

distances on P(n) [34]. Authors in [6] proposed an incremental method to estimate

the Frechet mean on P(n), and provided the convergence results, in the limit over

the number of samples. However, it is known that the Frechet mean on P(n) does

not necessarily preserve the diffusion anisotropy which depends on the shape of the

tensor. For a more detailed discussion, we refer the reader to Fig. 1 in [50]. In many

applications including interpolation of diffusion MR data [4], it is more appealing to

compute a shape preserving mean, over the given population.

The idea of separating shape and orientation in the diffusion data was motivated

by the authors in [35] and later in [4]. More recently, Wang et al [50], applied this idea to

3× 3 diffusion tensors and presented a Kalman filter on this new product manifold.

The eigen-decomposition of a 3× 3 SPD matrix, D, is D = UΛUT , where U belongs

to the space of 3× 3 special orthogonal matrices, denoted by SO(3), and Λ is a diagonal

matrix, with positive elements. The matrix Λ controls the shape of the tensor, and U

58

models the orientation. Following the idea in [4], we break down the mean computation

of SPD matrices, into the separated mean computation of orientations and shapes.

We now present a novel incremental shape-preserving mean for a group of

3 × 3 SPD matrices. First, the mean of the positive diagonal elements of the shape

components can be computed incrementally, as the space of such matrices is

isomorphic to R+3. Besides, the elements in SO(3) can be parameterized by unit

quaternions which belong to the northern hemisphere in a 3-dimensional unit sphere, S3

[18], hence our iFME technique is applicable to these elements.

Formally speaking, let X1,X2, ... be a population of matrices in P(3). Also, assume

that U∗n−1 and Λ∗n−1, respectively, denote the orientation and shape components of the

incremental mean of n − 1 given samples. Then,

U∗n = U

∗n−1# 1

n

Un (3–26)

where Un is the orientation part of the sample Xn. Further, the mean of the shape

part, Λ∗n, is updated using geometric mean of the diagonal elements.

We evaluated the accuracy of this novel incremental estimator in a synthetic data

experiment. A set of 150 SPD matrices on P(3) are randomly generated, in the following

manner; the shape component of each tensor is assigned 1 + r , 0.25 + r and 0.25 + r

to its diagonal element, where r ∈ [0, 0.1] is picked randomly. Moreover, the orientation

part was sampled from a log-Normal distribution on S3, centered at [1, 0, 0, 0] which

corresponds to the identity rotation matrix, with the variance set to 0.2.

We then input each sample SPD matrix to both iFME on P(3), as well as proposed

shape-preserving iFME on the manifold of shapes and orientations, i.e., SO(3) × R+3.

For each increment, the mean of both methods are computed and are displayed in Fig.

3-6, along with the ground-truth mean. Furthermore, to compare the accuracy of these

two methods, we measured the Fractional Anisotropy (FA) of the output tensor, at each

59

Figure 3-6. Visual comparison of the mean tensor obtained from shape preserving iFMEon the product manifold (top row), and iFME applied on P(3) (bottom row).The rightmost column shows the ground truth.

increment. The FA value for a SPD matrix is a scalar measuring the anisotropy of a

tensor, and is defined by

FA =

√1

2

√(λ1 − λ2)2 + (λ2 − λ3)2) + (λ1 − λ3)2√

λ21 + λ22 + λ33(3–27)

Since the sample matrices were generated with very similar shapes, it is expected

that the FA value of the mean sample does not drastically change. Fig. 3-7 illustrates

the FA values computed from the iFME on P(3) as well as the iFME on the product

manifold. Although both of the incremental techniques are initialized equally, it is evident

that the FA values of iFME on P(3) rapidly drops after only 15 increments. In contrast,

the shape preserving version of iFME remains close to the ground-truth, for any number

of given samples. Fig. 3-6 demonstrates the significant differences between these two

estimates, visually.

Appendix

Lemma1 : For any angle α ∈ (0,π), the following inequality holds:

sin( nαn+1)

sin( αn+1)≥ ncos2(α

2) (3–28)

1 This lemma has been proven by Mr. Rudrasis Chakraborty.

60

Figure 3-7. Comparison of FA values between iFME on P(3), and iFME on the productmanifold. The ground-truth is the incremental geometric mean of thesamples’ FA values, at each increment.

Proof: Let

f = sin(nθ)− ncos2(n + 12

θ) sin(θ), θ ∈ (0, α/(n + 1)), α ∈ (0, π), n ≥ 1 (3–29)

fθ = n cos(nθ)+2n cos(n + 1

2θ) sin(θ) sin(

n + 1

2θ) (n + 1

2)− n cos2(n + 1

2θ) cos(θ) (3–30)

Solving equation 3–30, as θ ∈ (0, π/(n + 1))we get

θ = 0

But,

fθθ|θ=0 = 0

So, we check fθθθ.

fθθθ|θ=0 = −n3 + 1.5n (n + 1)2 + n > 0, n ≥ 1

61

So, at θ = 0, f has a minima where θ ∈ (0,α/(n + 1)).

f |θ=0 = 0 (3–31)

Thus, f ≥ 0 as n ≥1.

Between θ ∈ (0,α/(n + 1)), sin(θ) > 0.

Thus,f

sin(θ)≥ 0 (3–32)

f

sin(θ)=sin(nθ)

sin(θ)− ncos2(n + 1

2θ)

Hence,sin(nθ)

sin(θ)− ncos2(n + 1

2θ) ≥ 0

62

CHAPTER 4IPGA: INCREMENTAL PRINCIPAL GEODESIC ANALYSIS WITH APPLICATIONS TO

MOVEMENT DISORDER CLASSIFICATION

4.1 Background

Principal Geodesic Analysis (PGA) captures variability in the data by using the

concept of principal geodesic subspaces which in this case are sub-manifolds of the

Riemannian manifold on which the given data lie. In order to achieve this goal, it is

required to know the Riemannian structure of the manifold, specifically, the geodesic

distance, the Riemannian log and exp maps and the Frechet mean. For definitions of

Riemannian log and exp maps, the geodesic distance as well the Frechet mean, see

section 4.2. PGA relies on use of the linear vector space structure of the tangent space

at the Frechet mean by projecting all of the data points to this tangent space and then

performing standard PCA in this tangent space followed by projection of the principal

vectors back to the manifold using the Riemannian exp map yielding principal geodesic

subspaces. The representation of each manifold-valued data point in the principal

geodesic subspace has to be achieved by finding the closest (in the sense of geodesic

distance) point in the subspace to the given data point. This however involves a hard

optimization problem. The standard PGA however does a linear approximation by

projecting the given data point to the aforementioned tangent space, finding the closest

point to the principal linear subspace defined by the principal vectors in this tangent

space and then projecting it back to the manifold using the exp map [13, 14, 36]. Exact

PGA reported in literature by several researchers tries to solve this hard optimization

c⃝2014 Springer. Reprinted with minor changes, with permission, from H. Salehian,D. Vaillancourt, and B. C. Vemuri. ”iPGA: Incremental Principal Geodesic Analysis withApplications to Movement Disorder Classification.” In Medical Image Computing andComputer-Assisted Intervention–MICCAI 2014, pp. 765-772. Springer InternationalPublishing, October 2014. [40]

63

without the linear approximation [37, 43]. A generalization of the PGA reported in

[14, 36] to symmetric positive definite diffusion tensor fields was presented in [55]. In

[55], it was demonstrated that the Frechet mean of several given (registered) tensor

fields computed using a voxel-wise Frechet mean over the field is equivalent to the

Frechet mean computed using the Frechet mean in a product space representation

of the tensor fields. However, for higher order statistics, such as variance, such an

equivalence does not hold. This statement however holds for any manifold-valued fields,

not just for the diffusion tensor fields.

When dealing with large amounts of data, specifically, manifold-valued fields e.g.,

diffusion tensor fields, deformation tensor fields, ODF fields etc,, performing PGA can

be computationally quite expensive. That said, if we have a large number of tensor

fields to perform statistical analysis upon, and if we are provided the data incrementally,

rather than performing PGA from scratch in a batch mode each time a new data set is

provided, it would be computationally more efficient to perform PGA once for a given

data pool and then simply update the PGA each time a new data set is provided. To this

end, we propose a novel incremental PGA or iPGA algorithm in which we incrementally

update the Frechet mean and the principal sub-manifolds rather than performing PGA in

a batch mode. This will lead to significant savings in computation time.

In the past few decades, the problem of incrementally updating the PCA has been

well studied in literature e.g., [56]. However, these methods require the data samples to

live in a Euclidean space, and hence are not directly applicable to the PGA problem. On

the other hand, Cheng et al. [6] and Ho et al. [21] have reported incremental algorithms

for computing the Frechet expectation of a given set of SPD matrices. Besides, we

have shown in previous section the convergence of a similar incremental Frechet mean

estimator, for samples living on a sphere.

Our iPGA algorithm is a novel combination of the incremental Frechet expectation

algorithm of [6, 21], and the linearized PGA in [55]. We apply our iPGA to two types

64

of popular manifold-valued data: (1) a group of SPD tensor fields derived from high

angular resolution diffusion magnetic resonance images (HARDI), (2) a population of

samples on a high-dimensional unit sphere, derived from the 3-D shapes. Based on

these two iPGA techniques the classification of patients with movement disorders is

performed. We present synthetic experiments depicting the effectiveness and accuracy

of iPGA, compared to the batch-mode PGA. Furthermore, in the real data experiments,

given 67 human brain HARDI data, our iPGA based nearest neighbor classifier aims

to distinguish between controls, Parkinson’s Disease (PD) and Essential Tremor (ET)

patients. Our results demonstrate the effectiveness of iPGA, compared to the batch

mode scheme.

The rest of the chapter is organized as follows. Section 4.2 contains background

material on differential geometry of the space of SPD tensor fields. Further, a brief

review of the differential geometry of sphere is provided. Next, in section 4.3 the

proposed iPGA techniques applicable to both SPD tensor fields, and the spherical

samples, are described in detail. Moreover, sections 4.4 and 4.5 contain synthetic and

real data experiments, comparing PGA and iPGA with respect to computation time and

accuracy.

4.2 Preliminaries

4.2.1 Riemannian Geometry of the Space of SPD Tensor Fields

The Riemannian geometry of k−dimensional unit sphere, Sk , has been discussed

in section 3.2. Table 4-1 summarizes the Riemannian operations on Sk , as well as the

space of n × n SPD matrices, Pn, for convenience.

Based on the Riemannian geometry of Pn summarized in Table 4-1, we now

briefly introduce the basic relevant concepts of Riemannian geometry of the space

of SPD tensor fields denoted by Pmn following the notation from [55]. For details on

the Riemannian geometry of Pn we refer the reader to [13]. Pn is the space of n × n

symmetric positive definite (SPD) matrices, which is a Riemannian manifold with GL(n),

65

the general linear group as the symmetry group. This can be easily generalized to

Pmn , the product space of Pn using the product Riemannian structure. In particular,

expressions for the Riemannian geodesic distance, log and exponential maps can be

easily derived. Specifically, the group GL(n)m acts transitively on Pmn with the group

action specified by

ϕG(X) = (G1X1GT1 , ... ,GmXmG

Tm ) (4–1)

where each Gi ∈ GL(n) is a n × n invertible matrix and Xi is an n × n positive-definite

matrix. The tangent space of Pmn at any point can be identified with Sym(n)m because

the tangent space of a product manifold is the product of tangent spaces. Let Y,Z ∈

TMPmn be two tangent vectors atM ∈ Pmn . The inner product between two vectors using

the product Riemannian metric is given by,

⟨Y,Z⟩M =m∑i=1

tr(YiM−1i ZiM

−1i ) (4–2)

The Riemannian exponential map atM maps Y the tangent vector, to a point in Pmn and

is given by,

ExpM(Y) =(G1 exp(G

−11 Y1G

−T1 )G

T1 , ... ,Gm exp(G

−1m YmG

−Tm )G

Tm

)(4–3)

where Gi ∈ GL(n) such thatM =(G1G

T1 , ... ,GmG

Tm

).

Given X ∈ Pmn , and the log map atM is given by,

LogM(X) =(G1 log(G

−11 X1G

−T1 )G

T1 , ... ,Gm log(G

−1m XmG

−Tm )G

Tm

)(4–4)

Using this definition of the log map in Pmn , the geodesic distance betweenM and X

is computed as

d(M,X) = ∥LogM(X)∥ =

√√√√ m∑i=1

tr(log2(G−1

i XiG−Ti )

)(4–5)

66

Table 4-1. Summary of Riemannian geometry of the space of n × n positive definitematrices, Pn, as well as the unit k−dimensional sphere, Sk . In the table, X ,Y ∈ Pn and U, V ∈ TXPn. Similarly, x, y ∈ Sk and u, v ∈ TxSk .

Pn Sk

⟨U,V ⟩X = tr(UX−1VX−1) ⟨u, v⟩ =∑k+1i=1 uivi

ExpX (U) = X1/2 exp(X−1/2UX−1/2)X 1/2 Expx(u) = x cos(||u||) + u

||u|| sin(||u||)LogX (Y ) = X

1/2 log(X−1/2YX−1/2)X 1/2 Logx(y) =y−x cos(ϕ)

||y−x cos(ϕ)||ϕ , ϕ = ⟨x, y⟩

dPn(X ,Y ) =√tr(log2(G−1XG−T )

)dSk (x, y) = cos

−1(⟨x, y⟩)γ(t) = ExpX (tLogX (Y )) α(t) = Expx(tLogx(y))

X = argminX∈Pn1N

∑Ni=1 d

2Pn(X ,Xi) x = argminx∈Sk

1N

∑Ni=1 d

2Sk (x, xi)

Using the expression for the geodesic distance given above, we can define the

(intrinsic) mean of N tensor fields as that tensor field which minimizes the following sum

of squared geodesic distances expression:

M = arg minM∈Pmn

1

N

N∑i=1

d(M,Xi)2 (4–6)

Since the Frechet mean is unique on Pn [13], this shows thatM will be unique as well,

and it can be computed using an iterative algorithm similar to the one in [13]. After

obtaining the intrinsic meanM of the input tensor fields X1, ... ,XN , we compute the

modes of variation using the PGA algorithm for tensor fields described in [55].

4.2.2 Schild’s Ladder Approximation of Parallel Transport

Given two points X0 and Xp on a Riemannian manifold M, with the geodesic curve

γ(t) such that γ(0) = X0 and γ(1) = Xp, the Schild’s Ladder algorithm approximates the

parallel transport of any vector V ∈ TX0M along γ [31].

This algorithm requires the geodesic curve, log-map and exp-map defined on the

manifold, hence is applicable to both Sk and Pmn , using their corresponding Riemannian

operations, summarized in Table 4-1.

67

Figure 4-1. Illustration of Schild’s Ladder algorithm, described in Eq. 4–9.

Let X1,X2, ...,Xp−1 be some intermediate points on γ(t). Then, the parallel transport

of V to TXpM, denoted by ΓX0→Xp(V ) is approximated by:

A0 = ExpX0(V ) (4–7)

∀1 ≤ i ≤ p Bi = Xi#1/2Ai−1 Ai = Xi−1#2Bi (4–8)

ΓX0→Xp(V ) = LogXp(Ap) (4–9)

where X#1/2Y denotes the midpoint of geodesic curve between X and Y , and X#2Y

is obtained by following the geodesic from X through Y for twice its length. For more

information, the reader is referred to [19, 31]. Figure 4-1 illustrates the algorithm

described above.

On the manifold of SPD matrices, the parallel transport from an arbitrary point X0 to

the identity matrix, I , is equivalent to the transform using group action [26]. Therefore,

for the case of SPD tensor fields, we apply the group action wherever applicable, as it is

more computationally efficient and accurate compared to the parallel transport using the

Schild’s ladder.

4.3 iPGA: Incremental Principal Geodesic Analysis

In order to develop the incremental Principal Geodesic Analysis on the space of

SPD tensor fields and the unit sphere, we first need to develop incremental Frechet

68

mean update techniques applicable to tensor fields and the spherical samples. We will

address this sub-problem in the following paragraphs.

4.3.1 Incremental Frechet Mean Estimator

As described earlier, the Frechet mean of a group of manifold valued features is

defined as the minimizer of the sum of squared geodesic distances. Unfortunately,

this minimization problem does not have a closed form solution for a population of size

greater than two, in most Riemannian manifolds including Pmn and Sk .

In section 3.3 we introduced an incremental algorithm to estimate the Frechet

mean of a group of samples on sphere, and proved its convergence to the mean of the

distribution the samples are drawn from, as number of samples tends to infinity.

Similarly, in [21], authors presented an incremental Frechet mean estimator, IFME,

for SPD matrices (not SPD tensor fields). Given the estimated Frechet mean of the first

k SPD tensors, denoted by Mk , and the new sample Xk+1, IFME locates the new mean,

Mk+1, on the geodesic curve between Mk and Xk+1 using the Euclidean weight. More

formally,

Mk+1 = ExpMk (tLogMk (Xk+1)) (4–10)

where t = 1k+1

.

We now generalize the above incremental Frechet mean formula to the case

where the data samples are SPD tensor fields (not just SPD matrices), using exp

and log maps defined earlier on the product manifold of SPD tensor fields. LetMk =

(Mk,1, ...,Mk,m) denote the estimated Frechet mean of the first k samples, and Xk+1 =

(Xk+1,1, ...,Xk+1,m) be the new given tensor field. Based on the IFME algorithm and the

product space representation chosen here, it is straightforward to generalize the IFME to

the product space of tensor fields Pmn . Thus, the new mean then is obtained by updating

the old mean via the following equation:

Mk+1 = (ExpMk,1(1

k + 1LogMk,1(Xk+1,1)), ...,ExpMk,m(

1

k + 1LogMk,m(Xk+1,m))) (4–11)

69

4.3.2 Incremental Principal Geodesic Analysis on Pmn

In this section we will develop the incremental version of the PGA algorithm in [55]

applicable to SPD tensor fields. Very briefly, in [55], the PGA computation problem on

the space of SPD tensor fields is approximated by applying PCA in the tangent plane

anchored at the Frechet mean, in the following manner. First, the Frechet mean,M, of

the set of tensor fields is computed. Next, each tensor field is projected to the tangent

space at the mean (i.e., TMPmn ), using log map, then transformed to the tangent space

at the identity. This tangent space is a standard Euclidean space denoted by TIPmn ,

where I is the tensor field consisting of m identity matrices. Therefore, the ordinary PCA

algorithm is performed at TIPmn , and the obtained principal components are transformed

back to TMPmn . Note that this operation of transforming to the identity is crucial, since,

the inner product defined for Pmn corresponds to the inner product in the Euclidean space

only at the identity I.

Equipped with the incremental Frechet mean estimator, IFME, on the space of SPD

tensor fields, we are ready to reformulate this algorithm in an incremental form. In a

similar fashion, each SPD tensor field is projected using the log map and transformed

(by applying the group action) to TIPmn . More formally, let Xi denote the i th tensor field,

andMk be the Frechet mean of the k given samples. Define Yi = LogMk(Xi) ∈ TMkPmn .

Each Yi is then transformed to TMPmn , to obtain Zi. Accordingly, the data matrix at

TIPmn , denoted by Ak , can be constructed where its i th column corresponds to Zi in a

vectorized form.

In the our algorithm, we keep track of the data matrix, Ak , at TIPmn . Let Xk+1 andMk

denote the new SPD tensor field, and the Frechet mean over all previous k tensor fields,

respectively. Then, to update the principal components we need to augment the data

matrix with an appropriate vector which represents Xk+1, in TIPmn .

In order to find this vector, we first locate the new Frechet meanMk+1, using Eq.

4–11 , then project Xk+1 to the tangent space atMk+1, i.e., Yk+1 = LogMk+1(Xk+1). This

70

tangent vector is moved to TIPmn using the group action on Pmn as shown below, where,

G = (G1, ...,Gm), and G is such that ∀i ,Mk+1,i = GiGTi .

Zk+1 = ΦG−1(Yk+1) = (G−11 Yk+1,1G

−T1 , ...,G

−1m Yk+1,mG

−Tm ) (4–12)

Now, the old data matrix Ak and the vector Zk+1 are both in TIPmn which is the standard

Euclidean space.

However, we should emphasize that the data matrix Ak contains the transformed log

maps of the first k data points, at the old mean, i.e.,Mk, while Zk+1 is the transformed

log vector of the k + 1st sample, at the new mean, i.e.,Mk+1. Consequently, while the

mean of log vectors in Ak is the zero vector, the columns of[Ak Zk+1

]will no longer

be zero-mean. This will affect the estimation accuracy of principal components, specially

for smaller values of k for whichMk andMk+1 are further from each other. Hence, the

data matrix Ak should first be updated, accordingly, before it is augmented by the new

log vector.

Given the old data matrix Ak , the basic algorithm for this update problem consists

of the following steps: (1) compute the exp maps of all k log vectors at the identity, to

retrieve the first k data samples, (2) obtain the log maps of the data matrices at the new

location. It is evident that this method significantly slows down the incremental PGA,

hence is not a reasonable choice.

Instead, we apply the following faster heuristic solution. Let Yi = LogMk(Xi) be the

log map of the i th data matrix at the old mean, and Zi be the corresponding transformed

vector to TIPmn . Also, assume that Lk+1 = LogMk+1(Mk), and Tk+1 is its translated

vector to TIPmn . Then, the updated vector is obtained by Yi = Yi + Tk+1. Note that this

algorithm gives an accurate solution in linear spaces. Also, as will be shown shortly in

experiments, it does not sacrifice much accuracy in estimating PGA, especially when k

gets larger. Besides, this method is significantly faster, because for each new sample

71

Table 4-2. Incremental PGA Algorithm for SPD Tensor Fields

1: Input the data matrix Ak for k samplesthe new tensor field Xk+1, and the old meanMk

2: ComputeMk+1 from Xk+1 andMk, using Eq. 4–113: Yk+1 = LogMk+1(Xk+1)4: Zk+1 = ΦG−1(Yk+1), defined in Eq. 4–125: Compute Lk+1 = LogMk+1(Mk) and Tk+1 = ΦG−1(Lk+1)

6: Add Tk+1 to every column of Ak to obtain Ak7: Perform standard PCA on Ak+1 =

[Ak Zk+1

]8: Translate j th principal component, Pj,

back to TMk+1Pmn , via Qj = ΦG(Pj)

Figure 4-2. Schematic illustration of the algorithm in Table 4-2.

Tk+1 is only computed once, and is added to all columns of Ak . This way, the old data

matrix Ak is updated to Ak .

Now, we can augment the updated data matrix with the new log vector: Ak+1

=

[Ak Zk+1

], and perform PCA on new data matrix. At the end, the new principal

components are transformed back to TMk+1Pmn , using the transformation ΦG, where Φ

and G are the same as in Eq. 4–12. This method is summarized in Table 4-2. Also, Fig.

4-2 illustrates the variables used in the algorithm.

4.3.3 Incremental Principal Geodesic Analysis on Sk

We now introduce the iPGA algorithm applicable on Sk , in a very similar fashion

to the iPGA algorithm on Pmn proposed so far. In specific, we discuss the modifications

72

Table 4-3. Incremental PGA Algorithm on Unit Sphere

1: Input the data matrix Ak = [v1, ..., vk] for k samplesthe new sample xk+1, and the old mean mk

2: Compute mk+1 from xk+1 and mk, using Eq. 3–103: yk+1 = Logmk+1(xk+1)4: Parallel Transport zk+1 = Γmk+1→n(yk+1), defined in Eq. 4–9, and n is the north pole5: Compute rk+1 = Logmk+1(mk) and tk+1 = Γmk+1→n(rk+1)6: Add tk+1 to every column of Ak to obtain Ak = [v1, ..., vk]7: Perform standard PCA on Ak+1 =

[Ak zk+1

]8: Parallel transport j th principal component, pj,

back to Tmk+1Sk , via qj = Γn→mk+1(pj)

should be made to the previously discussed iPGA, in order to make it suitable for the

spherical samples.

First, note that the convergence analysis of iFME on Pn in [21] is not directly

applicable to the unit sphere. However, in Section 3.3 we provided the convergence

proof of iFME on sphere, using tools from Gnomonic projection. As an application, the

iFME method is used here to develop the iPGA algorithm on sphere.

Second, the inner product between any two tangent vectors of Sk , is equivalent to

the standard Euclidean inner product (see Table 4-1), and is independent of the point

that the vectors are anchored at. Consequently, the standard PCA can be employed on

the tangent plane at any point in Sk , in contrast to the PGA algorithm on Pmn . However,

in our incremental PGA technique, we always keep track of the data matrix at the north

pole (or any other arbitrary point on sphere), because this way only the new log vector

needs to be translated for each new sample.

Third, the group action applied to the case of Pmn is replaced with the parallel

transport, approximated by the Schild’s Ladder technique, which was described in

4.2.2. With these modifications being made, the new iPGA technique on Sk can be

summarized in Table 4-3.

73

Figure 4-3. Step by step illustration of the iPGA algorithm on Sk , summarized in Table4-3. From left to right, and top to bottom steps 1 through 8 are shown,respectively.

74

4.4 Synthetic Experiments

In this section we present several experiments with the synthetically generated data,

using the proposed iPGA methods, on both Sk and Pmn . The accuracy and efficiency of

the proposed algorithms have been evaluated compared to the non-incremental PGA

counterparts.

4.4.1 Manifold of SPD Tensor Fields

Data Description: We generated a group of 25, 16 × 16 SPD tensor fields,

synthetically. The 3 × 3 SPD matrices in all tensor fields are ellipsoidal. There are two

types of SPD matrices in each tensor field, whose principal eigenvectors differ by 90

degree. In generated tensor fields, the angles of principal eigenvectors of the first and

the second matrices are uniformly chosen in [0, π] and [π2, 3π2], respectively.

Time Consumption: Given a pool of tensor fields, they are incrementally input

(in random order) to both iPGA and PGA algorithms and the CPU time consumed (on

an Intel-7 2.76GHz CPU with 8GB RAM) by each method to compute the principal

components is recorded. We repeat this experiment 10 times on the data pool of 25

tensor fields and plot the average time/accuracy for each method. The left plot in Fig.

4-5 demonstrate that time consumption for iPGA is significantly less compared to that of

PGA, especially for a large number of input data samples.

Error Measurement: In order to measure the accuracy of each method, we

computed the residual sum defined in [43] for estimated principal components. For N

input tensor fields, the residual sum is defined by 1N

∑Nj=1 d

2(Xj, πSU(Xj)), where d is the

geodesic distance on Pmn , and πS(Xj) is the estimated projection of Xj to the geodesic

subspace spanned by the principal components, denoted by SU . The projection, πSU , is

estimated in the tangent space (see Eq.6 in [43] for details). This estimation is illustrated

in Fig 4-4. The bar chart on the right in Fig. 4-5 depicts the error comparison between

PGA and iPGA at each iteration. It can be observed that iPGA’s residual error is very

75

Figure 4-4. Estimation of the projection πS(X ) to the 1-D principal geodesic submanifold(red curve).

Figure 4-5. Time consumption and residual error comparison between iPGA (proposed)and PGA on Pmn .

close to PGA’s. Thus, from an accuracy viewpoint, iPGA is on an equal footing with PGA

but from a computational efficiency viewpoint, it is significantly better.

4.4.2 Unit Sphere Sk

We generated a group of 25 random samples on a high-dimensional unit-sphere,

i.e., S10000. We picked this very high dimensional space, in order to simulate the data

points we are going to deal with in the real data experiments.

76

Figure 4-6. Mean angular error of iPGA estimates w.r.t. PGA on S10000.

We fed the samples into both PGA and iPGA methods defined on sphere,

incrementally, and recorded the time consumed by each method to estimate the

principal components. Also, in order to evaluate the accuracy of iPGA for each new

sample, we considered the PGA estimate as ground-truth, and measured the angle

between the first principal components obtained from iPGA and PGA, in the tangent

plane at the north pole. This error is henceforth called the angular error.

The experiment is repeated 500 times and the average plots are shown here.

Figure 4-6 illustrates the angular error of iPGA over the number of samples. It can be

seen that the angular error of iPGA with respect to PGA is bounded by 10 degrees and

keeps decreasing, as the sample size gets larger. Besides, it is evident from figure 4-7

that the time consumed by iPGA is significantly less than the non-incremental version,

which makes it an appealing choice especially for large data dimensionality.

4.5 Real Data Experiments: Classification of PD vs. ET vs. Controls

In this section we present an application of iPGA to real data sets. We applied

proposed iPGA techniques on both the unit sphere, as well as the space of SPD tensor

fields. Our real data consists of HARDI acquisitions from patients with Parkinson’s

disease (PD), essential tremor (ET) and controls. The goal here is to be able to

automatically discriminate between these groups using features derived from the

77

Figure 4-7. Time comparison of incremental and non-incremental PGA estimators onS10000.

data. Earlier work in this context in the field of movement disorders involved use of

DTI based ROI analysis specifically using scalar valued measures such as fractional

anisotropy [49]. They showed that DTI had high potential of being a non-invasive early

trait biomarker. All our HARDI data were acquired using a 3T Phillips MR scanner with

the following parameters: TR = 7748ms, TE = 86ms, b−values: 0, 1000 smm2

, 64 gradient

directions and voxel size = 2× 2× 2mm3.

4.5.1 Classification Results using Deformation Tensor Features

In the first part, we perform the classification task using SPD tensor field features.

We use the ensemble average propagators (EAP) at each voxel estimated using the

technique in [23]. We extract the Cauchy deformation tensor field which is computed

from a non-rigid registration of the given EAP fields to the control atlas EAP field

(constructed using the approach in [7]) – see figure 4-8. The Cauchy deformation tensor

is defined as√JJt , where J is the Jacobian of the deformation at each voxel. The

Cauchy deformation tensor is an SPD matrix of size (3, 3) in this case. This gives us an

SPD field as a derived feature corresponding to each given EAP field. We use the iPGA

described earlier and use the nearest geodesic distance-based neighbor to classify the

probe data set. Note that the geodesic distance in this case is the distance between the

probe data set and the geodesic submanifold representation of each class namely, PD,

78

Figure 4-8. (a) and (b) are the corresponding S0 (zero magnetic gradient) slices of theatlas and a control subject, respectively, and (c) shows the EAPs of thesame slice as in (b), with the Substantia Nigra as the ROI. Similarly, (d) and(e) are the corresponding S0 slices of the atlas and a Parkinson subject,respectively, and (f) illustrates the EAPs computed for the slice in (e), withthe Substantia Nigra as the ROI.

Table 4-4. Classification results of iPGA, PGA, PCA using SPD tensor field features

Control vs. PD Control vs. ET PD vs. ETiPGA PGA PCA iPGA PGA PCA iPGA PGA PCA

Accuracy 89.00 89.95 56.37 86.44 87.13 63.43 89.18 90.28 58.53Sensitivity 92.72 93.33 65.29 87.01 88.94 66.27 95.57 96.47 64.71Specificity 85.28 86.57 47.45 85.87 85.32 60.59 82.79 84.09 52.35

ET and Controls. The probe is assigned the label of that class with smallest geodesic

distance.

Classification is performed on 26 PD, 16 ET and 25 control subjects using the PGA

of the Cauchy deformation tensor fields described above, where 10 subjects from PD

and control, as well as 6 subjects from ET were randomly picked as test group, and the

rest of the subjects we used for training. The experiment is repeated 300 times and the

mean values are reported. Table 4-4 summarizes the accuracy for each method, where

Accuracy = TP+TNFP+FN

, Sensitivity = TPTP+FN

and Specificity = TNFP+TN

and FN denotes the

79

number of False Negatives, similarly for TP, TN and FP. For comparison, we also used

the standard PCA method, which is applied to a vectorized version of the tensor fields.

The size of the tensor fields was restricted to the ROIs instead of the whole image.

Thus, the dimensionality was 600 ∗ 6 = 3600 and we used just the first two principal

components in all competing methods to achieve the classification reported in the table.

From the table, it is evident that iPGA and PGA provide very similar accuracies in all

three classifications. Further, iPGA is considerably more accurate than PCA, because in

the later method the non-linearity of Pmn is not taken into account.

4.5.2 Classification Results using Shape Features

In the second part, we evaluated the iPGA algorithm applied on unit sphere, in

the task of movement disorder classification. To this end, we used the shape of the

Substantia Nigra region in the brain images, as the discriminant feature. Recently, in

[10], a Schrodinger Distance Transform (SDT) was introduced and applied to represent

the point clouds (in 2-D or 3-D) as points on an infinite dimensional Hilbert sphere.

The shape of Substantia Nigra region was hand-segmented in all rigidly aligned

datasets, consisting of 25 controls, 24 PD and 15 ET images. We first collected the

same number of random samples on the boundary of each 3-D shape, and applied the

SDT technique to represent each shape as a point on a unit sphere. The 3-D shape

domain was set to 28 × 28 × 15, resulting in 11760-dimensional unit vectors from SDT.

Therefore, the samples are now living on the S11759 manifold. Figure 4-9 demonstrates

the extracted shapes of Substantia Nigra in 25 control images.

Once all shapes are represented as points on the unit sphere, we can apply our

incremental PGA method for spherical features. Figure 4-10 illustrates the mean shape,

along with the first principal components from PGA and iPGA methods, with coefficients

1.5√λ and 3

√λ, where λ is the corresponding coefficient of the first principal component

estimated from each method.

80

Table 4-5. Classification results of iPGA, PGA, PCA using shape descriptor features

Control vs. PD Control vs. ET PD vs. ETiPGA PGA PCA iPGA PGA PCA iPGA PGA PCA

Accuracy 91.46 92.95 67.32 88.28 90.14 75.69 86.13 87.58 64.60Sensitivity 87.98 90.93 51.96 86.34 88.18 77.87 80.54 82.38 48.36Specificity 94.94 94.98 82.69 92.16 94.05 71.32 97.32 98.00 97.08

Figure 4-9. Population of Substantia Nigra regions extracted from the control brainimages.

Next, a PGA-based classification was performed, in a similar manner to the

previous section. We randomly selected 10 Control, 10 PD and 5 ET images as the

test set, and used the rest of the images for training. The classification task is repeated

300 times using various training sets and the average accuracy is computed. The

classification results using the shape descriptors are summarized in Table 4-5. It can

be seen that the accuracy of iPGA is reasonably close the the PGA, while they both

outperform the standard linear version of PCA.

81

Figure 4-10. Comparison of incremental (bottom row) and non-incremental (top row)results of (1) Frechet Means (left column), (2) PGA with the coefficient1.5

√λ (middle column), and (3) PGA with the coefficient 3

√λ (right column)

82

CHAPTER 5SUMMARY AND DISCUSSION

In this dissertation we developed novel incremental algorithms for statistical analysis

of manifold-valued data. In the first part, we proposed an incremental (intrinsic) mean

computation technique for the space of Symmetric Positive Definite (SPD) matrices,

based on the Stein distance. The key contribution entailed the derivation of a closed

form solution for the computation of a weighted Stein mean for two SPD matrices

which was then used in developing an incremental algorithm for computing the Stein

mean of a population of SPD matrices. Further, using this incremental Stein mean

estimator, we experimentally demonstrated significant gains in computation time over

the non-incremental counter part while maintaining the approximately same accuracy.

Second, we presented a new incremental algorithm for computing the Frechet mean

for samples on a sphere. We presented the proof of convergence for this incremental

algorithm, when the number of samples tends to infinity. Several applications of sample

data that live on the sphere are considered and results depict superior performance of

our incremental algorithm over the non-incremental counterpart. Finally, we presented a

novel incremental algorithm to perform Principal Geodesic Analysis (PGA) applicable to

the manifold of SPD matrices as well as a sphere. Further, we demonstrated significant

time gains using our incremental algorithm, while maintaining the accuracy to be

approximately same as that of the non-incremental counterpart.

83

REFERENCES

[1] Afsari, Bijan. “Riemannian Lp center of mass: Existence, uniqueness, andconvexity.” Proceedings of the American Mathematical Society 139 (2011).2:655–673.

[2] Amarasinghe, GW. “On the standard lengths of angle bisectors and the anglebisector theorem.” Global Journal of Advanced Research on Classical and ModernGeometries 1 (2012).1.

[3] Baisnab, AP and Jas, AP Baisnab Manoranjan. Elements of Probability andStatistics. Tata McGraw-Hill Education, 1993.

[4] Cetingul, Hasan Ertan, Afsari, Bijan, Wright, Margaret J, Thompson, Paul M, andVidal, Rene. “Group action induced averaging for HARDI processing.” Biomed-ical Imaging (ISBI), 2012 9th IEEE International Symposium on. IEEE, 2012,1389–1392.

[5] Chebbi, Z. and Moakher, M. “Means of Hermitian positive-definite matrices basedon the log-determinant divergence function.” Linear Algebra and its Applications 40(2012).

[6] Cheng, Guang, Salehian, Hesamoddin, and Vemuri, Baba C. “Efficient recursivealgorithms for computing the mean diffusion tensor and applications to DTIsegmentation.” ECCV. Springer, 2012.

[7] Cheng, Guang, Vemuri, Baba C, Hwang, Min-Sig, Howland, Dena, and Forder,John R. “Atlas construction from high angular resolution diffusion imaging datarepresented by Gaussian Mixture fields.” Biomedical Imaging: From Nano to Macro,2011 IEEE International Symposium on. IEEE, 2011, 549–552.

[8] Cheng, Jian, Ghosh, Aurobrata, Jiang, Tianzi, and Deriche, Rachid. “A Riemannianframework for orientation distribution function computing.” Medical Image Comput-ing and Computer-Assisted Intervention–MICCAI 2009. Springer, 2009. 911–918.

[9] Cherian, A., Sra, S., Banerjee, A., and Papanikolopoulos, N. “Efficient similaritysearch for covariance matrices via the JB LogDet Divergence.” ICCV. 2011,2399–2406.

[10] Deng, Yan, Rangarajan, Anand, Eisenschenk, Stephan, and Vemuri, Baba C. “ARiemannian Framework for Matching Point Clouds Represented by the SchrodingerDistance Transform.” 2014.

[11] Do Carmo, Manfredo P. Riemannian geometry. Springer, 1992.

[12] Fillard, P., Arsigny, V., Pennec, X., Thompson, M., and Ayache, N. “Extrapolation ofsparse tensor fields: application to the modeling of brain variability.” InternationalConference on Information Processing in Medical Imaging (IPMI). 2005.

84

[13] Fletcher, P Thomas and Joshi, Sarang. “Riemannian geometry for the statisticalanalysis of diffusion tensor data.” Signal Processing 87 (2007).2: 250–262.

[14] Fletcher, P.T., Lu, C., Pizer, S.M., and Joshi, S. “Principal geodesic analysis for thestudy of nonlinear statistics of shape.” Medical Imaging, IEEE Transactions on 23(2004).8: 995–1005.

[15] Frechet, Maurice. “Les elements aleatoires de nature quelconque dans un espacedistancie.” Annales de l’institut Henri Poincare. vol. 10. Presses universitaires deFrance, 1948, 215–310.

[16] Grove, Karsten and Karcher, Hermann. “How to conjugateC 1-close group actions.”Mathematische Zeitschrift 132 (1973).1: 11–20.

[17] Harandi, M., Sanderson, C., Hartley, R., and Lovell, B.C. “Sparse Coding andDictionary Learning for Symmetric Positive Definite Matrices: A Kernel Approach.”European Conference on Computer Vision (ECCV). 2012.

[18] Hartley, Richard, Trumpf, Jochen, Dai, Yuchao, and Li, Hongdong. “Rotationaveraging.” International journal of computer vision 103 (2013).3: 267–305.

[19] Hauberg, Søren, Lauze, Francois, and Pedersen, Kim Steenstrup. “Unscentedkalman filtering on riemannian manifolds.” Journal of mathematical imaging andvision 46 (2013).1: 103–120.

[20] Heo, Jae-Pil, Lee, YoungWoon, He, Junfeng, Chang, Shih-Fu, and Yoon, Sung-eui.“Spherical Hashing.” IEEE International Conference on Computer Vision andPattern Recognition (CVPR). 2012.

[21] Ho, Jeffrey, Cheng, Guang, Salehian, Hesamoddin, and Vemuri, Baba. “RecursiveKarcher Expectation Estimators And Geometric Law of Large Numbers.” Pro-ceedings of the Sixteenth International Conference on Artificial Intelligence andStatistics. 2013, 325–332.

[22] Horn, Berthold. Robot vision. MIT press, 1986.

[23] Jian, Bing and Vemuri, Baba C. “A Unified Computational Framework forDeconvolution to Reconstruct Multiple Fibers From DWMRI.” IEEE TMI 26 (2007):1464–1471.

[24] Karcher, Hermann. “Riemannian Center of Mass and so called karcher mean.”arXiv preprint arXiv:1407.2087 (2014).

[25] Kendall, Wilfrid S. “Probability, convexity, and harmonic maps with small image I:uniqueness and fine existence.” Proceedings of the London Mathematical Society 3(1990).2: 371–406.

[26] Kim, Hyunwoo J, Adluru, Nagesh, Bendlin, Barbara B, Johnson, Sterling C, Vemuri,Baba C, and Singh, Vikas. “Canonical Correlation Analysis on Riemannian

85

Manifolds and Its Applications.” Computer Vision–ECCV 2014. Springer, 2014.251–267.

[27] Latecki, Longin Jan, Lakamper, Rolf, and Eckhardt, T. “Shape descriptors fornon-rigid shapes with a single closed contour.” CVPR. 2000, 424–429.

[28] Lenglet, C., Rousson, M., and Deriche, R. “DTI segmentation by statistical surfaceevolution.” IEEE Transactions on Medical Imaging 25 (2006).6: 685–700.

[29] Li, Jia and Wang, James Z. “Automatic linguistic indexing of pictures by a statisticalmodeling approach.” PAMI (2003).

[30] Lim, Yongdo and Palfia, Miklos. “Weighted inductive means.” Linear Algebra and itsApplications 453 (2014): 59–83.

[31] Lorenzi, Marco, Ayache, Nicholas, and Pennec, Xavier. “Schilds Ladder for theparallel transport of deformations in time series of images.” Information Processingin Medical Imaging. Springer, 2011, 463–474.

[32] Lowe, David G. “Object recognition from local scale-invariant features.” Computervision, 1999. The proceedings of the seventh IEEE international conference on.vol. 2. Ieee, 1999, 1150–1157.

[33] Mardia, Kanti V and Jupp, Peter E. Directional statistics, vol. 494. John Wiley &Sons, 2009.

[34] Moakher, M. and Batchelor, P. G. SPD Matrices: From Geometry to Applicationsand Visualization. Visual. and Proc. of Tensor Fields, 2006, 285–298.

[35] Ncube, Sentibaleng and Srivastava, Anuj. “A novel Riemannian metric foranalyzing HARDI data.” SPIE Medical Imaging. International Society for Opticsand Photonics, 2011, 79620Q–79620Q.

[36] Pennec, Xavier. “Intrinsic statistics on Riemannian manifolds: Basic tools forgeometric measurements.” JMIV 25 (2006).1: 127–154.

[37] Said, Salem, Courty, Nicolas, Le Bihan, Nicolas, Sangwine, Stephen J, et al. “Exactprincipal geodesic analysis for data on so (3).” Proceedings of the 15th EuropeanSignal Processing Conference, EUSIPCO-2007. 2007, 1700–1705.

[38] Sakai, Takashi. Riemannian geometry, vol. 149. American Mathematical Soc.,1996.

[39] Salehian, Hesamoddin, Cheng, Guang, Vemuri, Baba C, and Ho, Jeffrey.“Recursive Estimation of the Stein Center of SPD Matrices and Its Applications.”Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, 2013,1793–1800.

86

[40] Salehian, Hesamoddin, Vaillancourt, David, and Vemuri, Baba C. “iPGA:Incremental Principal Geodesic Analysis with Applications to Movement DisorderClassification.” Medical Image Computing and Computer-Assisted Intervention–MICCAI 2014. Springer, 2014. 765–772.

[41] Schwartzman, Armin. Random ellipsoids and false discovery rates: Statistics fordiffusion tensor imaging data. Ph.D. thesis, Stanford University, 2006.

[42] Sloane, Neil JA et al. “The on-line encyclopedia of integer sequences.” 2003.

[43] Sommer, Stefan, Lauze, Francois, Hauberg, Søren, and Nielsen, Mads. “Manifoldvalued statistics, exact principal geodesic analysis and the effect of linearapproximations.” Computer Vision–ECCV 2010. Springer, 2010. 43–56.

[44] Sra, S. “Positive Definite Matrices and the Symmetric Stein Divergence.” Availablein author’s website at ”http://people.kyb.tuebingen.mpg.de/suvrit/” (2011).

[45] Srivastava, Anuj, Jermyn, Ian, and Joshi, Shantanu. “Riemannian analysis ofprobability density functions with applications in vision.” Computer Vision andPattern Recognition, 2007. CVPR’07. IEEE Conference on. IEEE, 2007, 1–8.

[46] Sturm, K. T. “Probability Measures on Metric Spaces of Nonpositive Curvature.”Heat Kernels and Analysis on Manifolds, Graphs, and Metric Spaces. 2003.

[47] Terras, A. Harmonic Analysis on Symmetric Spaces and Applications.Springer-Verlag, 1985.

[48] Tournier, Maxime, Wu, Xiaomao, Courty, Nicolas, Arnaud, Elise, and Reveret,Lionel. “Motion compression using principal geodesics analysis.” ComputerGraphics Forum. vol. 28. Wiley Online Library, 2009, 355–364.

[49] Vaillancourt, DE, Spraker, MB, Prodoehl, J, Abraham, I, Corcos, DM, Zhou,XJ, Comella, CL, and Little, DM. “High-resolution diffusion tensor imaging inthe substantia nigra of de novo Parkinson disease.” Neurology 72 (2009).16:1378–1384.

[50] Wang, Yuanxiang, Salehian, Hesamoddin, Cheng, Guang, and Vemuri, Baba.“Tracking on the Product Manifold of Shape and Orientation for Tractography fromDiffusion MRI.” Proceedings of the IEEE Conference on Computer Vision andPattern Recognition. 2013, 3051–3056.

[51] Wang, Z. and Vemuri, B. “Tensor field segmentation using region based activecontour model.” European Conference on Computer Vision (ECCV). 2004,304–315.

[52] Woods, Roger P. “Characterizing volume and surface deformations in an atlasframework: theory, applications, and implementation.” NeuroImage 18 (2003).3:769–788.

87

[53] Wu, Jing, Smith, William AP, and Hancock, Edwin R. “Weighted principal geodesicanalysis for facial gender classification.” IAPR. Springer, 2007, 331–339.

[54] Wu, Yi, Wang, Jinqiao, and Lu, Hanqing. “Real-Time Visual Tracking viaIncremental Covariance Model Update on Log-Euclidean Riemannian Manifold.”CCPR. 2009.

[55] Xie, Yuchen, Vemuri, Baba C, and Ho, Jeffrey. “Statistical analysis of tensor fields.”Medical Image Computing and Computer-Assisted Intervention–MICCAI 2010.Springer, 2010. 682–689.

[56] Zha, Hongyuan and Simon, Horst D. “On updating problems in latent semanticindexing.” SIAM Journal on Scientific Computing 21 (1999).2: 782–791.

[57] Zhang, Miaomiao and Fletcher, P Thomas. “Probabilistic Principal GeodesicAnalysis.” NIPS. 2013.

88

BIOGRAPHICAL SKETCH

Hesamoddin Salehian was born in 1987 in Tehran, Iran. He graduated from

high school in Semnan, Iran, in 2006. He received his Bachelor of Science degree

in Computer Engineering from Sharif University of Technology, Tehran, Iran, in June

2010. He earned his Master of Science degree from University of Florida, Gainesville, in

Computer Engineering in September 2014. He received his Doctor of Philosophy degree

in Computer Engineering from University of Florida, in December 2014. His research

interests revlove around Medical Image Analysis, Computer Vision and Machine

Learning.

89

incremental algorithms for statistical analysis of manifold …salehian/salehian_h.pdf ·...

Documents