solving minimal-distance problems over the manifold...

SOLVING MINIMAL-DISTANCE PROBLEMS OVER THE

MANIFOLD OF REAL SYMPLECTIC MATRICES

SIMONE FIORI∗

Abstract. The present paper discusses the question of formulating and solving minimal-distanceproblems over the group-manifold of real symplectic matrices. In order to tackle the related opti-mization problem, the real symplectic group is regarded as a pseudo-Riemannian manifold and ametric is chosen that affords the computation of geodesic arcs in closed forms. Then, the consideredminimal-distance problem can be solved numerically via a gradient steepest descent algorithm im-plemented through a geodesic-stepping method. The minimal-distance problem investigated in thispaper relies on a suitable notion of distance – induced by the Frobenius norm – as opposed to thenatural pseudo-distance that corresponds to the pseudo-Riemannian metric that the real symplecticgroup is endowed with. Numerical tests about the computation of the empirical mean of a collectionof symplectic matrices illustrate the discussed framework.

Key words. Group of real symplectic matrices. Pseudo-Riemannian geometry. Gradient-basedoptimization on manifolds. Geodesic stepping method.

Citation: Cite this paper as: S. Fiori, Solving minimal-distance problems overthe manifold of real symplectic matrices, SIAM Journal on Matrix Analysis and Ap-plications, Vol. 32, No. 3, pp. 938 - 968, 2011

1. Introduction. Optimization problems over smooth manifolds have receivedconsiderable attention due to their broad application range (see, for instance, [2, 9,10, 16] and references therein). In particular, the formulation of minimal-distanceproblems on compact Riemannian Lie groups relies on closed forms of geodesic curvesand geodesic distances. Notwithstanding, on certain manifolds the problem of formu-lating a minimal-distance criterion function and of its optimization is substantiallymore involved, because it might be hard to compute geodesic distances in closed form.One of such manifolds is the real symplectic group.

Real symplectic matrices form an algebraic group denoted as Sp(2n,R), withn ≥ 1. The real symplectic group is defined as follows:

Sp(2n,R)def= X ∈ R

2n×2n|XTQ2nX = Q2n, with Q2ndef=

[

0n In−In 0n

]

, (1.1)

where symbol In denotes a n × n identity matrix and symbol 0n denotes a whole-zero n × n matrix. A noticeable application of real symplectic matrices is to thecontrol of beam systems in particle accelerators [12, 29], where Lie-group tools areapplied to the characterization of beam dynamics in charged-particle optical systems.Such methods are applicable to accelerator design, charge-particle beam transport andelectron microscopes. In the context of vibration analysis, real symplectic ‘transfermatrices’ are widely used for the dynamic analysis of engineering structures as well asfor static analysis, and are particularly useful in the treatment of repetitive structures[38]. Other notable applications of real symplectic matrices are to coding theory [8],quantum computing [6, 22], time-series prediction [3], and automatic control [17]. Afurther list of applications of symplectic matrices is reported in [14].

The minimal-distance problem over the set of real symplectic matrices plays animportant role in applied fields. A recent application which involves the solution

∗Dipartimento di Ingegneria Biomedica, Elettronica e Telecomunicazioni (DiBET), Facolta diIngegneria, Universita Politecnica delle Marche, Via Brecce Bianche, Ancona I-60131, Italy, E-mail:[email protected]

1

2 S. FIORI

of a minimal-distance problem in the real symplectic group of matrices is found inthe study of optical systems in computational ophthalmology, where it is assumedthat the optical nature of a centered optical system is completely described by a realsymplectic matrix [23, 25] and involved symplectic matrices are of size 4× 4 (n = 2).A notable application is found in the control of beam systems in particle accelerators[15], where the size of involved symplectic matrices is 6 × 6 (n = 3), and in theassessment of the fidelity of dynamical gates in quantum analog computation [41],where the integer n corresponds to the number of quantum observables and may bequite large (for example n = 20).

Although results are available about the real symplectic group [13, 14, 30], opti-mization on the manifold of real symplectic matrices appears to be far less studiedthan for other Lie groups. The present manuscript aims at discussing such problemand to present a solution based on the key idea of treating the space Sp(2n,R) asa manifold endowed with a pseudo-Riemannian metric that affords the computationof geodesic arcs in closed-forms. Then, the considered minimal-distance problem canbe solved numerically via a gradient steepest descent algorithm implemented througha geodesic-stepping method. A proper minimal-distance objective function on themanifold Sp(2n,R) is constructed by building on a result available from the scien-tific literature about the Frobenius norm of the difference of two symplectic matrices.Given a collection X1, X2, . . . , XN of real symplectic matrices of size 2n× 2n, thegoal of this paper is the minimization of the function

f(X)def=

1

2N

N∑

k=1

‖X −Xk‖2F (1.2)

over the manifold Sp(2n,R) to find the minimizing symplectic matrix.The remainder of this paper is organized as follows. The §2 recalls those notions

from differential geometry (with particular emphasis to pseudo-Riemannian geometry)that are instrumental in the development of an optimization algorithm on the manifoldof symplectic matrices. Results about the computation of geodesic arcs and of thepseudo-Riemannian gradient of a regular function on the manifold of real symplecticmatrices are presented. The §3 defines minimal-distance problems on manifolds andrecalls the basic idea of the gradient-steepest-descent optimization method on pseudo-Riemannian manifolds. It deals explicitly with minimal-distance problems on themanifold of real symplectic matrices and recalls how to measure discrepancy betweenreal symplectic matrices. The §4 presents results of numerical tests performed on anaveraging problem over the manifold Sp(2n,R). The §5 concludes the paper.

2. The real symplectic matrix group and its geometry. The present sec-tion recalls notions of differential geometry that are instrumental in the developmentof the following topics, such as affine geodesic curves and the pseudo-Riemanniangradient. For a general-purpose reference on differential geometry, see [39], while fora specific reference on pseudo-Riemannian geometry and the calculus of variation onmanifolds, see [32]. Throughout the present section and the rest of the paper, ma-trices are denoted by upper-case letters (e.g., X) while their entries are denoted byindexed lower-case letters (e.g., xij).

2.1. Notes on pseudo-Riemannian geometry. LetM denote a p-dimensionalpseudo-Riemannian manifold. Associated with each point x in a p-dimensional dif-ferentiable manifold M is a tangent space, denoted by TxM. This is a p-dimensionalvector space whose elements can be thought of as equivalence classes of curves passing

MINIMAL-DISTANCE SYMPLECTIC MATRICES 3

through the point x. Symbol TMdef= (x, v)|x ∈ M, v ∈ TxM denotes the tangent

bundle associated to the manifold M. Symbol Ω1(M) denotes the set of analytic1-forms on M and symbol X(M) denotes the set of vector fields on M. A vector fieldF ∈ X(M) is a map F : x ∈ M 7→ F(x) ∈ TxM. Symbol 〈·, ·〉E denotes Euclideaninner product.

The pseudo-Riemannian manifold M ∋ x is endowed with an indefinite innerproduct 〈·, ·〉x : TxM× TxM → R. An indefinite inner product is a non-degenerate,smooth, symmetric, bilinear map which assigns a real number to pairs of tangentvectors at each tangent space of the manifold. That the metric is non-degeneratemeans that there are no non-zero v ∈ TxM such that 〈v, w〉x = 0 for all w ∈ TxM.Let x = (x1, x2, . . . , xp) denote a local parametrization and ∂i denote the canonicalbasis of TxM. The metric tensor field G : M → Rp×p of components gij(x) associatedto the indefinite inner product 〈·, ·〉x is a covariant tensor field defined by:

gij(x)def= 〈∂i, ∂j〉x, (2.1)

which is smooth in x. Given tangent vectors u, v ∈ TxM parameterized by u = ui∂iand v = vi∂i (Einstein summation convention used), their inner product expresses as:

〈u, v〉x = gij(x)uivj . (2.2)

In pseudo-Riemannian geometry, the metric tensor gij is symmetric and invertible(but not necessarily positive-definite, in fact, positive-definiteness is an additionalproperty that characterizes Riemannian geometry). Therefore, it might hold 〈u, u〉x =0 even for nonzero tangent vectors u. The inverse G−1 of the metric tensor field is acontravariant tensor field whose components are denoted by gij(x).

Given a regular function f : M → R, the differential df : TM → Ω1(M) isexpressed by:

dfx(v) = 〈∇xf, v〉x, v ∈ TxM, (2.3)

where ∇xf : M → TxM denotes pseudo-Riemannian gradient. In local coordinates,it holds that:

dfx(v) = gij(x)(∇xf)ivj . (2.4)

As the differential does not depend on the choice of coordinates, it holds that dfx(v) =(∂if)xv

i. It follows that gij(x)(∇xf)j = (∂if)x. The sharp isomorphism ♯ : Ω1(M) →

X(M) takes a 1-form on M and returns a vector field on M. It holds that:

(dfx)♯ = ∇xf. (2.5)

The pseudo-Riemannian gradient may be computed by the metric compatibility con-dition:

〈∂xf, v〉E = 〈∇xf, v〉x, ∀v ∈ TxM. (2.6)

The notion of differential generalizes to the notion of pushforward map. Given differ-entiable manifolds M and N and a regular function f : M → N , the pushforwardmap df : TM → TN is a linear map such that df |x : TxM → Tf(x)N for everyx ∈ M. Let γ : [−a, a] → M be a smooth curve on the manifold M for some a > 0

and let xdef= γ(0). The pushforward map df |x is defined by:

d

dtf(γ(t))

∣

∣

∣

∣

t=0

= df |x(

dγ

dt

∣

∣

∣

∣

t=0

)

. (2.7)

4 S. FIORI

The following result holds true.Lemma 2.1 (Adapted from [5]). Let M denote a manifold of n × n real-valued

matrices and let f : M → Rn×n denote an analytic map about the point X0 ∈ M,namely:

f(X) = a0In +

∞∑

k=1

ak(X −X0)k, ak ∈ R, (2.8)

in a neighborhood of X0. Then it holds that:

df |X(V ) =

∞∑

k=1

ak

k∑

r=1

(X −X0)r−1V (X −X0)

k−r , (2.9)

for any V ∈ TXM.The covariant derivative (or connection) of a vector field F ∈ X(M) in the di-

rection of a vector v ∈ TxM is denoted as ∇vF. The covariant derivative is definedaxiomatically by the following properties:

∇fv+gwF = f∇vF+ g∇wF, (2.10)

∇v(F+G) = ∇vF+∇vG, (2.11)

∇v(fF) = f∇vF+ Fdfx(v), (2.12)

for any vector fields F,G, tangent vectors v, w and scalar functions f, g. The covariantderivative of a vector field F ∈ X(M) along a vector v ∈ TxM may be extended to the

covariant derivative of a vector field F along a vector field G by ∇GF(x)def=∇G(x)F.

Such rule defines a connection ∇ : X(M) × X(M) → X(M). The fundamentalrelationship for the connection is:

∇∂i∂j = Γk

ij∂k, (2.13)

where quantities Γkij : M → R are termed Christoffel symbols of the second kind

and describe the structure of the connection. The derivative ∇∂i∂j measures the

change of the elementary vector field ∂j = ∂j(x) in the direction ∂i. By the properties(2.10)-(2.12), it is readily obtained that:

∇vi∂i(ui∂i) = vi

(

Γkiju

j +∂uk

∂xi

)

∂k, (2.14)

where the functions vi = vi(x) and the functions ui = ui(x) are the componentsof vector fields in X(M) in the basis ∂i. The Christoffel symbols of the secondkind may be specified arbitrarily and give rise to an arbitrary connection. In theLevi-Civita geometry, the Christoffel symbols of the second kind are associated to themetric tensor of components gij and are defined as:

Γkijdef=

1

2gkh

(

∂ghj∂xi

+∂gih∂xj

− ∂gij∂xh

)

. (2.15)

The Christoffel symbols of the second kind defined as in (2.14) are symmetric in thecovariant indices, namely, Γk

ij = Γkji. The associated Christoffel form Γx : TxM ×

TxM → Rp is defined in local coordinates by [Γx(v, w)]kdef=Γk

ijviwj . In the present


paper, the notion of connection refers to a Levi-Civita connection, unless otherwisestated.

The notion of covariant derivative is intimately tied to the notion of paralleltranslation (or transport) along a curve. Parallel translation allows, e.g., compar-ing vectors belonging to different tangent spaces. On a smooth pseudo-Riemannianmanifold M with connection ∇, fix a smooth curve γ : [−a, a] → M (a > 0). Theparallel translation map Γt

s(γ) : Tγ(s)M → Tγ(t)M associated to the curve is a linearisomorphism for every s, t ∈ [−a, a]. The parallel translation map depends smoothlyon its arguments and is such that Γs

s(γ) is an identity map and Γtu(γ)Γu

s (γ) = Γts(γ)

for every s, u, t ∈ [−a, a]. Let v = γ(0). The covariant derivative of a vector fieldF ∈ X(M) in the direction v ∈ TxM is given by:

∇vF = limε→0

Γ0ǫ(γ)F(γ(ε))− F(γ(0))

ε=

d

dtΓ0t (γ)F(γ(t))

∣

∣

∣

∣

t=0

. (2.16)

The notion of geodesic curve generalizes the notion of straight line of Euclideanspaces. A distinguishing feature of a straight line of an Euclidean space is that ittranslates parallel to itself, namely, it is self-parallel. The notion of ‘straight line’ ona curved space inherits such distinguishing feature. An affine geodesic on a manifoldM with connection ∇ and associated parallel translation map Γ·

·(·), is a curve γ suchthat γ is parallel translated along γ itself, namely, for every s, t ∈ [−a, a], it holdsthat:

Γts(γ)γ(s) = γ(t). (2.17)

Setting s = 0, F(γ(t))def= γ(t), and invoking the relationship (2.16), it is seen that an

affine geodesic curve must satisfy the condition:

∇γ γ = 0. (2.18)

A geodesic curve is denoted throughout the paper by ρ(t). On a pseudo-Riemannianmanifold M, the parallel translation map about a curve γ : [−a, a] → M preservesthe angle between tangent vectors, namely:

〈Γts(γ)v,Γ

ts(γ)w〉γ(t) = 〈v, w〉γ(s), (2.19)

for every pair of tangent vectors v, w ∈ Tγ(s)M and for every s, t ∈ [−a, a]. For ageodesic curve ρ : [0, 1] → M, setting v = w = ρ(0), thanks to the self-translationproperty (2.17), it is immediate to verify that:

d

dt〈ρ(t), ρ(t)〉ρ(t) = 0, (2.20)

for every t ∈ [0, 1]. (A regular re-parametrization does not modify the distinguishingproperties of an affine geodesic curve.)

The ‘total action’ associated to a curve γ : [0, 1] → M is defined as:

A(γ)def=

∫ 1

0

〈γ(t), γ(t)〉γ(t)dt. (2.21)

On a geodesic arc ρ : [0, 1] → M, the integrand is constant, therefore the total actiontakes on the value A(ρ) = 〈ρ(0), ρ(0)〉ρ(0). On a Riemannian manifold, the length ofa curve γ : [0, 1] → M is defined as:

L(γ)def=∫ 1

0

〈γ(t), γ(t)〉12

γ(t)dt, (2.22)

6 S. FIORI

hence, the length of a geodesic arc ρ on a Riemannian manifold is computed by

L(ρ) = 〈ρ(0), ρ(0)〉12

ρ(0) = A 12 (ρ). On a Riemannian manifold, the total action is

positive definite and the distance between two points may be calculated as the lengthof a geodesic arc which joins them (if it exists). On a pseudo-Riemannian manifold,however, such definition is no longer valid because the total action has indefinite sign.

2.2. An equivalence principle of pseudo-Riemannian geometry. The ge-odesic arc associated to a given metric on a pseudo-Riemannian manifold may becalculated through a variational stationary-action principle rather than via covariantderivation.

Lemma 2.2. The equation (2.18) is equivalent to the equation:

δ

∫ 1

0

〈γ, γ〉γdt = 0, (2.23)

where symbol δ denotes the variation of the integral functional.

Proof. Substituting uk = vk = γk in the equation (2.14), leads to the expression∇γ γ = γiγjΓk

ij∂k + γk∂k. Setting the above covariant derivative to zero leads to the

geodesic equations for the components γk:

γk + Γkij γ

iγj = 0. (2.24)

The integral functional in (2.23) represents the total action associated with the pa-rameterized smooth curve γ : [0, 1] → M of local coordinates xk = γk(t) and may bewritten explicitly as:

∫ 1

0

〈γ, γ〉γdt =∫ 1

0

gij(γ(t))γi(t)γj(t)dt. (2.25)

The variation δ in the expression (2.23) corresponds to a perturbation of the totalaction (2.25). Let η : [0, 1] → M denote an arbitrary parameterized smooth curve oflocal coordinates ηk(t) such that ηk(0) = ηk(1) = 0. Define the perturbed action:

Iη(ε) def=∫ 1

0

gij(γ + εη)(γi + εηi)(γj + εηj)dt, (2.26)

with ε ∈ [−a, a], a > 0. The condition (2.23) may be expressed explicitly in terms ofthe perturbed action as:

limε→0

Iη(ε)− Iη(0)ε

= 0, ∀η. (2.27)

The perturbed total action may be expanded around ε = 0 as follows:

Iη(ε) =∫ 1

0

[

gij(x) + ε∂gij∂xk

ηk + o(ε)

]

[γiγj + ε(γiηj + γj ηi) + o(ε)]dt

=

∫ 1

0

gij(x)γiγjdt+ ε

∫ 1

0

[

gij(x)(γiηj + γj ηi) + ηk

∂gij∂xk

γiγj

]

dt+ o(ε)

= Iη(0) + ε

∫ 1

0

[

gik(x)γiηk + gkj(x)γ

j ηk +∂gij∂xk

γiγjηk]

dt+ o(ε), (2.28)


where o(ε) is such that limε→0 o(ε)/ε = 0. The first term within the integral on theright-hand side of the expression (2.28) may be integrated by parts, namely:

∫ 1

0

gik(x)γiηkdt = gik(γ)γ

iηk∣

∣

1

0−∫ 1

0

d(gik(γ)γi)

dtηkdt. (2.29)

The first term on the right-hand side vanishes to zero because ηk(0) = ηk(1) = 0,hence:

∫ 1

0

gik(x)γiηkdt = −

∫ 1

0

(

∂gik∂xj

γiγj + gik(x)γi

)

ηkdt. (2.30)

A similar result holds true for the second term within the integral on the right-handside of the expression (2.28). Therefore, it holds that:

Iη(ε)− Iη(0)ε

=

∫ 1

0

[(

∂gij∂xk

− ∂gik∂xj

− ∂gkj∂xi

)

γiγj − 2gik(x)γi

]

ηkdt+o(ε)

ε. (2.31)

The condition (2.27), therefore, implies that:

gikγi +

1

2

(

−∂gij∂xk

+∂gik∂xj

+∂gkj∂xi

)

γiγj = 0. (2.32)

As the metric tensor is invertible, the equation (2.32) is equivalent to the equation(2.24).

The geodesic equation (2.24) may be written in compact form as:

γ(t) + Γγ(t)(γ(t), γ(t)) = 0. (2.33)

In the above expressions, the over-dot and the double over-dot denote first-order andsecond-order derivation with respect to parameter t, respectively. The solution ofthe geodesic equation may be written in terms of two parameters, such as the initialvalues γ(0) = x ∈ M and γ(0) = v ∈ TxM, in which case the solution of thegeodesic equation (2.33) will be denoted as ρx,v(t). A geodesic arc ρx,v : [0, 1] → Mis associated to a map expx : TxM → M defined as:

expx(v)def= ρx,v(1). (2.34)

The function (2.34) is termed exponential map with pole x ∈ M. The notion ofexponential map is illustrated in the Figure 2.1.

The equivalency stated in the Lemma 2.2 proves useful when working in intrinsiccoordinates. Intrinsic coordinates appear more appealing when it comes to implementan optimization method on a computation platform (for a detailed discussion, see,e.g., [11]). In order to make such equivalency profitable from a computational pointof view, it pays to examine the variational method invoked in the Lemma 2.2 onsmooth manifolds from an intrinsic-coordinate perspective.

Lemma 2.3. The first variation of the integral functional

F(γ)def=

∫ 1

0

F (γ, γ)dt, (2.35)

with γ : [0, 1] → M denoting a curve on a pseudo-Riemannian manifold M andF : TM → R integrable, reads:

δF(γ) =

∫ 1

0

⟨

∂F

∂γ− d

dt

∂F

∂γ, δγ

⟩E

dt. (2.36)

8 S. FIORI

b

M

x

TxM

v

expx

b

expx(v)

Fig. 2.1. Notion of exponential mapping on a manifold M. The point expx(v) lies on a

neighborhood of the pole x denoted by the dashed circle.

Proof. Define a smooth family of smooth curves c : [0, 1] × [−a, a], a > 0, suchthat:

c(t, 0) = γ(t), ∀t ∈ [0, 1], (2.37)

c(0, ε) = γ(0), c(1, ε) = γ(1), ∀ε ∈ [−a, a], (2.38)

for a fixed ε, t 7→ c(t, ε) is a smooth curve on the manifold M, for a fixed t, ε 7→ c(t, ε)

is a smooth curve on the manifold M for some a > 0 and ∂2c∂t∂ε

= ∂2c∂ε∂t

. Define:

δγ(t)def=

∂

∂εc(t, ε)

∣

∣

∣

∣

ǫ=0

. (2.39)

Note that δγ ∈ TγM and δγ(0) = δγ(1) = 0. The variation δF(γ) may be written as:

δ

∫ 1

0

F (γ, γ)dtdef= lim

ε→0

1

ε

[∫ 1

0

F (c(t, ε), c(t, ε))dt−∫ 1

0

F (c(t, 0), c(t, 0))dt

]

=

∫ 1

0

limε→0

F (c(t, ε), c(t, ε))− F (c(t, 0), c(t, 0))

εdt

=

∫ 1

0

∂

∂εF (c(t, ε), c(t, ε))

∣

∣

∣

∣

ε=0

dt. (2.40)

The partial derivative within the integral in equation (2.40) may be written as:

∂F (c, c)

∂ε

∣

∣

∣

∣

ε=0

=

⟨

∂F (c, c)

∂c,∂c

∂ε

⟩E∣

∣

∣

∣

∣

ε=0

+

⟨

∂F (c, c)

∂c,∂c

∂ε

⟩E∣

∣

∣

∣

∣

ε=0

. (2.41)

The variation of the integral functional reads, therefore:

δ

∫ 1

0

F (γ, γ)dt =

∫ 1

0

⟨

∂F (γ, γ)

∂γ, δγ

⟩E

dt+

∫ 1

0

⟨

∂F (γ, γ)

∂γ,dδγ

dt

⟩E

dt. (2.42)


Integration by parts yields:

∫ 1

0

⟨

∂F

∂γ,d

dtδγ

⟩E

dt =

⟨

∂F

∂γ, δγ

⟩E∣

∣

∣

∣

∣

1

0

−∫ 1

0

⟨

d

dt

∂F

∂γ, δγ

⟩E

dt. (2.43)

The first term in the right-hand side equals zero, hence, the variation of the integralfunctional assumes the final expression (2.36).

A noticeable consequence of Lemma 2.3 is that the variation δF(γ) depends onlyon the tangential component δγ ∈ TγM of the perturbation.

It is worth noting the difference between Riemannian geodesics, that minimizethe total action, and pseudo-Riemannian geodesics, that are only stationary points ofthe total action.

2.3. A pseudo-Riemannian geometric structure of the real symplectic

group. The skew-symmetric matrix Q2n defined in (1.1) enjoys the properties Q22n =

−I2n and Q−12n = QT

2n = −Q2n. From now on, the subscript 2n on the symbol Q2n

will drop for a tidier notation. The dimension subscript of the zero-matrix 0p willdrop as well whenever its dimension is clear from the context.

The space Sp(2n,R) is a curved smooth manifold that may also be endowedwith an algebraic-group structure (namely, group multiplication, group inverse and aunique identity element) in a manner that is compatible with the manifold structure.Therefore, the space Sp(2n,R) has the structure of a Lie group of dimension (2n+1)n.Standard matrix multiplication and inversion work as algebraic-group operations.(Any symplectic matrix X ∈ Sp(2n,R) is such that det2(X) = 1, where symbol det(·)denotes determinant. Therefore, symplectic matrices are invertible.) The identityelement of the group Sp(2n,R) is the matrix I2n. The following identities hold:det(X) = 1, XT = −QX−1Q, X−T = −QXQ. It follows that the group Sp(2n,R) isa subgroup of Sl(2n,R) as well as of Gl(2n,R).

The tangent space TXSp(2n,R) has structure:

TXSp(2n,R) = V ∈ R2n×2n|V TQX +XTQV = 0. (2.44)

The tangent space at the identity of the Lie group, namely the Lie algebra sp(2n,R),has structure:

sp(2n,R) = H ∈ R2n×2n|HTQ+QH = 0 (2.45)

and the elements of the Lie algebra sp(2n,R) are termed Hamiltonian matrices. Bythe embedding of the manifold Sp(2n,R) into the Euclidean space R2n×2n, at anypoint X ∈ Sp(2n,R) is associated the normal space:

NXSp(2n,R)def= P ∈ R

2n×2n|tr(PTV ) = 0, ∀V ∈ TXSp(2n,R), (2.46)

where symbol tr(·) denotes the trace operator. The tangent space and the Lie algebraassociated to the real symplectic group may be characterized as follows:

TXSp(2n,R) = XQS|S ∈ R2n×2n, ST = S,

sp(2n,R) = QS|S ∈ R2n×2n, ST = S,NXSp(2n,R) = QXΩ|Ω ∈ so(2n),

(2.47)

where symbol so(p) denotes the Lie algebra:

so(p)def= Ω ∈ R

p×p|ΩT = −Ω. (2.48)

10 S. FIORI

Consider the following indefinite inner product on the general linear group ofmatrices Gl(p,R):

〈U, V 〉Xdef=tr(X−1UX−1V ), X ∈ Gl(p,R), U, V ∈ TXGl(p,R), (2.49)

referred to as Khvedelidze-Mladenov metric [26]. The above inner product gives riseto a metric which is not positive-definite on the space Sp(2n,R) ⊂ Gl(2n,R). Toverify such property, it is sufficient to evaluate the structure of the squared norm‖V ‖2X = tr((X−1V )2) with X ∈ Sp(2n,R) and V ∈ TXSp(2n,R). By the structureof the tangent space TXSp(2n,R), it is known that X−1V = QS with S ∈ R2n×2n

symmetric. It holds that:

QS =

[

0n In−In 0n

] [

S1 S2

ST2 S3

]

=

[

ST2 S3

−S1 −S2

]

,

with S1, S3 ∈ Rn×n symmetric and S2 ∈ R

n×n arbitrary. Hence, tr((QS)2) =2tr(S2

2 )− 2tr(S1S3) has indefinite sign.Under the pseudo-Riemannian metric (2.49), it is indeed possible to solve the

geodesic equation in closed form. The solution makes use of the exponential of amatrix X ∈ Rp×p, which is defined by the series:

exp(X)def= Ip +

∞∑

r=1

Xr

r!. (2.50)

Theorem 2.4. The geodesic curve ρX,V : [0, 1] → Sp(2n,R) with X ∈ Sp(2n,R),V ∈ TXSp(2n,R) corresponding to the indefinite Khvedelidze-Mladenov metric (2.49)has expression:

ρX,V (t) = X exp(tX−1V ). (2.51)

Proof. By the equivalence principle stated in the Lemma 2.2, the geodesic equa-tion in the variational form writes:

δ

∫ 1

0

tr(γ−1γγ−1γ)dt = 0, (2.52)

where the natural parametrization is assumed. The above variation is computed onthe basis of the variational method on manifolds recalled in the Lemma 2.3 and isfacilitated by the following rules of calculus of variations:

δ(X−1) = −X−1(δX)X−1, (2.53)

δ

(

dX

dt

)

=d

dt(δX), (2.54)

δ(XZ) = (δX)Z +X(δZ), (2.55)

for curvesX,Z : [0, 1] → Sp(2n,R). By computing the variations, integrating by partsand recalling that the variation vanishes at endpoints, it is found that the geodesicequation in variational form reads:

∫ 1

0

tr(δγ(γ−1γγ−1 − γ−1γγ−1γγ−1))dt = 0. (2.56)


The variation δγ ∈ TγSp(2n,R) is arbitrary. By the structure of the normal spaceNXSp(2n,R), the equation tr(PT δγ) = 0, with δγ ∈ TγSp(2n,R), implies that PT =ΩQγ−1 with Ω ∈ so(2n). Therefore, the equation (2.56) is satisfied if and only if:

γ−1γγ−1 − γ−1γγ−1γγ−1 = ΩQγ−1, Ω ∈ so(2n), (2.57)

or, equivalently,

γ − γγ−1γ = γΩQ, (2.58)

for some Ω ∈ so(2n). In order to determine the value of matrix Ω, note that:

γTQγ −Q = 0 ⇒ γTQγ + 2γT qγ + γT γ = 0.

Substituting the expression γ = γγ−1γ + γΩQ into the above equation yields thecondition QΩQ = 0. Hence, Ω = 0 and the geodesic equation in Christoffel formreads:

γ − γγ−1γ = 0. (2.59)

Its solution, with initial conditions γ(0) = X ∈ Sp(2n,R) and γ(0) = V ∈ TXSp(2n,R),is found to be of the form (2.51).

By definition of matrix exponential, it follows that ρX,V (t) = V exp(tX−1V ).The Lemma 2.1 applies to the map exp : sp(2n,R) → Sp(2n,R) around 0 ∈ sp(2n,R)and implies, in particular, that d exp |0(H) = H for every H ∈ sp(2n,R).

The structure of the pseudo-Riemannian gradient associated to the Khvedelidze-Mladenov metric (2.49) applied to the case of the real symplectic group Sp(2n,R) isgiven by the following result.

Theorem 2.5. The pseudo-Riemannian gradient of a sufficiently regular functionf : Sp(2n,R) → R associated to the Khvedelidze-Mladenov metric (2.49) reads:

∇Xf =1

2XQ

(

XT∂XfQ−Q∂TXfX

)

. (2.60)

Proof. The pseudo-Riemannian gradient of a regular function f : Sp(2n,R) → R

associated to the metric (2.49) is computed as the solution of the following system ofequations:

tr(X−1∇XfX−1V ) = tr(∂TXfV ), ∀V ∈ TXSp(2n,R),

∇TXfQX +XTQ∇Xf = 0.

(2.61)

The first equation expresses the compatibility of the pseudo-Riemannian gradientwith the metric, while the second equation expresses the requirement that the pseudo-Riemannian gradient be a tangent vector. The metric compatibility condition maybe rewritten as:

tr(V T (∂Xf −X−T∇TXfX−T )) = 0, (2.62)

where superscript −T denotes the inverse of the transposed matrix. The abovecondition implies that ∂Xf − X−T∇T

XfX−T ∈ NXSp(2n,R), hence that ∂Xf −X−T∇T

XfX−T = QXΩ with Ω ∈ so(2n). Therefore, the pseudo-Riemannian gra-dient of the criterion function f has the expression:

∇Xf = X∂TXfX −XΩQ. (2.63)

12 S. FIORI

In order to determine the value of the matrix Ω, it is sufficient to substitute theexpression (2.63) of the gradient within the tangency condition, which becomes:

(

X∂TXfX −XΩQ

)TQX +XTQ

(

X∂TXfX −XΩQ

)

= 0. (2.64)

Solving for Ω gives:

Ω = −1

2

(

QXT∂Xf + ∂TXfXQ

)

. (2.65)

Substituting the above expression of the variable Ω into the expression (2.63) of thepseudo-Riemannian gradient gives the result (2.60).

2.4. A Riemannian structure of the real symplectic group. For complete-ness of exposition, it is instructive to recall an advanced result about a Riemannianstructure of the real symplectic group of matrices.

Theorem 2.6 ([7]). Let σ : sp(2n,R) → sp(2n,R) be a symmetric positive-definite operator with respect to the Euclidean inner product on the space sp(2n,R).The minimizing curve of the integral

∫ 1

0

〈H(t), σ(H(t))〉Edt

over all curves γ : [0, 1] → Sp(2n,R) with fixed endpoints γ(0), γ(1), having defined

H(t)def= γ−1(t)γ(t), is the solution of the system:

γ(t) = γ(t)H(t),

M(t) = σT (H(t))M(t) −H(t)σT (H(t)),H(t) = σ−1(M(t)),

(2.66)

where symbol σ−1 denotes the inverse of the operator σ.The simplest choice for the symmetric positive-definite operator σ is the identity

map of the space sp(2n,R), which corresponds to a Riemannian metric for the realsymplectic group Sp(2n,R) given by the inner product:

〈U, V 〉X = tr((X−1U)T (X−1V )), X ∈ Sp(2n,R), U, V ∈ TXSp(2n,R). (2.67)

Such a metric leads to the ‘natural gradient’ on the space of real invertible matricesGl(p,R) studied in [1]. The above choice for the operator σ implies that M = H andthat the corresponding curve on the real symplectic group satisfies the equations:

H(t) = HT (t)H(t)−H(t)HT (t),H(t) = γ−1(t)γ(t),

(2.68)

or, in Christoffel form:

γ − γγ−1γ + γγTQγQγ−1γ − γγTQγQ = 0. (2.69)

The above equations describe geodesic arcs on the real symplectic group correspondingto the metric (2.67). Closed form solutions of the above equations are unknown tothe authors of [7] and to the present author.

The structure of the Riemannian gradient is given by the following result.


Theorem 2.7. The Riemannian gradient ∇Xf of a sufficiently regular functionf : Sp(2n,R) → R corresponding to the metric (2.67) reads:

∇Xf =1

2XQ

(

∂TXfXQ−QXT∂Xf

)

. (2.70)

Proof. The Riemannian gradient ∇Xf of a regular function f : Sp(2n,R) → R

must satisfy condition:

∇Xf = XQ(Ω−X−1Q∂Xf), with Ω =1

2(X−1Q∂Xf + ∂T

XfXQ), (2.71)

from which the expression of the Riemannian gradient associated to the metric (2.67)follows.

Clearly, the expressions of the gradients (2.60) and (2.70) differ from each otherbecause the metrics that they correspond to are different.

3. Minimal-distance optimization on pseudo-Riemannianmanifolds. Sev-eral numerical applications require the solution of a least-squares problem on a pseudo-Riemannian manifold. As an example, given a ‘cloud’ of points xk ∈ M on a manifoldM endowed with a distance function d(·, ·), its empirical mean µ ∈ M is defined as:

µdef=arg min

x∈M

∑

k

d2(x, xk). (3.1)

For a recent review, see, e.g., [19]. The empirical mean value µ of a distribution ofpoints on a manifold is instrumental in several applications. The mean value µ is,by definition, close to all points in the distribution. Therefore, the tangent spaceTµM associated to a cloud of data-points may serve as reference tangent space in thedevelopment of algorithms to process those data (see, for example, the discussion in[37] about ‘principal geodesic analysis’ of data). The notion of averaging over a curvedmanifold is depicted in the Figure 3.1. In addition, given a cloud of N points xk ∈ Mon a manifold M endowed with a distance function d(·, ·), its empirical mth-order(centered) moment is defined as:

µmdef=

1

N

∑

k

dm(µ, xk), m ≥ 2, (3.2)

where µ ∈ M denotes the empirical mean of the cloud. In particular, the momentµ2 denotes the empirical variance of the manifold-valued samples, which measuresthe width of the cloud around its center. The definition of empirical moments µm ispaired with the definition of empirical mean as in applied statistics on manifold theyprovide a characterization of the distribution of a data-cloud.

3.1. Optimization on manifolds. Optimization on manifold is a modern for-mulation of some classical constrained optimization problems. In particular, given adifferentiable manifold M and a sufficiently regular function f : M → R, minimiza-tion on manifold is expressed as:

minx∈M

f(x). (3.3)

The main tool for building algorithms to solve the optimization problem (3.3) was Rie-mannian geometry. Two seminal contributions in this area are the geodesic-stepping

14 S. FIORI

b

b

b

b

b

b

b

b

b

bb

xk

µ

M

Fig. 3.1. Notion of averaging over a manifold M. Dots represent the samples xk ∈ M toaverage while the open circle represents the empirical mean µ ∈ M. The dashed lines represent thedistances among the sample-points and the midpoint.

Riemannian-gradient-steepest-descent method of Luenberger [28] and its extension toRiemannian-Newtonian and quasi-Newtonian methods by Gabay [21], that have beeninspiring further research in the area of optimization on curved spaces since theirinception.

In particular, the Riemannian-gradient-steepest-descent numerical optimizationmethod is based on the knowledge of the explicit form of the geodesic arc emanatingfrom a given point along a given direction and of the Riemannian gradient of the(sufficiently regular) criterion function to optimize. The geodesic-based Riemannian-gradient-steepest-descent optimization method may be expressed as:

x(ℓ+1) = expx(ℓ)(−t(ℓ)∇x(ℓ)

f), with (3.4)

t(ℓ) = argmint>0

f(expx(ℓ)(−t∇x(ℓ)

f)), (3.5)

with ℓ ≥ 0 and with the initial guess x(0) ∈ M being arbitrarily chosen. The rule(3.5) to compute a stepsize schedule is well defined for any integer ℓ such that x(ℓ) isnot a critical point. In fact, it holds:

d

dtf(expx(ℓ)

(−t∇x(ℓ)f)) =

〈∇expx(ℓ)

(−t∇x(ℓ)f)f, d expx(ℓ)

|−t∇x(ℓ)f (−∇x(ℓ)

f)〉expx(ℓ)

(−t∇x(ℓ)f), (3.6)

hence, it holds that:

d

dtf(expx(ℓ)

(−t∇x(ℓ)f))

∣

∣

∣

∣

t=0

= 〈∇x(ℓ)f,−∇x(ℓ)

f〉x(ℓ)= −‖∇x(ℓ)

f‖2x(ℓ)< 0, (3.7)

provided x(ℓ) is not a critical point of the function f . The convergence of the geodesic-based Riemannian-gradient-steepest-descent optimization method (3.4) endowed withthe stepsize selection rule (3.5) is ensured by the following result.

Theorem 3.1 ([21]). Assume that the function f is continuously differentiable,that its critical values are distinct and that the connected subset of the level-set x ∈


M|f(x) < f(x(0)) containing x(0) is compact. Then the sequence x(ℓ) constructed bythe method (3.4) and (3.5) converges to a critical point of the function f .

Moreover, under appropriate conditions (see, e.g., Thorem 4.4 in [21]), the gradient-based optimization method converges linearly.

The application of such optimization method to the computation of the intrinsicmean of a set of points on a manifold is based on the knowledge of the explicit form ofthe geodesic distance between two given points on a manifold. The short discussionpresented in the §2.4 shows that the Riemannian setting is not always suitable tosuch purpose as some quantities needed to set-up an optimization procedure may beunavailable.

For such a reason, it is of interest to devise optimization algorithms that abstractfrom the Riemannian context. A notable contribution in this regard is given by thework [34] that suggests how to tackle optimization on affine manifolds. The ideasconveyed by paper [34] are summarized in the following.

Let M be a p-dimensional differentiable manifold and ∇ a connection on M.According to what has been recalled in §2.1, a connection is completely specified byChristoffel symbols, which, in turn, completely specify parallel translation. Paralleltranslation defines geodesic arcs which, in turn, define exponential maps. Denote byexp∇x : TxM → M the exponential map determined by the connection ∇ that carriesvectors from tangent spaces TxM to ∇-self-parallel curves through the pole x ∈ M.The descent method on an affine manifold proposed in [34] to numerically solve theoptimization problem (3.3) is summarized in the Algorithm 1. The method applies

Algorithm 1 Descent method on an affine manifold proposed in [34] to numeri-cally solve the optimization problem (3.3). The method is expressed in a completelycoordinate-free fashion.Set x(0) to an initial guess in MSet ℓ = 0repeat

Choose a tangent vector v(ℓ) ∈ Tx(ℓ)M such that dfx(ℓ)

(v(ℓ)) < 0Choose a connection ∇(ℓ)

Determine a value t(ℓ) ∈ R such that f(exp∇(ℓ)x(ℓ)

(t(ℓ)v(ℓ))) < f(x(ℓ))

Set x(ℓ+1) = exp∇(ℓ)x(ℓ)

(t(ℓ)v(ℓ))Set ℓ = ℓ+ 1

until x(ℓ) is close enough to a critical point of f

to the case that the initial guess x(0) is not a critical point of the criterion functionf already (a preventive check of this situation was omitted from the Algorithm 1).At any optimization step, a different connection ∇(ℓ) may be chosen (not necessarilya Levi-Civita connection), implying that no metrics are selected a priori and thatthe method depends on local affine structures. The paper [34] does not comment onhow to choose the move-along direction v(ℓ) nor a stepsize schedule t(ℓ) but suggestsa stopping criterion to halt the iteration. Let Γ∇ denote the parallel translationoperation defined by the connection ∇ and t 7→ exp∇x (tv) the curve associated tothe exponential map exp∇x emanating from the point x along the direction v ∈ TxMof an extent t. At x(0), select a basis e(0)1 , . . . , e

(0)p of the tangent space Tx(0)

Mand a precision constant ε > 0. A basis e(ℓ)1 , . . . , e

(ℓ)p of the tangent space Tx(ℓ)

Mmay be parallel translated along the curve γx(ℓ),v(ℓ)(t)

def= exp

∇(ℓ)x(ℓ)

(tv(ℓ)) to give a basis

16 S. FIORI

e(ℓ+1)1 , . . . , e

(ℓ+1)p of the tangent space Tx(ℓ+1)

M, namely, by

e(ℓ+1)i = Γ∇(ℓ)(γx(ℓ),v(ℓ))

t(ℓ)0 e

(ℓ)i . (3.8)

The proposed stopping criterion is to halt the iteration at step ℓ if it holds that:

|dfx(ℓ)(e

(ℓ)i )| < ε for every i ∈ 1, . . . , p. (3.9)

The analysis of convergence of the method in the Algorithm 1 with stopping criterion(3.9) is summarized as follows.

Theorem 3.2 ([34]). Let M be a p-dimensional differentiable manifold, f bea sufficiently regular function and x(ℓ) be the sequence generated by the method ofFigure 1. Suppose that the function f is bounded from below and that the connectedcomponent of the level set x ∈ M|f(x) < f(x(0)) containing x(0) is compact. Denoteby γ the piecewise differentiable curve constructed by joining together all the ∇(ℓ)-self-parallel curves properly re-parameterized. Let (E1, . . . ,Ep) denote a piecewise contin-uous frame along the curve γ, where each vector field Ej ∈ X(M) is obtained by the

prolongation of the tangent vectors e(ℓ)j along the curve γ in the intervals (t(ℓ), t(ℓ+1)).

(i) If there exists a frame (E1, . . . ,Ep) along the curve γ such that for every ε > 0the method halts according to the stopping criterion (3.9), then there exists asubsequence of x(ℓ) converging to a critical point of f .

(ii) If the function f is convex along γ, then the sequence x(ℓ) converges to aunique minimum point of the function f .

It is to be underlined that such a method relies on parallel translation to propagatea basis of the tangent spaces, which is not surely available or efficiently computablefor manifolds of interest.

3.2. Gradient-based optimization on pseudo-Riemannianmanifolds. LetM denote a p-dimensional pseudo-Riemannian manifold. A key step in pseudo-Riemannian geometry is to decompose each tangent space TxM as follows:

T+x Mdef

= v ∈ TxM|〈v, v〉x > 0,T 0xM

def= v ∈ TxM|〈v, v〉x = 0,

T−x Mdef

= v ∈ TxM|〈v, v〉x < 0.(3.10)

The above three components may be characterized by definingQαx

def= v ∈ TxM|〈v, v〉x =

α. If the set Qαx with α 6= 0 is nonempty, it is a submanifold of the space TxM of

dimension p − 1. The set Q0x = T 0

xM is nonempty as 0 ∈ Q0x and is a set of null

measure in TxM.

Example 1. The space Sp(2,R) is a 3-dimensional manifold (in fact, Sp(2,R) =Sl(2,R), the special linear group), therefore the following parametrization may be takenadvantage of:

(x11, x12, x21) ↔[

x11 x12

x211+x12x21

x11

]

∈ SpI(2,R), for x11 6= 0,

(0, x12, x22) ↔[

0 x12

− 1x12

x22

]

∈ SpII(2,R), for x12 6= 0,

(3.11)


where (SpI(2,R), SpII(2,R)) is a partition of the space Sp(2,R). For X ∈ SpII(2,R),it holds that:

X−1 =

[

0 x12

− 1x12

x22

]−1

=

[

x22 −x121

x120

]

, (3.12)

TXSpII(2,R) ∋ V =

[

v11 v12v11

x22

x12+ v12

1x212

v22

]

. (3.13)

Hence, straightforward calculations lead to:

〈V, V 〉X = 2

(

v212x212

+v11x22v12

x12− v11v22

)

. (3.14)

As a consequence, it holds that:

T 0XSpII(2,R) =

[

v11 v12v11

x22

x12+ v12

1x212

v22

]∣

∣

∣

∣

v212 + v11v12x22x12 − v11v22x212 = 0

.

(3.15)

When a pseudo-Riemannian manifold of interest M and a regular criterion func-tion f : M → R are specified, an optimization rule is ‘gradient steepest descent’,expressed by [20]:

x =

−∇xf if ∇xf ∈ T+x M∪ T 0

xM,∇xf if ∇xf ∈ T−

x M,(3.16)

whose solution induces the dynamics:

f =

−‖∇xf‖2x if ∇xf ∈ T+x M∪ T 0

xM‖∇xf‖2x if ∇xf ∈ T−

x M

≤ 0. (3.17)

The differential equation (3.16) on the manifold M may be solved numerically bya geodesic-based stepping method. It generalizes the forward Euler method, whichmoves a point along a straight line in the direction of the Euclidean gradient. In theEuler method, the extension of each step is controlled by a parameter termed stepsize.In the same manner, a geodesic-based stepping method moves a point along a shortarc in the direction of the pseudo-Riemannian gradient.

The optimization method based on geodesic stepping reads:

x(ℓ+1) =

expx(ℓ)(−t(ℓ)∇x(ℓ)

f) if ∇x(ℓ)f ∈ T+

x(ℓ)M∪ T 0

x(ℓ)M,

expx(ℓ)(t(ℓ)∇x(ℓ)

f) if ∇x(ℓ)f ∈ T−

x(ℓ)M,

(3.18)

where ℓ ≥ 0 denotes iteration step-counter, symbol expx(v) denotes an exponentialmap on M at a point x ∈ M applied to the tangent vector v ∈ TxM, while thesuccession t(ℓ) > 0 denotes an optimization stepsize schedule and the succession x(ℓ) ∈M denotes an approximation of the actual solution of the differential equation (3.16)

at time∑ℓ−1

i=1 t(i). The initial guess to start iteration is denoted by x(0) ∈ M.

The stepsize t(ℓ) at any step may be defined by the counterpart of the ‘line search’method of the geodesic-based gradient-steepest-descent optimization on Riemannian

18 S. FIORI

manifolds (3.5) adapted to the present geometrical setting. Such a method may beexpressed as:

t(ℓ)def=

argmint>0f(expx(ℓ)(−t∇x(ℓ)

f)) if ∇x(ℓ)f ∈ T+

x(ℓ)M,

1 if ∇x(ℓ)f ∈ T 0

x(ℓ)M,

argmint>0f(expx(ℓ)(t∇x(ℓ)

f)) if ∇x(ℓ)f ∈ T−

x(ℓ)M,

(3.19)

provided x(ℓ) is not a critical point. The method (3.19) involves a nonlinear optimiza-tion sub-problem in the parameter t that does not admit any closed-form solution, ingeneral. Moreover, the definition (3.19) is given in a way that takes into account thatthe function f(expx(ℓ)

(∓t∇x(ℓ)f)) is locally decreasing for ∇x(ℓ)

f ∈ T±x(ℓ)

M while it

is locally flat for ∇x(ℓ)f ∈ T 0

x(ℓ)M. A consequence of such phenomenon is that the

line-search method (3.5) cannot be extended directly, otherwise the resulting stepsizewould not be well-defined. The obstruction just encountered is due to the presence ofthe tangent-space part T 0

xM, as moving along a direction v ∈ T 0xM might increase

the value of the criterion function for any value of the stepsize t, unless t = 0. The caset = 0 would clearly imply that the optimization method has reached an impasse point,namely, the method cannot proceed any further in the optimization of the criterionfunction f .

In the hypothesis that over an optimization trajectory it holds that ∇xf /∈ T 0xM,

a stepsize schedule t(ℓ) that approximates the result of the line-search method (3.19)may be chosen on the basis of the following argument. Define the function:

f(ℓ)def= f ρx(ℓ),v(ℓ) : [0, 1] → R, (3.20)

where v(ℓ) = −∇x(ℓ)f if ∇x(ℓ)

f ∈ T+x(ℓ)

M while v(ℓ) = ∇x(ℓ)f if ∇x(ℓ)

f ∈ T−x(ℓ)

M. The

value f(ℓ)(0) − f(ℓ)(t) ≥ 0 denotes the decrease of the criterion function f subjectedto a pseudo-geodesic step of stepsize t. If the value of the geodesic stepsize t is smallenough, such decrease may be expanded in Taylor series truncated to the second order:

f(ℓ)(0)− f(ℓ)(t) = −f1(ℓ)t−1

2f2(ℓ)t

2 + o(t2), (3.21)

where, by definition of Taylor series expansion:

f1(ℓ)def=

df(ℓ)dt

∣

∣

∣

∣

∣

t=0

, f2(ℓ)def=

d2f(ℓ)dt2

∣

∣

∣

∣

∣

t=0

. (3.22)

Under such second-order Taylor approximation, the stepsize value that maximizes thedecrease rate is:

t(ℓ)def=argmax

t

(

−f1(ℓ)t−1

2f2(ℓ)t

2

)

= −f1(ℓ)f−12(ℓ). (3.23)

(The above choice of optimization stepsize is sound only if f1(ℓ) ≤ 0 and f2(ℓ) > 0,

hence t(ℓ) ≥ 0.) The above procedure may increase significantly the computationalcomplexity of the optimization algorithm as the repeated computation of the coeffi-cient f2(ℓ) at every iteration may be computationally cumbersome. Therefore, as far asthe computational complexity of the optimization algorithm is of concern, the aboveprocedure should be limited to cases where the computation of second derivatives issimple.


A stopping criterion for the iteration (3.18) may be adapted from criterion (3.9).

Denote by e(ℓ)i a frame of the tangent space Tx(ℓ)M of the pseudo-Riemannian

manifold M. Given a precision value ε > 0, the stopping criterion (3.9) halts iterationat step ℓ if:

|〈∇x(ℓ)f, e

(ℓ)i 〉x(ℓ)

| < ε for every i ∈ 1, . . . , p, (3.24)

where the initial frame e(0)i is propagated via the rule (3.8).Another stopping criterion, given a precision value ε > 0 and under the hypothesis

that over an optimization trajectory it holds that ∇xf /∈ T 0xM, halts iteration at step

ℓ if:

−f1(ℓ) < ε. (3.25)

Such stopping criterion is simpler to implement than (3.24) and will be actually usedin the following sections.

Unlike the method (3.4)-(3.5), that converges under appropriate conditions, themethod (3.18) is not necessarily convergent, in general, due to the presence of thetangent-space component T 0

xM. However, numerical experiments suggest that opti-mization trajectories ℓ 7→ x(ℓ) are such that ∇x(ℓ)

f /∈ T 0x(ℓ)

M and that, as a conse-

quence, the optimization method (3.18) endowed with the stepsize selection method(3.19) enjoys the convergence properties expressed by Theorem 3.1 and, providedthat it generates a sequence converging to a critical point of the criterion function,its convergence is linear.

3.3. A distance function on the real symplectic group. In the presentpaper, the real symplectic group is treated as a pseudo-Riemannian manifold with abi-invariant pseudo-Riemannian metric. Although it is possible to introduce a pseudo-distance function that is compatible with the pseudo-Riemannian metric (see, e.g.,[33]), such function is not positive definite and cannot be interpreted as a distancefunction.

The research work [42] is concerned with the distance function on the manifoldof real symplectic matrices defined by:

d2(X,Y )def= ‖X − Y ‖2F = tr((X − Y )T (X − Y )), X, Y ∈ Sp(2n,R). (3.26)

The paper [42] investigated on the nature of the critical points of the function d2(X, Y )with Y ∈ Sp(2n,R) fixed, namely, of the points that satisfy the equation∇Xd2(X, Y ) =0. Generally, there exist multiple solutions for the critical points, which form a sub-manifold of the space Sp(2n,R). A topological study of the critical submanifold basedon Hessian analysis reveals that it is formed by a number of separate submanifolds,each of which is characterized by a dimension and a local optimality status (local max-imum, minimum or saddle). The result of such non-trivial analysis is summarized bythe following theorem.

Theorem 3.3 ([42]). The least-squares distance function X 7→ d2(X, Y ) withfixed Y ∈ Sp(2n,R) has a unique minimum X = Y in Sp(2n,R) and the rest ofcritical submanifolds are all saddles.

On the manifold Sp(2n,R), the empirical mean µ ∈ Sp(2n,R) of a data-setXk ⊂ Sp(2n,R) of cardinality N may be defined as in equation (3.1) with dis-tance function chosen as in definition (3.26). Likewise, the empirical variance of the

20 S. FIORI

data-set may be defined as in equation (3.2) with m = 2. The criterion functionf : Sp(2n,R) → R to minimize is:

f(X)def=

1

2N

∑

k

‖X −Xk‖2F. (3.27)

For the sake of notational convenience, set:

Cdef=

1

N

∑

k

Xk ∈ R2n×2n. (3.28)

The criterion function (3.27) may be recast as f(X) = 12‖X − C‖2F + constant. The

analytical study of the critical landscape topology of such function is nontrivial andthe present author is not aware of any result about the critical points of the function‖X − C‖2F over X ∈ Sp(2n,R). Hence, the problem of its minimization is treatednumerically in the present paper. It holds that ∂Xf = X − C, hence, according tothe Theorem 2.5, the pseudo-Riemannian gradient of the criterion function (3.27) onthe real symplectic group endowed with the Khvedelidze-Mladenov metric reads:

∇Xf =1

2Q(X − C)Q+

1

2X(X − C)TX. (3.29)

In order to compute the partition (3.10), it is worth noting that:

2tr(

(

X−1∇Xf)2)

= tr(

(X − C)TX(X − C)TX + (X − C)TQ(X − C)Q)

. (3.30)

The computation of the stepsize at each iteration of the algorithm (3.18) to opti-mize the criterion function (3.27) according to the rule (3.23), is based on the followingresults:

1

2

d

dt‖X exp(tX−1V )− Y ‖2F = tr

(

(

X exp(tX−1V )− Y)T

V exp(tX−1V ))

(3.31)

1

2

d2

dt2‖X exp(tX−1V )− Y ‖2F = ‖V exp(tX−1V )‖2F +

tr(

(

X exp(tX−1V )− Y)T

V X−1V exp(tX−1V ))

. (3.32)

The coefficients f1 and f2 of the function f = f ρX,V , with f as in definition (3.27),read:

f1 = tr(

(X − C)TV)

, (3.33)

f2 = tr(

V TV + (X − C)TV X−1V)

. (3.34)

The sign of the coefficients f1 and f2 may be evaluated as follows. In the case that∇Xf ∈ T+

XSp(2n,R) ∪ T 0XSp(2n,R), the algorithm (3.18) takes V = −∇Xf , hence

f1 = −tr((X − C)T∇Xf) = 12 tr((X − C)TX(X − C)TX + (X − C)TQ(X − C)Q),

therefore, by equation (3.30), it holds that f1 = −tr((X−1∇Xf)2) ≤ 0. Conversely,when ∇Xf ∈ T−

XSp(2n,R), the algorithm (3.18) takes the search direction V = ∇Xf ,

hence f1 = tr((X−1∇Xf)2) ≤ 0. The coefficient f2 has a quadratic dependence onmatrix V , hence it has the same value for V = ±∇Xf . The coefficient f2 is computed


as the sum of two terms, ‖V ‖2F and tr((X − C)TV X−1V ). The first term is non-negative for every V ∈ TXSp(2n,R), while the second term is indefinite. To evaluatethe magnitude of the second term, note that:

|tr((X − C)TV X−1V )| ≤ ‖X − C‖F‖V X−1V ‖F ≤ ‖X − C‖F‖V ‖2F‖X−1‖F. (3.35)

By the identity X−1 = −QXTQ, it follows that ‖X−1‖F = ‖X‖F. Moreover, by thetriangle inequality it follows that ‖X‖F ≤ ‖X − C‖F + ‖C‖F. Hence:

|tr((X − C)TV X−1V )| ≤ (‖X − C‖2F + ‖X − C‖F‖C‖F)‖V ‖2F. (3.36)

Fixed ‖C‖F, for ‖X − C‖F sufficiently small, the term ‖V ‖2F dominates the sum in

the expression (3.34), hence f2 may be non-negative as well.The initial guessX(0) may be chosen or randomly generated in Sp(2n,R), provided

it meets the condition

tr(

∇TX(0)

f∇X(0)f + (X(0) − C)T∇X(0)

fX−1(0)∇X(0)

f)

> 0. (3.37)

The proposed procedure to optimize the criterion function (3.27) may be sum-marized by the pseudocode listed in the Algorithm 2, where it is assumed that theoptimization sequence ℓ 7→ X(ℓ) is such that ∇X(ℓ)

f /∈ T 0X(ℓ)

M. In the Algorithm 2,

the quantity ℓ denotes a step counter, the matrix J(ℓ) represents the Euclidean gradi-ent of the criterion function (3.27), the matrix U(ℓ) represents its Riemannian gradientand the sign of the scalar quantity s(ℓ) determines whether the matrix U(ℓ) belongs

to the space T+X(ℓ)

Sp(2n,R) or to the space T−X(ℓ)

Sp(2n,R).

3.4. Computational issues. From a computational viewpoint, the evaluationof the geodesic function (2.51) is the most expensive operation required in the imple-mentation of the optimization algorithm (3.18).

By definition, the exponential of a Hamiltonian matrix H ∈ sp(2n,R) is givenby the infinite series (2.50). A fundamental result to ease such computation is theCayley-Hamilton theorem. The characteristic polynomial of H ∈ sp(2n,R) is defined

as q(λ)def= det(λI2n − H), where λ ∈ C. According to the Cayley-Hamilton theorem

applied to the present case, the matrix H satisfies the equation q(H) = 0. A directconsequence of such result is that the power H2n may be expressed as a liner com-bination of the powers Hr with r = 0, 1, . . . , 2n − 1. Hence, the geodesic equation(2.51) may be expressed as the finite sum:

ρX,V (t) = X [ρ0(t,X−1V )I2n+ρ1(t,X

−1V )X−1V+· · ·+ρ2n−1(t,X−1V )(X−1V )2n−1],

(3.38)with ρi : [0, 1]×sp(2n,R) → R. The advantages of the expression (3.38) of the matrix-exponential are that the expression is exact and that its computational complexityis very reasonable, as the coefficients ρi are scalar functions of the entries of thematrix X−1V and a careful implementation of the matrix-powers in the expression(3.38) costs as much as the computation of the highest-order term (X−1V )2n−1. Theproblem of the computation of the coefficients ρi has been studied and expressionsof such coefficients are available from the literature [35, 36]. The computation of thescalar coefficients ρi is facilitated by the highly-structured form of the Hamiltonianmatrices.

Example 2. From the expression of the tangent space TXSpI(2,R) evaluatedat the identity (X = I2), it follows that the Lie algebra sp(2,R) coincides with the

22 S. FIORI

Algorithm 2 Pseudocode to implement averaging over the real symplectic groupaccording to the optimization rule (3.18) endowed with stepsize-selection rule (3.23)and stopping criterion (3.25).

Set Q =

[

0n In−In 0n

]

Set ℓ = 0Set C = 1

N

∑

k Xk

Set X(0) to an initial guess in Sp(2n,R)Set ε to desired precisionrepeat

Compute J(ℓ) = X(ℓ) − C

Compute U(ℓ) =12 (QJ(ℓ)Q+X(ℓ)J

T(ℓ)X(ℓ))

Compute s(ℓ) = tr((X−1(ℓ)U(ℓ))

2)

if s(ℓ) > 0 then

Set V(ℓ) = −U(ℓ)

else

Set V(ℓ) = U(ℓ)

end if

Compute f1(ℓ) = tr(JT(ℓ)V(ℓ))

Compute f2(ℓ) = tr(V T(ℓ)V(ℓ)) + tr(JT

(ℓ)V(ℓ)X−1(ℓ) V(ℓ))

Set t(ℓ) = −f1(ℓ)/f2(ℓ)Set X(ℓ+1) = X(ℓ) exp(t(ℓ)X

−1(ℓ) V(ℓ))

Set ℓ = ℓ+ 1until −f1(ℓ) < ε

set of real 2 × 2 matrices with zero trace. Namely, any matrix H ∈ sp(2,R) may berepresented as:

H =

[

h11 h12

h21 −h11

]

. (3.39)

It follows that H2 = (h211+h12h21)I2, H

3 = (h211+h12h21)H, H4 = (h2

11+h12h21)2I2,

and so forth. Hence, by the definition of matrix exponential, it follows that:

exp(H) =

[

1 +1

2!(h2

11 + h12h21) +1

4!(h2

11 + h12h21)2 + . . .

]

I2

+

[

1 +1

3!(h2

11 + h12h21) +1

5!(h2

11 + h12h21)2 + . . .

]

H, (3.40)

namely:

exp(H) =

I2 cosh√

− det(H) +Hsinh

√− det(H)√

− det(H)if det(H) < 0,

I2 +H if det(H) = 0,

I2 cos√

det(H) +Hsin

√det(H)√

det(H)if det(H) > 0.

(3.41)

Another exact method applies when the matrix to be exponentiated is diago-nalizable. As a reference on the eigen-structure of a Hamiltonian matrix see, e.g.,


[40]. Assume that there exist matrices U,D ∈ R2n×2n with D diagonal such thatX−1V = UDU−1. Then, it holds that:

ρX,V (t) = XU exp(tD)U−1. (3.42)

In addition, in numerical linear algebra, there is a large body of numerical recipesto compute the exponential of a matrix (see, for example, [24]). An interesting exam-ple is based on the following approximation:

exp(H) = exp

(

H

2

)(

exp

(

−H

2

))−1

≈(

I2n +H

2

)(

I2n − H

2

)−1

, (3.43)

for H ∈ sp(2n,R) and H−2I2n ∈ Gl(2n,R). The map on the rightmost side is known

as Cayley map, which is defined as cay(X)def=(Ip +X)(Ip −X)−1 for X ∈ Rp×p and

X−Ip nonsingular. Note that cay(12H) is an approximation of exp(H), yet the Cayley

function maps a Hamiltonian matrix into a real symplectic matrix1 [4].The computation burden associated to the operations listed in the Algorithm 2,

besides of the exponential function, may be evaluated in terms of the size 2n of theinvolved matrices. Few preliminary observations are in order:

• The implementation of the Algorithm 2 requires the evaluation of the inverseX−1 of a symplectic matrix X ∈ Sp(2n,R). By the definition of symplecticmatrices, it is inferred that X−1 = −QXTQ, which essentially correspondsto block swapping and sign-switching. Namely, upon the n× n 4-block par-

tition X =[

A BC D

]

, it results that X−1 =[

DT −BT

−CT AT

]

. Hence, in the present

evaluation of computation burden, the inversion of a symplectic matrix isconsidered as costless.

• Some terms repeat across the lines of the pseudocode of Algorithm 2, hencea careful implementation is assumed, which does not perform the same com-putation more than once.

• Except for matrix exponentiation and one scalar division, the rest of theoperations are multiplications and additions. The latter are grouped intoMAC (Multiply & ACcumulate) operations.

On the basis of the above observations, the computational complexity of the termsinvolved in the Algorithm 2 have been estimated as shown in the Table 3.1. The totalcomputation burden of the algorithm has been estimated in 64n3 MACs, 1 scalardivision and 1 matrix exponential evaluation.

4. Numerical tests. The numerical behavior of the developed minimal-distancealgorithm will be illustrated via different tests. A numerical test concerns the compu-tation of the empirical mean out of a set of real symplectic matrices of size 2× 2. Thespace Sp(2,R) is 3-dimensional, therefore graphical representations of the quantitiesof interest is allowed. A further numerical test concerns the computation of the em-pirical mean of a set of given real symplectic matrices in Sp(2n,R) with n > 1. The

optimization algorithm was coded in MATLABR©

6.0.The numerical experiments presented in the following sections rely on the avail-

ability of a way to generate pseudo-random samples on the the real symplectic groupSp(2n,R). Given a point X ∈ Sp(2n,R), that will be referred to in the following as

1Indeed, the Cayley map may be applied in the numerical integration of ordinary differentialequations on quadratic matrix group-manifolds [27] that generalize the notion of symplectic group.

24 S. FIORI

Term MACs div exp

U(ℓ) 16n3 – –s(ℓ) 16n3 – –

f1(ℓ) 8n3 – –

f2(ℓ) 16n3 – –

t(ℓ) – 1 –X(ℓ+1) 8n3 – 1

Table 3.1Estimate of the computational complexity of the terms involved in the Algorithm 2. (The symbol

‘div’ denotes scalar division while the term ‘exp’ denotes matrix exponential.)

‘center of mass’ or simply center of the random distribution, it is possible to generatea random sample Y ∈ Sp(2n,R) in a neighbor of matrix X by the rule:

Y = X exp(X−1V ), (4.1)

where the direction V ∈ TXSp(2n,R) is randomly generated around 0 ∈ TXSp(2n,R).That the points Y generated by the exponential rule (4.1) distribute around the pointX is confirmed by the following result.

Lemma 4.1. Let (X,V ) ∈ TSp(2n,R) and Y = X exp(X−1V ). Then it holdsthat:

‖Y −X‖F ≤ ‖X‖F[exp(‖X−1V ‖F)− 1]. (4.2)

Proof. As ‖Y −X‖F = ‖X(exp(X−1V )− I2n)‖F, it follows that:

‖Y −X‖F ≤ ‖X‖F∥

∥

∥

∥

∥

∞∑

k=1

(X−1V )k

k!

∥

∥

∥

∥

∥

F

≤ ‖X‖F∞∑

k=1

‖X−1V ‖kFk!

. (4.3)

The series on the rightmost term converges to exp(‖X−1V ‖F)− 1.Recall that, given a point X ∈ Sp(2n,R) and a matrix S ∈ R2n×2n symmetric,

because of the structure of the tangent space TXSp(2n,R), the matrix V = XQSbelongs to the space TXSp(2n,R). Hence, it is sufficient to generate a random sym-metric matrix S ∈ R2n×2n to get a random point X exp(QS) in Sp(2n,R). In turn, asymmetric 2n×2nmatrix may be generated by the rule S = AT +A, with A ∈ R2n×2n

having zero-mean randomly-generated entries. All in one:

Y = X exp(Q(AT +A)). (4.4)

Note that ‖Q(AT +A)‖2F = 2(tr(A2)+ ‖A‖2F), hence, by the Lemma 4.1, it holds that

‖Y −X‖F ≤√2‖X‖F

√

tr(A2) + ‖A‖2F, namely, the variance of the entries aij controlsthe spread of the random real-symplectic sample-matrices Y around the center of massX .

4.1. Averaging over the manifold Sp(2,R). A close-up of the numerical be-havior of the minimal-distance optimization algorithm (3.18) stems from the exami-nation of the case of averaging over the manifold Sp(2n,R) for n = 1 and N = 100.The elements of the group Sp(2,R) may be rendered on a 3-dimensional drawing.


The Figure 4.1 shows a result obtained with the iterative algorithm (3.18) and the100 samples Xk generated randomly around a randomly-selected center of mass. TheFigure 4.1 shows the location of the target matrices Xk (circles), the location of thecenter-of-mass (cross), the trajectory of the optimization algorithm over the searchspace (solid-dotted line) and the location of the final point computed by the algorithm(diamond). In order to emphasize the behavior of the optimization method (3.18),in this experiment a constant stepsize schedule t(ℓ) = 1

2 was used and no stoppingcriterion was used, in order to evidence the numerical stability of the method. The

0.650.7

0.750.8

0.850.9

0.951

−0.2

0

0.2

0.4

0.6−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

Fig. 4.1. Averaging over the real symplectic group Sp(2,R). Target matrices Xk are denoted bycircles. The center-of-mass is denoted by a cross mark. The trajectory of the optimization algorithmis denoted by a solid-dotted line, where any dot corresponds to an approximate solution X(ℓ). Thelast point of the trajectory is denoted by a diamond mark and represents the computed empiricalmean µ.

Figure 4.1 shows that the algorithm is convergent toward the center of mass. TheFigure 4.2 shows the value of the criterion function 1

2N

∑

k d2(X,Xk) during iteration,

the value of the Frobenius norm of its pseudo-Riemannian gradient during iteration,the distances d(X,Xk) before iteration (with initial guess chosen as X(0) = I2) andafter iteration.

4.2. Averaging over the manifold Sp(2n,R) with n > 1. The Figure 4.3shows a result obtained with the iterative algorithm (3.18) for n = 5 and N = 50.The Figure 4.3 shows the value of the criterion function 1

2N

∑

k d2(X,Xk) as well as

the value of the Frobenius norm of its pseudo-Riemannian gradient during iteration.In the shown example, the squared pseudo-norm of the pseudo-Riemannian gradientof the criterion function to optimize may assume negative, zero, as well as positivevalues during optimization, therefore, the Frobenius norm was selected for displayingin order not to miss the link with the zero-gradient condition at convergence. TheFigure 4.3 also shows the distances d(X,Xk) before iteration (with initial guess chosenas X(0) = I10) and after the iteration. Such numerical results were obtained by the

26 S. FIORI

0 20 40 60 80 1000

0.02

0.04

0.06

0.08

0.1

Opt

imiz

atio

n cr

iterio

n

Iteration0 20 40 60 80 100

10−8

10−6

10−4

10−2

100

Fro

beni

us n

orm

of g

radi

ent

Iteration

20 40 60 80 1000

0.2

0.4

0.6

0.8

1

Sample

Dis

tanc

e to

sam

ples

Before iteration

20 40 60 80 1000

0.2

0.4

0.6

0.8

1

Sample

Dis

tanc

e to

sam

ples

After iteration

Fig. 4.2. Averaging over the real symplectic group Sp(2,R). Criterion function (3.27) duringiteration, value of the Frobenius norm of the pseudo-Riemannian gradient of the criterion functionduring iteration, distances d(X,Xk) before iteration (X = X(0)) and after iteration (X = µ).

adaptive optimization stepsize schedule explained in subsection 3.2 and by employingthe stopping condition (3.25) with precision ε = 10−6. The Figure 4.3 shows thatthe algorithm converges steadily toward the minimal distance configuration (in fact,the distances from the found empirical mean are much smaller than the distancesfrom the initial guess). The Figure 4.4 shows the value of the ‘symplecticity’ index‖XTQX − Q‖F during iteration as well as the value of the optimization stepsizeschedule. The value of the coefficients f1 and f2 during iteration is displayed as well.The panels show that the matrix sequence X(ℓ) remains on the manifold Sp(10,R) ateach iteration (up to machine precision).

The Figure 4.5 shows the result of an empirical statistical analysis about thebehavior of the optimization algorithm over 500 independent trials. In each trial,the algorithm starts from a randomly generated initial guess X(0). In particular,the Figure 4.5 shows the distribution of the number of iterations that optimizationtakes to reach the desired precision on each trial. The convergence speed varies withthe initial guess while the algorithm converges in every trial to the same value of thecriterion function (namely, to 1

2µ2 ≈ 0.9827, in this test) and takes no more than 10−3

seconds to run on an average computation platform (4GB RAM, 2.17 GHz clock).

5. Conclusion. Gradient-based optimization methods can be used to solve ap-plied mathematics problems and are relatively easy to implement on a computer. Thepresent paper discusses a minimal-distance problem formulated over the manifold ofreal symplectic matrices. The present research takes its moves from the followingobservations:

• Minimal-distance problems may be extended from flat spaces to curved smoothmetrizable manifolds by the choice of an appropriate distance function.

• The resulting minimization problem on manifold may be tackled via a gradient-


0 20 40 601

1.5

2

2.5

3

3.5

Opt

imiz

atio

n cr

iterio

n

Iteration0 20 40 60

10−3

10−2

10−1

100

101

Fro

beni

us n

orm

of g

radi

ent

Iteration

10 20 30 40 500

0.5

1

1.5

2

2.5

3

3.5

Sample

Dis

tanc

e to

sam

ples

Before iteration

10 20 30 40 500

0.5

1

1.5

2

2.5

3

3.5

Sample

Dis

tanc

e to

sam

ples

After iteration

Fig. 4.3. Optimization over the symplectic group Sp(10,R). Criterion function (3.27) duringiteration, value of the Frobenius norm of the pseudo-Riemannian gradient of the criterion functionduring iteration, distances d(X,Xk) before iteration (X = X(0)) and after iteration (X = µ).

descent algorithm tailored to the geometry of parameter manifold through ageodesic-stepping method.

The real symplectic group is regarded as a pseudo-Riemannian manifold and ametric is chosen that affords the computation of closed-forms of geodesic arcs. Withinsuch geometric setting, the considered minimal-distance problem may be set up andthe geodesic-based numerical stepping method may be implemented. The paper alsopresents a way of selecting an appropriate stepsize schedule based on local quadraticapproximation of the criterion function restricted on a geodesic arc.

The present manuscript is related to the previous manuscript [18]. The presentmanuscript is entirely dedicated to the optimization over the manifold of real sym-plectic matrices. The mathematical instruments needed within the present paper areexpressed in the §2 in a fully differential-geometrical fashion, while paper [18] wasbased on different concepts (for example, the Lagrange multipliers method). Theoptimization problem to tackle differs from the problem considered in the previouspaper [18] and is fully supported by the study conducted in [42]. The §3 presentsa discussion about the difficulties encountered when trying to extend the Rieman-nian optimization setting to a pseudo-Riemannian optimization setting and, more ingeneral, to a general manifold, along with a discussion on the selection of a stepsizeschedule as well as of stopping criteria, on the convergence and on the computationalissues related to the discussed optimization method. In particular, the §3 evidenceshow the computational complexity of the discussed optimization algorithm on thespace Sp(2n,R) is of order (2n)3 (without any specifically-optimized linear-algebratool).

Although the present paper focuses on the study of minimal-distance problemson the manifold of real symplectic matrices, the main results of the paper and the

28 S. FIORI

0 20 40 6010

−15

10−14

10−13

Sym

plec

ticity

Iteration0 20 40 60

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Opt

imiz

atio

n st

epsi

ze

Iteration

0 20 40 60−2

−1.5

−1

−0.5

0

Coe

ffici

ent f

1

Iteration0 20 40 60

0

1

2

3

4

5

Coe

ffici

ent f

2

Iteration

Fig. 4.4. Optimization over the symplectic group Sp(10,R). Value of the symplecticity index‖XT

(ℓ)QX(ℓ) −Q‖F, of the coefficient f1(ℓ) and of the coefficient f2(ℓ) during iteration and value of

the optimization stepsize schedule t(ℓ) (the index ℓ is denoted as ‘Iteration’).

relevant calculations were presented in a rather general fashion, so that they could beapplied to other manifolds of interest to the readers.

Numerical tests have been performed with reference to the computation of theempirical mean of a collection of symplectic matrices. The obtained numerical resultsshow that the pseudo-Riemannian-gradient-based algorithm, along with a pseudo-geodesic-based stepping method, is suitable to the numerical solution of the posedminimal-distance problem.

Acknowledgments. The author wishes to gratefully thank the associate editorfor handling the review of the present manuscript and the anonymous referees for of-fering precious and stimulating comments and suggestions about the technical contentof the paper and for substantially contributing to the quality of its final version.

REFERENCES

[1] S.-i. Amari, Natural gradient works efficiently in learning, Neural Computation, Vol. 10, 251– 276, 1998

[2] P.A. Absil, R. Mahony and R. Sepulchre, Optimization Algorithms on Matrix Manifolds,Princeton University Press, 2008

[3] M. Aoki, Two complementary representations of multiple time series in state-space innova-tion forms, Journal of Forecasting (Special Issue on “Times Series Advances in EconomicForecasting”), Vol. 13, No. 2, pp. 69 – 90, 2006

[4] V.I. Arnol’d and A.B. Givental’, Symplectic geometry, in “Dynamical Systems IV: Symplec-tic Geometry & Its Applications” (V.I. Arnol’d and S.P. Novikov, Ed.s), Enciclopaedya ofMathematical Sciences, Vol. 4, pp. 1 – 138, Springer-Verlag (Second Edition), 2001

[5] V. Arsigny, P. Fillard, X. Pennec and N. Ayache, Geometric means in a novel vectorspace structure on symmetric positive-definite matrices, SIAM Journal on Matrix Analysisand Applications, Vol. 29, No. 1, pp. 328 – 347, 2006


0 100 200 300 400 500 600 700 8000

20

40

60

80

100

120

140

160

180

200

Iterations to converge

Occ

urre

nces

Fig. 4.5. Optimization over the symplectic group Sp(10,R). Distribution of the number ofiterations to converge on each trial on a total of 500 independent trials.

[6] S. D. Bartlett, B. Sanders, S. Braunstein and K. Nemoto, Efficient classical simulationof continuous variable quantum information processes, Physical Review Letters, Vol. 88,097904/1-4, 2002

[7] A.M. Bloch, P.E. Crouch, J.E. Marsden and A.K. Sayal, Optimal control and geodesicson quadratic matrix Lie groups, Foundations of Computational Mathematics, Vol. 8, pp.469 – 500, 2008

[8] Y. Borissov, Minimal/nonminimal codewords in the second order binary Reed-Muller codes:revisited, Proceedings of the Eleventh International Workshop on Algebraic and Combina-torial Coding Theory (June 16-22, 2008, Pamporovo, Bulgaria), pp. 29 – 34, 2008

[9] E. Celledoni and S. Fiori, Neural learning by geometric integration of reduced ‘rigid-body’equations, Journal of Computational and Applied Mathematics (JCAM), Vol. 172, No. 2,pp. 247 – 269, December 2004

[10] E. Celledoni and S. Fiori, Descent methods for optimization on homogeneous manifolds,Journal of Mathematics and Computers in Simulation (Special issue on “Structural Dy-namical Systems: Computational Aspects”, Guest Editors: N. Del Buono, L. Lopez andT. Politi), Vol. 79, No. 4, pp. 1298 – 1323, December 2008

[11] A. Edelman, T.A. Arias and S.T. Smith, The geometry of algorithms with orthogonalityconstraints, SIAM Journal on Matrix Analysis Applications, Vol. 20, No. 2, pp. 303 – 353,1998

[12] A.J. Draft, F. Neri, G. Rangarajan, D.R. Douglas, L.M. Healy and R.D. Ryne, Liealgebraic treatment of linear and nonlinear beam dynamics, Annual Review of Nuclearand Particle Science, Vol. 38, pp. 455 – 496, December 1988

[13] L. Dieci and L. Lopez, Smooth singular value decomposition on the symplectic group andLyapunov exponents approximation, CALCOLO, Vol. 43, No 1, pp. 1 – 15, March 2006

[14] F.M. Dopico and C.R. Johnson, Parametrization of the matrix symplectic group and ap-plications, SIAM Journal on Matrix Analysis and Applications, Vol. 31, pp. 650 – 673,2009

[15] A.J. Dragt, F. Neri, G. Rangarajan, D.R. Douglas, L.M. Healy and R.D. Ryne, Liealgebraic treatment of linear and nonlinear beam dynamics, Annual Review of Nuclearand Particle Science, Vol. 38, pp. 455 – 496, 1988

[16] J. Fan and P. Nie, Quadratic problems over the Stiefel manifold, Operations Research Letters,Vol. 34, pp. 135 – 141, 2006

[17] H. Fassbender, Symplectic Methods for the Symplectic Eigenproblem, Kluwer Aca-

30 S. FIORI

demic/Plenum Publisher, new York, 2000[18] S. Fiori, Learning by natural gradient on noncompact matrix-type pseudo-Riemannian mani-

folds, IEEE Transactions on Neural Networks, Vol. 21, No. 5, pp. 841 – 852, May 2010[19] S. Fiori and T. Tanaka, An algorithm to compute averages on matrix Lie groups, IEEE

Transactions on Signal Processing, Vol. 57, No. 12, pp. 4734 – 4743, December 2009[20] S. Fiori, Learning by natural gradient on noncompact matrix-type pseudo-Riemannian mani-

folds, IEEE Transactions on Neural Networks, Vol. 21, No. 5, pp. 841 – 852, May 2010[21] D. Gabay, Minimizing a differentiable function over a differentiable manifold, Journal of Op-

timization Theory and Applications, Vol. 37, No. 2, pp. 177 – 219, 1982[22] V. Guillemin and S. Sternberg, Symplectic Techniques in Physics, Cambridge University

Press, 1984[23] W.F. Harris, Astigmatic optical systems with separated and prismatic or noncoaxial elements:

system matrices and system vectors, Optometry and Vision Science, Vol. 70, No. 6, pp.545 – 551, 1994

[24] N.J. Higham, Functions of Matrices: Theory and Computation, SIAM, 2008[25] M.P. Keating, A matrix formulation of spectacle magnification, Ophthalmic and Physiological

Optics, Vol. 2, No. 2, pp. 145 – 158, 1982[26] A. Khvedelidze and D. Mladenov, Generalized Calogero-Moser-Sutherland models from

geodesic motion on GL+(n,R) group manifold, Physics Letters A, Vol. 299, No.s 5-6,pp. 522 – 530, July 2002

[27] L. Lopez, Numerical methods for ordinary differential equations on matrix manifolds, Journalof Computational and Applied Mathematics, Vol. 210, pp. 232 – 243, 2007

[28] D.G. Luenberger, The gradient projection method along geodesics, Management Science, Vol.18, No. 11, pp. 620 – 631, July 1972

[29] W.W. MacKay and A.U. Luccio, Symplectic interpolation, Proceedings of the Tenth Euro-pean Particle Accelerator Conference (EPAC 2006, June 26 - 30, 2006, Edinburgh, Scot-land), 2006

[30] C. Mehl, V. Mehrmann, A.C.M. Ran and L. Rodman, Perturbation analysis of Lagrangianinvariant subspaces of symplectic matrices, Linear and Multilinear Algebra, Vol. 57, No.2, pp. 141 – 184, January 2009

[31] J. Nocedal and S.J. Wright, Numerical Optimization, Springer Series in Operations Researchand Financial Engineering (2nd Edition), July 2006

[32] B. O’Neill, Semi-Riemannian Geometry - With Applications to Relativity, Academic Press,1983

[33] J. Peleska, A characterization for isometries and conformal mappings of pseudo-Riemannianmanifolds, Aequationes Mathematicae, Vol. 27, No 1, pp. 20 – 31, 1984

[34] C.L. Pripoae and G.T. Pripoae, General descent algorithm on affine manifolds, Proceedingsof “The Conference of Geometry and its Applications in Technology” and “The Workshopon Global Analysis, Differential Geometry and Lie Algebras” (Editor: Gr. Tsagas, 1999),pp. 165 – 169, Balkan Society of Geometers, Geometry Balkan Press, 2001

[35] V. Ramakrishna and F. Costa, On the exponentials of some structured matrices, Journal ofPhysics A: Mathematical and General, Vol. 37, pp. 11613 – 11627, 2004

[36] F. Silva-Leite and P. Crouch, Closed forms for the exponential mapping on matrix Liegroups, based on Putzer’s method, Journal of Mathematical Physics, Vol. 40, No. 7, pp.3561 – 3568, July 1999

[37] S. Sommer, F. Lauze, S. Hauberg and M. Nielsen, Manifold valued statistics, exact principalgeodesic analysis and the effect of linear approximations, Proceedings of the 11th EuropeanConference on Computer Vision (ECCV’10, Crete, Greece, September 5-11, 2010), PartVI, pp. 43 – 56, 2010

[38] N.G. Stephen, Transfer matrix analysis of the elastostatics of one-dimensional repetitivestructures, Proceedings of the Royal Society A, Vol. 462, No. 2072, pp. 2245 – 2270, August2006

[39] M. Spivak, A Comprehensive Introduction to Differential Geometry, 2nd Edition, Berkeley,CA: Publish or Perish Press, 1979

[40] C.F. Van Loan, A symplectic method for approximating all the eigenvalues of a Hamiltonianmatrix, Linear Algebra and Its Applications, Vol. 61, pp. 233 – 251, 1984

[41] R.-B. Wu, R. Chakrabarti and H. Rabitz, Optimal control theory for continuous-variablequantum gates, Physical Review A, Vol. 77, 052303, 2008

[42] R.-B. Wu, R. Chakrabarti and H. Rabitz, Critical landscape topology for optimization onthe symplectic group, Journal of Optimization Theory and Applications, Vol. 145, No. 2,pp. 387 – 406, January 2010

solving minimal-distance problems over the manifold...

Documents