a. further background on gauss quadratureproceedings.mlr.press/v48/lig16-supp.pdf · 2020. 11....
Post on 28-Jan-2021
3 Views
Preview:
TRANSCRIPT
-
Gauss quadrature for matrix inverse forms with applications
A. Further Background on Gauss QuadratureWe present below a more detailed summary of material on Gauss quadrature to make the paper self-contained.
A.1. Selecting weights and nodes
We’ve described that the Riemann-Stieltjes integral could be expressed as
I[f ] := Qn
+Rn
=
X
n
i=1
!i
f(✓i
) +
X
m
i=1
⌫i
f(⌧i
) +Rn
[f ],
where Qn
denotes the nth degree approximation and Rn
denotes a remainder term. The weights {!i
}ni=1
, {⌫i
}mi=1
andnodes {✓
i
}ni=1
are chosen such that for all polynomials of degree less than 2n+m�1, denoted f 2 P2n+m�1, we have exactinterpolation I[f ] = Q
n
. One way to compute weights and nodes is to set f(x) = xi for i 2n+m� 1 and then use thisexact nonlinear system. But there is an easier way to obtain weights and nodes, namely by using polynomials orthogonalwith respect to the measure ↵. Specifically, we construct a sequence of orthogonal polynomials p
0
(�), p1
(�), . . . such thatpi
(�) is a polynomial in � of degree exactly k, and pi
, pj
are orthogonal, i.e., they satisfy
Z
�
max
�
min
pi
(�)pj
(�)d↵(�) =
⇢
1, i = j0, otherwise.
The roots of pn
are distinct, real and lie in the interval of [�min
,�max
], and form the nodes {✓i
}ni=1
for Gauss quadra-ture (see, e.g., (Golub & Meurant, 2009, Ch. 6)).
Consider the two monic polynomials whose roots serve as quadrature nodes:
⇡n
(�) =Y
n
i=1
(�� ✓i
), ⇢m
(�) =Y
m
i=1
(�� ⌧i
),
where ⇢0
= 1 for consistency. We further denote ⇢+m
= ±⇢m
, where the sign is taken to ensure ⇢+m
� 0 on [�min
,�max
].Then, for m > 0, we calculate the quadrature weights as
!i
= I
⇢+m
(�)⇡n
(�)
⇢+m
(✓i
)⇡0n
(✓i
)(�� ✓i
)
�
, ⌫j
= I
⇢+m
(�)⇡n
(�)
(⇢+m
)
0(⌧
j
)⇡n
(⌧j
)(�� ⌧j
)
�
,
where f 0(�) denotes the derivative of f with respect to �. When m = 0 the quadrature degenerates to Gauss quadratureand we have
!i
= I
⇡n
(�)
⇡0n
(✓i
)(�� ✓i
)
�
.
Although we have specified how to select nodes and weights for quadrature, these ideas cannot be applied to our problembecause the measure ↵ is unknown. Indeed, calculating the measure explicitly would require knowing the entire spectrumof A, which is as good as explicitly computing f(A), hence untenable for us. The next section shows how to circumventthe difficulties due to unknown ↵.
A.2. Gauss Quadrature Lanczos (GQL)
The key idea to circumvent our lack of knowledge of ↵ is to recursively construct polynomials called Lanczos polynomials.The construction ensures their orthogonality with respect to ↵. Concretely, we construct Lanczos polynomials via thefollowing three-term recurrence:
�i
pi
(�) = (�� ↵i
)pi�1(�)� �i�1pi�2(�), i = 1, 2, . . . , n
p�1(�) ⌘ 0; p0(�) ⌘ 1, (A.1)
while ensuringR
�
max
�
min
d↵(�) = 1. We can express (A.1) in matrix form by writing
�Pn
(�) = Jn
Pn
(�) + �n
pn
(�)en
,
-
Gauss quadrature for matrix inverse forms with applications
where Pn
(�) := [p0
(�), . . . , pn�1(�)]>, en is nth canonical unit vector, and Jn is the tridiagonal matrix
Jn
=
2
6
6
6
6
6
6
4
↵1
�1
�1
↵2
�2
�2
. . . . . .
. . . ↵n�1 �n�1
�n�1 ↵n
3
7
7
7
7
7
7
5
. (A.2)
This matrix is known as the Jacobi matrix, and is closed related to Gauss quadrature. The following well-known theoremmakes this relation precise.
Theorem 10 ((Wilf, 1962; Golub & Welsch, 1969)). The eigenvalues of Jn
form the nodes {✓i
}ni=1
of Gauss-type quadra-tures. The weights {!
i
}ni=1
are given by the squares of the first elements of the normalized eigenvectors of Jn
.
Thus, if Jn
has the eigendecomposition Jn
= P>n
�Pn
, then for Gauss quadrature Theorem 10 yields
Qn
=
X
n
i=1
!i
f(✓i
) = e>1
P>n
f(�)Pn
e1
= e>1
f(Jn
)e1
. (A.3)
Specialization. We now specialize to our main focus, f(A) = A�1, for which we prove more precise results. In thiscase, (A.3) becomes Q
n
= [J�1n
]
1,1
. The task now is to compute Qn
, and given A, u to obtain the Jacobi matrix Jn
.
Fortunately, we can efficiently calculate Jn
iteratively using the Lanczos Algorithm (Lanczos, 1950). Suppose we havean estimate J
i
, in iteration (i + 1) of Lanczos, we compute the tridiagonal coefficients ↵i+1
and �i+1
and add them tothis estimate to form J
i+1
. As to Qn
, assuming we have already computed [J�1i
]
1,1
, letting ji
= J�1i
ei
and invoking theSherman-Morrison identity (Sherman & Morrison, 1950) we obtain the recursion:
[J�1i+1
]
1,1
= [J�1i
]
1,1
+
�2i
([ji
]
1
)
2
↵i+1
� �2i
[ji
]
i
, (A.4)
where [ji
]
1
and [ji
]
i
can be recursively computed using a Cholesky-like factorization of Ji
(Golub & Meurant, 2009, p.31).
For Gauss-Radau quadrature, we need to modify Ji
so that it has a prescribed eigenvalue. More precisely, we extend Ji
to J lri
for left Gauss-Radau (J rri
for right Gauss-Radau) with �i
on the off-diagonal and ↵lri
(↵rri
) on the diagonal, so thatJ lri
(J rri
) has a prescribed eigenvalue of �min
(�max
).
For Gauss-Lobatto quadrature, we extend Ji
to J loi
with values �loi
and ↵loi
chosen to ensure that J loi
has the prescribedeigenvalues �
min
and �max
. For more detailed on the construction, see (Golub, 1973).
For all methods, the approximated values are calculated as [(J 0i
)
�1]
1,1
, where J 0i
2 {J lri
, J rri
, J loi
} is the modified Jacobimatrix. Here J 0
i
is constructed at the i-th iteration of the algorithm.
The algorithm for computing Gauss, Gauss-Radau, and Gauss-Lobatto quadrature rules with the help of Lanczos itera-tion is called Gauss Quadrature Lanczos (GQL) and is shown in (Golub & Meurant, 1997). We recall its pseudocodein Algorithm 1 to make our presentation self-contained (and for our proofs in Section 4).
The error of approximating I[f ] by Gauss-type quadratures can be expressed as
Rn
[f ] =f (2n+m)(⇠)
(2n+m)!I[⇢
m
⇡2n
],
for some ⇠ 2 [�min
,�max
] (see, e.g., (Stoer & Bulirsch, 2013)). Note that ⇢m
does not change sign in [�min
,�max
]; but withdifferent values of m and ⌧
j
we obtain different (but fixed) signs for Rn
[f ] using f(�) = 1/� and �min
> 0. Concretely,for Gauss quadrature m = 0 and R
n
[f ] � 0; for left Gauss-Radau m = 1 and ⌧1
= �min
, so we have Rn
[f ] 0; for rightGauss-Radau we have m = 1 and ⌧
1
= �max
, thus Rn
[f ] � 0; while for Gauss-Lobatto we have m = 2, ⌧1
= �min
and⌧2
= �max
, so that Rn
[f ] 0. This behavior of the errors clearly shows the ordering relations between the target valuesand the approximations made by the different quadrature rules. Lemma 2 (see e.g., (Meurant, 1997)) makes this claimprecise.
-
Gauss quadrature for matrix inverse forms with applications
Algorithm 5 Gauss Quadrature Lanczos (GQL)Input: u and A the corresponding vector and matrix, �
min
and �max
lower and upper bounds for the spectrum of AOutput: g
i
, grri
, glri
and gloi
the Gauss, right Gauss-Radau, left Gauss-Radau and Gauss-Lobatto quadrature computed ati-th iteration
Initialize: u�1 = 0, u0 = u/kuk, ↵1 = u>0
Au0
, �1
= k(A�↵1
I)u0
k, g1
= kuk/↵1
, c1
= 1, �1
= ↵1
, �lr1
= ↵1
��min
,�rr1
= ↵1
� �max
, u1
= (A� ↵1
I)u0
/�1
, i = 2while i N do
↵i
= u>i�1Aui�1 {Lanczos Iteration}
ũi
= Aui�1 � ↵iui�1 � �i�1ui�2
�i
= kũi
kui
= ũi
/�i
gi
= gi�1 +
kuk�2i�1c2
i�1�i�1(↵i�i�1��2i�1)
{Update gi
with Sherman-Morrison formula}ci
= ci�1�i�1/�i�1
�i
= ↵i
� �2i�1�i�1
, �lri
= ↵i
� �min
� �2i�1�
lri�1
, �rri
= ↵i
� �max
� �2i�1�
rri�1
↵lri
= �min
+
�
2
i
�
lri
, ↵rri
= �max
+
�
2
i�
rri{Solve for J lr
i
and J rri
}↵loi
=
�
lri �
rri
�
rri��lri
(
�
max
�
lri
� �min�
rri), (�lo
i
)
2
=
�
lri �
rri
�
rri��lri
(�max
� �min
) {Solve for J loi
}glri
= gi
+
�
2
i c2
i kuk�i(↵
lri�i��2i )
, grri
= gi
+
�
2
i c2
i kuk�i(↵
rri �i��2i )
, gloi
= gi
+
(�
loi )
2
c
2
i kuk�i(↵
loi �i�(�loi )2)
{Update grri
, glri
and gloi
with Sherman-Morrisonformula}
i = i+ 1end while
Lemma 11. Let gi
, glri
, grri
, and gloi
be the approximations at the i-th iteration of Gauss, left Gauss-Radau, right Gauss-Radau, and Gauss-Lobatto quadrature, respectively. Then, g
i
and grri
provide lower bounds on u>A�1u, while glri
and gloi
provide upper bounds.
The final connection we recall as background is the method of conjugate gradients. This helps us analyze the speed atwhich quadrature converges to the true value (assuming exact arithmetic).
A.3. Relation with Conjugate Gradient
While Gauss-type quadratures relate to the Lanczos algorithm, Lanczos itself is closely related to conjugate gradient(CG) (Hestenes & Stiefel, 1952), a well-known method for solving Ax = b for positive definite A.
We recap this connection below. Let xk
be the estimated solution at the k-th CG iteration. If x⇤ denotes the true solutionto Ax = b, then the error "
k
and residual rk
are defined as
"k
:= x⇤ � xk
, rk
= A"k
= b�Axk
, (A.5)
At the k-th iteration, xk
is chosen such that rk
is orthogonal to the k-th Krylov space, i.e., the linear space Kk
spannedby {r
0
, Ar0
, . . . , Ak�1r0
}. It can be shown (Meurant, 2006) that rk
is a scaled Lanczos vector from the k-th iteration ofLanczos started with r
0
. Noting the relation between Lanczos and Gauss quadrature applied to appoximate r>0
A�1r0
, oneobtains the following theorem that relates CG with GQL.
Theorem 12 (CG and GQL; (Meurant, 1999)). Let "k
be the error as in (A.5), and let k"k
k2A
:= "Tk
A"k
. Then, it holdsthat
k"k
k2A
= kr0
k2([J�1N
]
1,1
� [J�1k
]
1,1
),
where Jk
is the Jacobi matrix at the k-th Lanczos iteration starting with r0
.
Finally, the rate at which k"k
k2A
shrinks has also been well-studied, as noted below.
Theorem 13 (CG rate, see e.g. (Shewchuk, 1994)). Let "k
be the error made by CG at iteration k when started with x0
.
-
Gauss quadrature for matrix inverse forms with applications
Let be the condition number of A, i.e., = �1
/�N
. Then, the error norm at iteration k satisfies
k"k
kA
2⇣
p� 1p+ 1
⌘
k
k"0
kA
.
Due to these explicit relations between CG and Lanczos, as well as between Lanczos and Gauss quadrature, we readilyobtain the following convergence rate for relative error of Gauss quadrature.Theorem 14 (Gauss quadrature rate). The i-th iterate of Gauss quadrature satisfies the relative error bound
gN
� gi
gN
2⇣
p� 1p+ 1
⌘
i
.
Proof. This is obtained by exploiting relations among CG, Lanczos and Gauss quadrature. Set x0
= 0 and b = u. Then,"0
= x⇤ and r0
= u. An application of Theorem 12 and Theorem 13 thus yields the bound
k"i
k2A
= kuk2([J�1N
]
1,1
� [J�1i
]
1,1
) = gN
� gi
2⇣
p� 1p+ 1
⌘
i
k"0
kA
= 2
⇣
p� 1p+ 1
⌘
i
u>A�1u = 2⇣
p� 1p+ 1
⌘
i
gN
where the last equality draws from Lemma 15.
In other words, Theorem 14 shows that the iterates of Gauss quadrature converge linearly.
B. Proofs for Main Theoretical ResultsWe begin by proving an exactness property of Gauss and Gauss-Radau quadrature.Lemma 15 (Exactness). With A being symmetric positive definite with simple eigenvalues, the iterates g
N
, glrN
, and grrN
are exact. Namely, after N iterations they satisfy
gN
= glrN
= grrN
= u>A�1u.
Proof. Observe that the Jacobi tridiagonal matrix can be computed via Lanczos iteration, and Lanczos is essentially es-sentially an iterative tridiagonalization of A. At the i-th iteration we have J
i
= V >i
AVi
, where Vi
2 RN⇥i are the first iLanczos vectors (i.e., a basis for the i-th Krylov space). Thus, J
N
= V >N
AVN
where VN
is an N ⇥N orthonormal matrix,showing that J
N
has the same eigenvalues as A. As a result ⇡N
(�) =Q
N
i=1
(�� �i
), and it follows that the remainder
RN
[f ] =f (2N)(⇠)
(2N)!I[⇡2
N
] = 0,
for some scalar ⇠ 2 [�min
,�max
], which shows that gN
is exact for u>A�1u. For left and right Gauss-Radau quadrature,we have �
N
= 0, ↵lrN
= �min
, and ↵rrN
= �max
, while all other elements of the (N +1)-th row or column of J 0N
are zeros.Thus, the eigenvalues of J 0
N
are �1
, . . . ,�N
, ⌧1
, and ⇡N
(�) again equalsQ
N
i=1
(���i
). As a result, the remainder satisfies
RN
[f ] =f (2N)(⇠)
(2N)!I[(�� ⌧
1
)⇡2N
] = 0,
from which it follows that both grrN
and glrN
are exact.
The convergence rate in Theorem 13 and the final exactness of iterations in Lemma 15 does not necessarily indicate thatwe are making progress at each iterations. However, by exploiting the relations to CG we can indeed conclude that we aremaking progress in each iteration in Gauss quadrature.Theorem 16. The approximation g
i
generated by Gauss quadrature is monotonically nondecreasing, i.e.,
gi
gi+1
, for i < N.
-
Gauss quadrature for matrix inverse forms with applications
Proof. At each iteration ri
is taken to be orthogonal to the i-th Krylov space: Ki
= span{u,Au, . . . , Ai�1u}. Let ⇧i
bethe projection onto the complement space of K
i
. The residual then satisfies
k"i+1
k2A
= "Ti+1
A"i+1
= r>i+1
A�1ri+1
= (⇧
i+1
ri
)
>A�1⇧i+1
ri
= r>i
(⇧
>i+1
A�1⇧i+1
)ri
ri
A�1ri
,
where the last inequality follows from ⇧>i+1
A�1⇧i+1
� A�1. Thus k"i
k2A
is monotonically nonincreasing, wherebygN
� gi
� 0 is monotonically decreasing and thus gi
is monotonically nondecreasing.
Before we proceed to Gauss-Radau, let us recall a useful theorem and its corollary.Theorem 17 (Lanczos Polynomial (Golub & Meurant, 2009)). Let u
i
be the vector generated by Algorithm 1 at the i-thiteration; let p
i
be the Lanczos polynomial of degree i. Then we have
ui
= pi
(A)u0
, where pi
(�) = (�1)i det(Ji � �I)Q
i
j=1
�j
.
From the expression of Lanczos polynomial we have the following corollary specifying the sign of the polynomial atspecific points.Corollary 18. Assume i < N . If i is odd, then p
i
(�min
) < 0; for even i, pi
(�min
) > 0, while pi
(�max
) > 0 for anyi < N .
Proof. Since Ji
= V >i
AVi
is similar to A, its spectrum is bounded by �min
and �max
from left and right. Thus, Ji
� �min
is positive semi-definite, and Ji
� �max
is negative semi-definite. Taking (�1)i into consideration we will get the desiredconclusions.
We are ready to state our main result that compares (right) Gauss-Radau with Gauss quadrature.Theorem 19 (Theorem 4 in the main text). Let i < N . Then, grr
i
gives better bounds than gi
but worse bounds than gi+1
;more precisely,
gi
grri
gi+1
, i < N. (B.1)
Proof. We prove inequality (B.1) using the recurrences satisfied by gi
and grri
(see Alg. 1)
Upper bound: grri
gi+1
. The iterative quadrature algorithm uses the recursive updates
grri
= gi
+
�2i
c2i
�i
(↵rri
�i
� �2i
)
,
gi+1
= gi
+
�2i
c2i
�i
(↵i+1
�i
� �2i
)
.
It suffices to thus compare ↵rri
and ↵i+1
. The three-term recursion for Lanczos polynomials shows that
�i+1
pi+1
(�max
) = (�max
� ↵i+1
)pi
(�max
)� �i
pi�1(�max) > 0,
�i+1
p⇤i+1
(�max
) = (�max
� ↵rri
)pi
(�max
)� �i
pi�1(�max) = 0,
where pi+1
is the original Lanczos polynomial, and p⇤i+1
is the modified polynomial that has �max
as a root. Noting thatpi
(�max
) > 0, we see that ↵i+1
↵rri
. Moreover, from Theorem 16 we know that the gi
’s are monotonically increasing,whereby �
i
(↵i+1
�i
� �2i
) > 0. It follows that
0 < �i
(↵i+1
�i
� �2i
) �i
(↵rri
�i
� �2i
),
and from this inequality it is clear that grri
gi+1
.
Lower-bound: gi
grri
. Since �2i
c2i
� 0 and �i
(↵rri
�i
� �2i
) � �i
(↵i+1
�i
� �2i
) > 0, we readily obtain
gi
gi
+
�2i
c2i
�i
(↵rri
�i
� �2i
)
= grri
.
-
Gauss quadrature for matrix inverse forms with applications
Combining Theorem 19 with the convergence rate of relative error for Gauss quadrature (Theorem 14) immediately yieldsthe following convergence rate for right Gauss-Radau quadrature:
Theorem 20 (Relative error of right Gauss-Radau, Theorem 5 in the main text). For each i, the right Gauss-Radau grri
iterates satisfygN
� grri
gN
2⇣
p� 1p+ 1
⌘
i
.
This results shows that with the same number of iterations, right Gauss-Radau gives superior approximation over Gaussquadrature, though they share the same relative error convergence rate.
Our second main result compares Gauss-Lobatto with (left) Gauss-Radau quadrature.Theorem 21 (Theorem 6 in the main text). Let i < N . Then, glr
i
gives better upper bounds than gloi
but worse than gloi+1
;more precisely,
gloi+1
glri
gloi
, i < N.
Proof. We prove these inequalities using the recurrences for glri
and gloi
from Algorithm 5.
glri
gloi
: From Algorithm 5 we observe that ↵loi
= �min
+
(�
loi )
2
�
lri
. Thus we can write glri
and gloi
as
glri
= gi
+
�2i
c2i
�i
(↵lri
�i
� �2i
)
= gi
+
�2i
c2i
�min
�2i
+ �2i
(�2i
/�lri
� �i
)
gloi
= gi
+
(�loi
)
2c2i
�i
(↵loi
�i
� (�loi
)
2
)
= gi
+
(�loi
)
2c2i
�min
�2i
+ (�loi
)
2
(�2i
/�lri
� �i
)
To compare these quantities, as before it is helpful to begin with the original three-term recursion for the Lanczos polyno-mial, namely
�i+1
pi+1
(�) = (�� ↵i+1
)pi
(�)� �i
pi�1(�).
In the construction of Gauss-Lobatto, to make a new polynomial of order i + 1 that has roots �min
and �max
, we add�1
pi
(�) and �2
pi�1(�) to the original polynomial to ensure
⇢
�i+1
pi+1
(�min
) + �1
pi
(�min
) + �2
pi�1(�min) = 0,
�i+1
pi+1
(�max
) + �1
pi
(�max
) + �2
pi�1(�max) = 0.
Since �i+1
, pi+1
(�max
), pi
(�max
) and pi�1(�max) are all greater than 0, �1pi(�max) + �2pi�1(�max) < 0. To determine
the sign of polynomials at �min
, consider the two cases:
1. Odd i. In this case pi+1
(�min
) > 0, pi
(�min
) < 0, and pi�1(�min) > 0;
2. Even i. In this case pi+1
(�min
) < 0, pi
(�min
) > 0, and pi�1(�min) < 0.
Thus, if S = (sgn(�1
), sgn(�2
)), where the signs take values in {0,±1}, then S 6= (1, 1), S 6= (�1, 1) and S 6= (0, 1).Hence, �
2
0 must hold, and thus (�loi
)
2
= (�i
� �2
)
2 � �2i
given that �2i
> 0 for i < N .
Using (�loi
)
2 � �2i
with �min
c2i
(�i
)
2 � 0, an application of monotonicity of the univariate function g(x) = axb+cx
forab � 0 to the recurrences defining glr
i
and gloi
yields the desired inequality glri
gloi
.
gloi+1
glri
: From recursion formulas we have
glri
= gi
+
�2i
c2i
�i
(↵lri
�i
� �2i
)
,
gloi+1
= gi+1
+
(�loi+1
)
2c2i+1
�i+1
(↵loi+1
�i+1
� (�loi+1
)
2
)
.
-
Gauss quadrature for matrix inverse forms with applications
Establishing glri
� gloi+1
thus amounts to showing that (noting the relations among gi
, glri
and gloi
):
�2i
c2i
�i
(↵lri
�i
� �2i
)
� �2
i
c2i
�i
(↵i+1
�i
� �2i
)
� (�loi+1
)
2c2i+1
�i+1
(↵loi+1
�i+1
� (�loi+1
)
2
)
() �2
i
c2i
�i
(↵lri
�i
� �2i
)
� �2
i
c2i
�i
(↵i+1
�i
� �2i
)
� (�loi+1
)
2c2i
�2i
(�i
)
2�i+1
(↵loi+1
�i+1
� (�loi+1
)
2
)
() 1↵lri
�i
� �2i
� 1↵i+1
�i
� �2i
� (�loi+1
)
2
�i
�i+1
(↵loi+1
�i+1
� (�loi+1
)
2
)
() 1(↵
i+1
� �lri+1
)� �2i
/�i
� 1↵i+1
� �2i
/�i
� 1�i+1
(↵loi+1
�i+1
/(�loi+1
)
2 � 1) (Lemma 23)
() 1�i+1
� �lri+1
� 1�i+1
� 1�i+1
(
�
min
�i+1
(�
loi+1)
2
+
�i+1
�
lri+1
� 1)
() �min�i+1(�lo
i+1
)
2
+
�i+1
�lri+1
� 1 � �i+1�lri+1
� 1
() �min�i+1(�lo
i+1
)
2
� 0,
where the last inequality is obviously true; hence the proof is complete.
In summary, we have the following corollary for all the four quadrature rules:Corollary 22 (Monotonicity of Lower and Upper Bounds, Corr. 7 in the main text). As the iteration proceeds, g
i
and grri
gives increasingly better asymptotic lower bounds and glri
and gloi
gives increasingly better upper bounds, namely
gi
gi+1
; grri
grri+1
glri
� glri+1
; gloi
� gloi+1
.
Proof. Directly drawn from Theorem 16, Theorem 19 and Theorem 21.
Before proceeding further to our analysis of convergence rates of left Gauss-Radau and Gauss-Lobatto, we note twotechnical results that we will need.Lemma 23. Let ↵
i+1
and ↵lri
be as in Alg. 1. The difference �i+1
= ↵i+1
� ↵lri
satisfies �i+1
= �lri+1
.
Proof. From the Lanczos polynomials in the definition of left Gauss-Radau quadrature we have
�i+1
p⇤i+1
(�min
) =
�
�min
� ↵lri
�
pi
(�min
)� �i
pi�1(�min)
=
�
�min
� (↵i+1
��i+1
)
�
pi
(�min
)� �i
pi�1(�min)
= �i+1
pi+1
(�min
) +�
i+1
pi
(�min
) = 0.
Rearrange this equation to write �i+1
= ��i+1
pi+1(�min)
pi(�min), which can be further rewritten as
�
i+1
Theorem 17= ��
i+1
(�1)i+1det(Ji+1
� �min
I)/Q
i+1
j=1
�j
(�1)idet(Ji
� �min
I)/Q
i
j=1
�j
=
det(Ji+1
� �min
I)
det(Ji
� �min
I)= �lr
i+1
.
Remark 24. Lemma 23 has an implication beyond its utility for the subsequent proofs: it provides a new way of calculating↵i+1
given the quantities �lri+1
and ↵lri
; this saves calculation in Algorithm 5.
The following lemma relates �i
to �lri
, which will prove useful in subsequent analysis.Lemma 25. Let �lr
i
and �i
be computed in the i-th iteration of Algorithm 1. Then, we have the following:
�lri
< �i
, (B.2)
�lri
�i
1� �min�N
. (B.3)
-
Gauss quadrature for matrix inverse forms with applications
Proof. We prove (B.2) by induction. Since �min
> 0, �1
= ↵1
> �min
and �lr1
= ↵��min
we know that �lr1
< �1
. Assumethat �lr
i
< �i
is true for all i k and considering the (k + 1)-th iteration:
�lrk+1
= ↵k+1
� �min
� �2
k
�lrk
< ↵k+1
� �2
k
�k
= �k+1
.
To prove (B.3), simply observe the following
�lri
�i
=
↵i
� �min
� �2i�1/�
lri�1
↵i
� �2i�1/�
i�1
(B.2) ↵i � �min↵i
1� �min�N
.
With aforementioned lemmas we will be able to show how fast the difference between glri
and gi
decays. Note that glri
givesan upper bound on the objective while g
i
gives a lower bound.
Lemma 26. The difference between glri
and gi
decreases linearly. More specifically we have
glri
� gi
2+(p� 1p+ 1
)
igN
where + = �N
/�min
and is the condition number of A, i.e., = �N
/�1
.
Proof. We rewrite the difference glri
� gi
as follows
glri
� gi
=
�2i
c2i
�i
(↵lri
�i
� �2i
)
=
�2i
c2i
�i
(↵i+1
�i
� �2i
)
�i
(↵i+1
�i
� �2i
)
�i
(↵lri
�i
� �2i
)
=
�2i
c2i
�i
(↵i+1
�i
� �2i
)
1
�
↵lri
� �2i
/�i
���
↵i+1
� �2i
/�i
�
=
�2i
c2i
�i
(↵i
�i
� �2i
)
1
1��i+1
/�i+1
,
where �i+1
= ↵i+1
� ↵lri
. Next, recall that gN�gigN
2⇣p
�1p+1
⌘
i
. Since gi
lower bounds gN
, we have
⇣
1� 2⇣
p� 1p+ 1
⌘
i
⌘
gN
gi
gN
,
⇣
1� 2⇣
p� 1p+ 1
⌘
i+1
⌘
gN
gi+1
gN
.
Thus, we can conclude that
�2i
c2i
�i
(↵i
�i
� �2i
)
= gi+1
� gi
2⇣
p� 1p+ 1
⌘
i
gN
.
Now we focus on the term�
1��i+1
/�i+1
��1. Using Lemma 23 we know that �i+1
= �lri+1
. Hence,
1��i+1
/�i+1
= 1� �lri+1
/�i+1
� 1� (1� �min
/�N
) = �min
/�N
, 1+
.
Finally we have
glri
� gi
=
�2i
c2i
�i
(↵i
�i
� �2i
)
1
1��i+1
/�i+1
2+⇣
p� 1p+ 1
⌘
i
gN
.
-
Gauss quadrature for matrix inverse forms with applications
Theorem 27 (Relative error of left Gauss-Radau, Theorem 8 in the main text). For left Gauss-Radau quadrature wherethe preassigned node is �
min
, we have the following bound on relative error:
glri
� gN
gN
2+⇣
p� 1p+ 1
⌘
i
,
where + := �N
/�min
, i < N .
Proof. Write glri
= gi
+ (glri
� gi
). Since gi
gN
, using Lemma 26 to bound the second term we obtain
glri
gN
+ 2+⇣
p� 1p+ 1
⌘
i
gN
,
from which the claim follows upon rearrangement.
Due to the relations between left Gauss-Radau and Gauss-Lobatto, we have the following corollary:
Corollary 28 (Relative error of Gauss-Lobatto, Corr. 9 in the main text). For Gauss-Lobatto quadrature, we have thefollowing bound on relative error:
gloi
� gN
gN
2+⇣
p� 1p+ 1
⌘
i�1, (B.4)
where + := �N
/�min
and i < N .
C. Generalization: Symmetric MatricesIn this section we consider the case where u lies in the column space of several top eigenvectors of A, and discuss how theaforementioned theorems vary. In particular, note that the previous analysis assumes that A is positive definite. With ouranalysis in this section we relax this assumption to the more general case where A is symmetric with simple eigenvalues,though we require u to lie in the space spanned by eigenvectors of A corresponding to positive eigenvalues.
We consider the case where A is symmetric and has the eigendecomposition of A = Q⇤Q> =P
N
i=1
�i
qi
q>i
where �i
’sare eigenvalues of A increasing with i and q
i
’s are corresponding eigenvectors. Assume that u lies in the column spacespanned by top k eigenvectors of A where all these k eigenvectors correspond to positive eigenvalues. Namely we haveu 2 Span{{q
i
}Ni=N�k+1} and 0 < �N�k+1.
Since we only assume that A is symmetric, it is possible that A is singular and thus we consider the value of u>A†u, whereA† is the pseudo-inverse of A. Due to the constraints on u we have
u>A†u = u>Q⇤†Q>u = u>Qk
⇤
†k
Q>k
u = u>B†u,
where B =P
N
i=N�k+1 �iqiq>i
. Namely, if u lies in the column space spanned by the top k eigenvectors of A then it isequivalent to substitute A with B, which is the truncated version of A at top k eigenvalues and corresponding eigenvectors.
Another key observation is that, given that u lies only in the space spanned by {qi
}Ni=N�k+1, the Krylov space starting at
u becomes
Span{u,Au,A2u, . . .} = Span{u,Bu,B2u, . . . , Bk�1u} (C.1)This indicates that Lanczos iteration starting at matrix A and vector u will finish constructing the corresponding Krylovspace after the k-th iteration. Thus under this condition, Algorithm 1 will run at most k iterations and then stop. At thattime, the eigenvalues of J
k
are exactly the eigenvalues of B, thus they are exactly {�i
}Ni=N�k+1 of A. Using similar proof
as in Lemma 15, we can obtain the following generalized exactness result.
Corollary 29 (Generalized Exactness). gk
, grrk
and glrk
are exact for u>A†u = u>B†u, namely
gk
= grrk
= glrk
= u>A†u = u>B†u.
-
Gauss quadrature for matrix inverse forms with applications
The monotonicity and the relations between bounds given by various Gauss-type quadratures will still be the same asin the original case in Section 4, but the original convergence rate cannot apply in this case because now we probablyhave �
min
(B) = 0, making undefined. This crash of convergence rate results from the crash of the convergence of thecorresponding conjugate gradient algorithm for solving Ax = u. However, by looking at the proof of, e.g., (Shewchuk,1994), and by noting that �
1
(B) = . . . = �N�k(B) = 0, with a slight modification of the proof we actually obtain the
bound
k"ik2A
minPi
max
�2{�i}Ni=N�k+1[P
i
(�)]2k"0k2A
,
where Pi
is a polynomial of order i. By using properties of Chebyshev polynomials and following the originalproof (e.g., (Golub & Meurant, 2009) or (Shewchuk, 1994)) we obtain the following lemma for conjugate gradient.
Lemma 30. Let "k be as before (for conjugate gradient). Then,
k"kkA
2⇣
p0 � 1p0 + 1
)
kk"0
kA
, where 0 := �N
/�N�k+1.
Following this new convergence rate and connections between conjugate gradient, Lanczos iterations and Gauss quadraturementioned in Section 4, we have the following convergence bounds.
Corollary 31 (Convergence Rate for Special Case). Under the above assumptions on A and u, due to the connectionBetween Gauss quadrature, Lanczos algorithm and Conjugate Gradient, the relative convergence rates of g
i
, grri
, glri
andgloi
are given by
gk
� gi
gk
2⇣
p0 � 1p0 + 1
⌘
i
gk
� grri
gk
2⇣
p0 � 1p0 + 1
⌘
i
glri
� gk
gk
20m
⇣
p0 � 1p0 + 1
⌘
i
gloi
� gk
gk
20m
⇣
p0 � 1p0 + 1
⌘
i
,
where 0m
= �N
/�0min
and 0 < �0min
< �N�k+1 is a lowerbound for nonzero eigenvalues of B.
D. Accelerating MCMC for k-DPPWe present details of a Retrospective Markov Chain Monte Carlo (MCMC) in Algorithm 6 and Algorithm 7 that samplesfor efficiently drawing samples from a k-DPP, by accelerating it using our results on Gauss-type quadratures.
Algorithm 6 Gauss-kDPP (L, k)Input: L the kernel matrix we want to sample DPP from, k the size of subset and Y = [N ] the ground setOutput: Y sampled from exact kDPP (L) where |Y | = kRandomly Initialize Y ✓ Y where |Y | = kwhile not mixed do
Pick v 2 Y and u 2 Y\Y uniformly randomlyPick p 2 (0, 1) uniformly randomlyY 0 = Y \{v}Get lower and upper bounds �
min
, �max
of the spectrum of LY
0
if k-DPP-JudgeGauss(pLv,v
� Lu,u
, p, LY
0,u
, LY
0,v
,�min
,�max
) = True thenY = Y 0 [ {u}
end ifend while
-
Gauss quadrature for matrix inverse forms with applications
Algorithm 7 kDPP-JudgeGauss(t, p, u, v, A,�min
,�max
)Input: t the target value, p the scaling factor, u, v and A the corresponding vectors and matrix, �
min
and �max
lower andupper bounds for the spectrum of A
Output: Return True if t < p(v>A�1v)� u>A�1u, False if otherwiseu�1 = 0, u0 = u/kuk, iu = 1, �u
0
= 0, du = 1v�1 = 0, v0 = v/kvk, iv = 1, �v
0
= 0, dv = 1while True do
if du > pdv thenRun one more iteration of Gauss-Radau on u>A�1u to get tighter (glr)u and (grr)udu = (glr)u � (grr)u
elseRun one more iteration of Gauss-Radau on v>A�1v to get tighter (glr)v and (grr)vdv = (glr)v � (grr)v
end ifif t < pkvk2(grr)v � kuk2(glr)u then
Return Trueelse if t � pkvk2(glr)v � kuk2(grr)u then
Return Falseend if
end while
E. Accelerating Stochastic Double GreedyWe present details of Retrospective Stochastic Double Greedy in Algorithm 8 and Algorithm 9 that efficiently select asubset Y 2 Y that approximately maximize log det(L
Y
).
Algorithm 8 Gauss-DG (L)Input: L the kernel matrix and Y = [N ] the ground setOutput: X 2 Y that approximately maximize log det(L
Y
)
X0
= ;, Y0
= Yfor i = 1, 2, . . . , N do
Y 0i
= Yi�1\{i}
Sample p 2 (0, 1) uniformly randomlyGet lower and upper bounds ��
min
,��max
,�+min
,�+max
of the spectrum of LXi�1 and LY 0i respectively
if DG-JudgeGauss(LXi�1 , LY 0i , LXi�1,i, LY 0i ,i, Li,i, p,�
�min
,��max
,�+min
,�+max
) = True thenX
i
= Xi�1 [ {i}
elseYi
= Y 0i
end ifend for
-
Gauss quadrature for matrix inverse forms with applications
Algorithm 9 DG-JudgeGauss(A,B, u, v, t, p,�Amin
,�Amax
,�Bmin
,�Bmax
)
Input: t the target value, p the scaling factor, u, v, A and B the corresponding vectors and matrix, �Amin
, �Amax
, �Bmin
, �Bmax
lower and upper bounds for the spectrum of A and BOutput: Return True if p| log(t� u>A�1u)|
+
(1� p)|� log(t� v>B�1v)|+
, False if otherwisedu = 1, dv = 1while True do
if pdu > (1� p)dv thenRun one more iteration of Gauss-Radau on u>A�1u to get tighter lower and upper bounds lu, uu for | log(t �u>A�1u)|
+
du = uu � luelse
Run one more iteration of Gauss-Radau on v>B�1v to get tighter lower and upper bounds lv , uv for | log(t �v>B�1v)|
+
dv = uv � lvend ifif puu (1� p)lv then
Return Trueelse if plu > (1� p)uv then
Return Falseend if
end while
top related