spectral divide-and-conquer algorithms for eigenvalue problems and the svd
TRANSCRIPT
Spectral divide-and-conquer algorithms for matrixeigenvalue problems and the SVD
Yuji Nakatsukasa
Department of Mathematical InformaticsUniversity of Tokyo
Collaborators:Roland Freund, Nicholas J. Higham and Francoise Tisseur
Symmetric eigendecomposition and SVDI Symmetric eigenvalue decomposition: for A = AT ∈ Rn×n,
A = V
λ1
λ2. . .
λn
VT(≡ VΛVT
),
where VT V = In. λ1 ≥ · · · ≥ λn are the eigenvalues of A.
I SVD: for any A ∈ Rm×n (m ≥ n),
A = U
σ1
σ2. . .
σn
VT(≡ UΣVT
),
where UT U = Im, VT V = In. σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0 are thesingular values of A.
Sidetrack: Randomized SVD in a nutshell–[Halko, Martinsson, Tropp (SIREV 2011)]
A: m × n, rank k Algorithm randsvd for computing A ≈ UΣV∗
1. generate Gaussian random matrix Ω, n × k2. form Y = AΩ
3. find QR factorization Y = QR4. form B = Q∗A5. compute SVD B = UΣV∗, U = QU
I arithmetic cost O(mnk + (m + n)k2), can be madeO(mn log(k) + (m + n)k2) by FFT-type and row-extractiontechniques
I k = k + δ with δ small (≈ 5) is sufficient in practiceI power-like method effective when σi(A) decays slowly
Background: Communication-minimizing algorithms
Total cost = Arithmetic + Communication
Trend: Arithmetic CommunicationI Gaps growing exponentially with time
Algorithms that attain communication lower-bounds available for
–[Ballard, Demmel, Holtz, Schwartz (2011)]I Matrix multiplication
I Cholesky factorization
I QR decomposition (without pivoting)
Talk focus: communication-minimizing algorithms for the symmetric
eigendecomposition and the singular value decomposition (SVD)
Hermitian eigendecomposition: Standard algorithm
1. Reduce matrix to tridiagonal form
⇒ Expensive in communication cost
A =
∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗
HL ,HR→
∗ ∗
∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
HL ,HR→
∗ ∗
∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗
∗ ∗ ∗
HL ,HR→
∗ ∗
∗ ∗ ∗
∗ ∗ ∗
∗ ∗ ∗
∗ ∗
≡ T
Accumulate orthogonal factors: A = VATVHA where VA = (
∏HL)H
2. Compute eigendecomposition T = VBΛVHB (QR, divide-and
conquer, MRRR, ...)3. Eigendecomposition: A = (VAVB)Λ(VH
B VHA ) = VΛVH
Goal: design QR-based Hermitian eigendecomposition algorithm
Hermitian eigendecomposition: Standard algorithm
1. Reduce matrix to tridiagonal form⇒ Expensive in communication cost
A =
∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗
HL ,HR→
∗ ∗
∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
HL ,HR→
∗ ∗
∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗
∗ ∗ ∗
HL ,HR→
∗ ∗
∗ ∗ ∗
∗ ∗ ∗
∗ ∗ ∗
∗ ∗
≡ T
Accumulate orthogonal factors: A = VATVHA where VA = (
∏HL)H
2. Compute eigendecomposition T = VBΛVHB (QR, divide-and
conquer, MRRR, ...)3. Eigendecomposition: A = (VAVB)Λ(VH
B VHA ) = VΛVH
Goal: design QR-based Hermitian eigendecomposition algorithm
Hermitian eigendecomposition: Standard algorithm
1. Reduce matrix to tridiagonal form⇒ Expensive in communication cost
A =
∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗
HL ,HR→
∗ ∗
∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
HL ,HR→
∗ ∗
∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗
∗ ∗ ∗
HL ,HR→
∗ ∗
∗ ∗ ∗
∗ ∗ ∗
∗ ∗ ∗
∗ ∗
≡ T
Accumulate orthogonal factors: A = VATVHA where VA = (
∏HL)H
2. Compute eigendecomposition T = VBΛVHB (QR, divide-and
conquer, MRRR, ...)3. Eigendecomposition: A = (VAVB)Λ(VH
B VHA ) = VΛVH
Goal: design QR-based Hermitian eigendecomposition algorithm
Standard SVD algorithm1. Reduce matrix to bidiagonal form
⇒ Expensive in communication cost
A =
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
HL
→
∗ ∗ ∗ ∗
∗ ∗ ∗
∗ ∗ ∗
∗ ∗ ∗
HR
→
∗ ∗
∗ ∗ ∗
∗ ∗ ∗
∗ ∗ ∗
HL
→
∗ ∗
∗ ∗ ∗
∗ ∗
∗ ∗
HR
→
∗ ∗
∗ ∗
∗ ∗
∗ ∗
HL
→
∗ ∗
∗ ∗
∗ ∗
∗
≡ B
–[Golub and Kahan (1965)]
Accumulate orthogonal factors: A = UABVHA where
UA = (∏
HL)H ,VA =∏
HR
2. Compute SVD B = UBΣVHB (divide-conquer, QR, dqds +
twisted factorization)3. SVD: A = (UAUB)Σ(VH
B VHA ) = UΣVH
Goal: design QR-based SVD algorithm
Standard SVD algorithm1. Reduce matrix to bidiagonal form⇒ Expensive in communication cost
A =
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
HL
→
∗ ∗ ∗ ∗
∗ ∗ ∗
∗ ∗ ∗
∗ ∗ ∗
HR
→
∗ ∗
∗ ∗ ∗
∗ ∗ ∗
∗ ∗ ∗
HL
→
∗ ∗
∗ ∗ ∗
∗ ∗
∗ ∗
HR
→
∗ ∗
∗ ∗
∗ ∗
∗ ∗
HL
→
∗ ∗
∗ ∗
∗ ∗
∗
≡ B
–[Golub and Kahan (1965)]
Accumulate orthogonal factors: A = UABVHA where
UA = (∏
HL)H ,VA =∏
HR
2. Compute SVD B = UBΣVHB (divide-conquer, QR, dqds +
twisted factorization)3. SVD: A = (UAUB)Σ(VH
B VHA ) = UΣVH
Goal: design QR-based SVD algorithm
Standard SVD algorithm1. Reduce matrix to bidiagonal form⇒ Expensive in communication cost
A =
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
HL
→
∗ ∗ ∗ ∗
∗ ∗ ∗
∗ ∗ ∗
∗ ∗ ∗
HR
→
∗ ∗
∗ ∗ ∗
∗ ∗ ∗
∗ ∗ ∗
HL
→
∗ ∗
∗ ∗ ∗
∗ ∗
∗ ∗
HR
→
∗ ∗
∗ ∗
∗ ∗
∗ ∗
HL
→
∗ ∗
∗ ∗
∗ ∗
∗
≡ B
–[Golub and Kahan (1965)]
Accumulate orthogonal factors: A = UABVHA where
UA = (∏
HL)H ,VA =∏
HR
2. Compute SVD B = UBΣVHB (divide-conquer, QR, dqds +
twisted factorization)3. SVD: A = (UAUB)Σ(VH
B VHA ) = UΣVH
Goal: design QR-based SVD algorithm
Spectral divide-and-conquer algorithm for symeig
A =
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
→ V∗1 AV1 =
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
→[
V21V22
]∗V∗1 AV1
[V21
V22
]=
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
→
V31V32
V33V34
∗ [ V21V22
]∗V∗1 AV1
[V21
V22
] V31V32
V33V34
= diag(λi)
– not divide-and-conquer for symmetric tridiagonal eigenproblem!
Spectral divide-and-conquer algorithm for symeig
A =
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
→ V∗1 AV1 =
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
→[
V21V22
]∗V∗1 AV1
[V21
V22
]=
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
→
V31V32
V33V34
∗ [ V21V22
]∗V∗1 AV1
[V21
V22
] V31V32
V33V34
= diag(λi)
– not divide-and-conquer for symmetric tridiagonal eigenproblem!
Spectral divide-and-conquer algorithm for symeig
A =
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
→ V∗1 AV1 =
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
→[
V21V22
]∗V∗1 AV1
[V21
V22
]=
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
→
V31V32
V33V34
∗ [ V21V22
]∗V∗1 AV1
[V21
V22
] V31V32
V33V34
= diag(λi)
– not divide-and-conquer for symmetric tridiagonal eigenproblem!
Spectral divide-and-conquer algorithm for symeig
A =
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
→ V∗1 AV1 =
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
→[
V21V22
]∗V∗1 AV1
[V21
V22
]=
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
→
V31V32
V33V34
∗ [ V21V22
]∗V∗1 AV1
[V21
V22
] V31V32
V33V34
= diag(λi)
– not divide-and-conquer for symmetric tridiagonal eigenproblem!
Spectral divide-and-conquer algorithm for symeig
A =
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
→ V∗1 AV1 =
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
→[
V21V22
]∗V∗1 AV1
[V21
V22
]=
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
→
V31V32
V33V34
∗ [ V21V22
]∗V∗1 AV1
[V21
V22
] V31V32
V33V34
= diag(λi)
– not divide-and-conquer for symmetric tridiagonal eigenproblem!
Executing spectral divide-and-conquerTo obtain
[V+ V−]∗A[V+ V−] =
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
,
I V+,V− each spans an invariant subspace of A corresponding to asubset of eigenvalues.
I A natural idea: let V+ and V− span the positive and negativeeigenspaces.
→ [V+ V−][I−I
][V+ V−]T = sign(A), the matrix sign function
of A. Since A = AT , sign(A) is equivalent to U, the unitarypolar factor in the polar decomposition.
Executing spectral divide-and-conquerTo obtain
[V+ V−]∗A[V+ V−] =
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
,
I V+,V− each spans an invariant subspace of A corresponding to asubset of eigenvalues.
I A natural idea: let V+ and V− span the positive and negativeeigenspaces.
→ [V+ V−][I−I
][V+ V−]T = sign(A), the matrix sign function
of A. Since A = AT , sign(A) is equivalent to U, the unitarypolar factor in the polar decomposition.
Polar decompositionFor any A ∈ Rm×n (m ≥ n),
A = UH,
where UT U = In, H: symmetric positive semidefinite
I U is called the unitary polar factor of A
I Connection with SVD: A = U∗ΣVT∗ = (U∗VT
∗ ) · (V∗ΣVT∗ ) = UH
I Can be used as first step for computing the SVD
–[Higham and Papadimitriou (1993)]
Polar decompositionFor any A ∈ Rm×n (m ≥ n),
A = UH,
where UT U = In, H: symmetric positive semidefinite
I U is called the unitary polar factor of A
I Connection with SVD: A = U∗ΣVT∗ = (U∗VT
∗ ) · (V∗ΣVT∗ ) = UH
I Can be used as first step for computing the SVD
–[Higham and Papadimitriou (1993)]
polar/eigen decompositions of symmetric matricesThe polar/eigen decompositions of symmetric A:
A = UH
= [V+ V−]Λ+
Λ−
[V+ V−]T
where diag(Λ−) < 0, are related by U = [V+ V−]I−I
[V+ V−]T :
UH =
[V+ V−]I−I
[V+ V−]T · [V+ V−]
Λ+
|Λ−|
[V+ V−]T
I U = f (A), where f (λ) maps λ > 0 to 1, λ < 0 to −1
Algorithm for symmetric eigendecomposition
–[with N. J. Higham]
Algorithm symeig
1. compute polar decomposition A − σI = UH2. compute unitary [V+ V−] s.t. (U + I)/2 = V+VT
+
3. compute [V+ V−]T A[V+ V−] =
A+ ET
E A−
4. repeat steps 1–3 with A := A+, A−
I ideal choice of σ: median of eig(A)⇒ dim(A1) ' dim(A2)I dominant cost is in step 1, computing the polar decomposition
Backward stability
–[with N. J. Higham]
TheoremIf subspace iteration is successful and the polar decompositionA ' UH is computed backward stably s.t.
A = UH + ε1‖A‖2, UT U − I = ε2,
then A − VΛVT = ε‖A‖2, that is, symeig is backward stable.
QR-based algorithm for polar decomposition–[N., Bai, Gygi (2010)]
DWH (Dynamically Weighted Halley iteration):
Xk+1 = Xk(akI + XTk Xk)(bkI + ckXT
k Xk)−1
=bk
ckXk +
(ak −
bk
ck
)Xk(I + ckXT
k Xk)−1, X0 = A/‖A‖2
– limk→∞ Xk = U in A = UH– ak, bk, ck chosen dynamically to optimize convergence
I Lemma:
Q1QT2 = X(I + XT X)−1, where
XI =
Q1
Q2
R
I QR-based DWH (QDWH)
√ckXk
I
=
Q1
Q2
R,
Xk+1 =bk
ckXk +
1√
ck
(ak −
bk
ck
)Q1QT
2
QR-based algorithm for polar decomposition–[N., Bai, Gygi (2010)]
DWH (Dynamically Weighted Halley iteration):
Xk+1 = Xk(akI + XTk Xk)(bkI + ckXT
k Xk)−1
=bk
ckXk +
(ak −
bk
ck
)Xk(I + ckXT
k Xk)−1, X0 = A/‖A‖2
– limk→∞ Xk = U in A = UH– ak, bk, ck chosen dynamically to optimize convergence
I Lemma:
Q1QT2 = X(I + XT X)−1, where
XI =
Q1
Q2
R
I QR-based DWH (QDWH)
√ckXk
I
=
Q1
Q2
R,
Xk+1 =bk
ckXk +
1√
ck
(ak −
bk
ck
)Q1QT
2
Halley vs. Dynamical weighted Halley
Halley
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
σi(Xk)
σi(Xk+1)
f (x) = x 3+x2
1+3x2
I Dynamical weighting brings σi(Xk+1) much closer to 1, speedsup convergence
I `k+1 such that σi(Xk+1) ∈ [`k+1, 1] obtained for free
Halley vs. Dynamical weighted Halley
Halley
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
σi(Xk)
σi(Xk+1)
f (x) = x 3+x2
1+3x2
”Optimal” function
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
σi(Xk)
σi(Xk+1) = 1
I Dynamical weighting brings σi(Xk+1) much closer to 1, speedsup convergence
I `k+1 such that σi(Xk+1) ∈ [`k+1, 1] obtained for free
Halley vs. Dynamical weighted Halley
Halley
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
σi(Xk)
σi(Xk+1)
f (x) = x 3+x2
1+3x2
Dynamical weighting
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
σi(Xk+1)
σi(Xk)
f (x) = xak+bkx2
1+ckx2
I Dynamical weighting brings σi(Xk+1) much closer to 1, speedsup convergence
I `k+1 such that σi(Xk+1) ∈ [`k+1, 1] obtained for free
Halley vs. Dynamical weighted Halley
Halley
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
σi(Xk)
σi(Xk+1)
f (x) = x 3+x2
1+3x2
Dynamical weighting
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
σi(Xk)
σi(Xk+1)
f (x) = xak+bkx2
1+ckx2
ℓk+1
I Dynamical weighting brings σi(Xk+1) much closer to 1, speedsup convergence
I `k+1 such that σi(Xk+1) ∈ [`k+1, 1] obtained for free
Backward stability of iterations for polar decomposition–[with N. J. Higham, MIMS ePrint Dec. 2011]
An iteration Xk+1 = fk(Xk), X0 = A, limk→∞ Xk = U for computingthe polar decomposition is backward stable (A − UH = ε‖A‖) if
1. Each iteration is mixed backward–forward stable:Xk+1 = fk(Xk) + ε‖Xk+1‖, Xk = Xk + ε‖Xk‖
2. The mapping function fk(x) does not significantly decreasethe relative size of σi for i = 1 : n
dfk(σi)
‖ fk(X)‖2≥
σi
‖X‖2
for d = O(1).
(we prove A − UH = dε‖A‖, H − H = dε‖H‖)
Mapping function: stable (top) and unstable (bottom)QDWH: d = 1 scaled Newton: d = 1
0
0.25
0.5
0.75
1
m M
f(x)‖f(x)‖∞
xM
0
0.25
0.5
0.75
1
m M
f(x)‖f(x)‖∞ x
M
inverse Newton: d ≥ (κ2(A))1/2 Newton–Schulz
0
0.25
0.5
0.75
1
m M
f(x)‖f(x)‖∞
xM
0
0.25
0.5
0.75
1
m M
√3
f(x)‖f(x)‖∞
xM
Previous communication-minimizing eigensolverAlgorithm BDD –Ballard, Demmel, Dumitriu [2010,LAWN 237]
1. A0 = A, B0 = I
2. for j = 0, 1, . . ., [B j
−A j
]=
[Q11 Q12
Q21 Q22
] [R j
0
],
A j+1 = QT12A j,
B j+1 = QT22B j
3. rank-revealing QR decomposition (A j + B j)−1A j = QR
4. form QT AQ =
[A11 ET
E A22
], confirm ‖E‖F . ε‖A‖F , repeat 1-4 with
A := A11, A22
I (A j + B j)−1A j = (I + A−2 j)−1: maps |λ| < 1 to 0, |λ| > 1 to 1.
I SVD can be computed via eigendecomposition of[
0 AAT 0
].
Communication-minimizing algorithms: QDWH vs. BDD100-by-100 random symmetric matrices
0 5 10 150
5
10
15
20
25
30
35
40
45
50
55
log10 κ2(A)
Iteration
BDD
QDWH-eig
0 5 10 15-16
-15
-14
-13
-12
-11
-10
-9
-8
log10 κ2(A)
||A−
VHΛV|| F
/||A
|| F
Backward error
BDD
QDWH-eig
BDD vs. QDWH convergence
BDD: static parametersit=0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
λ
f(λ)
QDWH: dynamical parametersit=0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
σf(σ)
BDD vs. QDWH convergence
BDD: static parametersit=0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
λ
f(λ)
QDWH: dynamical parametersit=0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
σf(σ)
BDD vs. QDWH convergence
BDD: static parametersit=1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
λ
f(λ)
QDWH: dynamical parametersit=1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
σf(σ)
BDD vs. QDWH convergence
BDD: static parametersit=2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
λ
f(λ)
QDWH: dynamical parametersit=2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
σf(σ)
BDD vs. QDWH convergence
BDD: static parametersit=3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
λ
f(λ)
QDWH: dynamical parametersit=2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
σf(σ)
BDD vs. QDWH convergence
BDD: static parametersit=4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
λ
f(λ)
QDWH: dynamical parametersit=2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
σf(σ)
BDD vs. QDWH convergence
BDD: static parametersit=5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
λ
f(λ)
QDWH: dynamical parametersit=2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
σf(σ)
BDD vs. QDWH convergence
BDD: static parametersit=6
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
λ
f(λ)
QDWH: dynamical parametersit=2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
σf(σ)
BDD vs. QDWH convergence
BDD: static parametersit=7
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
λ
f(λ)
QDWH: dynamical parametersit=2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
σf(σ)
BDD vs. QDWH convergence
BDD: static parametersit=8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
λ
f(λ)
QDWH: dynamical parametersit=2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
σf(σ)
BDD vs. QDWH convergence
BDD: static parametersit=9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
λ
f(λ)
QDWH: dynamical parametersit=2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
σf(σ)
BDD vs. QDWH convergence
BDD: static parametersit=9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
λ
f(λ)
QDWH: dynamical parametersit=2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
σf(σ)
dynamical parameters speed up convergence significantly
In fact, no more than 6 iterations are needed for QDWH for any Awith κ2(A) < 1016
Polar decomposition + symmetric eigenproblem = SVD
–[with N. J. Higham]
Algorithm SVD
1. compute polar decomposition
A = UPH (1)
2. compute symmetric eigendecomposition
H = VT ΣV (2)
3. get SVD A = (UPVT )ΣV = UΣVT
I The computed SVD is backward stable if both (1) and (2) are–[Higham and Papadimitriou (1993)]
⇒ Backward stability again rests on that of polar decomposition
I For n × n matrices, SVD is less expensive than 2 runs of symeig
note: SVD via eig([
0 AAT 0
])costs ' ×8
Polar decomposition + symmetric eigenproblem = SVD
–[with N. J. Higham]
Algorithm SVD
1. compute polar decomposition
A = UPH (1)
2. compute symmetric eigendecomposition
H = VT ΣV (2)
3. get SVD A = (UPVT )ΣV = UΣVT
I The computed SVD is backward stable if both (1) and (2) are–[Higham and Papadimitriou (1993)]
⇒ Backward stability again rests on that of polar decomposition
I For n × n matrices, SVD is less expensive than 2 runs of symeig
note: SVD via eig([
0 AAT 0
])costs ' ×8
ComparisonI symmetric eigenproblem
standard BDD Proposedminimize communication? ×
√ √
arithmetic 9n3 ' 1000n3 26n3
backward stable?√
×√
I SVDstandard BDD Proposed
minimize communication? ×√ √
arithmetic 26n3 ' 8000n3 34n3 ∼ 52n3
backward stable?√
×√
nb: BDD is applicable to generalized non-symmetric eigenproblems
Numerical experiments: symeig accuracyNAG MATLAB toolbox experiments with n-by-n matrices AT = A, uniformeigenvalues, κ2(A) = 105
Backward error ‖A − VΛVT ‖F/‖A‖Fn = 2000 n = 4000 n = 6000 n = 8000 n = 10000
QDWH-eig 2.1e-15 2.4e-15 2.6e-15 2.8e-15 2.9e-15Divide-Conquer 5.3e-15 7.3e-15 8.7e-15 9.8e-15 1.1e-14
MATLAB 1.4e-14 1.8e-14 2.3e-14 2.7e-14 3.0e-14MR3 4.2e-15 5.8e-15 7.1e-15 8.1e-15 8.9e-15QR 1.5e-14 2.8e-14 2.5e-14 2.9e-14 3.2e-14
Distance from orthogonality ‖VT V − I‖F/√
n.n = 2000 n = 4000 n = 6000 n = 8000 n = 10000
QDWH-eig 7.7e-16 8.0e-16 8.2e-16 8.4e-16 8.5e-16Divide-Conquer 4.6e-15 6.3e-15 7.6e-15 8.7e-15 9.6e-15
MATLAB 1.1e-14 1.5e-14 1.9e-14 2.1e-14 2.3e-14MR3 2.6e-15 3.6e-15 4.2e-15 4.8e-15 5.3e-15QR 1.1e-14 2.4e-14 1.9e-14 2.2e-14 2.5e-14
Numerical experiments: symeig runtimeNAG MATLAB toolbox experiments with n-by-n matrices AT = A, uniformeigenvalues, κ2(A) = 105
Runtime(s)n = 2000 n = 4000 n = 6000 n = 8000 n = 10000
QDWH-eig 13.6 89.7 308 589 1100Divide-Conquer 2.29 18.1 61.5 141 277
MATLAB 3.08 25.9 81.9 176 336MR3 12.0 106 355 834 1620QR 4.25 34.4 113 272 520
Numerical experiments: SVD accuracyNAG MATLAB toolbox experiments with n-by-n MATLAB randsvdmatrices, κ2(A) = 105
Backward error ‖A − UΣVT ‖F/‖A‖Fn = 2000 n = 4000 n = 6000 n = 8000 n = 10000
QDWH-SVD 2.0e-15 2.3e-15 2.5e-15 2.7e-15 2.8e-15Divide-Conquer 5.6e-15 7.8e-15 9.2e-15 1.0e-14 1.2e-14
MATLAB 5.8e-15 7.8e-15 9.2e-15 1.1e-14 1.2e-14QR 1.6e-14 2.2e-14 2.7e-14 3.1e-14 3.5e-14
Distance from orthogonality max‖UT U − I‖F/√
n, ‖VT V − I‖F/√
n.n = 2000 n = 4000 n = 6000 n = 8000 n = 10000
QDWH-SVD 7.7e-16 8.0e-16 8.2e-16 8.4e-16 8.5e-16Divide-Conquer 5.2e-15 7.2e-15 8.4e-15 9.5e-15 1.1e-14
MATLAB 8.1e-15 7.1e-15 8.4e-15 9.8e-15 1.1e-14QR 1.2e-14 1.7e-14 2.1e-14 2.5e-14 2.8e-14
Numerical experiments: SVD runtimeNAG MATLAB toolbox experiments with n-by-n MATLAB randsvdmatrices, κ2(A) = 105
Runtime(s)n = 2000 n = 4000 n = 6000 n = 8000 n = 10000
QDWH-SVD 20.4 124 373 838 1600Divide-Conquer 7.5 51.5 161 364 688
MATLAB 4.56 36.2 117 272 527QR 19.5 140 446 998 1880
Higher-order iteration–[with R. Freund]
Recall the DWH iteration
Xk+1 = Xk(akI + XTk Xk)(bkI + ckXT
k Xk)−1
A natural extension is an iteration of the form
Xk+1 = Xk(akI + bkX∗k Xk + ck(X∗k Xk)2)(dkI + ekX∗k Xk + fk(X∗k Xk)2)−1
Idea: choose parameters such that f (x) = x ak+bk x2+ck x4
dk+ek x2+ fk x4 is as closeas possible to the ”optimal” mapping function
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
σi(Xk)
σi(Xk+1) = 1
Higher-order iteration–[with R. Freund]
Recall the DWH iteration
Xk+1 = Xk(akI + XTk Xk)(bkI + ckXT
k Xk)−1
A natural extension is an iteration of the form
Xk+1 = Xk(akI + bkX∗k Xk + ck(X∗k Xk)2)(dkI + ekX∗k Xk + fk(X∗k Xk)2)−1
Idea: choose parameters such that f (x) = x ak+bk x2+ck x4
dk+ek x2+ fk x4 is as closeas possible to the ”optimal” mapping function
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
σi(Xk)
σi(Xk+1) = 1
Zolotarev’s best rational approximation to sgn(x)For arbitrary degree r, Zolotarev (1877) gives an explicit formulafor argminR(x)∈Rr+1,r
‖R(x) − sgn(x)‖∞ on [−κ,−1] ∪ [1, κ]:
R(x) = xP(x2)Q(x2)
= Mx∏r
i=1(x2 + c2i)∏ri=1(x2 + c2i−1)
,
where ci is defined by
ci =sn2(iK/(2r + 1); k)cn2(iK/(2r + 1); k)
,
where k = 1 − 1/κ2 and K is the complete elliptic integral formodulus κ. M is a constant s.t. R(1) = 1
Zolotarev’s best rational approximation to sgn(x)
R(x) = xP(x2)Q(x2)
= Mx∏r
i=1(x2 + c2i)∏ri=1(x2 + c2i−1)
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
(1,1)-approximation
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
(4,4)-approximation
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
(2,2)-approximation
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
(5,5)-approximation
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
(3,3)-approximation
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
(6,6)-approximation
Best rational approximationf (x) : [−1, 1]→ R continuous, the type-(m, n) best rationalapproximation to f (x) is
R∗m,n(x) := argminR(x)= pm(x)qn(x)‖ f (x) − R(x)‖∞
Unique characterization: R∗m,n(x) iff the error R∗m,n(x) − f (x)equioscillates with at least (m + n + 2) extreme points(in the generic case, in the absence of defects)
Best rational approximationf (x) : [−1, 1]→ R continuous, the type-(m, n) best rationalapproximation to f (x) is
R∗m,n(x) := argminR(x)= pm(x)qn(x)‖ f (x) − R(x)‖∞
Unique characterization: R∗m,n(x) iff the error R∗m,n(x) − f (x)equioscillates with at least (m + n + 2) extreme points(in the generic case, in the absence of defects)
−1 −0.5 0 0.5 10
0.5
1
1.5
2
2.5
3
f(x) and Rm,n* (x), (m,n)=(1,1)
−1 −0.5 0 0.5 1
−0.02
−0.01
0
0.01
0.02
error (m,n)=(1,1)
Best rational approximationf (x) : [−1, 1]→ R continuous, the type-(m, n) best rationalapproximation to f (x) is
R∗m,n(x) := argminR(x)= pm(x)qn(x)‖ f (x) − R(x)‖∞
Unique characterization: R∗m,n(x) iff the error R∗m,n(x) − f (x)equioscillates with at least (m + n + 2) extreme points(in the generic case, in the absence of defects)
−1 −0.5 0 0.5 10
0.5
1
1.5
2
2.5
3
f(x) and Rm,n* (x), (m,n)=(2,2)
−1 −0.5 0 0.5 1
−5
0
5
x 10−5 error (m,n)=(2,2)
Best rational approximationf (x) : [−1, 1]→ R continuous, the type-(m, n) best rationalapproximation to f (x) is
R∗m,n(x) := argminR(x)= pm(x)qn(x)‖ f (x) − R(x)‖∞
Unique characterization: R∗m,n(x) iff the error R∗m,n(x) − f (x)equioscillates with at least (m + n + 2) extreme points(in the generic case, in the absence of defects)
−1 −0.5 0 0.5 10
0.5
1
1.5
2
2.5
3
f(x) and Rm,n* (x), (m,n)=(3,3)
−1 −0.5 0 0.5 1
−1.5
−1
−0.5
0
0.5
1
1.5
x 10−7 error (m,n)=(3,3)
Best rational approximationf (x) : [−1, 1]→ R continuous, the type-(m, n) best rationalapproximation to f (x) is
R∗m,n(x) := argminR(x)= pm(x)qn(x)‖ f (x) − R(x)‖∞
Unique characterization: R∗m,n(x) iff the error R∗m,n(x) − f (x)equioscillates with at least (m + n + 2) extreme points(in the generic case, in the absence of defects)
−1 −0.5 0 0.5 10
0.5
1
1.5
2
2.5
3
f(x) and Rm,n* (x), (m,n)=(4,4)
−1 −0.5 0 0.5 1
−1.5
−1
−0.5
0
0.5
1
1.5
x 10−10 error (m,n)=(4,4)
Best rational approximationf (x) : [−1, 1]→ R continuous, the type-(m, n) best rationalapproximation to f (x) is
R∗m,n(x) := argminR(x)= pm(x)qn(x)‖ f (x) − R(x)‖∞
Unique characterization: R∗m,n(x) iff the error R∗m,n(x) − f (x)equioscillates with at least (m + n + 2) extreme points(in the generic case, in the absence of defects)
−1 −0.5 0 0.5 10
0.5
1
1.5
2
2.5
3
f(x) and Rm,n* (x), (m,n)=(5,5)
−1 −0.5 0 0.5 1−1
−0.5
0
0.5
1x 10
−13 error (m,n)=(5,5)
Best rational approximationf (x) : [−1, 1]→ R continuous, the type-(m, n) best rationalapproximation to f (x) is
R∗m,n(x) := argminR(x)= pm(x)qn(x)‖ f (x) − R(x)‖∞
Unique characterization: R∗m,n(x) iff the error R∗m,n(x) − f (x)equioscillates with at least (m + n + 2) extreme points(in the generic case, in the absence of defects)
−1 −0.5 0 0.5 1−0.1
−0.05
0
0.05
0.1
0.15
0.2
0.25f(x) = cumsum(sign(sin(20*exp(x))))
−1 −0.5 0 0.5 1−0.05
0
0.05error (m,n)=(20,0)
Two-step convergenceIteration count for convergence for varying κ2(A) and degree r.κ2(A) 1.001 1.01 1.1 1.2 1.5 2 10 102 103 104 105 106 107 1014
r = 1 (QDWH) 2 2 2 3 3 3 4 4 4 4 5 5 5 6r = 2 1 2 2 2 2 2 3 3 3 3 3 3 4 4r = 3 1 1 2 2 2 2 2 2 3 3 3 3 3 3r = 4 1 1 1 2 2 2 2 2 2 2 3 3 3 3r = 5 1 1 1 1 2 2 2 2 2 2 2 2 3 3r = 6 1 1 1 1 1 2 2 2 2 2 2 2 2 3r = 7 1 1 1 1 1 1 2 2 2 2 2 2 2 3r = 8 1 1 1 1 1 1 2 2 2 2 2 2 2 2
-2 0 2 4 6 8 10 12 14 16
0
1
2
3
4
5
6
7
8
9
r
log(κ2(A)− 1)
Degree of polynomials r needed for convergence in two iterations.
⇒ Algorithm converges in two steps for any κ2(A) ≤ 1016
Two-step convergenceIteration count for convergence for varying κ2(A) and degree r.κ2(A) 1.001 1.01 1.1 1.2 1.5 2 10 102 103 104 105 106 107 1014
r = 1 (QDWH) 2 2 2 3 3 3 4 4 4 4 5 5 5 6r = 2 1 2 2 2 2 2 3 3 3 3 3 3 4 4r = 3 1 1 2 2 2 2 2 2 3 3 3 3 3 3r = 4 1 1 1 2 2 2 2 2 2 2 3 3 3 3r = 5 1 1 1 1 2 2 2 2 2 2 2 2 3 3r = 6 1 1 1 1 1 2 2 2 2 2 2 2 2 3r = 7 1 1 1 1 1 1 2 2 2 2 2 2 2 3r = 8 1 1 1 1 1 1 2 2 2 2 2 2 2 2
-2 0 2 4 6 8 10 12 14 16
0
1
2
3
4
5
6
7
8
9
r
log(κ2(A)− 1)
Degree of polynomials r needed for convergence in two iterations.⇒ Algorithm converges in two steps for any κ2(A) ≤ 1016
Parallel computation via partial fraction form∏ri=1(x2 + c2i)∏r
i=1(x2 + c2i−1)= 1 −
r∑i=1
α2i−1
x2 + c2i−1,
where α2i−1 =(∏r
j=1(c2i−1 − c2 j))·(∏r
j=1, j,i(c2i−1 − c2 j−1)). Hence
R(X) =1
R(1)X
r∏i=1
(X∗X + c2iI)r∏
i=1
(X∗X + c2i−1I)−1
=1
R(1)
X +
r∑i=1
α2i−1X(X∗X + c2i−1I)−1
.
This can be computed via QR decompositions in parallel:
X√
c2i−1I
=
Qi1
Qi2
R for i = 1, . . . , r,
R(X) =1
R(1)
X +
r∑i=1
α2i−1√
c2i−1Qi1Q∗i2
.
Parallel computation via partial fraction form∏ri=1(x2 + c2i)∏r
i=1(x2 + c2i−1)= 1 −
r∑i=1
α2i−1
x2 + c2i−1,
where α2i−1 =(∏r
j=1(c2i−1 − c2 j))·(∏r
j=1, j,i(c2i−1 − c2 j−1)). Hence
R(X) =1
R(1)X
r∏i=1
(X∗X + c2iI)r∏
i=1
(X∗X + c2i−1I)−1
=1
R(1)
X +
r∑i=1
α2i−1X(X∗X + c2i−1I)−1
.This can be computed via QR decompositions in parallel:
X√
c2i−1I
=
Qi1
Qi2
R for i = 1, . . . , r,
R(X) =1
R(1)
X +
r∑i=1
α2i−1√
c2i−1Qi1Q∗i2
.
AlgorithmAlgorithm 1 Zolo: compute the polar decomposition A = UpH
1: Obtain estimates α & σmax(A), β . σmin(A) ' α/condest(A)2: Choose r based on κ such that κ2 − 1 ≤ 10−15
3: X0 = A/β, compute X1, X2:
(i). Compute R(x), the best approximate rational function ofsign(x) in R2m+1,2m in the interval [1, κ]
(ii). Compute X1 by X0√
c2i−1I
=
Qi1
Qi2
R for i = 1, . . . , r,
then X1 = 1R(1)
(X0 +
∑ri=1
α2i−1√c2i−1
Qi1Q∗i2
).
(iii). Update κ ← R(κ)/R(1), compute ci and R(x) as in step (i)with new interval [1, κ]
(iv). Compute X2 by (safely use Cholesky since κ2(X1) ≤ 5,faster than QR) Z2i−1 = X∗1X1 + c2i−1I, W2i−1 = chol(Z2i−1),
X2 = 1R(1)
(X1 +
∑ri=1 α2i−1(X1W−1
2i−1)W−∗2i−1
).
4: Up = X2, H = 12 (U∗pA + (U∗pA)∗)
Numerical experiments: symeig accuracyNAG MATLAB toolbox experiments with n-by-n matrices AT = A, uniformeigenvalues, κ2(A) = 105
Backward error ‖A − VΛVT ‖F/‖A‖Fn = 2000 n = 4000 n = 6000 n = 8000 n = 10000
Zolo-eig 2.0e-15 2.4e-15 2.6e-15 2.8e-15 3.0e-15QDWH-eig (r = 1) 2.1e-15 2.4e-15 2.6e-15 2.8e-15 2.9e-15Divide-Conquer 5.3e-15 7.3e-15 8.7e-15 9.8e-15 1.1e-14
MATLAB 1.4e-14 1.8e-14 2.3e-14 2.7e-14 3.0e-14MR3 4.2e-15 5.8e-15 7.1e-15 8.1e-15 8.9e-15QR 1.5e-14 2.8e-14 2.5e-14 2.9e-14 3.2e-14
Distance from orthogonality ‖VT V − I‖F/√
n.n = 2000 n = 4000 n = 6000 n = 8000 n = 10000
Zolo-eig 7.7e-16 8.0e-16 8.2e-16 8.4e-16 8.5e-16QDWH-eig (r = 1) 7.7e-16 8.0e-16 8.2e-16 8.4e-16 8.5e-16Divide-Conquer 4.6e-15 6.3e-15 7.6e-15 8.7e-15 9.6e-15
MATLAB 1.1e-14 1.5e-14 1.9e-14 2.1e-14 2.3e-14MR3 2.6e-15 3.6e-15 4.2e-15 4.8e-15 5.3e-15QR 1.1e-14 2.4e-14 1.9e-14 2.2e-14 2.5e-14
Numerical experiments: symeig runtimeNAG MATLAB toolbox experiments with n-by-n matrices AT = A, uniformeigenvalues, κ2(A) = 105
Runtime(s)n = 2000 n = 4000 n = 6000 n = 8000 n = 10000
Zolo-eig 29.2 154 495 1090 2250(ideal parallel) 8.64 48.4 144 337 666
QDWH-eig (r = 1) 13.6 89.7 308 589 1100Divide-Conquer 2.29 18.1 61.5 141 277
MATLAB 3.08 25.9 81.9 176 336MR3 12.0 106 355 834 1620QR 4.25 34.4 113 272 520
Numerical experiments: SVD accuracyNAG MATLAB toolbox experiments with n-by-n MATLAB randsvdmatrices, κ2(A) = 105
Backward error ‖A − UΣVT ‖F/‖A‖Fn = 2000 n = 4000 n = 6000 n = 8000 n = 10000
Zolo-SVD 2.2e-15 2.8e-15 2.8e-15 3.1e-15 3.2e-15QDWH-SVD (r = 1) 2.0e-15 2.3e-15 2.5e-15 2.7e-15 2.8e-15
Divide-Conquer 5.6e-15 7.8e-15 9.2e-15 1.0e-14 1.2e-14MATLAB 5.8e-15 7.8e-15 9.2e-15 1.1e-14 1.2e-14
QR 1.6e-14 2.2e-14 2.7e-14 3.1e-14 3.5e-14
Distance from orthogonality max‖UT U − I‖F/√
n, ‖VT V − I‖F/√
n.n = 2000 n = 4000 n = 6000 n = 8000 n = 10000
Zolo-SVD 7.7e-16 8.0e-16 8.2e-16 8.4e-16 8.5e-16QDWH-SVD (r = 1) 7.7e-16 8.0e-16 8.2e-16 8.4e-16 8.5e-16
Divide-Conquer 5.2e-15 7.2e-15 8.4e-15 9.5e-15 1.1e-14MATLAB 8.1e-15 7.1e-15 8.4e-15 9.8e-15 1.1e-14
QR 1.2e-14 1.7e-14 2.1e-14 2.5e-14 2.8e-14
Numerical experiments: SVD runtimeNAG MATLAB toolbox experiments with n-by-n MATLAB randsvdmatrices, κ2(A) = 105
Runtime(s)n = 2000 n = 4000 n = 6000 n = 8000 n = 10000
Zolo-SVD 43.2 256 780 1764 3490(ideal parallel) 14.4 80.6 294 565 980
QDWH-SVD (r = 1) 20.4 124 373 838 1600Divide-Conquer 7.5 51.5 161 364 688
MATLAB 4.56 36.2 117 272 527QR 19.5 140 446 998 1880
Comparison with approach via contour integralsA general function f (A) has the Cauchy integral expression
f (A) =1
2πi
∫Γ
f (z)(zI − A)−1dz (3)
Numerically integrating (3) using conformal maps yields–[Hale, Higham, Trefethen (2008)]
f (A) ' AN∑
j=1
γ j(z jI + A)−1.
Since U = 2πA
∫ ∞0 (t2I + A∗A)−1dt in A = UH, so by this approach
U ' AN∑
j=1
γ j(z jI + A∗A)−1.
Compared with our Xk+1 = Xk +∑r
i=1 α2i−1Xk(X∗k Xk + c2i−1I)−1, U = X2 ,we
I use the best rational approximation (not semi-best).
I replace inverse by QR, thereby establish backward stability.
I allow two iterations, thereby reduce flop counts and improve stability.
Comparison with approach via contour integralsA general function f (A) has the Cauchy integral expression
f (A) =1
2πi
∫Γ
f (z)(zI − A)−1dz (3)
Numerically integrating (3) using conformal maps yields–[Hale, Higham, Trefethen (2008)]
f (A) ' AN∑
j=1
γ j(z jI + A)−1.
Since U = 2πA
∫ ∞0 (t2I + A∗A)−1dt in A = UH, so by this approach
U ' AN∑
j=1
γ j(z jI + A∗A)−1.
Compared with our Xk+1 = Xk +∑r
i=1 α2i−1Xk(X∗k Xk + c2i−1I)−1, U = X2 ,we
I use the best rational approximation (not semi-best).
I replace inverse by QR, thereby establish backward stability.
I allow two iterations, thereby reduce flop counts and improve stability.
Two approaches
Rational and polynomial best approximation to sgn(x)I Rational (Zolotarev, 1877):
‖sgn(x) − Rr+1,r(x)‖∞ ' exp(−cr/ ln(κ2(A)))
I Polynomial (Eremenko and Yuditskii, 2006):
‖sgn(x) − Pr(x)‖∞ ' c(κ2(A) − 1κ2(A) + 1
)r
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
Rational R5,4
Polynomial P9
R5,4(x), P9(x) for κ2(A) = 100
0 5 10 15 20 25 30
10-15
10-10
10-5
100
Rational
Polynomial
log ‖sgn(x) − Rr,r(x)‖∞,log ‖sgn(x) − P2r(x)‖∞
Summary and future workSummary
I Communication increasingly dominates computation cost⇒ need to redesign linear algebra algorithms
I Known communication-minimizing algorithms for symmetriceigenproblems and SVD are unstable, high arithmetic cost
I Proposed backward stable, highly parallelizablearithmetic/communication-minimizing algorithms
I Key idea from approximation theory (Zolotarev’s best rationalapproximation)
Future work
I Good scheme for choosing σ ' median(eig(A))I Fortran/C and parallel, communication-minimizing implementation
I Updating polar/eigen/sv decomposition
Summary and future workSummary
I Communication increasingly dominates computation cost⇒ need to redesign linear algebra algorithms
I Known communication-minimizing algorithms for symmetriceigenproblems and SVD are unstable, high arithmetic cost
I Proposed backward stable, highly parallelizablearithmetic/communication-minimizing algorithms
I Key idea from approximation theory (Zolotarev’s best rationalapproximation)
Future work
I Good scheme for choosing σ ' median(eig(A))I Fortran/C and parallel, communication-minimizing implementation
I Updating polar/eigen/sv decomposition