-
Golub-Kahan iterative bidiagonalization and
determining the noise level in the data
Iveta Hnětynková ∗,∗∗, Martin Plešinger ∗∗,∗∗∗, Zdeněk Strakoš ∗,∗∗
* Charles University, Prague
** Academy of Sciences of the Czech republic, Prague
*** Technical University, Liberec
with thanks to P. C. Hansen, M. Kilmer and many others
SIAM ALA, Monterey, October 2009
1
-
Six tons large scale real world ill-posed problem:
2
-
Solving large scale discrete ill-posed problems is frequently based upon
orthogonal projections-based model reduction using Krylov sub-
spaces, see, e.g., hybrid methods. This can be viewed as
approximation of a Riemann-Stieltjes distribution function
via matching moments.
3
-
Consider the Riemann-Stieltjes distribution function ω(λ) with the
n points of increase associated with the HPD matrix B and the
normalized inital vector s , (or with the transfer function given by
the Laplace transform of a linear dynamical system determined by
B, s ). Then
s∗(λI − B)−1s =n
∑
j=1
ωj
λ − λj≡ Fn(λ) ,
where λj, j = 1, . . . , n denote the eigenvalues of B and ωj the
squared size of the component of s in the corresponding invariant
subspace respectively.
4
-
The continued fraction on the right hand side is given by
Fn(λ) ≡Rn(λ)
Pn(λ)
≡1
λ − γ1 −δ22
λ − γ2 −δ23
λ − γ3 − . . .. . .
λ − γn−1 −δ2n
λ − γn
and the entries γ1, . . . , γn and δ2, . . . , δn form the Jacobi matrix
5
-
Tn ≡
γ1 δ2δ2 γ2
. . .. . . . . . δn
δn γn
, δℓ > 0 , ℓ = 2, . . . , n .
Consider the kth Gauss-Christoffel quadrature approximation ω(k)(λ)
of the Riemann-Stieltjes distribution function ω(λ) . Its algebraic
degree is 2k − 1 , i.e., it matches the first 2k moments
ξℓ−1 =∫
λℓ−1 dω(λ) =k
∑
j=1
ω(k)j {λ
(k)j }
ℓ−1, ℓ = 1, . . . ,2k .
6
-
The nodes and weights of ω(k)(λ) are given by the eigenvalues and
the corresponding squared first elements of the normalized eigenvec-
tors of Tk .
Expansion of the continued fraction Fn(λ) in terms of the decreasing
powers of λ and the approximation by its kth convergent Fk(λ)
gives
Fn(λ) =2k∑
ℓ=1
ξℓ−1λℓ
+ O
(
1
λ2k+1
)
= Fk(λ) + O
(
1
λ2k+1
)
.
Here Fk(λ) approximates Fn(λ) with the error proportional to
λ−(2k+1), which represents the minimal partial realization matching
the first 2k moments, cf. [Stieltjes - 1894, Chebyshev - 1855].
7
-
Discrete ill-posed problem,
the smallest node and weight in approximation of ω(λ):
10−40
10−20
100
10−30
10−20
10−10
100
the leftmost node
wei
ght
ω(λ): nodes σj2, weights |(b/β
1,u
j)|2
nodes (θ1(k))2, weights |(p
1(k),e
1)|2
δ2noise
8
-
Outline
1. Problem formulation
2. Golub-Kahan iterative bidiagonalization, Lanczos tridiagonaliza-
tion, and approximation of the Riemann-Stieltjes distribution func-
tion
3. Propagation of the noise in the Golub-Kahan bidiagonalization
4. Determination of the noise level
5. Numerical illustration
6. Summary and future work
9
-
Consider an ill-posed square nonsingular linear algebraic system
A x ≈ b, A ∈ Rn×n, b ∈ Rn,
with the right-hand side corrupted by a white noise
b = bexact + bnoise 6= 0 ∈ Rn , ‖ bexact ‖ ≫ ‖ bnoise ‖ ,
and the goal to approximate xexact ≡ A−1 bexact.
The noise level δnoise ≡‖bnoise‖
‖bexact‖≪ 1 .
10
-
Properties (assumptions):
• matrices A, AT , AAT have a smoothing property;
• left singular vectors uj of A represent increasing frequencies
as j increases;
• the system A x = bexact satisfies the discrete Picard condition.
Discrete Picard condition (DPC):
On average, the components |(bexact, uj)| of the true right-hand
side bexact in the left singular subspaces of A decay faster than
the singular values σj of A, j = 1, . . . , n .
White noise:
The components |(bnoise, uj)|, j = 1, . . . , n do not exhibit any trend.
11
-
Problem Shaw: Noise level, Singular values, and DPC:
[Hansen – Regularization Tools]
0 10 20 30 40 50 60
10−40
10−30
10−20
10−10
100
singular value number
σj, double precision arithmetic
σj, high precision arithmetic
|(bexact, uj)|, high precision arithmetic
0 50 100 150 200 250 300 350 40010
−20
10−15
10−10
10−5
100
105
singular value number
σj
|(b, uj)| for δ
noise = 10−14
|(b, uj)| for δ
noise = 10−8
|(b, uj)| for δ
noise = 10−4
12
-
Regularization is used to suppress the effect of errors in the data
and extract the essential information about the solution.
In hybrid methods, see [O’Leary, Simmons – 81], [Hansen – 98], or
[Fiero, Golub Hansen, O’Leary – 97], [Kilmer, O’Leary – 01], [Kilmer,
Hansen, Español – 06], [O’Leary, Simmons – 81], the outer bidiago-
nalization is combined with an inner regularization – e.g., truncated
SVD (TSVD), or Tikhonov regularization – of the projected small
problem (i.e. of the reduced model).
The bidiagonalization is stopped when the regularized solution of the
reduced model matches some selected stopping criteria.
13
-
Stopping criteria are typically based, amongst others, see [Björk –
88], [Björk, Grimme, Van Dooren – 94], on
• estimation of the L-curve [Calvetti, Golub, Reichel – 99], [Calvetti,
Morigi, Reichel, Sgallari – 00], [Calvetti, Reichel – 04];
• estimation of the distance between the exact and regularized so-
lution [O’Leary – 01];
• the discrepancy principle [Morozov – 66], [Morozov – 84];
• cross validation methods [Chung, Nagy, O’Leary – 04], [Golub,
Von Matt – 97], [Nguyen, Milanfar, Golub – 01].
For an extensive study and comparison see [Hansen – 98], [Kilmer,
O’Leary – 01].
14
-
Stopping criteria based on information from residual vectors:
A vector x̂ is a good approximaton to xexact = A−1 bexact if the
corresponding residual approximates the (white) noise in the data
r̂ ≡ b − A x̂ ≈ bnoise .
Behavior of r̂ can be expressed in the frequency domain using
• discrete Fourier transform, see [Rust – 98], [Rust – 00],
[Rust, O’Leary – 08], or
• discrete cosine transform, see [Hansen, Kilmer, Kjeldsen – 06],
and then analyzed using statistical tools – cumulative periodograms.
15
-
This talk:
Under the given (quite natural) assumptions, the Golub-Kahan itera-
tive bidiagonalization reveals the noise level δnoise .
16
-
Outline
1. Problem formulation
2. Golub-Kahan iterative bidiagonalization, Lanczos tridiago-
nalization, and approximation of the Riemann-Stieltjes dis-
tribution function
3. Propagation of the noise in the Golub-Kahan bidiagonalization
4. Determination of the noise level
5. Numerical illustration
6. Summary and future work
17
-
Golub-Kahan iterative bidiagonalization (GK) of A :
given w0 = 0 , s1 = b / β1 , where β1 = ‖b‖ , for j = 1,2, . . .
αj wj = AT sj − βj wj−1 , ‖wj‖ = 1 ,
βj+1 sj+1 = A wj − αj sj , ‖sj+1‖ = 1 .
Let Sk = [ s1, . . . , sk ] , Wk = [w1, . . . , wk ] be the associated
matrices with orthonormal columns.
18
-
Denote
Lk =
α1β2 α2
. . . . . .
βk αk
,
Lk+ =
α1β2 α2
. . . . . .
βk αkβk+1
=
[
Lk
eTk βk+1
]
,
the bidiagonal matrices containing the normalization coefficients. ThenGK can be written in the matrix form as
AT Sk = Wk LTk ,
A Wk =[
Sk, sk+1]
Lk+ = Sk+1 Lk+ .
19
-
GK is closely related to the Lanczos tridiagonalization of the sym-metric matrix A AT with the starting vector s1 = b / β1,
A AT Sk = Sk Tk + αk βk+1 sk+1 eTk ,
Tk = Lk LTk =
α21 α1 β1
α1 β1 α22 + β
22
. . .. . . . . . αk−1 βk
αk−1 βk α2k + β
2k
,
i.e. the matrix Lk from GK represents a Cholesky factor of thesymmetric tridiagonal matrix Tk from the Lanczos process.
20
-
Approximation of the distribution function:
The Lanczos tridiagonalization of the given (SPD) matrix B ∈ Rn×n
generates at each step k a non-decreasing piecewise constant distri-
bution function ω(k) , with the nodes being the (distinct) eigenvalues
of the Lanczos matrix Tk and the weights ω(k)j being the squared
first entries of the corresponding normalized eigenvectors [Hestenes,
Stiefel – 52].
The distribution functions ω(k)(λ) , k = 1, 2, . . . represent Gauss-
Christoffel quadrature (i.e. minimal partial realization) approxima-
tions of the distribution function ω(λ) , [Hestenes, Stiefel – 52], [Fis-
cher – 96], [Meurant, Strakoš – 06].
21
-
Consider the SVD
Lk = Pk Θk QkT ,
Pk = [p(k)1 , . . . , p
(k)k ] , Qk = [q
(k)1 , . . . , q
(k)k ] , Θk = diag (θ
(k)1 , . . . , θ
(k)n ) ,
with the singular values ordered in the increasing order,
0 < θ(k)1 < . . . < θ
(k)k .
Then Tk = Lk LTk = Pk Θ
2k P
Tk is the spectral decomposition of Tk ,
(θ(k)ℓ )
2 are its eigenvalues (the Ritz values of AAT) and
p(k)ℓ its eigenvectors (which determine the Ritz vectors of AA
T),
ℓ = 1, . . . , k .
22
-
Summarizing:
The GK bidiagonalization generates at each step k the distributionfunction
ω(k)(λ) with nodes (θ(k)ℓ )
2 and weights ω(k)ℓ = |(p
(k)ℓ , e1)|
2
that approximates the distribution function
ω(λ) with nodes σ2j and weights ωj = |(b/β1, uj)|2 ,
where σ2j , uj are the eigenpairs of A AT , for j = n, . . . , 1 .
Note that unlike the Ritz values (θ(k)ℓ )
2, the squared singular values
σ2j are enumerated in descending order.
23
-
Outline
1. Problem formulation
2. Golub-Kahan iterative bidiagonalization, Lanczos tridiagonaliza-
tion, and approximation of the Riemann-Stieltjes distribution func-
tion
3. Propagation of the noise in the Golub-Kahan bidiagonaliza-
tion
4. Determination of the noise level
5. Numerical illustration
6. Summary and future work
24
-
GK starts with the normalized noisy right-hand side s1 = b / ‖b‖ .
Consequently, vectors sj contain information about the noise.
Can this information be used to determine the level of the noise in
the observation vector b ?
Consider the problem Shaw from [Hansen – Regularization Tools]
(computed via [A,b exact,x] = shaw(400)) with a noisy right-
hand side (the noise was artificially added using the MATLAB function
randn). As an example we set
δnoise ≡‖ bnoise ‖
‖ bexact ‖= 10−14 .
25
-
Components of several bidiagonalization vectors sj
computed via GK with double reorthogonalization:
0 200 400−0.2
−0.1
0
0.1
0.2
s1
0 200 400−0.2
−0.1
0
0.1
0.2
s6
0 200 400−0.2
−0.1
0
0.1
0.2
s11
0 200 400−0.2
−0.1
0
0.1
0.2
s16
0 200 400−0.2
−0.1
0
0.1
0.2
s17
0 200 400−0.2
−0.1
0
0.1
0.2
s18
0 200 400−0.2
−0.1
0
0.1
0.2
s19
0 200 400−0.2
−0.1
0
0.1
0.2
s20
0 200 400−0.2
−0.1
0
0.1
0.2
s21
0 200 400−0.2
−0.1
0
0.1
0.2
s22
26
-
The first 80 spectral coefficients of the vectors sj
in the basis of the left singular vectors uj of A:
20 40 60 80
100
UTs1
20 40 60 80
100
UTs6
20 40 60 80
100
UTs11
20 40 60 80
100
UTs16
20 40 60 80
100
UTs17
20 40 60 80
100
UTs18
20 40 60 80
100
UTs19
20 40 60 80
100
UTs20
20 40 60 80
100
UTs21
20 40 60 80
100
UTs22
27
-
Signal space – noise space diagrams:
s1 −> s
2s
5 −> s
6s
6 −> s
7s
10 −> s
11s
13 −> s
14s
15 −> s
16
s16
−> s17
s17
−> s18
s18
−> s19
s19
−> s20
s20
−> s21
s21
−> s22
sk (triangle) and sk+1 (circle) in the signal space span{u1, . . . , uk+1}
(horizontal axis) and the noise space span{uk+2, . . . , un} (vertical axis).
28
-
The noise is amplified with the ratio αk/βk+1 :
GK for the spectral components:
α1 (VTw1) = Σ(U
Ts1) ,
β2 (UTs2) = Σ(V
Tw1) − α1 (UTs1) ,
and for k = 2,3, . . .
αk(VTwk) = Σ(U
Tsk) − βk(VTwk−1) ,
βk+1(UTsk+1) = Σ(V
Twk) − αk(UTsk) .
29
-
Since dominance in Σ(UTsk) and (VTwk−1) is shifted by one com-
ponent, in αk (VTwk) = Σ(U
Tsk) − βk (VTwk−1) , one can not ex-
pect a significant cancelation, and therefore
αk ≈ βk .
Whereas Σ (V Twk) and (UT sk) do exhibit dominance in the direc-
tion of the same components. If this dominance is strong enough,
then the required orthogonality of sk+1 and sk in
βk+1 (UTsk+1) = Σ(V
Twk) − αk (UTsk) can not be achieved without
a significant cancelation, and one can expect
βk+1 ≪ αk .
.
30
-
Absolute values of the first 25 components of Σ(V Twk), αk(UTsk),
and βk+1(UTsk+1) for k = 7, β8/α7 = 0.0524 (left)
and for k = 12, β13/α12 = 0.677 (right),
5 10 15 20 2510
−20
10−15
10−10
10−5
100
| Σ VT w7 |
| α7 UT s
7 |
| β8 UT s
8 |
5 10 15 20 2510
−25
10−20
10−15
10−10
10−5
| Σ VT w12
|| α
12 UT s
12 |
| β13
UT s13
|
Shaw with the noise level δnoise = 10−14:
31
-
Outline
1. Problem formulation
2. Golub-Kahan iterative bidiagonalization, Lanczos tridiagonaliza-
tion, and approximation of the Riemann-Stieltjes distribution func-
tion
3. Propagation of the noise in the Golub-Kahan bidiagonalization
4. Determination of the noise level
5. Numerical illustration
6. Summary and future work
32
-
Depending on the noise level, the smaller nodes of ω(λ) are com-
pletely dominated by noise, i.e., there exists an index Jnoise such
that for j ≥ Jnoise
|(b/β1, uj)|2 ≈ |(bnoise/β1, uj)|
2
and the weight of the set of the associated nodes is given by
δ2 ≡n
∑
j=Jnoise
|(bnoise/β1, uj)|2 .
Recall that the large nodes σ21, σ22, . . . are well-separated (relatively
to the small ones) and their weights on average decrease faster than
σ21, σ22 , see (DPC). Therefore the large nodes essentially control the
behavior of the early stages of the Lanczos process.
33
-
At any iteration step, the weight corresponding to the smallest node
(θ(k)1 )
2 must be larger than the sum of weights of all σ2j smaller
than this (θ(k)1 )
2 , see [Fischer, Freund – 94]. As k increases, some
(θ(k)1 )
2 eventually approaches (or becomes smaller than) the node
σ2Jnoise, and its weight becomes
|(p(k)1 , e1)|
2 ≈ δ2 ≈ δ2noise .
The weight |(p(k)1 , e1)|
2 corresponding to the smallest Ritz value
(θ(k)1 )
2 is strictly decreasing. At some iteration step it sharply starts
to (almost) stagnate on the level close to the squared noise level
δ2noise .
34
-
Square roots of the weights |(p(k)1 , e1)|
2, k = 1, 2, . . .,
0 5 10 15 20 25 3010
−15
10−10
10−5
100
iteration number k
δnoise
knoise
= 18 − 1 = 17
Shaw with the noise level δnoise = 10−14:
35
-
Square roots of the weights |(p(k)1 , e1)|
2, k = 1, 2, . . .,
0 5 10 15 20
10−4
10−3
10−2
10−1
100
iteration number k
δnoise
knoise
= 8 − 1 = 7
Shaw with the noise level δnoise = 10−4:
36
-
The smallest node and weight in approximation of ω(λ):
10−40
10−20
100
10−30
10−20
10−10
100
the leftmost node
wei
ght
ω(λ): nodes σj2, weights |(b/β
1,u
j)|2
nodes (θ1(k))2, weights |(p
1(k),e
1)|2
δ2noise
37
-
The smallest node and weight in approximation of ω(λ):
10−40
10−20
100
10−8
10−6
10−4
10−2
100
the leftmost node
wei
ght
ω(λ): nodes σj2, weights |(b/β
1,u
j)|2
nodes (θ1(k))2, weights |(p
1(k),e
1)|2
δ2noise
38
-
Outline
1. Problem formulation
2. Golub-Kahan iterative bidiagonalization, Lanczos tridiagonaliza-
tion, and approximation of the Riemann-Stieltjes distribution func-
tion
3. Propagation of the noise in the Golub-Kahan bidiagonalization
4. Determination of the noise level
5. Numerical illustration
6. Summary and future work
39
-
Image deblurring problem, image size 324 × 470 pixels,
problem dimension n = 152280, the exact solution (left) and
the noisy right-hand side (right), δnoise = 3 × 10−3.
xexact bexact + bnoise
40
-
Square roots of the weights |(p(k)1 , e1)|
2, k = 1, 2, . . . (top)
and error history of LSQR solutions (bottom):
0 20 40 60 80 10010
−3
10−2
10−1
100
| ( p1(k) , e
1 ) |, || bnoise ||
2 / || bexact ||
2 = 0.003
| ( p1(k) , e
1 ) |
δnoise
0 20 40 60 80 100
103.7
103.8
103.9
Error history
|| xkLSQR − xexact ||
the minimal error, k = 41
41
-
The best LSQR reconstruction (left), xLSQR41 ,
and the corresponding componentwise error (right).
GK without any reorthogonalization!
LSQR reconstruction with minimal error, xLSQR41
Error of the best LSQR reconstruction, |xexact − xLSQR41
|
10
20
30
40
50
60
70
80
90
100
110
42
-
Outline
1. Problem formulation
2. Golub-Kahan iterative bidiagonalization, Lanczos tridiagonaliza-
tion, and the approximation of the Riemann-Stieltjes distribution
function
3. Propagation of the noise in the Golub-Kahan bidiagonalization
4. Determination of the noise level
5. Numerical illustration
6. Summary and future work
43
-
Message:
Using GK, information about the noise can be obtained in a straight-
forward way.
Future work:
• Large scale problems;
• Behavior in finite precision arithmetic
(GK without reorthogonalization);
• Regularization;
• Denoising;
• Colored noise.
44
-
References
• Golub, Kahan: Calculating the singular values and pseudoinverse of a matrix,SIAM J. B2, 1965.
• Hansen: Rank-deficient and discrete ill-posed problems, SIAM MonographsMath. Modeling Comp., 1998.
• Hansen, Kilmer, Kjeldsen: Exploiting residual information in the parameterchoice for discrete ill-posed problems, BIT, 2006.
• Hnětynková, Strakoš: Lanczos tridiag. and core problem, LAA, 2007.• Meurant, Strakoš: The Lanczos and CG algorithms in finite precision arith-
metic, Acta Numerica, 2006.• Paige, Strakoš: Core problem in linear algebraic systems, SIMAX, 2006.• Rust: Truncating the SVD for ill-posed problems, Technical Report, 1998.• Rust, O’Leary: Residual periodograms for choosing regularization parameters
for ill-posed problems, Inverse Problems, 2008.• Hnětynková, Plešinger, Strakoš: The regularizing effect of the Golub-Kahan
iterative bidiagonalization and revealing the noise level, BIT, 2009.
• ...
45
-
Main message:
Whenever you see a blurred elephant which is a bit too noisy,
the best thing is to apply the GK iterative bidiagonalization.
Full version of the talk can be found at
www.cs.cas.cz/strakos
46
-
Thank you for your kind attention!
47