application of hierarchical matrices for partial inverse...randolph e. bank (uni of california),...
TRANSCRIPT
Application of hierarchical matrices for partialinverse
Alexander Litvinenko
KAUST, SRI-UQ Centerhttp://sri-uq.kaust.edu.sa/
www.hlib.org
November 26, 2013
4*
Content
1. Problem Setup
2. Hierarchical Domain Decomposition (HDD)
3. HDD in the H-matrix arithmetic
4. Computational resources of HDD
5. Modifications of HDD
6. Numerical results
2 / 40
4*
Happy Birthday Prof. Hackbusch !!!
www.mis.mpg.de/calendar/conferences/2013/wh65.htmlRandolph E. Bank (Uni of California), Susanne C. Brenner (Louisiana
State Uni), Eric Cances (Ecole des Ponts ParisTech), Albert Cohen (Uni
Pierre et Marie Curie), Wolfgang Dahmen (RWTH Aachen), Bjorn
Engquist (Uni of Texas at Austin), Christian Lubich (Uni Tubingen),
Yvon Maday (Uni Pierre et Marie Curie), Reinhold Schneider (TU
Berlin), Rob Stevenson (Uni Amsterdam), Endre Suli (Uni Oxford),
Gabriel Wittum (Uni Frankfurt), Jinchao Xu (Pennsylvania State Uni) 3 / 40
4*
Problem setup
The elliptic boundary value problem: find u ∈ H1(Ω) s.t. :∑
1≤i ,j≤2
∂
∂xiαi ,j(x)
∂
∂xju = f in Ω
u = g on ∂Ω
(1)
where αi ,j ∈ L∞(Ω) such A(x) = (αi ,j)i ,j=1,2 satisfies0 < λ ≤ λmin(A(x)) ≤ λmax(A(x)) ≤ λ , ∀x ∈ Ω.⇒ Oscillatory or jumping coefficients are allowed.
4 / 40
4*
The motivation and goals
E.g. a) compute solution on γ, b) compute solution in asubdomain ν, c) compute solution on the interface.d) Let Ah · xh = bh and h H, may be interested only inxH = RH←hA
−1h bh or
xH = RH←hA−1h PbH [see also Hackbusch and Drechsler, 2012]
5 / 40
4*
The idea of HDD
Apply Galerkin FE discretisation to (1).We construct the discrete solution in the form
uh = Fhfh + Ghgh, (2)
where Fh, Gh are two solution operators, fh is the FE rhs and gh isthe FE Dirichlet-boundary values.
Often only few functionals of the solution are of interest!
6 / 40
4*
IDEA: Leaves to Root and Root to Leaves algorithms
7 / 40
4*
Domain decomposition tree (TTh)
FE discretisation: triangulation Th, Ω := Ωh = ∪t∈Tht.
1
2
3
4
5
6
7
910
11
12
13
14
15
8
5
6
7
11
12
13
14
15
8
1
2
3
4
5
6
7
910
3
4
1
910
......
5
6
11
12
13
14
15
6
7
11
15
8
......
26
2
6
• Ω is the root of the tree,
• TTh is a binary tree,
• if ω ∈ TTh has two sonsω1, ω2 ∈ TTh : ω = ω1 ∪ ω2
and γω = ∂ω1 ∩ ∂ω2,
• ω ∈ TTh is a leaf, if and only ifω ∈ Th.
8 / 40
4*
Notation
Let ω ∈ TTh , ω = ω1 ∪ ω2.Γω,1 := ∂ω ∩ ω1,Γω,2 := ∂ω ∩ ω2
γω := ∂ω1\∂ω = ∂ω2\∂ω
I := I (Ω) = set of all nodal points in Ω.I (ω) := i ∈ I : xi ∈ ω.
9 / 40
4*
FE Galerkin Discretisation
For ω ∈ TTh define fω := (fi )i∈I (ω), gω := (gi )i∈I (∂ω),dω := (fω, gω).Let bj , j = 1, ...,N be piecewise linear basis,Vh := spanb1, ..., bN, Vh ⊂ V = H1(Ω).Variational Galerkin formulation of (1): find uh ∈ Vh such that
aω(uh, bj) = (fω, bj)L2(ω) ∀ j ∈ I (ω),
uh(xj) = gj ∀ j ∈ I (∂ω),(3)
where
aω(bi , bj) =
∫Ωα(x)(∇bi ,∇bj)dx,
(fω, bj) =
∫suppbj
fωbjdx.
10 / 40
4*
Main point of HDD
Main point of HDD is to build the mapping Φω = (Φgω,Φf
ω),where Φg
ω : RI (∂ω) → RI (γω) and Φfω : RI (ω) → RI (γω) for each
ω ∈ TTh .1. Definition of Mapping Φω := (Φg
ω,Φfω)
(Φω(dω))i := uh(xi ) , ∀i ∈ I (γω).
Hence, Φω(dω) is the trace of uh on γω.Actually, Φωdω = Φg
ωgω + Φfωfω.
2. Definition of auxiliary Mapping Ψω := (Ψgω,Ψf
ω)
Ψω(d) = (Ψω(dω))i∈I (∂ω) with (Ψω(dω))i := aω(uh, bi )− (fω, bi )L2(ω) ,
Ψωdω = Ψfωfω + Ψg
ωgω.
11 / 40
4*
Construction of the mappings Ψω and Φω
Lemma 1: Let ω1 and ω2 be two sons of ω ∈ TTh . Let dω1 anddω2 be the data associated to ω1 and ω2 s.t. :• (consistency conditions for the Dirichlet data)
g1,i = g2,i , ∀i ∈ I (ω1) ∩ I (ω2),
• (consistency conditions for the right-hand side)
f1,i = f2,i , ∀i ∈ I (ω1) ∩ I (ω2).
ω
ω
ω
1
2
xjγ ω
xj
Let uω1 and uω2 be the local FE solutions of the problem (3) forthe data dω1 , dω2 .
12 / 40
4*
Construction of the mappings Ψω and Φω
If uω1 , uω2 satisfy
γΨω1(dω1) + γΨω2(dω2) = 0,
then uω defined by assembling
uω(xi ) :=
uω1(xi ) for i ∈ I (ω1)uω2(xi ) for i ∈ I (ω2)
ω
ω
ω
1
2
xjγ ω
xj
is solution of (3) for the data dω = (fω, gω) given by
fω :=
f1,i for i ∈ I (ω1)f2,i for i ∈ I (ω2)
gω :=
g1,i for i ∈ I (∂ω1)g2,i for i ∈ I (∂ω2)
13 / 40
4*
Construction of Φω
Given: d1 := dω1 = (f1, g1,Γ, g1,γ), where g1,Γ := (g1)i∈I (Γω,1),g1,γ := (g1)i∈I (γ). Then
Ψω1d1 = Ψfω1f1 + ΨΓ
ω1g1,Γ + Ψγ
ω1g1,γ ,
Ψω2d2 = Ψfω2f2 + ΨΓ
ω2g2,Γ + Ψγ
ω2g2,γ .
Restricting to I (γ) and summing(γΨγ
ω1+ γΨγ
ω2
)gγ = (−Ψf
ω1f1 −ΨΓ
ω1g1,Γ −Ψf
ω2f2 −ΨΓ
ω2g2,Γ)|γ .
We setM := −( γΨγ
ω1+ γΨγ
ω2),
compute M−1 and solve for gγ :
gγ = M−1(Ψfω1f1 + ΨΓ
ω1g1,Γ + Ψf
ω2f2 + ΨΓ
ω2g2,Γ)|γ .
14 / 40
4*
HDD consists of two algorithms
1. Compute Ψω for all leaves of TTh (∈ R3×3 matrices).
2. Recursion from the leaves to the root (end if ω = Ω):
2.1 Compute Ψω and Φω from Ψω1 ,Ψω2 .2.2 Store Φω and delete Ψω1 ,Ψω2 .
II. Application of Φω
1. Given dω = (fω, gω), compute the solution uh on the interfaceγ by Φω(dω).
2. Build the data dω1 = (fω1 , gω1), dω2 = (fω2 , gω2) fromdω = (fω, gω) and gγ = Φω(dω).
3. Repeat for sons of ω1 and ω2.
15 / 40
4*
HDD in the H-matrix arithmetic
Exact HDD requires expensive matrix arithmetic.Let the system of linear equations for ω ∈ TTh be Au = Fc .Rewrite it in the block matrix form:(
ABB ABI
AIB AII
)(uB
uI
)=
(FBFI
)c ,
where uB ∈ RI (∂ω), uI ∈ RI (γ),ABB ∈ RI (∂ω) → RI (∂ω), AII ∈ RI (γ) → RI (γ).
16 / 40
4*
Eliminate uI via the Schur complement
(ABB − ABIA
−1II AIB 0
AIB AII
)(uB
uI
)=
(FB − ABIA
−1II FI
FI
)c .
(ABB − ABIA−1II AIB)uB = (FB − ABIA
−1II FI )c
uI = A−1II FI c − A−1
II AIBuB ,
Ψgω :=ABB − ABIA
−1II AIB (Schur complement)
Ψfω :=FB − ABIA
−1II FI ,
Φgω :=A−1
II AIB
Φfω :=A−1
II FI .Apply the H-matrix techniques.
17 / 40
4*
Rank-k matrices
R ∈ Rn×m, A ∈ Rn×k , B ∈ Rm×k ,k min(n,m). The storage R = ABT isk(n + m) instead of n ·m for R representedin the full matrix format.
=
A
BT
*
R
k
k
n
m
n
m
H-matrices (Hackbusch ’99)
1. Build cluster tree TI and block cluster tree TI×I .
I
I
I I
I
I
I I I I1
1
2
2
11 12 21 22
I11
I12
I21
I22
18 / 40
4*
Admissible condition
2. For each (t × s) ∈ TI×I , t, s ∈ TI , checkthe standard admissibility conditionmindiam(Qt), diam(Qs) ≤ η · dist(Qt ,Qs).
if(adm=true) then M|t×s is a rank-k matrixblockif(adm=false) then divide M|t×s further or de-fine as a dense matrix block, if small enough.
Q
Qt
S
dist
H=
t
s
Resume: Grid → cluster tree (TI ) + admissi-bility condition → block cluster tree (TI×I ) →H-matrix → H-matrix arithmetic.
4 2
2 23
3 3
4 2
2 2
4 2
2 2
4
19 / 40
4*
Definition of H-matrices
Definition: H(TI×J , k) := M ∈ RI×J | rank(M |t×s) ≤ k for alladmissible leaves t × s of TI×J.Let n := max(|I |, |J|), d = 1, 2, 3 be the spatial dimension.
Operation Sequential Complexity Parallel Complexity(Hackbusch et al. ’99-’06) (Kriemann ’05)
storage(M) N = O(kn log n) Nq
Mx N = O(kn log n) Nq
M1 ⊕M2 N = O(k2n log n) Nq
M1 M2, M−1 N = O(k2n log2 n) Nq +O(n)
H-LU N = O(k2n log2 n) Nq +O(k
2n log2 nn1/d )
H-matrix conversion N = O(k2n log2 n) Nq
20 / 40
4*
H-matrix conversion
A B
C 25 4
4 85
5 165
5 165
5 326
6 325
532 5
5 32
6
632 5
5 32
1
1
32 5
5 32 5
5
32 5
5
16 4
4 32 5
5 16
5
5 32
5
532 5
5 32
12
12
32 5
5 32 5
5
32 5
5
16 5
5
32 4
4 165
5 325
532 5
5 32
1
1
32 5
5 32 6
6
32 5
5 32 6
6
32 5
5
32 5
5
16 5
5
16 4
4 31
(right) An H-matrix approximation to Ψgω in the HDD method,
k ≤ 12. The weak admissibility condition is used.21 / 40
4*
H-matrix conversion
9 3
8 3
3 3
8 3
8 3
3 3
8 3
8 3
3
3 9
3 8
3 3
3 8
3 8
3 3
3 8
3 8
3 3
3 3
8 3 3
33 3
3 3
3 8 3
38 3
3 3
33 3
3 3
3 3
3
3
3
33 8
3 3
3
3 3
3 3
8 3
8 3
3 3
8 3
8 3
3 3
3 3
3 3
3 3
3 3
33 3
3 3
3 3
8 3
12 8
3 3
12 8
4 4
3 8
3 3
3 3
3 3
4 4
3 8
3 3
3 3
3 3
3 3
8 3
3 8
3 3
3 8
3 8 8
3 3
3
3
3 8
3 3
3
3 8
3 8
3 3
3 8
3 3
3 3
3
3 3
3 3
3 8
3
3 3
3 3
3 8
3 8
3 3
3 8
3 3
3 3
3 3
3 8
3 3
8 8
3 3
3 3
3 8
4 4
3 8
8 8
3 3
8 8
4 4
8 8
8 3
8 3
8 3
8 3 83
8 3
3 3
8 3
8 3
38 3
7 33
3
9 8
9 8 8
8
1
8 8
8 8
8 3
8 8
8 3
8 8
8 8
3 8
8 8
3 8
3 3
3 3
8 3
3 3
8 3
3 8
3 3
3 8
3 3
3 3
8 3
8 8
8 3
8 8
8 8
3 7
8 8
3 7
3 8
3 3
8 3
3 3
3 3
Matrix Ψf with the standard admissibility condition
22 / 40
A
B
C
D
E
13 4
4 45
5 85
5 82
2
8 5
5 16 5
5 8
5
5
8 5
5
16 5
5 81
1
8 5
5
8 5
5 15
5
516 5
5 15
255 8
6 16
6 16
6 32
7 32
32
16
32
16
32
6
32
6
16
6
32
5 16
6 32
32
32
16
16
31
19
32
5 32
632
5 31
258
16
16
32
32
1
32
6
16
6 32
6 16
6 32
5
32
16
32
16
32
1
32
6
32
5
16
6
16
5 31
20
32
5 32
632
5 31
25 7
7 89
9 1610
10 1611
11 3218
18 3215
15
32 17
17
16 10
10
32 8
8 1611
11 32
19
19
32 11
11 32 14
1432 12
12 31
17 8
8 16 11
11
16 8
8 32 10
10 16
17
17 3214
14
32 16
16
17 6
6 16 9
9
16 10
10
16 8
8 31
20
20
32 12
12 32 13
1332 11
11 31
25 5
5 86
6 166
6 166
6 327
7 321
1
32 6
6
16 6
6 32 6
6 16
6
6 32
11
11
32 6
6
16 6
6
32 5
5 166
6 321
1
32 6
6
32 5
5
16 6
6
16 5
5 31
32
32
32 10
10 32 12
1232 10
10 31
F
Building (Ψgω)H ∈ R513×513 from (Ψg
ω1)H and (Ψgω2)H ∈ R384×384.
23 / 40
4*
HDD and Multiscales
HDD with fH ∈ VH ⊂ Vh
Given: h H, fH ∈ VH ⊂ Vh,
mappings Ψfω : RI (ωh) → RI (∂ωh) Φf
ω : RI (ωh) → RI (γωh )
want to build Ψfω : RI (ωH) → RI (∂ωh) Φf
ω : RI (ωH) → RI (γωh ).
Φfω = Φf
ω · Ph←Hω
Hh
A P
.=
B
24 / 40
4*
HDD with truncation of the small scales
Ω
h
H
TH
Th
Tr
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
mean value
(left)Domain decomposition tree TTh ; (right) 2√nhnH dofs.
Application: Multiscale problems (e.g. the skin problem, porousmedium).Use the microscopic model to extract all microscale details andthen compute the macroscale behaviour.
25 / 40
4*
Computational resources for ω ∈ TTh
Lemma 2: Let ω ∈ TTh , n := |I (ω)| and√n be the number of
dofs on the interface. Then the storage costs and computationalcomplexities of Ψg
ω, Ψfω, Φg
ω, Φfω are as shown in Table.
storage Computational complexity
Ψgω O(k
√n log
√n)∗ O(k2√n log2√n)
Ψfω O(kn log n)∗ O(k2n log2 n)
Φgω O(k
√n) 0
Φfω O(kn log n) 0
Lemma 3: Application of Φgω costs O(k2√n). Application of Φf
ω
costs O(k2n log n).
26 / 40
4*
The mean value of the solution in ω
Lemma 4: Let ω, ω1, ω2 ∈ TTh and ω = ω1 ∪ ω2. Letλωi (dωi ) = (λgωi , gωi ) + (λfωi
, fωi ) computes the mean value in ωi ,i = 1, 2. Then
λω(dω) = (λfω, fω) + (λgω, gω)
computes the mean value in ω. Hereλfω : RI (ω) → R, fω ∈ RI (ω),λgω : RI (∂ω) → R, gω ∈ RI (∂ω),λfω = c1λ
fω1
+ c2λfω2
,λgω = c1λ
gω1 + c2λ
gω2 ,
gω is built from gω1 , gω2 and g |γ := Φω(dω).
27 / 40
Numerical results
28 / 40
I. Skin Problem in 2D
Elliptic diffusion problem with highly jumping coefficients.
a b
Lipid layer
α
β
[Khoromskij, Wittum 02]
29 / 40
Dependence of the relative error on α
α 1.0 10−1 10−2 10−3 10−4 10−5
‖u−u‖2‖u‖2
6.6 ∗ 10−9 2.0 ∗ 10−8 6.6 ∗ 10−8 7.4 ∗ 10−7 4.2 ∗ 10−6 7.0 ∗ 10−5
1292 dofs, ε = 10−8, β = 1.0, residual ‖Au− c‖ = 10−10.ε is responsible for the H-matrix approximation accuracy.
30 / 40
Dependence of the absolute and relative errors on ε
ε ‖u−u‖2
‖u‖2‖u− u‖∞ ‖u− u‖A
10−6 4.4 ∗ 10−1 6.67 ∗ 102 1.1 ∗ 103
10−8 7.27 ∗ 10−5 2.3 ∗ 10−1 9.0 ∗ 10−1
10−10 5.1 ∗ 10−7 1.0 ∗ 10−3 3.0 ∗ 10−3
10−12 3.9 ∗ 10−9 1.2 ∗ 10−5 2.9 ∗ 10−5
10−14 1.2 ∗ 10−11 6.6 ∗ 10−7 1.2 ∗ 10−7
10−16 1.6 ∗ 10−12 1.1 ∗ 10−8 1.7 ∗ 10−8
1292 dofs, α = 10−5, residual ‖Au− c‖ = 10−10.
31 / 40
II. Comparison of storage costs for H-Cholesky, HDD andH-matrix inverse (in MB)
ε H− LLT HDD (A−1)H
10−3 13.3 19.7 51.010−4 14.7 20.1 64.010−5 16.0 20.4 75.210−6 17.2 20.6 87.4
1292 dofs.
32 / 40
Computational times
dofs HDD pre,H− LLT ,PCG (A−1)H pre,H− LLT
332 0.19 0.1 0.24 0.11652 0.96 0.6 3.54 0.5
1292 10.6 5 65.8 4.72572 36 53 n.e.m. 505132 218 not enough memory n.e.m. n.e.m.
Computational times for the skin problem with α = 10−5,ε = 10−8, ‖Au− c‖ = 10−8, H
h = 2.
33 / 40
III. Problems with oscillatory coefficients−div(α∇u) = 1 in Ω ⊂ R2,u = 0 on ∂Ω
(4)
where α = 1 + 0.5sin(50x)sin(50y).
global k ‖u40 − uk‖2 / ‖u40‖2 ‖u40 − uk‖∞2 7 7 ∗ 10−2
4 2 ∗ 10−2 1.8 ∗ 10−3
6 5.4 ∗ 10−4 4.5 ∗ 10−5
8 6.6 ∗ 10−5 6.3 ∗ 10−6
10 7.6 ∗ 10−6 9 ∗ 10−7
34 / 40
Dependence on the frequency w
w ‖u40 − uk‖2 / ‖u40‖2 ‖u40 − uk‖∞10 1.65 ∗ 10−4 1.76 ∗ 10−5
50 1.8 ∗ 10−4 1.9 ∗ 10−5
Table : 2572 dofs, f = 1, α(x , y) = 1 + 0.5sin(wx)sin(wy).
35 / 40
IV. Truncation of the scales < H
Memory costs of all Φgω and Φf
ω (in kB). H-matrix rank k = 7.
dofs Φg , H = h Φg , H = 0.125
332 2.45 ∗ 102 2 ∗ 102
652 1.1 ∗ 103 7.9 ∗ 102
1292 5 ∗ 103 2.6 ∗ 103
2572 2.1 ∗ 104 7.4 ∗ 103
dofs Φf , H = h Φf , H = 0.125332 4 ∗ 102 2.8 ∗ 102
652 2.4 ∗ 103 1.8 ∗ 103
1292 1.4 ∗ 104 1.2 ∗ 104
2572 7.86 ∗ 104 6.9 ∗ 104
36 / 40
V. Many right-hand sides
Au(i) = c(i), 1292 dofs, c i , i = 1, ..., imax , PCG method.“Leaves to Root ” ⇒ t1,“Root to Leaves ” ⇒ t2.
imax t1 + t2, sec. tcg , sec.
10 38+2.8 29
100 38+27 117
1000 38+240 1048
The total computational times of HDD and PCGfor imax right-hand sides.
37 / 40
4*
Conclusion
1. HDD computes uh := Bhfh + Chgh or uh := BH fH + Chgh.
2. Bh, BH and Ch have H-matrix format.
3. The complexities are O(k2nh log3 nh) andO(k2√nhnH log3√nhnH).
4. The storages are O(knh log2 nh) and O(k√nhnH log2√nhnH).
5. HDD computes functionals of the solution (mean values∫ω uhdx , ω ⊂ Ω, the solution at a point, the solution in a
small subdomain ω),
38 / 40
4*
Thanks to
Prof. W. Hackbusch (for the whole idea and support)B. N. KhoromskijL. Grasedyck and S. Borm(for detailed explanation of H-matrix technique)all colleagues at Max Planck Institute for Applied Mathematics andSciences.
39 / 40
Thanks for your attention!
40 / 40