victorminden.github.io certify that i have read this dissertation and that, in my opinion, it is...

DATA-SPARSE ALGORITHMS FOR STRUCTURED MATRICES

A DISSERTATION

SUBMITTED TO THE INSTITUTE FOR COMPUTATIONAL

AND MATHEMATICAL ENGINEERING

AND THE COMMITTEE ON GRADUATE STUDIES

OF STANFORD UNIVERSITY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

Victor Lawrence Minden

May 2017

This dissertation is online at: http://purl.stanford.edu/nb571rs5647

2017 by Victor Lawrence Minden. All Rights Reserved.

Re-distributed by Stanford University under license with the author.

ii
http://purl.stanford.edu/nb571rs5647

I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

Lexing Ying, Primary Adviser


Eric Darve


George Papanicolaou

Approved for the Stanford University Committee on Graduate Studies.

Patricia J. Gumport, Vice Provost for Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.

iii

Abstract

In the first part of the dissertation, we present a method for updating certain hier-

archical factorizations for solving linear integral equations with elliptic kernels. In

particular, given a factorization corresponding to some initial geometry or material

parameters, we can locally perturb the geometry or coefficients and update the initial

factorization to reflect this change with asymptotic complexity that is polylogarithmic

in the total number of unknowns and linear in the number of perturbed unknowns.

We apply our method to the recursive skeletonization factorization and hierarchical

interpolative factorization and demonstrate scaling results for a number of different

2D problem setups.

In the second part, we consider the application of hierarchical factorizations to the

problem of spatial Gaussian process maximum likelihood estimation, i.e., parameter

fitting for kriging. We present a framework for scattered (quasi-)two-dimensional

observations using skeletonization factorizations to quickly evaluate the Gaussian

process log-likelihood. To efficiently evaluate the gradient of the log-likelihood we

introduce two approaches, the first based on matrix peeling and the second based on

what we deem selected sparse algebra. This gives a fast, nearly linear time framework

for computing both the log-likelihood and its gradient that can be used in the context

of black-box numerical optimization for parameter fitting of low-dimensional Gaussian

processes.

Finally, we introduce the strong recursive skeletonization factorization (RS-S), a

new approximate matrix factorization based on recursive skeletonization for solving

discretizations of linear integral equations associated with elliptic partial differential

equations in two and three dimensions (and other matrices with similar hierarchical

v

rank structure). Unlike previous skeletonization-based factorizations, RS-S uses a

simple modification of skeletonization, strong skeletonization, which compresses only

far-field interactions. This leads to an approximate factorization in the form of a prod-

uct of many block unit-triangular matrices that may be used as a preconditioner or

moderate-accuracy direct solver, with dramatically reduced rank growth. We further

combine the strong skeletonization procedure with alternating near-field compression

to obtain the hybrid recursive skeletonization factorization (RS-WS), a modification

of RS-S that exhibits reduced storage cost in many settings. Under suitable rank

assumptions both RS-S and RS-WS exhibit linear computational complexity, which

we demonstrate with a number of numerical examples.

vi

Acknowledgments

To begin, I would like to acknowledge the mentorship of my advisor, Lexing Ying.

Earning a doctorate is a long and (occasionally) arduous process, and I feel like I

lucked out in getting to work with a mentor so supportive. During weekly meetings

in his office he has offered guidance on everything from big-picture items like overall

career direction, research focus, and professional presentation skills to small-scale

aspects such as manuscript aesthetics, notation, and profiling and debugging code

(among many other things). I dont like to be too effusive, so Ill be brief: thanks. I

have learned a lot from you.

After my advisor, Id like to thank my co-authors on much of my work in graduate

school, Ken L. Ho and Anil Damle. Ive had many late-night email chains and hours

of whiteboard-based discussion with them, and couldnt ask for better colleagues. A

well-deserved shout-out as well to others I have worked with on small projects during

my graduate career: Phil Colella, Boris Lo, and David Donoho.

I would be remiss if I didnt mention the rest of my commmittee: Eric Darve,

George Papanicolaou, Michael Saunders, and Sanjiva Lele. Eric, George, and Michael,

you shaped my first year at ICME, and Im thankful for the knowledge youve im-

parted and the opportunities youve given me. Sanjiva, I never had a course with you

(which makes you the wildcard of this committee), but thank you for stepping in

so readily.

During my time in graduate school, I was supported both by Stanford via a

Stanford Graduate Fellowship in Science & Engineering and by the Department of

Energy through the Computational Science Graduate Fellowship (CSGF) program

(grant number DE-FG02-97ER25308). I certainly appreciate the financial support,

vii

but I would also like to thank my CSGF mentors and the CSGF community (and

Krell Institute) in general, which taught me a lot about not only the technical details

of computational science, but also how to present myself and my work, the interplay of

science and civics, and the broader role of computational science in society. Thanks

to all, and particularly to Jay Bardhan, Ashlee Ford Versypt, Oliver Fringer, Jeff

Hammond, Mary Ann Leung, Dan Martin, Lindsey Eilts, David Keyes, Matt Reuter,

and Jim Corones.

All the computational results I obtained during my graduate program were run

on ridiculously large computing resources (at least for the time) here at Stanford.

Thanks to Lenya Ryzhik for offering time on wave4, Brian Tempero for maintaining

all the ICME-specific resources (icme-share, icme-sharedmem, icme-gpu), and the

folks at the Stanford Research Computing Center for keeping up the university-wide

machines (sherlock, rice, corn, barley).

While I was putting together this dissertation, many folks around ICME offered

up chunks of their time for proof-reading and let me bounce my ideas off of them.

In particular, I would like to thank Austin Benson, Nolan Skochdopole, Brad Nelson,

Ron Estrin, Yingzhou Li, and Xiaotong Suo. Without their keen eyes, this dissertation

would have many more typographical errors than it inevitably already does.

Now we turn to personal acknowledgments. I am grateful for the support of my

family and friends outside of Stanford: my parents, brother, sisters, brother-in-law,

girlfriend, and high school and college friends here around San Francisco. Thanks

for helping keep me sane and making sure that my life has at least some non-math-

related components. To my parents, in particular: thank you. I wouldnt be here

without you.

I wouldnt have made it as far as graduate school without the guidance of many of

my professors back at Tufts, specifically my undergraduate advisors Scott MacLachlan

and Doug Preis and my math and EE professors Misha Kilmer, Ron Lasser, Usman

Khan, Tom Vandervelde, and Eric Miller. I also owe a special thanks for my first-

year undergraduate advisor Loring Tu, without whom I likely would have studied

international relations or psychology or something else not-math.

Most of my days at Stanford were spent in or around ICME. Thank you to all

viii

those responsible for keeping it going day after day: Emily, Antoinette, Claudine,

Judy, Karen, and especially Indira and Margot. With ICME in mind, I feel espe-

cially thankful for all the friends here who have made my years here fun, educational,

and never boring. A shout-out to everyone, but especially Austin, Anil, Sven, Ryan,

Yingzhou, Zhiyu, Rikel, Xiaotong, Anjan, Ron, Nolan, Lan, Casey, Arun, Neel, Car-

son, Laura, Eileen, Brad, Cindy, Nurbek, Ruoxi, Han, Dave, Gil, Evan, Mike, Chris,

Konstantin, Dangna, Kari, Milinda, Fei, Fai, Yuekai, Jason, and Tania.

ix

Contents

Abstract v

Acknowledgments vii

1 Introduction 1

1.1 Solving rank-structured linear systems . . . . . . . . . . . . . . . . . 2

1.2 Contributions of this thesis . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.1 Updating skeletonization factorizations . . . . . . . . . . . . . 5

1.2.2 Maximum likelihood estimation for Gaussian processes . . . . 6

1.2.3 Strong-admissibility-based skeletonization . . . . . . . . . . . 7

2 Background material 9

2.1 Hierarchical decomposition of space . . . . . . . . . . . . . . . . . . . 10

2.2 Block-structured elimination . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 The interpolative decomposition . . . . . . . . . . . . . . . . . . . . . 13

3 Review of some skeletonization-based factorizations 15

3.1 Skeletonization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1.1 Group skeletonization . . . . . . . . . . . . . . . . . . . . . . 18

3.2 The recursive skeletonization factorization . . . . . . . . . . . . . . . 19

3.2.1 Level L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2.2 Level L 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2.3 Higher levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.4 The use of a proxy surface . . . . . . . . . . . . . . . . . . . . 23

xi

3.2.5 Complexity of RS using the proxy trick . . . . . . . . . . . . . 25

3.3 The hierarchical interpolative factorization . . . . . . . . . . . . . . . 27

3.3.1 Level L 1{2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.3.2 Higher levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3.3 Complexity of HIF using the proxy trick . . . . . . . . . . . . 29

4 Updating skeletonization-based factorizations 33

4.1 Approaches to updating . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.1.1 Our contribution . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.2 Updating algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.2.1 Initial observation on updating a skeletonization . . . . . . . . 37

4.2.2 Propagation rules for higher levels . . . . . . . . . . . . . . . . 39

4.2.3 Updating a group skeletonization . . . . . . . . . . . . . . . . 40

4.2.4 Updating RS . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.2.5 Complexity of updating RS . . . . . . . . . . . . . . . . . . . 44

4.2.6 Modifications for updating HIF . . . . . . . . . . . . . . . . . 46

4.2.7 Complexity of updating HIF . . . . . . . . . . . . . . . . . . . 47

4.3 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.3.1 Example 1: Laplace double-layer potential on a circle with a

bump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.3.2 Example 2: the Lippmann-Schwinger equation . . . . . . . . . 50

4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5 Gaussian process MLE through skeletonization 57

5.1 Our contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.1.1 Alternative approaches . . . . . . . . . . . . . . . . . . . . . . 60

5.2 Factorization of the covariance matrix . . . . . . . . . . . . . . . . . . 61

5.2.1 Modified proxy trick . . . . . . . . . . . . . . . . . . . . . . . 62

5.2.2 Operations for MLE using RS . . . . . . . . . . . . . . . . . . 63

5.3 Computing the trace terms with peeling . . . . . . . . . . . . . . . . 66

5.3.1 Randomized low-rank approximations . . . . . . . . . . . . . . 67

5.3.2 Matrix peeling for weakly-admissible matrices . . . . . . . . . 68

xii

5.3.3 Computational complexity . . . . . . . . . . . . . . . . . . . . 74

5.4 Summary of MLE framework . . . . . . . . . . . . . . . . . . . . . . 76


5.5.1 Runtime scaling of the peeling algorithm . . . . . . . . . . . . 77

5.5.2 Relative efficiency of peeling versus the Hutchinson estimator . 80

5.5.3 Synthetic data example, Matern kernel . . . . . . . . . . . . . 82

5.5.4 Ocean data example, Matern kernel . . . . . . . . . . . . . . . 83

5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6 Selected sparse algebra and the product trace for faster Gaussian

process MLE 89

6.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6.1.1 Our contribution . . . . . . . . . . . . . . . . . . . . . . . . . 90

6.1.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.2 Selected sparse algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6.2.1 Fast application of skeletonization factorizations . . . . . . . . 92

6.2.2 Selected sparse algebra with RS . . . . . . . . . . . . . . . . . 94

6.2.3 Modifications for HIF . . . . . . . . . . . . . . . . . . . . . . . 103

6.3 Computing the product trace . . . . . . . . . . . . . . . . . . . . . . 107

6.3.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.3.2 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109


6.4.1 Selected sparse algebra: diagonal extraction . . . . . . . . . . 111

6.4.2 The full product trace . . . . . . . . . . . . . . . . . . . . . . 115

6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

7 The strong recursive skeletonization factorization 123

7.1 Our contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

7.2 Strong skeletonization . . . . . . . . . . . . . . . . . . . . . . . . . . 125

7.2.1 The use of a proxy surface in the strong case . . . . . . . . . . 128

7.3 Algorithm and complexity . . . . . . . . . . . . . . . . . . . . . . . . 130

7.3.1 The general case: first level . . . . . . . . . . . . . . . . . . . 133

xiii

7.3.2 The general case: subsequent levels . . . . . . . . . . . . . . . 138

7.3.3 The final factorization . . . . . . . . . . . . . . . . . . . . . . 140

7.3.4 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

7.3.5 Extension: hybrid skeletonization . . . . . . . . . . . . . . . . 144


7.4.1 Example 1: unit square in 2D . . . . . . . . . . . . . . . . . . 147

7.4.2 Example 2: unit cube in 3D . . . . . . . . . . . . . . . . . . . 151

7.4.3 Example 3: unit sphere in 3D . . . . . . . . . . . . . . . . . . 153

7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

xiv

List of Tables

4.1 Timing results for quasi-1D factorization updating . . . . . . . . . . . 51

4.2 Timing results for 2D factorization updating . . . . . . . . . . . . . . 54

5.1 Complexity of operations with RS in 2D . . . . . . . . . . . . . . . . 65

5.2 Runtime and storage complexity of peeling algorithm . . . . . . . . . 76

5.3 Runtime of peeling algorithm on squared-exponential kernel . . . . . 79

5.4 Runtime of peeling algorithm on Matern kernel . . . . . . . . . . . . 80

5.5 Runtime of one iteration for Gaussian process MLE with gridded data 83

5.6 Runtime of one iteration for Gaussian process MLE with scattered data 85

6.1 Runtime of selected sparse algebra with RS . . . . . . . . . . . . . . . 112

6.2 Runtime of selected sparse algebra with HIF . . . . . . . . . . . . . . 114

6.3 Error of selected sparse algebra with HIF . . . . . . . . . . . . . . . . 114

6.4 Runtime of product trace for 2D problem . . . . . . . . . . . . . . . . 116

6.5 Error of product trace for 2D problem . . . . . . . . . . . . . . . . . 117

6.6 Runtime of product trace for quasi-2D problem . . . . . . . . . . . . 119

6.7 Error of product trace for quasi-2D problem . . . . . . . . . . . . . . 120

7.1 Timing and memory results for RS-S with 2D grid . . . . . . . . . . . 150

7.2 Accuracy results for RS-S with 2D grid . . . . . . . . . . . . . . . . . 150

7.3 Timing and memory results for RS-S with 3D grid . . . . . . . . . . . 154

7.4 Accuracy results for RS-S with 3D grid . . . . . . . . . . . . . . . . . 154

7.5 Timing and memory results for RS-S with quasi-2D sphere . . . . . . 157

7.6 Accuracy results for RS-S with quasi-2D sphere . . . . . . . . . . . . 157

xv

List of Figures

1.1 Strong admissibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 Perfect hierarchical partitioning of space . . . . . . . . . . . . . . . . 10

2.2 Nonuniform hierarchical partitioning of space . . . . . . . . . . . . . 11

3.1 Active DOFs for each level of RS . . . . . . . . . . . . . . . . . . . . 23

3.2 The proxy surface (left) and Voronoi tessellation for HIF (right) . . . 26

3.3 Active DOFs for each level of HIF . . . . . . . . . . . . . . . . . . . . 30

4.1 A localized geometric perturbation . . . . . . . . . . . . . . . . . . . 34

4.2 Propagation of the marked set after a local perturbation . . . . . . . 40

4.3 Quasi-1D and 2D domains for updating examples . . . . . . . . . . . 49

4.4 Timing plot for quasi-1D factorization updating . . . . . . . . . . . . 51

4.5 Timing results for 2D factorization updating . . . . . . . . . . . . . . 53

5.1 Modified proxy trick for covariance kernels . . . . . . . . . . . . . . . 64

5.2 Labeling of the quadtree subdomains for peeling . . . . . . . . . . . . 68

5.3 Realizations of Gaussian processes with two different parameters . . . 79

5.4 Runtime plots for peeling algorithm versus Hutchinson trace estimator 81

5.5 Relative error plots for peeling versus Hutchinson tracec estimator . . 82

5.6 Timing plots for one iteration of Gaussian process MLE . . . . . . . . 84

5.7 Visualization of sea surface temperature kriging . . . . . . . . . . . . 87

6.1 Tree traversals for selected sparse algebra . . . . . . . . . . . . . . . . 102

6.2 Generalized ancestors for selected sparse algebra with HIF . . . . . . 106

xvii

6.3 Results for diagonal extraction with HIF . . . . . . . . . . . . . . . . 115

6.4 Runtime plots for product trace computation for 2D problem . . . . . 118

6.5 Plots of the 3D surface for the product trace example . . . . . . . . . 119

6.6 Runtime plots for product trace computation for quasi-2D problem . 120

7.1 Near-field and far-field DOFs (left) and proxy surface (right) . . . . . 125

7.2 Illustration of RS-S in 1D, first level . . . . . . . . . . . . . . . . . . . 132

7.3 Illustration of RS-S in 1D, second level . . . . . . . . . . . . . . . . . 134

7.4 A situation illustrating why modified interactions must be compressed 136

7.5 Illustration of RS-S in 2D, first level . . . . . . . . . . . . . . . . . . . 139

7.6 Illustration of RS-S in 2D, second level . . . . . . . . . . . . . . . . . 140

7.7 Timing and memory plots for RS-S with 2D grid . . . . . . . . . . . . 149

7.8 Timing and memory plots for RS-S with 3D grid . . . . . . . . . . . . 152

7.9 Timing and memory plots for RS-S with quasi-2D sphere . . . . . . . 156

7.10 Domain coloring for parallelization of RS-S . . . . . . . . . . . . . . . 159

xviii

List of Algorithms

1 The recursive skeletonization factorization (RS) . . . . . . . . . . . . 24

2 The hierarchical interpolative factorization (HIF) . . . . . . . . . . . 30

3 RS in modified form . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4 Updating RS after a local perturbation . . . . . . . . . . . . . . . . . 42

5 Computing the Gaussian process log-likelihood and gradient . . . . . 77

6 Applying F for RS to M P Cnm . . . . . . . . . . . . . . . . . . . . . 957 Applying F1{2 for RS to M P Cnm . . . . . . . . . . . . . . . . . . . . 968 Applying

`

F1{2

for RS to M P Cnm . . . . . . . . . . . . . . . . . . 979 Applying F for RS to M P Cnm with input restricted to j P LL and

output restricted to k P LL . . . . . . . . . . . . . . . . . . . . . . . 10410 Computing T TrpA1Bq using FA and FB . . . . . . . . . . . . . . . 10911 The strong recursive skeletonization factorization (RS-S) . . . . . . . 142

12 The hybrid recursive skeletonization factorization (RS-WS) . . . . . . 146

xix

Chapter 1

Introduction

The focus of this dissertation is the development of fast computational algorithms

for working with kernel matrices exhibiting various forms of hierarchical block low-

rank structure such as arise in a number of settings in the physical sciences and

statistics. A standard application-rich example is integral equations coming from

elliptic partial differential equations. We discuss this below, deferring discussion of

statistical examples to later in this thesis.

In particular, consider the general-form integral equation

apxqupxq ` bpxq

Kpx yqcpyqupyq dy fpxq, x P Rd(1.1)

in dimension d 2 or 3, where the kernel function Kpzq is associated with someunderlying elliptic partial differential equation (PDE), i.e., it is the Greens function

or its derivative. Here, apxq, bpxq, and cpyq are given functions that typically representmaterial parameters, fpxq is some known right-hand side, and upxq is the unknownfunction to be determined. We make the additional stipulation that the kernel Kpzqshould not exhibit significant oscillation away from the origin, though this is not

strictly necessary to apply the algorithms outlined in this thesis. In this setting,

(1.1) remains rather general and includes problems such as the Laplace equation, the

1

2 CHAPTER 1. INTRODUCTION

Lippmann-Schwinger equation, and the Helmholtz equation in the low- to moderate-

frequency regime. Further, while we concentrate on the case where upxq is scalar-valued, extension to the vector-valued case (e.g., the Stokes or elasticity equations)

is straightforward.

Discretization of (1.1) using typical approaches such as collocation, the Nystrom

method, or the Galerkin method leads to a linear system with N degrees of freedom

(DOFs)

Ku f,(1.2)

where the entries of the matrix K P CNN are dictated by the kernel Kpzq and thediscretization scheme. For example, in the case where our domain is the unit square

r0, 1s2 a simple Nystrom approximation to the integral using a regular grid with?N points in each direction yields the discrete system

rapxiq ` wisui `bpxiqN

ijKpxi xjqcpxjquj fpxiq, i 1, . . . , N,(1.3)

where the discrete solution tuiu tupxiqu approximates the continuous solution onthe grid and each term wiui corresponds to some discretization of diagonal entries of

K. Because Kpzq is frequently singular at the origin, this discretization may be moreinvolved than that of the off-diagonal entries. While more complicated and higher-

order discretization schemes exist, (1.3) illustrates the key feature that off-diagonal

entries of K are given essentially by kernel interactions between distinct points in

space.

1.1 Solving rank-structured linear systems

Because K in (1.2) is dense and generally large in practice, traditional direct fac-

torizations of K such as the LU factorization are typically too expensive due to the

associated OpN3q time complexity and OpN2q storage cost.

1.1. SOLVING RANK-STRUCTURED LINEAR SYSTEMS 3

D1 DD

D D

D

Figure 1.1: Given two boxes in R2 each with sidelength D and with correspondingDOF sets B1 and B2, in the strong admissibility setting the associated off-diagonalblocks KB1B2 and KB2B1 are assumed to be numerically low rank as long as the boxesare separated by a distance of at least D. In contrast, in the weak admissibilitysetting the boxes need only be non-overlapping.

Given the availability of fast schemes for applying K such as fast multipole meth-

ods (FMMs) [24, 30, 32, 84], iterative methods such as the conjugate gradient method

(CG) [40] form a tempting alternative to direct methods. For first-kind integral equa-

tions or problems where apxq, bpxq, or cpxq exhibit high contrast, however, convergenceis typically slow leading to a lack of robustness. In other words, while each iteration

is relatively fast, the number of iterations necessary to attain reasonable accuracies

can be unreasonably large.

The above considerations have led to the development of a plethora of alternative

methods for solving (1.2) approximately by exploiting properties of the kernel Kpzqand the underlying physical structure of the problem. In particular, such methods

take advantage of the fact that K exhibits hierarchical block low-rank structure.

A large body of work pioneered by Hackbusch and collaborators on the algebra

of H-matrices (and H2-matrices) provides an important and principled theoreticalframework for obtaining linear or quasilinear complexity when working with matrices

exhibiting such structure [34, 36, 37]. Inside the asymptotic scaling of this approach,

however, lurk large constant factors that hamper practical performance, particularly

in the 3D case.

The H-matrix literature classifies matrices with hierarchical block low-rank struc-ture into two categories based on which off-diagonal blocks of the matrix are com-

pressed. Given a quadtree or octree data structure partitioning the domain into small


boxes1, let B1 and B2 be sets of DOFs corresponding to distinct boxes at the samelevel of the tree each with sidelength D. For strongly-admissible hierarchical matri-

ces, the off-diagonal block KB1B2 is compressed only if B1 and B2 are well-separatedas in the FMM that is, if B1 and B2 are separated by a distance of at least D asin Figure 1.1. In contrast, weakly-admissible hierarchical matrices compress not only

well-separated interactions but also interactions corresponding to DOFs in adjacent

boxes. The inclusion of nearby interactions under weak admissibility typically in-

creases the required approximation rank, but it also affords a much simpler geometric

and algorithmic structure.

A number of more recent methods have been developed for hierarchically rank-

structured matrices with the aim of more efficient practical performance based on

weakly-admissible rank structure. Examples include algorithms for hierarchical semi-

separable (HSS) matrices [13, 14, 82], hierarchical off-diagonal low-rank (HODLR)

matrices [1, 52], and methods based on recursive skeletonization [27, 41, 53], among

other related schemes [11, 15]. In general, methods based strictly on weak admis-

sibility require allowing ranks of off-diagonal blocks to grow non-negligibly with N

to attain a fixed target accuracy. This has led to the development of more involved

methods such as the hierarchical interpolative factorization of Ho & Ying [43] and

the method of Corona et al. [17], which combine skeletonization with additional com-

pression steps based on geometric considerations to obtain greater efficiency at the

cost of a more complicated algorithm.

The skeletonization process that forms the core of the skeletonization-based

methods used and developed in this thesis was introduced by Martinsson & Rokhlin

[53] and Cheng et al. [16] based on observations by Starr & Rokhlin [67] and Greengard

& Rokhlin [31]. At their core, the methods attain fast compression using a so-called

interpolative decomposition to perform low-rank compression of off-diagonal blocks

of K without modifying many other entries. This important feature means that such

algorithms preserve interpretability of rows and columns of the partially-compressed

matrix, leading to intuitive and efficient algorithms. In this context, we refer to such

factorizations as data-sparse.

1We review this in section 2.1

1.2. CONTRIBUTIONS OF THIS THESIS 5

1.2 Contributions of this thesis

To begin, we spend a small amount of time in Chapter 2 reviewing some very basic

material underlying hierarchical factorizations, particularly some terminology con-

cerning hierarchical tree-based decompositions of space and block linear algebra. We

additionally review the notion of an interpolative decomposition, the compression

technique of choice for our algorithms.

We then delve into some of the more complicated material on which the work

of this thesis is based in Chapter 3. First, we review the recursive skeletonization

factorization, which serves as the prototypical example of a skeletonization-based fac-

torization. This serves to introduce the general idea of the method of skeletonization,

as well as how to apply skeletonization in a level-by-level fashion to obtain a factor-

ization for discretized integral equations. Following this, we review the hierarchical

interpolative factorization. Based on similar ideas, this modification of the recursive

skeletonization factorization introduces additional geometric information beyond the

tree-based decomposition to obtain better asymptotic complexity for problems with

higher intrinsic dimension.

With the background material and notation established, we then dive into the core

contributions of the thesis, which are broken into four different chapters and three

subject areas. These chapters, which we preview below, are based on joint work with

Ken L. Ho, Anil Damle, and Lexing Ying. In all cases, the corresponding papers were

the primary work of the dissertation author, including design and analysis of methods,

coding and numerical simulation, and manuscript writing and editing [5759].

1.2.1 Updating skeletonization factorizations

The primary focus in the existing literature for hierarchical matrix factorizations has

been on the speed with which these factorizations can be constructed and subse-

quently used to solve linear systems, either via preconditioning an iterative method

(at low accuracies) or directly (at high accuracies). The structure of these factoriza-

tions, however, admits a number of other efficient operations that can be useful in

practical settings.


We exploit the same hierarchical structure that makes these factorizations useful

to explore the idea of updating a factorization in response to a local perturbation. By

this we mean, given a factorization F that corresponds to, for example, a discretization

of an integral equation on some geometric domain, how do we construct a modified

factorization qF corresponding to the same integral equation on a slightly modified

geometry? We can of course simply construct a brand-new factorization using the

standard algorithm, but as we observe in Chapter 4 there are many cases where we

can update this factorization in an asymptotically (and practically) more efficient way

[58].

We describe our approach in the context of sequences of modifications to two-

dimensional integral equations, such as might be arise, for example, in the context of a

design problem where the optimization variables parameterize a portion of the domain

and the objective function involves quantities governed by, for example, diffusion or

low-frequency scattering. However, the methods we describe are also highly applicable

to the case of kernelized covariance matrices, and allow fast updated factorizations in

response to, for example, the addition or removal of observations.

1.2.2 Maximum likelihood estimation for Gaussian processes

Based on the observation that skeletonization-based factorizations give fast ways to

compute quantities such as the log-determinant, we next turn to an application of

such factorizations: parameterized Gaussian processes. Low-dimensional Gaussian

processes have wide applicability in the geosciences in the context of kriging, where

they are used predict the value of some spatially-indexed quantity of interest given

observations at other locations [69]. Many covariance kernels of interest are struc-

turally similar to the Greens functions of elliptic PDEs in the sense that, empirically,

they exhibit similar hierarchical rank structure and admit the use of hierarchical

factorizations.

In Chapter 5, we consider the task of maximum likelihood estimation for parame-

terized Gaussian processes: given some observations of a spatial field that are assumed

to be modeled by a Gaussian process whose covariance kernel is specified up to some

1.2. CONTRIBUTIONS OF THIS THESIS 7

parameter, find the value of that parameter that maximizes the Gaussian process

log-likelihood. This can be a computationally-demanding task due to the fact that

the log-determinant of the covariance matrix (and the derivative of said quantity)

both play a role, but we demonstrate that skeletonization-based factorizations can be

an efficient method by outlining a general framework based on these ideas [57].

In follow-up work, we explore more deeply in Chapter 6 some of the intricacies

of the structure of skeletonization-based factorizations, with an eye at obtaining an

optimally efficient method for computing the gradient of the log-likelihood to high

accuracy. To accomplish this, we introduce the idea of selected sparse algebra for

skeletonization-based factorizations, which, like updating, uses the hierarchical struc-

ture of skeletonization in a nuanced way to perform computations that are inherently

data-sparse. For example, we consider the task of extracting the diagonal of F1 when

F is a skeletonization-based factorization. Based on these techniques, we outline a

fast method for computing the product trace Trp1iq appearing in the gradient ofthe Gaussian process log-likelihood, which substantially improves upon the method

of Chapter 6.

1.2.3 Strong-admissibility-based skeletonization

Finally, in Chapter 7 we build upon the recursive skeletonization factorization [43,

53] to construct a new skeletonization-based factorization using strong admissibility

[59]. Succinctly, previous factorizations based on skeletonization are based on the

idea of compressing blocks of a matrix corresponding to kernel interactions between

distinct convex subdomains: the case of weak admissibility. In contrast, we use a

modified skeletonization process to restrict compression only to blocks of a matrix

corresponding to kernel interactions between well-separated subdomains, i.e., strong

admissibility.

While previous skeletonization-based algorithms have used iterative dimensional-

ity reduction (i.e., more complicated algorithms) to get at the asymptotic efficiency

of algorithms based on strong-admissibliity while maintaining the practical efficiency

of skeletonization, our strong-admissibility-baseed skeletonization factorization is the


first to treat this case directly with skeletonization, leading to a simpler algorithm

that can be thought of as a linear algebraic inverse to the fast multipole method [59].

We find our algorithm to be relatively simple and easy to understand and implement

compared to competing approaches, with competitive performance.

Chapter 2

Background material

Because of the different application and algorithmic particulars discussed in different

chapters, we will occasionally use different notation to express the same quantities.

This is for the sake of clarity within each chapter. We attempt to mention explicitly

any overloaded notation at the point it arises. In general however, we follow the

following notational conventions as closesly as possible.

For a positive integer N , the index set t1, 2, . . . , Nu is denoted by rN s. We writematrices or matrix-valued functions in the sans serif font (e.g., A P CNN) but makeno such distinction for vectors (e.g., x P CN). Given a vector or matrix, the norms}x} or }A} refer to the standard Euclidean vector norm and corresponding inducedmatrix norm, respectively. The math-calligraphic font is used to indicate index sets

(e.g., I ti1, i2, . . . , iru with each ij a positive integer) that we use to index blocksof a matrix (e.g., AIJ ApI,J q P C|I||J |, using MATLAB R notation). Therefore,each index set has an implicit ordering, though we use the term set as opposed

to vector to avoid conflation. Because we are working with matrices discretizing

integral equations, indices in an index set are typically associated with points in Rd

(e.g., Nystrom or collocation points or centroids of elements). As such, we will use the

more general term DOF sets to refer to both the index set B and the correspondingpoints txiuiPB in Rd. This leads to one notation that will appear strange at first blush:if I is an index set and i Rd is a subdomain, then I X i is well-defined whenunderstood in this sense. Finally, to denote ordered sets that are not associated with

9

10 CHAPTER 2. BACKGROUND MATERIAL

Figure 2.1: In a perfect quadtree, each level ` has 4` subdomains. Here we visualize` 1 (left) and ` 2 (right).

points in the domain nor used to index matrices we use the math-script font (e.g.,

L ).

2.1 Hierarchical decomposition of space

A key data structure for factorizations based on exploiting spatial structure is a tree-

based hierarchical decomposition of space, which hierarchically partitions the domain

. Here, we briefly review the use of a quadtree (octree in 3D) and the corresponding

terminology, but direct the reader to Samet [65] for a more thorough description.

Suppose for simplicity that the domain is a rectangle in R2 and define the rootof the quadtree as itself with corresponding index set rN s denoting the fact thattxjujPrNs . At level ` 1, we partition uniformly into four child subdomains,and at level ` 2 we recursively further partition each of these subdomains into fourmore subdomains. This leads to the decomposition shown in Figure 2.1. For each

subdomain i, the corresponding index set Ii rN s is the largest possible such that

txjujPIi i.

It is evident that we may continue to subdivide the domain in this fashion an

arbitrary number of times, such that at level ` we have 4` subdomains. Subdividing

uniformly such that the tree has levels ` 0, 1, . . . , L leads to a perfect quadtreewith 4L subdomains at the bottom level. The three-dimensional case of an octree

is analogous, except that the domain is divided into rectangular prisms instead of

2.1. HIERARCHICAL DECOMPOSITION OF SPACE 11

Figure 2.2: Suppose that our integral equation is posed on the boundary of the blackregion (left). Using a regular discretization of the boundary and building an adaptivequadtree on the resulting DOFs, we can visualize the spatial hierarchy (right), whereeach square leaf subdomain i is colored according to its level, `.

rectangles, leading to 8` subdomains at level `.

In the entirety of this work, we will visualize examples in terms of a perfect

quadtree, which is what our spatial hierarchy will look like if our points txjujPrNs forma regular grid in space. However, most problems are not discretized on uniform grids,

and the algorithms we describe do not rely on this assumption. Instead, an adaptive

quadtree (or octree) is constructed, such that subdomains that have sufficiently few

points are not subdivided. Concretely, given an occupancy parameter nocc, we only

subdivide subdomains i such that |Ii| nocc, which leads to a partitioning such asis visualized in Figure 2.2.

We define the following notions on a hierarchical tree-based partitioning of space.

For any given subdomain i at level `, the children of i are all subdomains jat level `` 1 such that j i. We write the collection of all children of i aschildpiq. We call i the parent of j and write i parentpjq.

If childpiq H then we say i is a leaf subdomain.

We say that i and j are siblings if parentpiq parentpjq.

We say that j at level `1 is a neighbor of i at level ` if (i) the subdomain j isadjacent to i and (ii) either `

1 ` or `1 ` and childpjq H. We write the


collection of all neighbors of i as nborpiq. Note that this is not a symmetricrelation, i.e., it is possible that j P nborpiq and i R nborpjq due to thedependence on ` and `1.

The collection ancestors of a subdomain i is defined recursively as the set ofall j such that either j parentpiq or j parentpkq for some k thatis an ancestor of i. We write the collection of ancestors as ancpiq.

The descendents of a subdomain i are defined to be all nodes j such thati P ancpjq. We write the collection of descendants as descpiq.

In our algorithms, we typically require a fixed but arbitrary bottom-up level-by-

level traversal of the tree and order the subdomains accordingly such that a subdomain

at level L is ordered before any subdomain at level L1 and so on. This total orderingof the subdomains induces corresponding orderings on the boxes within each level of

the tree, L` for ` 1, . . . , L. For example, in the case of a regular tree with 2d`

subdomains at level ` we obtain the orderings

LL

1, 2, . . . , 2dL(

,

LL1

2dL ` 1, 2dL ` 2, . . . , 2dL ` 2dpL1q(

,

and so on.

2.2 Block-structured elimination

We begin with a brief review of block-structured elimination and its efficiency, which

is central to the skeletonization algorithm.

Let A P CNN be an N N matrix and suppose rN s I Y J YK is a partitionof the index set of A such that both AIK 0 and AKI 0, i.e., we have the block

2.3. THE INTERPOLATIVE DECOMPOSITION 13

structure

A

AII AIJ

AJI AJJ AJK

AKJ AKK

fi

ffi

ffi

ffi

fl

,

up to permutation. Assuming that the block AII is invertible, the DOFs I can bedecoupled as follows. First, define the matrices L and U as

L

I

AJIA1II I

I

fi

ffi

ffi

ffi

fl

, U

I A1IIAIJI

I

fi

ffi

ffi

ffi

fl

.(2.1)

with the same block partitioning as A. Then, applying these operators on the left

and right of A yields

LAU

AII

SJJ AJK

AKJ AKK

fi

ffi

ffi

ffi

fl

,(2.2)

where SJJ AJJ AJIA1IIAIJ is the only nonzero block of the resulting matrixthat has been modified.

We say that SJJ is related to AJJ through a Schur complement update. Note

that, while we choose here to write block-elimination in its simplest form, in practice

it can be numerically advantageous to work with a factorization of AII as is done by

Ho & Ying [43, Lemma 2.1] as opposed to inverting the submatrix directly. Either

way, the cost of computing SJJ is Op|I|3 ` |I| |J |2q.

2.3 The interpolative decomposition

Another key linear algebra tool of which we will make heavy use is the interpolative

decomposition [16].


Definition 2.1. Given both a matrix AIJ P C|I||J | with rows indexed by I andcolumns indexed by J and a tolerance 0, an -accurate interpolative decom-position (ID) of AIJ is a partitioning of J into DOF sets associated with so-calledskeleton columns S J and redundant columns R J zS and a correspondinginterpolation matrix T such that

}AIR AIST} }AIJ } ,

or equivalently, assuming AIJ r AIR AIS s,

AIJ AISr T I s

}AIJ } .

In other words, the redundant columns are approximated as a linear combination of

the skeleton columns to within the prescribed relative accuracy, leading to a low-rank

factorization of AIJ .

Note that, while the ID error bound can be attained trivially by taking S J ,it is desirable to keep |S| as small as possible. The typical algorithm to compute anID uses a strong rank-revealing QR factorization as detailed by Gu & Eisenstat [33],

though in practice a standard greedy column-pivoted QR tends to be sufficient. In

either case, the computational complexity is Op|I| |J |2q.In the case where the entries of AIJ are given in terms of Kpxixjq for xi P I and

xj P J (with additional factors depending only on xi and xj alone), the ID of AIJhas the nice property that it preserves this data-sparse representation. In particular,

by forcing the interpolation basis to be a subset of columns S J , we maintain theproperty that AIS is given in terms of kernel interactions.

In what follows, we will occasionally use the notation

rS, R, TJ s idpA,J , q(2.3)

to denote a function that returns the relevant pieces of an -accurate ID of a matrix

A.

Chapter 3

Review of some

skeletonization-based factorizations

With the previous definitions in tow we now review the key compression technique

used in this thesis: skeletonization. In particular, what we descibe here was introduced

by Ho & Ying [43] to cast the (additive) skeletonization process of Martinsson &

Rokhlin [53] as a multiplicative factorization.

3.1 Skeletonization

Consider a matrix A P CNN indexed by points in our domain, and let B be a setof DOFs corresponding to a leaf box at the finest level of the spatial decomposition.

With Bc as the relative complement of B in rN s, we have rN s BYBc by definition,so under an appropriate permutation P we can write A in block form as

PAP

ABB ABBc

ABcB ABcBc

fi

fl .(3.1)

We proceed by assuming that ABBc and ABcB are numerically low-rank to some spec-

ified tolerance . We partition B into its redundant and skeleton DOFs B R Y S

15

16 CHAPTER 3. SKELETONIZATION REVIEW

according to the ID

ABcB

ABBc

fi

fl

ABcR ABcS

ARBc ASBc

fi

fl

ABcS

ASBc

fi

fl r T I s,

where we have assumed that the redundant DOFs R are ordered first within B forpedagogical purposes such that no further permutation is necessary. Inserting this

ID into (3.1) by explicitly partitioning B into RY S, we obtain

PAP

ARR ARS TASBc

ASR ASS ASBc

ABcST ABcS ABcBc

fi

ffi

ffi

ffi

fl

.

Defining the elimination matrices

UT

I T

I

I

fi

ffi

ffi

ffi

fl

, LT

I

T I

I

fi

ffi

ffi

ffi

fl

(3.2)

and multiplying on the left and right as appropriate, we obtain

UT

ARR ARS TASBc

ASR ASS ASBc

ABcST ABcS ABcBc

fi

ffi

ffi

ffi

fl

LT

XRR XRS

XSR ASS ASBc

ABcS ABcBc

fi

ffi

ffi

ffi

fl

,

where

XRS ARS TASSXSR ASR ASST

XRR ARR TASR ARST` TASST

are modified nonzero blocks arising due to the block elimination.

3.1. SKELETONIZATION 17

Following section 2.2 with I R, J S, and K Bc, we use XRR as a pivotblock to eliminate the other blocks in the first row and column to obtain

LUTPAPLTU

XRR

XSS ASBc

ABcS ABcBc

fi

ffi

ffi

ffi

fl

rZ pA;Bq ,(3.3)

with appropriate definition of L and U as in (2.1). At this point, the redundant DOFs

are completely decoupled from the rest of the problem, and the last block row and

block column have not been otherwise modified.

We refer to the process of forming rZ pA;Bq as skeletonization of A with respect tothe DOFs B. Notationally, we define the left and right skeletonization operators rVand rW as

rV PU1T L1, rW U1L1T P

.(3.4)

Using this shorthand, we have

rZ pA;Bq rV1A rW1.(3.5)

Clearly the matrices rV and rW are highly structured since they are each the product

of block unit-triangular matrices with one non-trivial block each. This means that

working with them in block form explicitly is very efficient, both for computation

and storage. In particular, we recall that the block unit-triangular matrices may be

inverted by toggling the sign of the nonzero off-diagonal block, giving (for example)

U1T

I T

I

I

fi

ffi

ffi

ffi

fl

, L1T

I

T I

I

fi

ffi

ffi

ffi

fl

(3.6)

in the case of (3.2).

To make it explicit that only a small amount of information is needed to represent


skeletonization of a matrix A with respect to a DOF set, we will occasionally write

the process in functional form as

S, R, XSS , XRR, rV, rW

skelpA,B, q,(3.7)

where rV and rW depend are understood to be stored as a product of operators in block

form that can be applied and inverted cheaply. Clearly one can construct rZ pA;Bqimplicitly from the information returned by skelpA,B, q.

3.1.1 Group skeletonization

Notationally, it will be useful to extend the notion of skeletonization to multiple

disjoint index sets (see also Ho & Ying [43]). Consider two index sets Bi and Bj withBi X Bj H, e.g., two DOF sets corresponding to distinct subdomains at the samelevel of the spatial hierarchy. Performing the skeletonizations of A with respect to Biand Bj independently, we obtain

Si, Ri, XSiSi , XRiRi , rVi, rWi

skelpA,Bi, q,

Sj, Rj, XSjSj , XRjRj , rVj, rWj

skelpA,Bj, q,

where the fact that Bi and Bj are disjoint mean that rVi and rVj commute (and similarlyfor rWi and rWj). It is evident in (3.3) that skeletonization does not affect blocks of A

indexed by Bci except to introduce zeros in place of the ARiBci and ABciRi blocks. Thus,we find that, with K prN szBiqzBj as the rest of the world,

rV1jrV1i A

rW1irW1j rV1i rV1j A rW1j rW1i

XRiRi

XSiSi ASiSj ASiK

XRjRj

ASjSi XSjSj ASjK

AKSi AKSj AKK

fi

ffi

ffi

ffi

ffi

ffi

ffi

ffi

ffi

fl

rZ pA; tBi,Bjuq .

3.2. THE RECURSIVE SKELETONIZATION FACTORIZATION 19

The subtlety of this is that we have performed the skeletonizations in parallel as

opposed to first computing rZ pA;Biq and then computing rZ

rZ pA;Biq ;Bj

. This can

lead to slightly different results in the computation of the ID to determine Sj andRj, but this is not due to the introduction of additional error it is simply differenterror.

More generally, given a pairwise-disjoint collection of index sets C tB1, . . . ,Bmuwith each Bi rN s, we let

Si, Ri, XSiSi , XRiRi , rVi, rWi

skelpA,Bi, q

for each i 1, . . . ,m and define the simultaneous group skeletonization of A withrespect to C as B rZ pA; C q such that

For each i 1, . . . ,m, BRiRi XRiRi is the only nonzero block in its block rowand block column.

For each i 1, . . . ,m, BSiSi XSiSi .

All other blocks of B are unmodified, such that, e.g., BSiSj ASiSj for i j.

We write

rZ pA; C q rV1C A rW1C

BiPC

rV1i

A

BiPC

rW1i

,(3.8)

where the matrices rVC and rWC are again not stored explicitly but are useful nota-

tionally. Similarly to before, we may define the functional form

tSi, Ri, XSiSi , XRiRiuBiPC , rVC , rWC

skel grouppA,C , q.

3.2 The recursive skeletonization factorization

We now combine the idea of skeletonization with the hierarchical tree decomposition

of the domain to construct the recursive skeletonization factorization [43, 54]. We


will suppose that the ID tolerance and tree occupancy parameter nocc are specified,

and that we have a tree decomposition of the domain with levels ` 0, . . . , L.We assume we are operating in the context of a Nystrom discretization of (1.1) for

exposition, though the factorization is more general.

3.2.1 Level L

We begin at level ` L. For each subdomain (or box) at level L, we define theactive1 DOFs Bi Ii and identify a box with its active DOFs. In this way, we maythink of LL in section 2.1 as a collection of active DOF sets, one corresponding to

each box at level L, such that

LL tBi |i is a subdomain at level L of the treeu.(3.9)

Group skeletonization of K with respect to LL gives (via a rearrangement of (3.8))

K rVLLrZ pK; LLq rWLL

BiPLL

rVi

rZ pK; LLq

BiPLL

rWi

,(3.10)

where in rZ pK; LLq we have identified and decoupled all redundant DOFs at level` L while leaving unchanged the off-diagonal blocks of K corresponding to kernelinteractions between skeleton DOFs in distinct leaf-level subdomains as described in

subsection 3.1.1. We will define the set of all thusfar decoupled DOFs as

RLL

BiPLL

Ri,(3.11)

as each Ri now indexes only a decoupled diagonal block in the new matrix rZ pK; LLqand will not play a role in further levels.

We note a technical point here in successive levels of skeletonization. In section 3.1

when skeletonizing with respect to B we considered compressing the block KBBc ,where Bc was the relative complement of B in rN s. Due to the fact that we have

1The notion of active DOFs will be made clear soon.


now completely decoupled the DOFs RLL , we use Bc at level ` L 1 to refer toa different relative complement: the relative complement of B in rN szRL . In otherwords, any DOFs at level ` L that have been decoupled already no longer needto be considered in future skeletonizations. We continue to use Bc in this mannerthroughout the thesis.

3.2.2 Level L 1

At the next level of the tree, corresponding to ` L 1, we aim to repeat the samegroup skeletonization process that we have performed for level ` L. We must,however, take into account the DOFs that have already been decoupled. With this

in mind, we define the active DOFs at this level as Bi IizRLL for each box in thequadtree at this level, and once again identify boxes with active DOF sets such that

we may consider

LL1 tBi |i is a subdomain at level L 1 of the treeu.

Note that, while at level L there is no distinction between DOFs and active DOFs

for a box, at higher levels the distinction is that the active DOFs do not include DOFs

marked redundant at previous levels.

Due to the structure of rZ pK; LLq that we identified in subsection 3.1.1, we seethat, for any two distinct sets of active DOFs in LL1, the corresponding off-diagonal

block of rZ pK; LLq is unmodified from what it was in K. This is due to the nestingproperty of DOF sets: off-diagonal blocks corresponding to distinct sets of active

DOFs in LL were unmodified, and sets of active DOFs in LL1 are given by taking

the union of skeleton DOFs from the previous level. Mathematically, we have

Bi IizRLL

jPchildpiq

Sj

for a subdomain i at level L 1. Thus, we may skeletonize rZ pK; LLq with respect


to LL1 to obtain

rZ pK; LL,LL1q rZ

rZ pK; LLq ; LL1

rV1LL1rZ pK; LLq rW1LL1

BiPLL1

rV1i

rZ pK; LLq

BiPLL1

rW1i

,

which we rearrange as

rZ pK; LLq rVLL1rZ pK; LL,LL1q rWLL1

analogously to (3.10). We define

RLL1 RLL Y

BiPLL1

Ri

(3.12)

to be the set of all thusfar decoupled DOFs.

3.2.3 Higher levels

For higher levels up to and including ` 1, we define Bi IizRL``1 and continue toidentify active DOF sets with their corresponding boxes as

L` tBi |i is a subdomain at level ` of our treeu.

Skeletonizing rZ pK; LL, . . . ,L``1q with respect to L` decouples new redundant DOFsat level `, which we use to define RL` analogously to (3.12). It is informative tovisualize the active DOFs at each level as in Figure 3.1 to get a sense of the algorithms

progression.

After finishing this process for level ` 1, we obtain

K rVLLrVLL1 . . . rVL2rVL1PtDPt rWL1 rWL2 . . . rWLL1 rWLL ,

where Pt is a permutation for the top level ` 0 such that all skeleton DOFs from


` 3 ` 2 ` 1 ` 0

Figure 3.1: We show the active DOFs before skeletonizing each level ` of RS on aquasi-1D problem (top) and true 2D problem (bottom). We see that the DOFs clusternear the edges of the boxes of the quadtree at each level.

level ` 1 are contiguous. The middle matrix D is given by

D Pt rZ pK; LL,LL1, . . . ,L1qPt(3.13)

and thus has block-diagonal structure, since each set of redundant DOFs Ri is de-coupled from all other DOFs. Letting Ft be some sort of factorization of D (e.g., a

Cholesky, LU, or LDL factorization, as appropriate) we may finally define the full

recursive skeletonization factorization (RS) as

K

L

`1

rVL`

PtFtPt

1

`L

rWL`

F.(3.14)

We give this in algorithmic form in Algorithm 1.

3.2.4 The use of a proxy surface

As we saw in the previous section, RS can be effectively summarized as repeatedly

identifying redundant DOFs using an ID of off-diagonal blocks and then decoupling

those DOFs using block row and column operations. What makes this efficient in


Algorithm 1 The recursive skeletonization factorization (RS)

1: // Initialize

2: A : K3: for ` : L down to 1 do4: for each box i P L` do5: // Identify relevant DOFs for skeletonization

6: rBi,Bci s : tactive DOFs in box and rest of worldu7: end for8: // Perform group skeletonization with respect to DOFs

9: A : rZ pA; L`q rV1L` ArW1L`

BiPL`rV1i

A

BiPL`rW1i

10: end for11: // Store middle block diagonal matrix and permutation

12: D : Pt APt13: Ft : tsome factorization of Du14: Output: F as in (3.14)

practice is the use of what has come to be known as the proxy trick, a prominent

computational acceleration in the literature [16, 17, 27, 29, 41, 43, 53, 61, 84].

The proxy trick takes advantage of the fact that the matrix K is typically related

to an underlying elliptic PDE, and thus the kernel function Kpzq typically satisfiessome form of a Greens identity wherein the values of the kernel inside a domain

can be recovered from those on the boundary of that domain. We will not justify

this rigorously here, but defer to Ho & Ying [43] for a thorough description (see also

subsection 7.2.1). Instead, we focus on the computational consequences.

Suppose that B is some subdomain in our quadtree with DOFs J and that wewish to compute an interpolative decomposition of KJ cJ . In principle, the cost of this

is Op|J c| |J |2q, which can be as big as OpNq at early levels. Using the proxy trick,we draw a circle (sphere in 3D) around B as seen in Figure 3.2a. This proxy surface,

, partitions the DOF set Jc into two sets: DOFs O associated with subdomainsintersected by , and DOFs P associated with subdomains fully outside relativeto B. Discretizing the proxy surface with nprox points tyiunproxi1 and defining thematrix GJ with entries rGJ sij Kpyi xjq for i 1, . . . , nprox and xj P J , we


begin by computing an ID

KOJ

GJ

fi

fl

KOR KOS

GR GS

fi

fl

KOS

GS

fi

fl

T I

.(3.15)

Using the same partitioning J R Y S and interpolation matrix T, the beauty ofthe proxy trick is that we get an ID of the full off-diagonal block for free,

KJ cJ

KJ cR KJ cS

KJ cS

T I

.

The use of the proxy surface is discussed in more detail in subsection 7.2.1 for an

analogous trick in the case of strong admissibility. The key aspects, at this point, are:

The cost of the ID has been reduced from Op|J c| |J |2q to Op|J |3`nprox|J |2q,which is substantially smaller when |P | is large.

The ID in (3.15) does not depend on KPJ , only on the block KOJ correspondingto interactions between B and DOFs belonging to boxes in nborpBq.

3.2.5 Complexity of RS using the proxy trick

Using the proxy trick, the cost of RS is essentially determined by the number of skele-

ton DOFs for each subdomain in the tree, i.e., |Si| for each i. Letting k` be an upperbound on |Si| for all boxes on level `, we find that the growth of k` with the numberof points N and dimensionality d depends on the distribution of points. As discussed

by Ho & Ying [43, subsection 3.4], the skeleton DOFs tend to line the boundaries of

their corresponding subdomains, a phenomenon we observe in Figure 3.1. Assuming

the points txiuNi1 Rd lie on a -dimensional manifold ( d), they give the result

k`

$

&

%

OpL `q 1,

Op2p1qpL`qq 2.(3.16)

which can be justified by standard multipole estimates. In particular, we note that in

2D k` tends to double each level we move up the tree. This matches the observation


B

(a) (b)

Figure 3.2: (a): Using the dotted black circle as a proxy surface when skeletonizingDOFs in the dark gray box, only interactions between that box and the light grayboxes adjacent to it need be considered. In particular, all interactions between thedark gray box and the white boxes will be represented by equivalent interactionsusing the proxy surface. (b): If the black grid corresponds to edges of boxes at level`, then the assignment of DOFs to edges is given using the Voronoi tessellation aboutthe edge centers (gray rotated grid).

that skeletons line the boundary of a subdomain, since the sidelength of a subdomain

doubles with each level we go up the tree.

The full complexity estimate assuming this growth of k` is given in Theorem 3.1.

We remark that these complexity estimates do not include the cost of construct-

ing an initial tree decomposition with a specified occupancy parameter nocc, which

technically has complexity OpN logNq but is negligible in practice.

Theorem 3.1 ([41, 43, 53]). Assume that (3.16) holds and that we use the proxy

trick for accelerated compression. Then the computational complexity of constructing

the recursive skeletonization factorization F in (3.14) is

Tfactor

$

&

%

OpNq 1,

OpN3p11{qq 2,

while the cost of applying or solving systems with F by exploiting block structure and

3.3. THE HIERARCHICAL INTERPOLATIVE FACTORIZATION 27

(3.6) is

Tapply Tsolve

$

&

%

OpNq 1,

OpN logNq 2,

OpN2p11{qq 3,

with constants in all cases that depend on the ID tolerance . The storage complexity

is the same as the apply cost.

3.3 The hierarchical interpolative factorization

While RS has asymptotically optimal efficiency for quasi-1D problems, in higher di-

mensions the growth of k` in (3.16) leads to less than optimal complexity. The

hierarchical interpolative factorization (HIF) [42, 43] was developed on top of RS to

remedy this rank growth. In what follows we describe HIF for 2D problems, though

a more complete treatment is given by Ho & Ying [43].

At level ` L we begin just like in RS, defining LL as in (3.9) and skeletonizingK with respect to LL to obtain

K rVLLrZ pK; LLq rWLL

BiPLL

rVi

rZ pK; LLq

BiPLL

rWi

,

identically to (3.10). The set of thusfar decoupled DOFs is again RLL as in (3.11).

3.3.1 Level L 1{2

It is at this point that things differ from RS. Rather than proceeding directly to level

` L 1, HIF first introduces an extra level of skeletonization with the goal ofdecoupling additional DOFs. We denote this level with a half-integer, such that after

` L we move to ` L 1{2.The key to HIF is using additional geometry outside of the tree decomposition of

space, moving from skeletonization based on boxes to skeletonization based on edges.


Each box in the quadtree decomposition at level L has four edges, though we note

that adjacent boxes share an edge. Performing a Voronoi tessellation of the domain

with the Voronoi cells centered on each of these edges defines a new decomposition of

space, as we see in Figure 3.2b. In general, the Voronoi cells are square subdomains

with the diagonals of the squares given by edges of the quadtree boxes.

Each Voronoi cell Vi geometrically contains active DOFs indexed by

Ei prN szRLLq X Vi .

In particular, the DOFs Ei come from the two boxes sharing the corresponding edgeon which Vi is centered We note in passing that in the case of a non-uniform quadtree,

it is possible that these boxes are of different sizes and only one is on level L, so Eimay contain DOFs that are not part of Sj or Rj for any Bj P LL.

Defining the half-integer level L 1{2 as

LL1{2 tEi |Vi is an edge subdomain at level L 1{2u,(3.17)

we skeletonize rZ pK; LLq with respect to LL1{2 to obtain

rZ pK; LLq rVLL1{2rZ`

K; LL,LL1{2

rWLL1{2

EiPLL1{2

rVi

rZ`

K; LL,LL1{2

EiPLL1{2

rWi

.

We define the full set of redundant DOFs so far RL`1{2 analogously to (3.12).

3.3.2 Higher levels

HIF continues for higher levels by alternating between integer box levels and half-

integer edge levels of skeletonization, stopping at level 1{2 where we have the ap-proximation

K rVLLrVLL1{2rVLL1 . . . rVL3{2rVL1rVL1{2PtDPtrWL1{2

rWL1rWL3{2 . . .

rWLL1rWLL1{2

rWLL .


Here, Pt is a permutation for the top level ` 0 such that all skeleton DOFs fromlevel ` 1{2 are contiguous. The middle matrix D is given by

D Pt rZ`

K; LL,LL1{2,LL1, . . . ,L3{2,L1,L1{2

Pt

and again has block-diagonal structure as in RS. Letting Ft be a factorization of D as

before, we define the full hierarchical interpolative factorization as

K

L

`1

rVL`rVL`1{2

PtFtPt

1

`L

rWL`1{2rWL`

F.(3.18)

We give this in algorithmic form in Algorithm 2. As with RS, we plot the active

DOFs at each level for visualization purposes in Figure 3.3.

There are a few extra points to Algorithm 2 that we do not address here in detail.

For example, whereas in RS all IDs can be shown to be applied to original blocks of the

matrix K, in HIF these blocks will include rows and columns that have been modified

by Schur complement updates from previous levels. Additionally, HIF in 3D has an

extra level of complication on top of what we describe here, wherein we alternate not

between boxes and edges but between boxes, faces, and edges, iteratively reducing

the dimensionality of the DOF sets from 3D to quasi-2D to quasi-1D to quasi-0D. We

direct the reader to Ho & Ying for a thorough discussion [43].

3.3.3 Complexity of HIF using the proxy trick

Using the proxy trick as we did with RS, we again require a bound on the number of

skeleton DOFs at each level. The additional levels of compression in HIF compared

to RS are intended to keep this number small, which is observed to be effective if we

consider Figure 3.3. Letting k` be as in (3.16) and assuming

k` OpL `q,(3.19)

we have the following result from Ho & Ying [43].


` 3 ` 2.5 ` 2 ` 1.5

` 1 ` 0.5 ` 0Figure 3.3: We show the active DOFs before skeletonizing each level ` of HIF in 2D.The growth of k` that was observed in the bottom of Figure 3.1 appears to have beenreduced dramatically.

Algorithm 2 The hierarchical interpolative factorization (HIF)

1: // Initialize

2: A : K3: for ` : L down to 1 do4: for each box i P L` do5: // Identify relevant DOFs for skeletonization

6: rBi,Bci s : tactive DOFs in box and rest of worldu7: end for8: // Perform group skeletonization with respect to DOFs

9: A : rZ pA; L`q rV1L` ArW1L`

BiPL`rV1i

A

BiPL`rW1i

10: for each edge i P L`1{2 do11: // Identify relevant DOFs for skeletonization

12: rEi, Eci s : tactive DOFs in edge and rest of worldu13: end for14: // Perform group skeletonization with respect to DOFs

15: A : rZ`

A; L`1{2

rV1L`1{2ArW1L`1{2

EiPL`1{2rV1i

A

EiPL`1{2rW1i

16: end for17: // Store middle block diagonal matrix and permutation



Theorem 3.2 ([43]). Assume that (3.19) holds and that we use the proxy trick for

accelerated compression. Then the computational complexity of constructing the

hierarchical interpolative factorization F in (3.18) is

Tfactor OpNq,

while the cost of applying or solving systems with F by exploiting block structure and

(3.6) is

Tapply Tsolve OpNq

with constants in all cases that depend on the ID tolerance . The storage complexity

is the same as the apply cost.

Again, the complexity estimates of Theorem 3.2 does not include the (negligi-

ble) cost of constructing an initial tree decomposition with a specified occupancy

parameter nocc, which has complexity OpN logNq.

Chapter 4

Updating skeletonization-based

factorizations

Having reviewed RS and HIF and how they are used to factor matrices K as in (1.2),

we now turn to the first major contribution of this thesis: efficient updating of the

factorization F of K in response to a sequence of localized perturbations. By localized

perturbation we mean that, given a matrix K discretizing an original problem of the

form (1.1) and a matrix qK discretizing a new perturbed problem of the same form,

there is a small local subdomain loc such that for all DOF sets I and J withI X loc H and J X loc H, we have

qKpI,J q KpI,J q,(4.1)

where we explicitly use the MATLAB R index notation for clarity. Put simply, blocks

of the system matrix that correspond to DOFs away from the modifications are un-

changed. Such local perturbations include (but are not limited to):

Localized geometric perturbations (see Figure 4.1), wherein the domain of in-tegration is modified and therefore a subset of discretization points of may

move or discretization points may be added or removed.

Localized coefficient perturbations, wherein the material parameters apxq, bpxq,or cpxq are modified in a local region.

33

34 CHAPTER 4. UPDATING FACTORIZATIONS

Figure 4.1: As an example of a localized perturbation to the geometry, we start withthe quasi-1D domain 1, the square with rounded corners following the dashedcurve. Then, for updating we adjust the rounding parameter to obtain 1 2, thesquare with the sharper, solid corners.

By a sequence of localized perturbations, we mean that we are interested in ap-

plications where there are a number of localized perturbations

K Kp1q Kp2q Kpi1q Kpiq . . . ,(4.2)

where each perturbation Kpi1q Kpiq is localized to some subdomain loci that weallow to be different for each i. Such sequences of problems can arise, e.g., in the case

of design problems where the physical system described by the linear operator is a

device that we want to design in an effort to optimize some objective function. We

make the following observations:

Localized perturbations lead to a global low-rank modification in the sensethat entire rows and columns of the new matrix Kpiq are different from the

corresponding rows and columns in Kpi1q, if such a correspondance even exists.

Because each perturbation can be localized to a different subdomain, for largei the matrix Kpiq is not necessarily given by a low-rank modification to K.

Because the perturbations we consider respect the same physical structure used

in the construction of hierarchical factorizations (i.e., spatial locality), it is not un-

reasonable to believe it might be possible to take a hierarchical factorization of Kpi1q

and update it to obtain a hierarchical factorization of Kpiq. This is what the method

we describe in this work accomplishes in an efficient way for certain factorizations.

4.1. APPROACHES TO UPDATING 35

4.1 Approaches to updating

The idea of updating matrix factorizations to solve sequences of related systems is not

a new one. For example, in the linear programming community it is common practice

to maintain an LU factorization of a sparse matrix A that permits the addition or

deletion of rows/columns of A, or a general rank-one update [26]. Further, it is

well-known how to update the QR factorization of a matrix after any of those same

operations [28].

The updating techniques described above, however, do not apply to fast hierar-

chical factorizations. Updating factorizations in the H-matrix format in response tolocal modifications has been previously studied in the thesis of Djokic [21], wherein

a similar process to this work is used to update the representation of the forward

operator, which allows for a post-processing step to obtain the updated inverse in the

same format. Updating of the skeletonization-based formats we consider here has not

appeared thus far in the literature, and, as we show, these formats admit efficient

one-pass updating.

In the case where the number of unknowns does not change and loci is the same

for all i, it is possible to order the unknowns in an LU decomposition such that those

that will be modified are eliminated last as in [63], which can be used to update LU

factorizations for integral equation design problems where only one small portion of

the geometry is to be changed across all updates. Similarly, if the total number of

unknowns modified between K and Kpiq is small and one is interested only in solving

systems and not in updating factorizations, then for any factorization of the base

system K it is relatively efficient to keep track of the updates as a global rank-k

update Kpiq K ` UCV with U P CNk, C P Ckk, and V P CkN and use theSherman-Morrison-Woodbury (SMW) formula,

pK` UCVq1 K1 K1U`

C1 ` VK1U1

VK1,(4.3)

taking advantage of the initial factorization of K as is done by Greengard et al. [29]

for hierarchical factorizations.


4.1.1 Our contribution

In this work we present a method to efficiently update skeletonization-based hierar-

chical factorizations in response to localized perturbations, i.e., to take a factorization

corresponding to Kpi1q in (4.2) and obtain a factorization of Kpiq. We illustrate our

approach using the language of skeletonization, in particular RS and HIF, though it

is simple to generalize to any factorization using bottom-up hierarchical compression

of off-diagonal subblocks.

There are a number of advantages to our approach over using the SMW formula

to solve a system with Kpiq. In the case where the number of unknowns that have

been modified between K and Kpiq is bounded by a small constant m and the cost of

solving a system with the existing factorization of K is OpNq, the cost of a solve using(4.3) (dropping terms that dont depend on N) is OpN ` mNq, where the secondterm can be amortized across multiple right-hand-sides. However, if the number of

total modified unknowns m comprises any substantial fraction of N then this is not

a viable strategy.

In contrast, under certain assumptions on the attainable compression of off-

diagonal blocks in the factorizations considered in this paper, if the number of modi-

fied unknowns between two factorizations is bounded by m then the asymptotic cost

of our updating method is Opm logpNq for some small p. Furthermore, one obtainsa factorization of the new matrix and not just a method for solving systems. This

factorization can of course be subsequently efficiently updated, but can be useful for

other reasons such as computing determinants or applying or solving with a matrix

square root.

We focus in this work on the 2D case d 2.

4.2 Updating algorithm

Given RS in Algorithm 1 and HIF in Algorithm 2, we consider updating existing

instantiations of these factorizations in response to a localized modification to the

problem. Concretely, we suppose that we have on hand a factorization corresponding

4.2. UPDATING ALGORITHM 37

to the initial problem with matrix K and assume a new matrix qK is obtained by

discretizing a locally-perturbed problem. For simplicity of exposition, we initially

assume that the perturbation does not change the total number of points N and does

not necessitate a change in the structure of the hierarchical decomposition of space,

i.e., the old quadtree is still valid for the new problem with the same occupancy

bound nocc. Extension to the more general case is straightforward.

4.2.1 Initial observation on updating a skeletonization

We will begin with a detailed discussion of updating when the base factorization is

RS, and later describe the necessary modifications for HIF. It will be useful to write

RS in a slightly different way than described in Algorithm 1 to make explicit exactly

which blocks are modified as a consequence of each step of skeletonization. Thus, we

rewrite the RS algorithm in modified form in Algorithm 3

Algorithm 3 RS in modified form

1: // Initialize

2: A : K3: for ` : L down to 1 do4: // Get blocks and operators

5:

tSi, Ri, XSiSi , XRiRiuBiPL` , rVL` , rWL`

: skel grouppA,L`, q6: // Assemble skeletonization

7: for each B P L` with B S YR do8: Ap:,Rq : 09: ApR, :q : 0

10: ApS,Sq : XSS11: ApR,Rq : XRR12: end for13: end for14: // Store middle block diagonal matrix and permutation


Recall from subsection 3.2.4 that the use of a proxy surface gives a strong notion

of locality to the skeletonization process, wherein the ID in (3.15) does not depend


on KPJ but only on KOJ , where J is the set of DOFs for a box B and O is the setof DOFs belonging to boxes in nborpBq. This locality has implications in updating.

Namely, suppose that we are considering a subdomain B i at the lowest levelof our quadtree with corresponding DOFs J B and neighbor DOFs O. Assumingthat B X loc H and O X loc H, i.e., that

qKpB,B YOq KpB,B YOq and qKpB YO,Bq KpB YO,Bq,(4.4)

we find that the ID

rS, R, TBs idpK,B, q

is identical to the ID

rS, R, TBs idpqK,B, q

in the sense shown, i.e., the output is exactly the same. We remark that this is

not a statement of the form both IDs are accurate to tolerance but rather, a

stronger statement: assuming the same proxy surface and deterministic floating-point

arithmetic, the output is the same.

By following the algebra in section 3.1, we see that the same statement holds for

the full skeletonization with respect to B: if B X loc H and O X loc H, thenthe skeletonization


skelpK,B, q

returns the same result as the skeletonization


skelpqK,B, q.

The fact that the proxy surface shields the subdomain B from the effect of

perturbations to the problem that are far from B and thus the local skeletonizations

do not change is not a complicated mathematical result, but it is a powerful one that


forms the core of the updating algorithm. We have described already how to apply

this idea at the lowest level of the quadtree, now we must discuss some propagation

rules for applying this idea at higher levels.

4.2.2 Propagation rules for higher levels

From our discussion in subsection 4.2.1, we know that when (4.4) does not hold then

we must possibly recompute the skeletonization with respect to B. We define thecollection of marked DOF sets (or simply marked boxes) at level L as

ML tBi |i is a subdomain at level L of the tree and (4.4) does not holdu ,(4.5)

such that ML LL contains all the DOF sets with respect to which we must redoskeletonization. The remainder of this section is dedicated to describing how to define

the marked set M` at higher levels.

For ` L, suppose that Bi is the DOF set for a box i at level `` 1 with respectto which we have redone skeletonization. Further, let j parentpiq have activeDOFs Bj. If qSi is the new set of skeleton DOFs arising from the reskeletonizationwith respect to Bi, then we know by definition of Bj that qSi Bj. This leads to themost self-evident propagation rule: if Bi at level `` 1 is marked, then so is its parentBj at level `. Based on this we define for each the collection

P` tBj |j parentpiq for some i with Bi P M``1u ,

which contains all parent boxes of marked boxes at the previous level.

The next propagation rule accounts for the fact that boxes must be reskeletonized

if any of their neighbors are modified. In particular, if Bi P M``1 and Bj P L` are suchthat j parentpiq, then we see from before that Bj P P`. However, this impliesthat for any Bk such that j P nborpkq we must also perform reskeletonization withrespect to Bk. This is because the DOFs Bj are possibly inside the proxy circle around


B

Figure 4.2: Left: Suppose the local perturbations are contained in box B with DOFsB so that possibly qKp:,Bq Kp:,Bq or qKpB, :q KpB, :q. Initially ML contains theDOF sets corresponding to the shaded boxes. Center: At level L 1, DOF setscorresponding to the dark gray boxes are in PL1 and thus ML1 because they havemarked children, and the light gray boxes are in UL1 and thus ML1 because theyhave neighbors in PL1 Right: The corresponding quadtree with nodes shaded thesame as their associated boxes.

Bk, so there is no longer any guarantee of shielding. Thus, we define the collection

U` tBk |j P nborpkq for some Bj P P`u ,

which corresponds to neighbors of parents of marked boxes at level `` 1.If the quadtree is perfect, the previous two rules are sufficient to define the new

marked set. As a technical detail however, we note that due to heterogeneous refine-

ment it is possible that there are leaf boxes i at levels ` L that have been directlymodified, i.e., for which (4.4) does not hold. Such boxes are also clearly marked,

though they may not be covered by the previous two rules. Combining this rule with

the previous two leads us to define the collection of marked DOF sets for levels ` Las

M` tBi |i P L` is a leaf box for which (4.4) does not holdu YP` YU`.

We see an example of the evolution of the marked set M` in Figure 4.2.

4.2.3 Updating a group skeletonization

At this point, we have identified the collection of DOFs M` at each level such that

in updating RS we possibly need to reskeletonize with respect to Bi P M` but do not


need to reskeletonize with respect to Bj P L`zM`. So, consider the skeletonization ofA rZ

qK; LL,LL1, . . . ,L``1

with respect to L`,

tSi, Ri, XSiSi , XRiRiuBiPL` , rVL` , rWL`

skel grouppA,L`, q,

with corresponding factorization

rZ pA; L`q rV1L`A rW1L`

BiPL`

rV1i

A

BiPL`

rW1i

.

Using the previous observation and the fact that the factors inside the parentheses

above commute, we may write

rZ pA; L`q

BjPL`zM`

rV1j

BiPM`

rV1i

A

BiPM`

rW1i

BjPL`zM`

rW1j

,

where we observe that the matrices in the product over L`zM` are the same asin the initial skeletonization of rZ pK; LL,LL1, . . . ,L``1q with respect to L`. Thisobservation leads us to decompose the skeletonization with respect to L` as

rZ pA; L`q rZ

rZ pA; M`q ; L`zM`

,

where the outer skeletonization need not be recomputed since we already know what

the result will be. Thus, the necessary computation to update this skeletonization is

simply

tSi, Ri, XSiSi , XRiRiuBiPM` , rVM` , rWM`

skel grouppA,M`, q,

with rVM` andrWM` defined analogously to

rVL` andrWL` .

4.2.4 Updating RS

In light of the previous discussion, it is straightforward to write the process of


Algorithm 4 Updating RS after a local perturbation

1: // Initialize

2: A : qK3: for ` : L down to 1 do4: // Get new blocks and operators

5:

tSi, Ri, XSiSi , XRiRiuBiPM` , rVM` , rWM`

: skel grouppA,M`, q6: // Assemble new parts of skeletonization

7: for each B P M` with B S YR do8: Ap:,Rq : 09: ApR, :q : 0

10: ApS,Sq : XSS11: ApR,Rq : XRR12: end for13: // Assemble old parts of skeletonization

14: for each B P L`zM` with B S YR do15: Ap:,Rq : 016: ApR, :q : 017: ApS,Sq : XSS18: ApR,Rq : XRR19: end for20: e

victorminden.github.io certify that i have read this dissertation and that, in my opinion, it is...

Documents