victorminden.github.io certify that i have read this dissertation and that, in my opinion, it is...

189
DATA-SPARSE ALGORITHMS FOR STRUCTURED MATRICES A DISSERTATION SUBMITTED TO THE INSTITUTE FOR COMPUTATIONAL AND MATHEMATICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Victor Lawrence Minden May 2017

Upload: hakhuong

Post on 13-Jul-2019

216 views

Category:

Documents


0 download

TRANSCRIPT

  • DATA-SPARSE ALGORITHMS FOR STRUCTURED MATRICES

    A DISSERTATION

    SUBMITTED TO THE INSTITUTE FOR COMPUTATIONAL

    AND MATHEMATICAL ENGINEERING

    AND THE COMMITTEE ON GRADUATE STUDIES

    OF STANFORD UNIVERSITY

    IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

    FOR THE DEGREE OF

    DOCTOR OF PHILOSOPHY

    Victor Lawrence Minden

    May 2017

  • This dissertation is online at: http://purl.stanford.edu/nb571rs5647

    2017 by Victor Lawrence Minden. All Rights Reserved.

    Re-distributed by Stanford University under license with the author.

    ii

    http://purl.stanford.edu/nb571rs5647
  • I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

    Lexing Ying, Primary Adviser

    I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

    Eric Darve

    I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

    George Papanicolaou

    Approved for the Stanford University Committee on Graduate Studies.

    Patricia J. Gumport, Vice Provost for Graduate Education

    This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.

    iii

  • iv

  • Abstract

    In the first part of the dissertation, we present a method for updating certain hier-

    archical factorizations for solving linear integral equations with elliptic kernels. In

    particular, given a factorization corresponding to some initial geometry or material

    parameters, we can locally perturb the geometry or coefficients and update the initial

    factorization to reflect this change with asymptotic complexity that is polylogarithmic

    in the total number of unknowns and linear in the number of perturbed unknowns.

    We apply our method to the recursive skeletonization factorization and hierarchical

    interpolative factorization and demonstrate scaling results for a number of different

    2D problem setups.

    In the second part, we consider the application of hierarchical factorizations to the

    problem of spatial Gaussian process maximum likelihood estimation, i.e., parameter

    fitting for kriging. We present a framework for scattered (quasi-)two-dimensional

    observations using skeletonization factorizations to quickly evaluate the Gaussian

    process log-likelihood. To efficiently evaluate the gradient of the log-likelihood we

    introduce two approaches, the first based on matrix peeling and the second based on

    what we deem selected sparse algebra. This gives a fast, nearly linear time framework

    for computing both the log-likelihood and its gradient that can be used in the context

    of black-box numerical optimization for parameter fitting of low-dimensional Gaussian

    processes.

    Finally, we introduce the strong recursive skeletonization factorization (RS-S), a

    new approximate matrix factorization based on recursive skeletonization for solving

    discretizations of linear integral equations associated with elliptic partial differential

    equations in two and three dimensions (and other matrices with similar hierarchical

    v

  • rank structure). Unlike previous skeletonization-based factorizations, RS-S uses a

    simple modification of skeletonization, strong skeletonization, which compresses only

    far-field interactions. This leads to an approximate factorization in the form of a prod-

    uct of many block unit-triangular matrices that may be used as a preconditioner or

    moderate-accuracy direct solver, with dramatically reduced rank growth. We further

    combine the strong skeletonization procedure with alternating near-field compression

    to obtain the hybrid recursive skeletonization factorization (RS-WS), a modification

    of RS-S that exhibits reduced storage cost in many settings. Under suitable rank

    assumptions both RS-S and RS-WS exhibit linear computational complexity, which

    we demonstrate with a number of numerical examples.

    vi

  • Acknowledgments

    To begin, I would like to acknowledge the mentorship of my advisor, Lexing Ying.

    Earning a doctorate is a long and (occasionally) arduous process, and I feel like I

    lucked out in getting to work with a mentor so supportive. During weekly meetings

    in his office he has offered guidance on everything from big-picture items like overall

    career direction, research focus, and professional presentation skills to small-scale

    aspects such as manuscript aesthetics, notation, and profiling and debugging code

    (among many other things). I dont like to be too effusive, so Ill be brief: thanks. I

    have learned a lot from you.

    After my advisor, Id like to thank my co-authors on much of my work in graduate

    school, Ken L. Ho and Anil Damle. Ive had many late-night email chains and hours

    of whiteboard-based discussion with them, and couldnt ask for better colleagues. A

    well-deserved shout-out as well to others I have worked with on small projects during

    my graduate career: Phil Colella, Boris Lo, and David Donoho.

    I would be remiss if I didnt mention the rest of my commmittee: Eric Darve,

    George Papanicolaou, Michael Saunders, and Sanjiva Lele. Eric, George, and Michael,

    you shaped my first year at ICME, and Im thankful for the knowledge youve im-

    parted and the opportunities youve given me. Sanjiva, I never had a course with you

    (which makes you the wildcard of this committee), but thank you for stepping in

    so readily.

    During my time in graduate school, I was supported both by Stanford via a

    Stanford Graduate Fellowship in Science & Engineering and by the Department of

    Energy through the Computational Science Graduate Fellowship (CSGF) program

    (grant number DE-FG02-97ER25308). I certainly appreciate the financial support,

    vii

  • but I would also like to thank my CSGF mentors and the CSGF community (and

    Krell Institute) in general, which taught me a lot about not only the technical details

    of computational science, but also how to present myself and my work, the interplay of

    science and civics, and the broader role of computational science in society. Thanks

    to all, and particularly to Jay Bardhan, Ashlee Ford Versypt, Oliver Fringer, Jeff

    Hammond, Mary Ann Leung, Dan Martin, Lindsey Eilts, David Keyes, Matt Reuter,

    and Jim Corones.

    All the computational results I obtained during my graduate program were run

    on ridiculously large computing resources (at least for the time) here at Stanford.

    Thanks to Lenya Ryzhik for offering time on wave4, Brian Tempero for maintaining

    all the ICME-specific resources (icme-share, icme-sharedmem, icme-gpu), and the

    folks at the Stanford Research Computing Center for keeping up the university-wide

    machines (sherlock, rice, corn, barley).

    While I was putting together this dissertation, many folks around ICME offered

    up chunks of their time for proof-reading and let me bounce my ideas off of them.

    In particular, I would like to thank Austin Benson, Nolan Skochdopole, Brad Nelson,

    Ron Estrin, Yingzhou Li, and Xiaotong Suo. Without their keen eyes, this dissertation

    would have many more typographical errors than it inevitably already does.

    Now we turn to personal acknowledgments. I am grateful for the support of my

    family and friends outside of Stanford: my parents, brother, sisters, brother-in-law,

    girlfriend, and high school and college friends here around San Francisco. Thanks

    for helping keep me sane and making sure that my life has at least some non-math-

    related components. To my parents, in particular: thank you. I wouldnt be here

    without you.

    I wouldnt have made it as far as graduate school without the guidance of many of

    my professors back at Tufts, specifically my undergraduate advisors Scott MacLachlan

    and Doug Preis and my math and EE professors Misha Kilmer, Ron Lasser, Usman

    Khan, Tom Vandervelde, and Eric Miller. I also owe a special thanks for my first-

    year undergraduate advisor Loring Tu, without whom I likely would have studied

    international relations or psychology or something else not-math.

    Most of my days at Stanford were spent in or around ICME. Thank you to all

    viii

  • those responsible for keeping it going day after day: Emily, Antoinette, Claudine,

    Judy, Karen, and especially Indira and Margot. With ICME in mind, I feel espe-

    cially thankful for all the friends here who have made my years here fun, educational,

    and never boring. A shout-out to everyone, but especially Austin, Anil, Sven, Ryan,

    Yingzhou, Zhiyu, Rikel, Xiaotong, Anjan, Ron, Nolan, Lan, Casey, Arun, Neel, Car-

    son, Laura, Eileen, Brad, Cindy, Nurbek, Ruoxi, Han, Dave, Gil, Evan, Mike, Chris,

    Konstantin, Dangna, Kari, Milinda, Fei, Fai, Yuekai, Jason, and Tania.

    ix

  • x

  • Contents

    Abstract v

    Acknowledgments vii

    1 Introduction 1

    1.1 Solving rank-structured linear systems . . . . . . . . . . . . . . . . . 2

    1.2 Contributions of this thesis . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.2.1 Updating skeletonization factorizations . . . . . . . . . . . . . 5

    1.2.2 Maximum likelihood estimation for Gaussian processes . . . . 6

    1.2.3 Strong-admissibility-based skeletonization . . . . . . . . . . . 7

    2 Background material 9

    2.1 Hierarchical decomposition of space . . . . . . . . . . . . . . . . . . . 10

    2.2 Block-structured elimination . . . . . . . . . . . . . . . . . . . . . . . 12

    2.3 The interpolative decomposition . . . . . . . . . . . . . . . . . . . . . 13

    3 Review of some skeletonization-based factorizations 15

    3.1 Skeletonization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    3.1.1 Group skeletonization . . . . . . . . . . . . . . . . . . . . . . 18

    3.2 The recursive skeletonization factorization . . . . . . . . . . . . . . . 19

    3.2.1 Level L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    3.2.2 Level L 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2.3 Higher levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    3.2.4 The use of a proxy surface . . . . . . . . . . . . . . . . . . . . 23

    xi

  • 3.2.5 Complexity of RS using the proxy trick . . . . . . . . . . . . . 25

    3.3 The hierarchical interpolative factorization . . . . . . . . . . . . . . . 27

    3.3.1 Level L 1{2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.3.2 Higher levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    3.3.3 Complexity of HIF using the proxy trick . . . . . . . . . . . . 29

    4 Updating skeletonization-based factorizations 33

    4.1 Approaches to updating . . . . . . . . . . . . . . . . . . . . . . . . . 35

    4.1.1 Our contribution . . . . . . . . . . . . . . . . . . . . . . . . . 36

    4.2 Updating algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    4.2.1 Initial observation on updating a skeletonization . . . . . . . . 37

    4.2.2 Propagation rules for higher levels . . . . . . . . . . . . . . . . 39

    4.2.3 Updating a group skeletonization . . . . . . . . . . . . . . . . 40

    4.2.4 Updating RS . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    4.2.5 Complexity of updating RS . . . . . . . . . . . . . . . . . . . 44

    4.2.6 Modifications for updating HIF . . . . . . . . . . . . . . . . . 46

    4.2.7 Complexity of updating HIF . . . . . . . . . . . . . . . . . . . 47

    4.3 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    4.3.1 Example 1: Laplace double-layer potential on a circle with a

    bump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    4.3.2 Example 2: the Lippmann-Schwinger equation . . . . . . . . . 50

    4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    5 Gaussian process MLE through skeletonization 57

    5.1 Our contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    5.1.1 Alternative approaches . . . . . . . . . . . . . . . . . . . . . . 60

    5.2 Factorization of the covariance matrix . . . . . . . . . . . . . . . . . . 61

    5.2.1 Modified proxy trick . . . . . . . . . . . . . . . . . . . . . . . 62

    5.2.2 Operations for MLE using RS . . . . . . . . . . . . . . . . . . 63

    5.3 Computing the trace terms with peeling . . . . . . . . . . . . . . . . 66

    5.3.1 Randomized low-rank approximations . . . . . . . . . . . . . . 67

    5.3.2 Matrix peeling for weakly-admissible matrices . . . . . . . . . 68

    xii

  • 5.3.3 Computational complexity . . . . . . . . . . . . . . . . . . . . 74

    5.4 Summary of MLE framework . . . . . . . . . . . . . . . . . . . . . . 76

    5.5 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

    5.5.1 Runtime scaling of the peeling algorithm . . . . . . . . . . . . 77

    5.5.2 Relative efficiency of peeling versus the Hutchinson estimator . 80

    5.5.3 Synthetic data example, Matern kernel . . . . . . . . . . . . . 82

    5.5.4 Ocean data example, Matern kernel . . . . . . . . . . . . . . . 83

    5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

    6 Selected sparse algebra and the product trace for faster Gaussian

    process MLE 89

    6.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

    6.1.1 Our contribution . . . . . . . . . . . . . . . . . . . . . . . . . 90

    6.1.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

    6.2 Selected sparse algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 92

    6.2.1 Fast application of skeletonization factorizations . . . . . . . . 92

    6.2.2 Selected sparse algebra with RS . . . . . . . . . . . . . . . . . 94

    6.2.3 Modifications for HIF . . . . . . . . . . . . . . . . . . . . . . . 103

    6.3 Computing the product trace . . . . . . . . . . . . . . . . . . . . . . 107

    6.3.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

    6.3.2 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

    6.4 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

    6.4.1 Selected sparse algebra: diagonal extraction . . . . . . . . . . 111

    6.4.2 The full product trace . . . . . . . . . . . . . . . . . . . . . . 115

    6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

    7 The strong recursive skeletonization factorization 123

    7.1 Our contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

    7.2 Strong skeletonization . . . . . . . . . . . . . . . . . . . . . . . . . . 125

    7.2.1 The use of a proxy surface in the strong case . . . . . . . . . . 128

    7.3 Algorithm and complexity . . . . . . . . . . . . . . . . . . . . . . . . 130

    7.3.1 The general case: first level . . . . . . . . . . . . . . . . . . . 133

    xiii

  • 7.3.2 The general case: subsequent levels . . . . . . . . . . . . . . . 138

    7.3.3 The final factorization . . . . . . . . . . . . . . . . . . . . . . 140

    7.3.4 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

    7.3.5 Extension: hybrid skeletonization . . . . . . . . . . . . . . . . 144

    7.4 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

    7.4.1 Example 1: unit square in 2D . . . . . . . . . . . . . . . . . . 147

    7.4.2 Example 2: unit cube in 3D . . . . . . . . . . . . . . . . . . . 151

    7.4.3 Example 3: unit sphere in 3D . . . . . . . . . . . . . . . . . . 153

    7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

    xiv

  • List of Tables

    4.1 Timing results for quasi-1D factorization updating . . . . . . . . . . . 51

    4.2 Timing results for 2D factorization updating . . . . . . . . . . . . . . 54

    5.1 Complexity of operations with RS in 2D . . . . . . . . . . . . . . . . 65

    5.2 Runtime and storage complexity of peeling algorithm . . . . . . . . . 76

    5.3 Runtime of peeling algorithm on squared-exponential kernel . . . . . 79

    5.4 Runtime of peeling algorithm on Matern kernel . . . . . . . . . . . . 80

    5.5 Runtime of one iteration for Gaussian process MLE with gridded data 83

    5.6 Runtime of one iteration for Gaussian process MLE with scattered data 85

    6.1 Runtime of selected sparse algebra with RS . . . . . . . . . . . . . . . 112

    6.2 Runtime of selected sparse algebra with HIF . . . . . . . . . . . . . . 114

    6.3 Error of selected sparse algebra with HIF . . . . . . . . . . . . . . . . 114

    6.4 Runtime of product trace for 2D problem . . . . . . . . . . . . . . . . 116

    6.5 Error of product trace for 2D problem . . . . . . . . . . . . . . . . . 117

    6.6 Runtime of product trace for quasi-2D problem . . . . . . . . . . . . 119

    6.7 Error of product trace for quasi-2D problem . . . . . . . . . . . . . . 120

    7.1 Timing and memory results for RS-S with 2D grid . . . . . . . . . . . 150

    7.2 Accuracy results for RS-S with 2D grid . . . . . . . . . . . . . . . . . 150

    7.3 Timing and memory results for RS-S with 3D grid . . . . . . . . . . . 154

    7.4 Accuracy results for RS-S with 3D grid . . . . . . . . . . . . . . . . . 154

    7.5 Timing and memory results for RS-S with quasi-2D sphere . . . . . . 157

    7.6 Accuracy results for RS-S with quasi-2D sphere . . . . . . . . . . . . 157

    xv

  • xvi

  • List of Figures

    1.1 Strong admissibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    2.1 Perfect hierarchical partitioning of space . . . . . . . . . . . . . . . . 10

    2.2 Nonuniform hierarchical partitioning of space . . . . . . . . . . . . . 11

    3.1 Active DOFs for each level of RS . . . . . . . . . . . . . . . . . . . . 23

    3.2 The proxy surface (left) and Voronoi tessellation for HIF (right) . . . 26

    3.3 Active DOFs for each level of HIF . . . . . . . . . . . . . . . . . . . . 30

    4.1 A localized geometric perturbation . . . . . . . . . . . . . . . . . . . 34

    4.2 Propagation of the marked set after a local perturbation . . . . . . . 40

    4.3 Quasi-1D and 2D domains for updating examples . . . . . . . . . . . 49

    4.4 Timing plot for quasi-1D factorization updating . . . . . . . . . . . . 51

    4.5 Timing results for 2D factorization updating . . . . . . . . . . . . . . 53

    5.1 Modified proxy trick for covariance kernels . . . . . . . . . . . . . . . 64

    5.2 Labeling of the quadtree subdomains for peeling . . . . . . . . . . . . 68

    5.3 Realizations of Gaussian processes with two different parameters . . . 79

    5.4 Runtime plots for peeling algorithm versus Hutchinson trace estimator 81

    5.5 Relative error plots for peeling versus Hutchinson tracec estimator . . 82

    5.6 Timing plots for one iteration of Gaussian process MLE . . . . . . . . 84

    5.7 Visualization of sea surface temperature kriging . . . . . . . . . . . . 87

    6.1 Tree traversals for selected sparse algebra . . . . . . . . . . . . . . . . 102

    6.2 Generalized ancestors for selected sparse algebra with HIF . . . . . . 106

    xvii

  • 6.3 Results for diagonal extraction with HIF . . . . . . . . . . . . . . . . 115

    6.4 Runtime plots for product trace computation for 2D problem . . . . . 118

    6.5 Plots of the 3D surface for the product trace example . . . . . . . . . 119

    6.6 Runtime plots for product trace computation for quasi-2D problem . 120

    7.1 Near-field and far-field DOFs (left) and proxy surface (right) . . . . . 125

    7.2 Illustration of RS-S in 1D, first level . . . . . . . . . . . . . . . . . . . 132

    7.3 Illustration of RS-S in 1D, second level . . . . . . . . . . . . . . . . . 134

    7.4 A situation illustrating why modified interactions must be compressed 136

    7.5 Illustration of RS-S in 2D, first level . . . . . . . . . . . . . . . . . . . 139

    7.6 Illustration of RS-S in 2D, second level . . . . . . . . . . . . . . . . . 140

    7.7 Timing and memory plots for RS-S with 2D grid . . . . . . . . . . . . 149

    7.8 Timing and memory plots for RS-S with 3D grid . . . . . . . . . . . . 152

    7.9 Timing and memory plots for RS-S with quasi-2D sphere . . . . . . . 156

    7.10 Domain coloring for parallelization of RS-S . . . . . . . . . . . . . . . 159

    xviii

  • List of Algorithms

    1 The recursive skeletonization factorization (RS) . . . . . . . . . . . . 24

    2 The hierarchical interpolative factorization (HIF) . . . . . . . . . . . 30

    3 RS in modified form . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    4 Updating RS after a local perturbation . . . . . . . . . . . . . . . . . 42

    5 Computing the Gaussian process log-likelihood and gradient . . . . . 77

    6 Applying F for RS to M P Cnm . . . . . . . . . . . . . . . . . . . . . 957 Applying F1{2 for RS to M P Cnm . . . . . . . . . . . . . . . . . . . . 968 Applying

    `

    F1{2

    for RS to M P Cnm . . . . . . . . . . . . . . . . . . 979 Applying F for RS to M P Cnm with input restricted to j P LL and

    output restricted to k P LL . . . . . . . . . . . . . . . . . . . . . . . 10410 Computing T TrpA1Bq using FA and FB . . . . . . . . . . . . . . . 10911 The strong recursive skeletonization factorization (RS-S) . . . . . . . 142

    12 The hybrid recursive skeletonization factorization (RS-WS) . . . . . . 146

    xix

  • xx

  • Chapter 1

    Introduction

    The focus of this dissertation is the development of fast computational algorithms

    for working with kernel matrices exhibiting various forms of hierarchical block low-

    rank structure such as arise in a number of settings in the physical sciences and

    statistics. A standard application-rich example is integral equations coming from

    elliptic partial differential equations. We discuss this below, deferring discussion of

    statistical examples to later in this thesis.

    In particular, consider the general-form integral equation

    apxqupxq ` bpxq

    Kpx yqcpyqupyq dy fpxq, x P Rd(1.1)

    in dimension d 2 or 3, where the kernel function Kpzq is associated with someunderlying elliptic partial differential equation (PDE), i.e., it is the Greens function

    or its derivative. Here, apxq, bpxq, and cpyq are given functions that typically representmaterial parameters, fpxq is some known right-hand side, and upxq is the unknownfunction to be determined. We make the additional stipulation that the kernel Kpzqshould not exhibit significant oscillation away from the origin, though this is not

    strictly necessary to apply the algorithms outlined in this thesis. In this setting,

    (1.1) remains rather general and includes problems such as the Laplace equation, the

    1

  • 2 CHAPTER 1. INTRODUCTION

    Lippmann-Schwinger equation, and the Helmholtz equation in the low- to moderate-

    frequency regime. Further, while we concentrate on the case where upxq is scalar-valued, extension to the vector-valued case (e.g., the Stokes or elasticity equations)

    is straightforward.

    Discretization of (1.1) using typical approaches such as collocation, the Nystrom

    method, or the Galerkin method leads to a linear system with N degrees of freedom

    (DOFs)

    Ku f,(1.2)

    where the entries of the matrix K P CNN are dictated by the kernel Kpzq and thediscretization scheme. For example, in the case where our domain is the unit square

    r0, 1s2 a simple Nystrom approximation to the integral using a regular grid with?N points in each direction yields the discrete system

    rapxiq ` wisui `bpxiqN

    ijKpxi xjqcpxjquj fpxiq, i 1, . . . , N,(1.3)

    where the discrete solution tuiu tupxiqu approximates the continuous solution onthe grid and each term wiui corresponds to some discretization of diagonal entries of

    K. Because Kpzq is frequently singular at the origin, this discretization may be moreinvolved than that of the off-diagonal entries. While more complicated and higher-

    order discretization schemes exist, (1.3) illustrates the key feature that off-diagonal

    entries of K are given essentially by kernel interactions between distinct points in

    space.

    1.1 Solving rank-structured linear systems

    Because K in (1.2) is dense and generally large in practice, traditional direct fac-

    torizations of K such as the LU factorization are typically too expensive due to the

    associated OpN3q time complexity and OpN2q storage cost.

  • 1.1. SOLVING RANK-STRUCTURED LINEAR SYSTEMS 3

    D1 DD

    D D

    D

    Figure 1.1: Given two boxes in R2 each with sidelength D and with correspondingDOF sets B1 and B2, in the strong admissibility setting the associated off-diagonalblocks KB1B2 and KB2B1 are assumed to be numerically low rank as long as the boxesare separated by a distance of at least D. In contrast, in the weak admissibilitysetting the boxes need only be non-overlapping.

    Given the availability of fast schemes for applying K such as fast multipole meth-

    ods (FMMs) [24, 30, 32, 84], iterative methods such as the conjugate gradient method

    (CG) [40] form a tempting alternative to direct methods. For first-kind integral equa-

    tions or problems where apxq, bpxq, or cpxq exhibit high contrast, however, convergenceis typically slow leading to a lack of robustness. In other words, while each iteration

    is relatively fast, the number of iterations necessary to attain reasonable accuracies

    can be unreasonably large.

    The above considerations have led to the development of a plethora of alternative

    methods for solving (1.2) approximately by exploiting properties of the kernel Kpzqand the underlying physical structure of the problem. In particular, such methods

    take advantage of the fact that K exhibits hierarchical block low-rank structure.

    A large body of work pioneered by Hackbusch and collaborators on the algebra

    of H-matrices (and H2-matrices) provides an important and principled theoreticalframework for obtaining linear or quasilinear complexity when working with matrices

    exhibiting such structure [34, 36, 37]. Inside the asymptotic scaling of this approach,

    however, lurk large constant factors that hamper practical performance, particularly

    in the 3D case.

    The H-matrix literature classifies matrices with hierarchical block low-rank struc-ture into two categories based on which off-diagonal blocks of the matrix are com-

    pressed. Given a quadtree or octree data structure partitioning the domain into small

  • 4 CHAPTER 1. INTRODUCTION

    boxes1, let B1 and B2 be sets of DOFs corresponding to distinct boxes at the samelevel of the tree each with sidelength D. For strongly-admissible hierarchical matri-

    ces, the off-diagonal block KB1B2 is compressed only if B1 and B2 are well-separatedas in the FMM that is, if B1 and B2 are separated by a distance of at least D asin Figure 1.1. In contrast, weakly-admissible hierarchical matrices compress not only

    well-separated interactions but also interactions corresponding to DOFs in adjacent

    boxes. The inclusion of nearby interactions under weak admissibility typically in-

    creases the required approximation rank, but it also affords a much simpler geometric

    and algorithmic structure.

    A number of more recent methods have been developed for hierarchically rank-

    structured matrices with the aim of more efficient practical performance based on

    weakly-admissible rank structure. Examples include algorithms for hierarchical semi-

    separable (HSS) matrices [13, 14, 82], hierarchical off-diagonal low-rank (HODLR)

    matrices [1, 52], and methods based on recursive skeletonization [27, 41, 53], among

    other related schemes [11, 15]. In general, methods based strictly on weak admis-

    sibility require allowing ranks of off-diagonal blocks to grow non-negligibly with N

    to attain a fixed target accuracy. This has led to the development of more involved

    methods such as the hierarchical interpolative factorization of Ho & Ying [43] and

    the method of Corona et al. [17], which combine skeletonization with additional com-

    pression steps based on geometric considerations to obtain greater efficiency at the

    cost of a more complicated algorithm.

    The skeletonization process that forms the core of the skeletonization-based

    methods used and developed in this thesis was introduced by Martinsson & Rokhlin

    [53] and Cheng et al. [16] based on observations by Starr & Rokhlin [67] and Greengard

    & Rokhlin [31]. At their core, the methods attain fast compression using a so-called

    interpolative decomposition to perform low-rank compression of off-diagonal blocks

    of K without modifying many other entries. This important feature means that such

    algorithms preserve interpretability of rows and columns of the partially-compressed

    matrix, leading to intuitive and efficient algorithms. In this context, we refer to such

    factorizations as data-sparse.

    1We review this in section 2.1

  • 1.2. CONTRIBUTIONS OF THIS THESIS 5

    1.2 Contributions of this thesis

    To begin, we spend a small amount of time in Chapter 2 reviewing some very basic

    material underlying hierarchical factorizations, particularly some terminology con-

    cerning hierarchical tree-based decompositions of space and block linear algebra. We

    additionally review the notion of an interpolative decomposition, the compression

    technique of choice for our algorithms.

    We then delve into some of the more complicated material on which the work

    of this thesis is based in Chapter 3. First, we review the recursive skeletonization

    factorization, which serves as the prototypical example of a skeletonization-based fac-

    torization. This serves to introduce the general idea of the method of skeletonization,

    as well as how to apply skeletonization in a level-by-level fashion to obtain a factor-

    ization for discretized integral equations. Following this, we review the hierarchical

    interpolative factorization. Based on similar ideas, this modification of the recursive

    skeletonization factorization introduces additional geometric information beyond the

    tree-based decomposition to obtain better asymptotic complexity for problems with

    higher intrinsic dimension.

    With the background material and notation established, we then dive into the core

    contributions of the thesis, which are broken into four different chapters and three

    subject areas. These chapters, which we preview below, are based on joint work with

    Ken L. Ho, Anil Damle, and Lexing Ying. In all cases, the corresponding papers were

    the primary work of the dissertation author, including design and analysis of methods,

    coding and numerical simulation, and manuscript writing and editing [5759].

    1.2.1 Updating skeletonization factorizations

    The primary focus in the existing literature for hierarchical matrix factorizations has

    been on the speed with which these factorizations can be constructed and subse-

    quently used to solve linear systems, either via preconditioning an iterative method

    (at low accuracies) or directly (at high accuracies). The structure of these factoriza-

    tions, however, admits a number of other efficient operations that can be useful in

    practical settings.

  • 6 CHAPTER 1. INTRODUCTION

    We exploit the same hierarchical structure that makes these factorizations useful

    to explore the idea of updating a factorization in response to a local perturbation. By

    this we mean, given a factorization F that corresponds to, for example, a discretization

    of an integral equation on some geometric domain, how do we construct a modified

    factorization qF corresponding to the same integral equation on a slightly modified

    geometry? We can of course simply construct a brand-new factorization using the

    standard algorithm, but as we observe in Chapter 4 there are many cases where we

    can update this factorization in an asymptotically (and practically) more efficient way

    [58].

    We describe our approach in the context of sequences of modifications to two-

    dimensional integral equations, such as might be arise, for example, in the context of a

    design problem where the optimization variables parameterize a portion of the domain

    and the objective function involves quantities governed by, for example, diffusion or

    low-frequency scattering. However, the methods we describe are also highly applicable

    to the case of kernelized covariance matrices, and allow fast updated factorizations in

    response to, for example, the addition or removal of observations.

    1.2.2 Maximum likelihood estimation for Gaussian processes

    Based on the observation that skeletonization-based factorizations give fast ways to

    compute quantities such as the log-determinant, we next turn to an application of

    such factorizations: parameterized Gaussian processes. Low-dimensional Gaussian

    processes have wide applicability in the geosciences in the context of kriging, where

    they are used predict the value of some spatially-indexed quantity of interest given

    observations at other locations [69]. Many covariance kernels of interest are struc-

    turally similar to the Greens functions of elliptic PDEs in the sense that, empirically,

    they exhibit similar hierarchical rank structure and admit the use of hierarchical

    factorizations.

    In Chapter 5, we consider the task of maximum likelihood estimation for parame-

    terized Gaussian processes: given some observations of a spatial field that are assumed

    to be modeled by a Gaussian process whose covariance kernel is specified up to some

  • 1.2. CONTRIBUTIONS OF THIS THESIS 7

    parameter, find the value of that parameter that maximizes the Gaussian process

    log-likelihood. This can be a computationally-demanding task due to the fact that

    the log-determinant of the covariance matrix (and the derivative of said quantity)

    both play a role, but we demonstrate that skeletonization-based factorizations can be

    an efficient method by outlining a general framework based on these ideas [57].

    In follow-up work, we explore more deeply in Chapter 6 some of the intricacies

    of the structure of skeletonization-based factorizations, with an eye at obtaining an

    optimally efficient method for computing the gradient of the log-likelihood to high

    accuracy. To accomplish this, we introduce the idea of selected sparse algebra for

    skeletonization-based factorizations, which, like updating, uses the hierarchical struc-

    ture of skeletonization in a nuanced way to perform computations that are inherently

    data-sparse. For example, we consider the task of extracting the diagonal of F1 when

    F is a skeletonization-based factorization. Based on these techniques, we outline a

    fast method for computing the product trace Trp1iq appearing in the gradient ofthe Gaussian process log-likelihood, which substantially improves upon the method

    of Chapter 6.

    1.2.3 Strong-admissibility-based skeletonization

    Finally, in Chapter 7 we build upon the recursive skeletonization factorization [43,

    53] to construct a new skeletonization-based factorization using strong admissibility

    [59]. Succinctly, previous factorizations based on skeletonization are based on the

    idea of compressing blocks of a matrix corresponding to kernel interactions between

    distinct convex subdomains: the case of weak admissibility. In contrast, we use a

    modified skeletonization process to restrict compression only to blocks of a matrix

    corresponding to kernel interactions between well-separated subdomains, i.e., strong

    admissibility.

    While previous skeletonization-based algorithms have used iterative dimensional-

    ity reduction (i.e., more complicated algorithms) to get at the asymptotic efficiency

    of algorithms based on strong-admissibliity while maintaining the practical efficiency

    of skeletonization, our strong-admissibility-baseed skeletonization factorization is the

  • 8 CHAPTER 1. INTRODUCTION

    first to treat this case directly with skeletonization, leading to a simpler algorithm

    that can be thought of as a linear algebraic inverse to the fast multipole method [59].

    We find our algorithm to be relatively simple and easy to understand and implement

    compared to competing approaches, with competitive performance.

  • Chapter 2

    Background material

    Because of the different application and algorithmic particulars discussed in different

    chapters, we will occasionally use different notation to express the same quantities.

    This is for the sake of clarity within each chapter. We attempt to mention explicitly

    any overloaded notation at the point it arises. In general however, we follow the

    following notational conventions as closesly as possible.

    For a positive integer N , the index set t1, 2, . . . , Nu is denoted by rN s. We writematrices or matrix-valued functions in the sans serif font (e.g., A P CNN) but makeno such distinction for vectors (e.g., x P CN). Given a vector or matrix, the norms}x} or }A} refer to the standard Euclidean vector norm and corresponding inducedmatrix norm, respectively. The math-calligraphic font is used to indicate index sets

    (e.g., I ti1, i2, . . . , iru with each ij a positive integer) that we use to index blocksof a matrix (e.g., AIJ ApI,J q P C|I||J |, using MATLAB R notation). Therefore,each index set has an implicit ordering, though we use the term set as opposed

    to vector to avoid conflation. Because we are working with matrices discretizing

    integral equations, indices in an index set are typically associated with points in Rd

    (e.g., Nystrom or collocation points or centroids of elements). As such, we will use the

    more general term DOF sets to refer to both the index set B and the correspondingpoints txiuiPB in Rd. This leads to one notation that will appear strange at first blush:if I is an index set and i Rd is a subdomain, then I X i is well-defined whenunderstood in this sense. Finally, to denote ordered sets that are not associated with

    9

  • 10 CHAPTER 2. BACKGROUND MATERIAL

    Figure 2.1: In a perfect quadtree, each level ` has 4` subdomains. Here we visualize` 1 (left) and ` 2 (right).

    points in the domain nor used to index matrices we use the math-script font (e.g.,

    L ).

    2.1 Hierarchical decomposition of space

    A key data structure for factorizations based on exploiting spatial structure is a tree-

    based hierarchical decomposition of space, which hierarchically partitions the domain

    . Here, we briefly review the use of a quadtree (octree in 3D) and the corresponding

    terminology, but direct the reader to Samet [65] for a more thorough description.

    Suppose for simplicity that the domain is a rectangle in R2 and define the rootof the quadtree as itself with corresponding index set rN s denoting the fact thattxjujPrNs . At level ` 1, we partition uniformly into four child subdomains,and at level ` 2 we recursively further partition each of these subdomains into fourmore subdomains. This leads to the decomposition shown in Figure 2.1. For each

    subdomain i, the corresponding index set Ii rN s is the largest possible such that

    txjujPIi i.

    It is evident that we may continue to subdivide the domain in this fashion an

    arbitrary number of times, such that at level ` we have 4` subdomains. Subdividing

    uniformly such that the tree has levels ` 0, 1, . . . , L leads to a perfect quadtreewith 4L subdomains at the bottom level. The three-dimensional case of an octree

    is analogous, except that the domain is divided into rectangular prisms instead of

  • 2.1. HIERARCHICAL DECOMPOSITION OF SPACE 11

    Figure 2.2: Suppose that our integral equation is posed on the boundary of the blackregion (left). Using a regular discretization of the boundary and building an adaptivequadtree on the resulting DOFs, we can visualize the spatial hierarchy (right), whereeach square leaf subdomain i is colored according to its level, `.

    rectangles, leading to 8` subdomains at level `.

    In the entirety of this work, we will visualize examples in terms of a perfect

    quadtree, which is what our spatial hierarchy will look like if our points txjujPrNs forma regular grid in space. However, most problems are not discretized on uniform grids,

    and the algorithms we describe do not rely on this assumption. Instead, an adaptive

    quadtree (or octree) is constructed, such that subdomains that have sufficiently few

    points are not subdivided. Concretely, given an occupancy parameter nocc, we only

    subdivide subdomains i such that |Ii| nocc, which leads to a partitioning such asis visualized in Figure 2.2.

    We define the following notions on a hierarchical tree-based partitioning of space.

    For any given subdomain i at level `, the children of i are all subdomains jat level `` 1 such that j i. We write the collection of all children of i aschildpiq. We call i the parent of j and write i parentpjq.

    If childpiq H then we say i is a leaf subdomain.

    We say that i and j are siblings if parentpiq parentpjq.

    We say that j at level `1 is a neighbor of i at level ` if (i) the subdomain j isadjacent to i and (ii) either `

    1 ` or `1 ` and childpjq H. We write the

  • 12 CHAPTER 2. BACKGROUND MATERIAL

    collection of all neighbors of i as nborpiq. Note that this is not a symmetricrelation, i.e., it is possible that j P nborpiq and i R nborpjq due to thedependence on ` and `1.

    The collection ancestors of a subdomain i is defined recursively as the set ofall j such that either j parentpiq or j parentpkq for some k thatis an ancestor of i. We write the collection of ancestors as ancpiq.

    The descendents of a subdomain i are defined to be all nodes j such thati P ancpjq. We write the collection of descendants as descpiq.

    In our algorithms, we typically require a fixed but arbitrary bottom-up level-by-

    level traversal of the tree and order the subdomains accordingly such that a subdomain

    at level L is ordered before any subdomain at level L1 and so on. This total orderingof the subdomains induces corresponding orderings on the boxes within each level of

    the tree, L` for ` 1, . . . , L. For example, in the case of a regular tree with 2d`

    subdomains at level ` we obtain the orderings

    LL

    1, 2, . . . , 2dL(

    ,

    LL1

    2dL ` 1, 2dL ` 2, . . . , 2dL ` 2dpL1q(

    ,

    and so on.

    2.2 Block-structured elimination

    We begin with a brief review of block-structured elimination and its efficiency, which

    is central to the skeletonization algorithm.

    Let A P CNN be an N N matrix and suppose rN s I Y J YK is a partitionof the index set of A such that both AIK 0 and AKI 0, i.e., we have the block

  • 2.3. THE INTERPOLATIVE DECOMPOSITION 13

    structure

    A

    AII AIJ

    AJI AJJ AJK

    AKJ AKK

    fi

    ffi

    ffi

    ffi

    fl

    ,

    up to permutation. Assuming that the block AII is invertible, the DOFs I can bedecoupled as follows. First, define the matrices L and U as

    L

    I

    AJIA1II I

    I

    fi

    ffi

    ffi

    ffi

    fl

    , U

    I A1IIAIJI

    I

    fi

    ffi

    ffi

    ffi

    fl

    .(2.1)

    with the same block partitioning as A. Then, applying these operators on the left

    and right of A yields

    LAU

    AII

    SJJ AJK

    AKJ AKK

    fi

    ffi

    ffi

    ffi

    fl

    ,(2.2)

    where SJJ AJJ AJIA1IIAIJ is the only nonzero block of the resulting matrixthat has been modified.

    We say that SJJ is related to AJJ through a Schur complement update. Note

    that, while we choose here to write block-elimination in its simplest form, in practice

    it can be numerically advantageous to work with a factorization of AII as is done by

    Ho & Ying [43, Lemma 2.1] as opposed to inverting the submatrix directly. Either

    way, the cost of computing SJJ is Op|I|3 ` |I| |J |2q.

    2.3 The interpolative decomposition

    Another key linear algebra tool of which we will make heavy use is the interpolative

    decomposition [16].

  • 14 CHAPTER 2. BACKGROUND MATERIAL

    Definition 2.1. Given both a matrix AIJ P C|I||J | with rows indexed by I andcolumns indexed by J and a tolerance 0, an -accurate interpolative decom-position (ID) of AIJ is a partitioning of J into DOF sets associated with so-calledskeleton columns S J and redundant columns R J zS and a correspondinginterpolation matrix T such that

    }AIR AIST} }AIJ } ,

    or equivalently, assuming AIJ r AIR AIS s,

    AIJ AISr T I s

    }AIJ } .

    In other words, the redundant columns are approximated as a linear combination of

    the skeleton columns to within the prescribed relative accuracy, leading to a low-rank

    factorization of AIJ .

    Note that, while the ID error bound can be attained trivially by taking S J ,it is desirable to keep |S| as small as possible. The typical algorithm to compute anID uses a strong rank-revealing QR factorization as detailed by Gu & Eisenstat [33],

    though in practice a standard greedy column-pivoted QR tends to be sufficient. In

    either case, the computational complexity is Op|I| |J |2q.In the case where the entries of AIJ are given in terms of Kpxixjq for xi P I and

    xj P J (with additional factors depending only on xi and xj alone), the ID of AIJhas the nice property that it preserves this data-sparse representation. In particular,

    by forcing the interpolation basis to be a subset of columns S J , we maintain theproperty that AIS is given in terms of kernel interactions.

    In what follows, we will occasionally use the notation

    rS, R, TJ s idpA,J , q(2.3)

    to denote a function that returns the relevant pieces of an -accurate ID of a matrix

    A.

  • Chapter 3

    Review of some

    skeletonization-based factorizations

    With the previous definitions in tow we now review the key compression technique

    used in this thesis: skeletonization. In particular, what we descibe here was introduced

    by Ho & Ying [43] to cast the (additive) skeletonization process of Martinsson &

    Rokhlin [53] as a multiplicative factorization.

    3.1 Skeletonization

    Consider a matrix A P CNN indexed by points in our domain, and let B be a setof DOFs corresponding to a leaf box at the finest level of the spatial decomposition.

    With Bc as the relative complement of B in rN s, we have rN s BYBc by definition,so under an appropriate permutation P we can write A in block form as

    PAP

    ABB ABBc

    ABcB ABcBc

    fi

    fl .(3.1)

    We proceed by assuming that ABBc and ABcB are numerically low-rank to some spec-

    ified tolerance . We partition B into its redundant and skeleton DOFs B R Y S

    15

  • 16 CHAPTER 3. SKELETONIZATION REVIEW

    according to the ID

    ABcB

    ABBc

    fi

    fl

    ABcR ABcS

    ARBc ASBc

    fi

    fl

    ABcS

    ASBc

    fi

    fl r T I s,

    where we have assumed that the redundant DOFs R are ordered first within B forpedagogical purposes such that no further permutation is necessary. Inserting this

    ID into (3.1) by explicitly partitioning B into RY S, we obtain

    PAP

    ARR ARS TASBc

    ASR ASS ASBc

    ABcST ABcS ABcBc

    fi

    ffi

    ffi

    ffi

    fl

    .

    Defining the elimination matrices

    UT

    I T

    I

    I

    fi

    ffi

    ffi

    ffi

    fl

    , LT

    I

    T I

    I

    fi

    ffi

    ffi

    ffi

    fl

    (3.2)

    and multiplying on the left and right as appropriate, we obtain

    UT

    ARR ARS TASBc

    ASR ASS ASBc

    ABcST ABcS ABcBc

    fi

    ffi

    ffi

    ffi

    fl

    LT

    XRR XRS

    XSR ASS ASBc

    ABcS ABcBc

    fi

    ffi

    ffi

    ffi

    fl

    ,

    where

    XRS ARS TASSXSR ASR ASST

    XRR ARR TASR ARST` TASST

    are modified nonzero blocks arising due to the block elimination.

  • 3.1. SKELETONIZATION 17

    Following section 2.2 with I R, J S, and K Bc, we use XRR as a pivotblock to eliminate the other blocks in the first row and column to obtain

    LUTPAPLTU

    XRR

    XSS ASBc

    ABcS ABcBc

    fi

    ffi

    ffi

    ffi

    fl

    rZ pA;Bq ,(3.3)

    with appropriate definition of L and U as in (2.1). At this point, the redundant DOFs

    are completely decoupled from the rest of the problem, and the last block row and

    block column have not been otherwise modified.

    We refer to the process of forming rZ pA;Bq as skeletonization of A with respect tothe DOFs B. Notationally, we define the left and right skeletonization operators rVand rW as

    rV PU1T L1, rW U1L1T P

    .(3.4)

    Using this shorthand, we have

    rZ pA;Bq rV1A rW1.(3.5)

    Clearly the matrices rV and rW are highly structured since they are each the product

    of block unit-triangular matrices with one non-trivial block each. This means that

    working with them in block form explicitly is very efficient, both for computation

    and storage. In particular, we recall that the block unit-triangular matrices may be

    inverted by toggling the sign of the nonzero off-diagonal block, giving (for example)

    U1T

    I T

    I

    I

    fi

    ffi

    ffi

    ffi

    fl

    , L1T

    I

    T I

    I

    fi

    ffi

    ffi

    ffi

    fl

    (3.6)

    in the case of (3.2).

    To make it explicit that only a small amount of information is needed to represent

  • 18 CHAPTER 3. SKELETONIZATION REVIEW

    skeletonization of a matrix A with respect to a DOF set, we will occasionally write

    the process in functional form as

    S, R, XSS , XRR, rV, rW

    skelpA,B, q,(3.7)

    where rV and rW depend are understood to be stored as a product of operators in block

    form that can be applied and inverted cheaply. Clearly one can construct rZ pA;Bqimplicitly from the information returned by skelpA,B, q.

    3.1.1 Group skeletonization

    Notationally, it will be useful to extend the notion of skeletonization to multiple

    disjoint index sets (see also Ho & Ying [43]). Consider two index sets Bi and Bj withBi X Bj H, e.g., two DOF sets corresponding to distinct subdomains at the samelevel of the spatial hierarchy. Performing the skeletonizations of A with respect to Biand Bj independently, we obtain

    Si, Ri, XSiSi , XRiRi , rVi, rWi

    skelpA,Bi, q,

    Sj, Rj, XSjSj , XRjRj , rVj, rWj

    skelpA,Bj, q,

    where the fact that Bi and Bj are disjoint mean that rVi and rVj commute (and similarlyfor rWi and rWj). It is evident in (3.3) that skeletonization does not affect blocks of A

    indexed by Bci except to introduce zeros in place of the ARiBci and ABciRi blocks. Thus,we find that, with K prN szBiqzBj as the rest of the world,

    rV1jrV1i A

    rW1irW1j rV1i rV1j A rW1j rW1i

    XRiRi

    XSiSi ASiSj ASiK

    XRjRj

    ASjSi XSjSj ASjK

    AKSi AKSj AKK

    fi

    ffi

    ffi

    ffi

    ffi

    ffi

    ffi

    ffi

    ffi

    fl

    rZ pA; tBi,Bjuq .

  • 3.2. THE RECURSIVE SKELETONIZATION FACTORIZATION 19

    The subtlety of this is that we have performed the skeletonizations in parallel as

    opposed to first computing rZ pA;Biq and then computing rZ

    rZ pA;Biq ;Bj

    . This can

    lead to slightly different results in the computation of the ID to determine Sj andRj, but this is not due to the introduction of additional error it is simply differenterror.

    More generally, given a pairwise-disjoint collection of index sets C tB1, . . . ,Bmuwith each Bi rN s, we let

    Si, Ri, XSiSi , XRiRi , rVi, rWi

    skelpA,Bi, q

    for each i 1, . . . ,m and define the simultaneous group skeletonization of A withrespect to C as B rZ pA; C q such that

    For each i 1, . . . ,m, BRiRi XRiRi is the only nonzero block in its block rowand block column.

    For each i 1, . . . ,m, BSiSi XSiSi .

    All other blocks of B are unmodified, such that, e.g., BSiSj ASiSj for i j.

    We write

    rZ pA; C q rV1C A rW1C

    BiPC

    rV1i

    A

    BiPC

    rW1i

    ,(3.8)

    where the matrices rVC and rWC are again not stored explicitly but are useful nota-

    tionally. Similarly to before, we may define the functional form

    tSi, Ri, XSiSi , XRiRiuBiPC , rVC , rWC

    skel grouppA,C , q.

    3.2 The recursive skeletonization factorization

    We now combine the idea of skeletonization with the hierarchical tree decomposition

    of the domain to construct the recursive skeletonization factorization [43, 54]. We

  • 20 CHAPTER 3. SKELETONIZATION REVIEW

    will suppose that the ID tolerance and tree occupancy parameter nocc are specified,

    and that we have a tree decomposition of the domain with levels ` 0, . . . , L.We assume we are operating in the context of a Nystrom discretization of (1.1) for

    exposition, though the factorization is more general.

    3.2.1 Level L

    We begin at level ` L. For each subdomain (or box) at level L, we define theactive1 DOFs Bi Ii and identify a box with its active DOFs. In this way, we maythink of LL in section 2.1 as a collection of active DOF sets, one corresponding to

    each box at level L, such that

    LL tBi |i is a subdomain at level L of the treeu.(3.9)

    Group skeletonization of K with respect to LL gives (via a rearrangement of (3.8))

    K rVLLrZ pK; LLq rWLL

    BiPLL

    rVi

    rZ pK; LLq

    BiPLL

    rWi

    ,(3.10)

    where in rZ pK; LLq we have identified and decoupled all redundant DOFs at level` L while leaving unchanged the off-diagonal blocks of K corresponding to kernelinteractions between skeleton DOFs in distinct leaf-level subdomains as described in

    subsection 3.1.1. We will define the set of all thusfar decoupled DOFs as

    RLL

    BiPLL

    Ri,(3.11)

    as each Ri now indexes only a decoupled diagonal block in the new matrix rZ pK; LLqand will not play a role in further levels.

    We note a technical point here in successive levels of skeletonization. In section 3.1

    when skeletonizing with respect to B we considered compressing the block KBBc ,where Bc was the relative complement of B in rN s. Due to the fact that we have

    1The notion of active DOFs will be made clear soon.

  • 3.2. THE RECURSIVE SKELETONIZATION FACTORIZATION 21

    now completely decoupled the DOFs RLL , we use Bc at level ` L 1 to refer toa different relative complement: the relative complement of B in rN szRL . In otherwords, any DOFs at level ` L that have been decoupled already no longer needto be considered in future skeletonizations. We continue to use Bc in this mannerthroughout the thesis.

    3.2.2 Level L 1

    At the next level of the tree, corresponding to ` L 1, we aim to repeat the samegroup skeletonization process that we have performed for level ` L. We must,however, take into account the DOFs that have already been decoupled. With this

    in mind, we define the active DOFs at this level as Bi IizRLL for each box in thequadtree at this level, and once again identify boxes with active DOF sets such that

    we may consider

    LL1 tBi |i is a subdomain at level L 1 of the treeu.

    Note that, while at level L there is no distinction between DOFs and active DOFs

    for a box, at higher levels the distinction is that the active DOFs do not include DOFs

    marked redundant at previous levels.

    Due to the structure of rZ pK; LLq that we identified in subsection 3.1.1, we seethat, for any two distinct sets of active DOFs in LL1, the corresponding off-diagonal

    block of rZ pK; LLq is unmodified from what it was in K. This is due to the nestingproperty of DOF sets: off-diagonal blocks corresponding to distinct sets of active

    DOFs in LL were unmodified, and sets of active DOFs in LL1 are given by taking

    the union of skeleton DOFs from the previous level. Mathematically, we have

    Bi IizRLL

    jPchildpiq

    Sj

    for a subdomain i at level L 1. Thus, we may skeletonize rZ pK; LLq with respect

  • 22 CHAPTER 3. SKELETONIZATION REVIEW

    to LL1 to obtain

    rZ pK; LL,LL1q rZ

    rZ pK; LLq ; LL1

    rV1LL1rZ pK; LLq rW1LL1

    BiPLL1

    rV1i

    rZ pK; LLq

    BiPLL1

    rW1i

    ,

    which we rearrange as

    rZ pK; LLq rVLL1rZ pK; LL,LL1q rWLL1

    analogously to (3.10). We define

    RLL1 RLL Y

    BiPLL1

    Ri

    (3.12)

    to be the set of all thusfar decoupled DOFs.

    3.2.3 Higher levels

    For higher levels up to and including ` 1, we define Bi IizRL``1 and continue toidentify active DOF sets with their corresponding boxes as

    L` tBi |i is a subdomain at level ` of our treeu.

    Skeletonizing rZ pK; LL, . . . ,L``1q with respect to L` decouples new redundant DOFsat level `, which we use to define RL` analogously to (3.12). It is informative tovisualize the active DOFs at each level as in Figure 3.1 to get a sense of the algorithms

    progression.

    After finishing this process for level ` 1, we obtain

    K rVLLrVLL1 . . . rVL2rVL1PtDPt rWL1 rWL2 . . . rWLL1 rWLL ,

    where Pt is a permutation for the top level ` 0 such that all skeleton DOFs from

  • 3.2. THE RECURSIVE SKELETONIZATION FACTORIZATION 23

    ` 3 ` 2 ` 1 ` 0

    Figure 3.1: We show the active DOFs before skeletonizing each level ` of RS on aquasi-1D problem (top) and true 2D problem (bottom). We see that the DOFs clusternear the edges of the boxes of the quadtree at each level.

    level ` 1 are contiguous. The middle matrix D is given by

    D Pt rZ pK; LL,LL1, . . . ,L1qPt(3.13)

    and thus has block-diagonal structure, since each set of redundant DOFs Ri is de-coupled from all other DOFs. Letting Ft be some sort of factorization of D (e.g., a

    Cholesky, LU, or LDL factorization, as appropriate) we may finally define the full

    recursive skeletonization factorization (RS) as

    K

    L

    `1

    rVL`

    PtFtPt

    1

    `L

    rWL`

    F.(3.14)

    We give this in algorithmic form in Algorithm 1.

    3.2.4 The use of a proxy surface

    As we saw in the previous section, RS can be effectively summarized as repeatedly

    identifying redundant DOFs using an ID of off-diagonal blocks and then decoupling

    those DOFs using block row and column operations. What makes this efficient in

  • 24 CHAPTER 3. SKELETONIZATION REVIEW

    Algorithm 1 The recursive skeletonization factorization (RS)

    1: // Initialize

    2: A : K3: for ` : L down to 1 do4: for each box i P L` do5: // Identify relevant DOFs for skeletonization

    6: rBi,Bci s : tactive DOFs in box and rest of worldu7: end for8: // Perform group skeletonization with respect to DOFs

    9: A : rZ pA; L`q rV1L` ArW1L`

    BiPL`rV1i

    A

    BiPL`rW1i

    10: end for11: // Store middle block diagonal matrix and permutation

    12: D : Pt APt13: Ft : tsome factorization of Du14: Output: F as in (3.14)

    practice is the use of what has come to be known as the proxy trick, a prominent

    computational acceleration in the literature [16, 17, 27, 29, 41, 43, 53, 61, 84].

    The proxy trick takes advantage of the fact that the matrix K is typically related

    to an underlying elliptic PDE, and thus the kernel function Kpzq typically satisfiessome form of a Greens identity wherein the values of the kernel inside a domain

    can be recovered from those on the boundary of that domain. We will not justify

    this rigorously here, but defer to Ho & Ying [43] for a thorough description (see also

    subsection 7.2.1). Instead, we focus on the computational consequences.

    Suppose that B is some subdomain in our quadtree with DOFs J and that wewish to compute an interpolative decomposition of KJ cJ . In principle, the cost of this

    is Op|J c| |J |2q, which can be as big as OpNq at early levels. Using the proxy trick,we draw a circle (sphere in 3D) around B as seen in Figure 3.2a. This proxy surface,

    , partitions the DOF set Jc into two sets: DOFs O associated with subdomainsintersected by , and DOFs P associated with subdomains fully outside relativeto B. Discretizing the proxy surface with nprox points tyiunproxi1 and defining thematrix GJ with entries rGJ sij Kpyi xjq for i 1, . . . , nprox and xj P J , we

  • 3.2. THE RECURSIVE SKELETONIZATION FACTORIZATION 25

    begin by computing an ID

    KOJ

    GJ

    fi

    fl

    KOR KOS

    GR GS

    fi

    fl

    KOS

    GS

    fi

    fl

    T I

    .(3.15)

    Using the same partitioning J R Y S and interpolation matrix T, the beauty ofthe proxy trick is that we get an ID of the full off-diagonal block for free,

    KJ cJ

    KJ cR KJ cS

    KJ cS

    T I

    .

    The use of the proxy surface is discussed in more detail in subsection 7.2.1 for an

    analogous trick in the case of strong admissibility. The key aspects, at this point, are:

    The cost of the ID has been reduced from Op|J c| |J |2q to Op|J |3`nprox|J |2q,which is substantially smaller when |P | is large.

    The ID in (3.15) does not depend on KPJ , only on the block KOJ correspondingto interactions between B and DOFs belonging to boxes in nborpBq.

    3.2.5 Complexity of RS using the proxy trick

    Using the proxy trick, the cost of RS is essentially determined by the number of skele-

    ton DOFs for each subdomain in the tree, i.e., |Si| for each i. Letting k` be an upperbound on |Si| for all boxes on level `, we find that the growth of k` with the numberof points N and dimensionality d depends on the distribution of points. As discussed

    by Ho & Ying [43, subsection 3.4], the skeleton DOFs tend to line the boundaries of

    their corresponding subdomains, a phenomenon we observe in Figure 3.1. Assuming

    the points txiuNi1 Rd lie on a -dimensional manifold ( d), they give the result

    k`

    $

    &

    %

    OpL `q 1,

    Op2p1qpL`qq 2.(3.16)

    which can be justified by standard multipole estimates. In particular, we note that in

    2D k` tends to double each level we move up the tree. This matches the observation

  • 26 CHAPTER 3. SKELETONIZATION REVIEW

    B

    (a) (b)

    Figure 3.2: (a): Using the dotted black circle as a proxy surface when skeletonizingDOFs in the dark gray box, only interactions between that box and the light grayboxes adjacent to it need be considered. In particular, all interactions between thedark gray box and the white boxes will be represented by equivalent interactionsusing the proxy surface. (b): If the black grid corresponds to edges of boxes at level`, then the assignment of DOFs to edges is given using the Voronoi tessellation aboutthe edge centers (gray rotated grid).

    that skeletons line the boundary of a subdomain, since the sidelength of a subdomain

    doubles with each level we go up the tree.

    The full complexity estimate assuming this growth of k` is given in Theorem 3.1.

    We remark that these complexity estimates do not include the cost of construct-

    ing an initial tree decomposition with a specified occupancy parameter nocc, which

    technically has complexity OpN logNq but is negligible in practice.

    Theorem 3.1 ([41, 43, 53]). Assume that (3.16) holds and that we use the proxy

    trick for accelerated compression. Then the computational complexity of constructing

    the recursive skeletonization factorization F in (3.14) is

    Tfactor

    $

    &

    %

    OpNq 1,

    OpN3p11{qq 2,

    while the cost of applying or solving systems with F by exploiting block structure and

  • 3.3. THE HIERARCHICAL INTERPOLATIVE FACTORIZATION 27

    (3.6) is

    Tapply Tsolve

    $

    &

    %

    OpNq 1,

    OpN logNq 2,

    OpN2p11{qq 3,

    with constants in all cases that depend on the ID tolerance . The storage complexity

    is the same as the apply cost.

    3.3 The hierarchical interpolative factorization

    While RS has asymptotically optimal efficiency for quasi-1D problems, in higher di-

    mensions the growth of k` in (3.16) leads to less than optimal complexity. The

    hierarchical interpolative factorization (HIF) [42, 43] was developed on top of RS to

    remedy this rank growth. In what follows we describe HIF for 2D problems, though

    a more complete treatment is given by Ho & Ying [43].

    At level ` L we begin just like in RS, defining LL as in (3.9) and skeletonizingK with respect to LL to obtain

    K rVLLrZ pK; LLq rWLL

    BiPLL

    rVi

    rZ pK; LLq

    BiPLL

    rWi

    ,

    identically to (3.10). The set of thusfar decoupled DOFs is again RLL as in (3.11).

    3.3.1 Level L 1{2

    It is at this point that things differ from RS. Rather than proceeding directly to level

    ` L 1, HIF first introduces an extra level of skeletonization with the goal ofdecoupling additional DOFs. We denote this level with a half-integer, such that after

    ` L we move to ` L 1{2.The key to HIF is using additional geometry outside of the tree decomposition of

    space, moving from skeletonization based on boxes to skeletonization based on edges.

  • 28 CHAPTER 3. SKELETONIZATION REVIEW

    Each box in the quadtree decomposition at level L has four edges, though we note

    that adjacent boxes share an edge. Performing a Voronoi tessellation of the domain

    with the Voronoi cells centered on each of these edges defines a new decomposition of

    space, as we see in Figure 3.2b. In general, the Voronoi cells are square subdomains

    with the diagonals of the squares given by edges of the quadtree boxes.

    Each Voronoi cell Vi geometrically contains active DOFs indexed by

    Ei prN szRLLq X Vi .

    In particular, the DOFs Ei come from the two boxes sharing the corresponding edgeon which Vi is centered We note in passing that in the case of a non-uniform quadtree,

    it is possible that these boxes are of different sizes and only one is on level L, so Eimay contain DOFs that are not part of Sj or Rj for any Bj P LL.

    Defining the half-integer level L 1{2 as

    LL1{2 tEi |Vi is an edge subdomain at level L 1{2u,(3.17)

    we skeletonize rZ pK; LLq with respect to LL1{2 to obtain

    rZ pK; LLq rVLL1{2rZ`

    K; LL,LL1{2

    rWLL1{2

    EiPLL1{2

    rVi

    rZ`

    K; LL,LL1{2

    EiPLL1{2

    rWi

    .

    We define the full set of redundant DOFs so far RL`1{2 analogously to (3.12).

    3.3.2 Higher levels

    HIF continues for higher levels by alternating between integer box levels and half-

    integer edge levels of skeletonization, stopping at level 1{2 where we have the ap-proximation

    K rVLLrVLL1{2rVLL1 . . . rVL3{2rVL1rVL1{2PtDPtrWL1{2

    rWL1rWL3{2 . . .

    rWLL1rWLL1{2

    rWLL .

  • 3.3. THE HIERARCHICAL INTERPOLATIVE FACTORIZATION 29

    Here, Pt is a permutation for the top level ` 0 such that all skeleton DOFs fromlevel ` 1{2 are contiguous. The middle matrix D is given by

    D Pt rZ`

    K; LL,LL1{2,LL1, . . . ,L3{2,L1,L1{2

    Pt

    and again has block-diagonal structure as in RS. Letting Ft be a factorization of D as

    before, we define the full hierarchical interpolative factorization as

    K

    L

    `1

    rVL`rVL`1{2

    PtFtPt

    1

    `L

    rWL`1{2rWL`

    F.(3.18)

    We give this in algorithmic form in Algorithm 2. As with RS, we plot the active

    DOFs at each level for visualization purposes in Figure 3.3.

    There are a few extra points to Algorithm 2 that we do not address here in detail.

    For example, whereas in RS all IDs can be shown to be applied to original blocks of the

    matrix K, in HIF these blocks will include rows and columns that have been modified

    by Schur complement updates from previous levels. Additionally, HIF in 3D has an

    extra level of complication on top of what we describe here, wherein we alternate not

    between boxes and edges but between boxes, faces, and edges, iteratively reducing

    the dimensionality of the DOF sets from 3D to quasi-2D to quasi-1D to quasi-0D. We

    direct the reader to Ho & Ying for a thorough discussion [43].

    3.3.3 Complexity of HIF using the proxy trick

    Using the proxy trick as we did with RS, we again require a bound on the number of

    skeleton DOFs at each level. The additional levels of compression in HIF compared

    to RS are intended to keep this number small, which is observed to be effective if we

    consider Figure 3.3. Letting k` be as in (3.16) and assuming

    k` OpL `q,(3.19)

    we have the following result from Ho & Ying [43].

  • 30 CHAPTER 3. SKELETONIZATION REVIEW

    ` 3 ` 2.5 ` 2 ` 1.5

    ` 1 ` 0.5 ` 0Figure 3.3: We show the active DOFs before skeletonizing each level ` of HIF in 2D.The growth of k` that was observed in the bottom of Figure 3.1 appears to have beenreduced dramatically.

    Algorithm 2 The hierarchical interpolative factorization (HIF)

    1: // Initialize

    2: A : K3: for ` : L down to 1 do4: for each box i P L` do5: // Identify relevant DOFs for skeletonization

    6: rBi,Bci s : tactive DOFs in box and rest of worldu7: end for8: // Perform group skeletonization with respect to DOFs

    9: A : rZ pA; L`q rV1L` ArW1L`

    BiPL`rV1i

    A

    BiPL`rW1i

    10: for each edge i P L`1{2 do11: // Identify relevant DOFs for skeletonization

    12: rEi, Eci s : tactive DOFs in edge and rest of worldu13: end for14: // Perform group skeletonization with respect to DOFs

    15: A : rZ`

    A; L`1{2

    rV1L`1{2ArW1L`1{2

    EiPL`1{2rV1i

    A

    EiPL`1{2rW1i

    16: end for17: // Store middle block diagonal matrix and permutation

    18: D : Pt APt19: Ft : tsome factorization of Du20: Output: F as in (3.18)

  • 3.3. THE HIERARCHICAL INTERPOLATIVE FACTORIZATION 31

    Theorem 3.2 ([43]). Assume that (3.19) holds and that we use the proxy trick for

    accelerated compression. Then the computational complexity of constructing the

    hierarchical interpolative factorization F in (3.18) is

    Tfactor OpNq,

    while the cost of applying or solving systems with F by exploiting block structure and

    (3.6) is

    Tapply Tsolve OpNq

    with constants in all cases that depend on the ID tolerance . The storage complexity

    is the same as the apply cost.

    Again, the complexity estimates of Theorem 3.2 does not include the (negligi-

    ble) cost of constructing an initial tree decomposition with a specified occupancy

    parameter nocc, which has complexity OpN logNq.

  • 32 CHAPTER 3. SKELETONIZATION REVIEW

  • Chapter 4

    Updating skeletonization-based

    factorizations

    Having reviewed RS and HIF and how they are used to factor matrices K as in (1.2),

    we now turn to the first major contribution of this thesis: efficient updating of the

    factorization F of K in response to a sequence of localized perturbations. By localized

    perturbation we mean that, given a matrix K discretizing an original problem of the

    form (1.1) and a matrix qK discretizing a new perturbed problem of the same form,

    there is a small local subdomain loc such that for all DOF sets I and J withI X loc H and J X loc H, we have

    qKpI,J q KpI,J q,(4.1)

    where we explicitly use the MATLAB R index notation for clarity. Put simply, blocks

    of the system matrix that correspond to DOFs away from the modifications are un-

    changed. Such local perturbations include (but are not limited to):

    Localized geometric perturbations (see Figure 4.1), wherein the domain of in-tegration is modified and therefore a subset of discretization points of may

    move or discretization points may be added or removed.

    Localized coefficient perturbations, wherein the material parameters apxq, bpxq,or cpxq are modified in a local region.

    33

  • 34 CHAPTER 4. UPDATING FACTORIZATIONS

    Figure 4.1: As an example of a localized perturbation to the geometry, we start withthe quasi-1D domain 1, the square with rounded corners following the dashedcurve. Then, for updating we adjust the rounding parameter to obtain 1 2, thesquare with the sharper, solid corners.

    By a sequence of localized perturbations, we mean that we are interested in ap-

    plications where there are a number of localized perturbations

    K Kp1q Kp2q Kpi1q Kpiq . . . ,(4.2)

    where each perturbation Kpi1q Kpiq is localized to some subdomain loci that weallow to be different for each i. Such sequences of problems can arise, e.g., in the case

    of design problems where the physical system described by the linear operator is a

    device that we want to design in an effort to optimize some objective function. We

    make the following observations:

    Localized perturbations lead to a global low-rank modification in the sensethat entire rows and columns of the new matrix Kpiq are different from the

    corresponding rows and columns in Kpi1q, if such a correspondance even exists.

    Because each perturbation can be localized to a different subdomain, for largei the matrix Kpiq is not necessarily given by a low-rank modification to K.

    Because the perturbations we consider respect the same physical structure used

    in the construction of hierarchical factorizations (i.e., spatial locality), it is not un-

    reasonable to believe it might be possible to take a hierarchical factorization of Kpi1q

    and update it to obtain a hierarchical factorization of Kpiq. This is what the method

    we describe in this work accomplishes in an efficient way for certain factorizations.

  • 4.1. APPROACHES TO UPDATING 35

    4.1 Approaches to updating

    The idea of updating matrix factorizations to solve sequences of related systems is not

    a new one. For example, in the linear programming community it is common practice

    to maintain an LU factorization of a sparse matrix A that permits the addition or

    deletion of rows/columns of A, or a general rank-one update [26]. Further, it is

    well-known how to update the QR factorization of a matrix after any of those same

    operations [28].

    The updating techniques described above, however, do not apply to fast hierar-

    chical factorizations. Updating factorizations in the H-matrix format in response tolocal modifications has been previously studied in the thesis of Djokic [21], wherein

    a similar process to this work is used to update the representation of the forward

    operator, which allows for a post-processing step to obtain the updated inverse in the

    same format. Updating of the skeletonization-based formats we consider here has not

    appeared thus far in the literature, and, as we show, these formats admit efficient

    one-pass updating.

    In the case where the number of unknowns does not change and loci is the same

    for all i, it is possible to order the unknowns in an LU decomposition such that those

    that will be modified are eliminated last as in [63], which can be used to update LU

    factorizations for integral equation design problems where only one small portion of

    the geometry is to be changed across all updates. Similarly, if the total number of

    unknowns modified between K and Kpiq is small and one is interested only in solving

    systems and not in updating factorizations, then for any factorization of the base

    system K it is relatively efficient to keep track of the updates as a global rank-k

    update Kpiq K ` UCV with U P CNk, C P Ckk, and V P CkN and use theSherman-Morrison-Woodbury (SMW) formula,

    pK` UCVq1 K1 K1U`

    C1 ` VK1U1

    VK1,(4.3)

    taking advantage of the initial factorization of K as is done by Greengard et al. [29]

    for hierarchical factorizations.

  • 36 CHAPTER 4. UPDATING FACTORIZATIONS

    4.1.1 Our contribution

    In this work we present a method to efficiently update skeletonization-based hierar-

    chical factorizations in response to localized perturbations, i.e., to take a factorization

    corresponding to Kpi1q in (4.2) and obtain a factorization of Kpiq. We illustrate our

    approach using the language of skeletonization, in particular RS and HIF, though it

    is simple to generalize to any factorization using bottom-up hierarchical compression

    of off-diagonal subblocks.

    There are a number of advantages to our approach over using the SMW formula

    to solve a system with Kpiq. In the case where the number of unknowns that have

    been modified between K and Kpiq is bounded by a small constant m and the cost of

    solving a system with the existing factorization of K is OpNq, the cost of a solve using(4.3) (dropping terms that dont depend on N) is OpN ` mNq, where the secondterm can be amortized across multiple right-hand-sides. However, if the number of

    total modified unknowns m comprises any substantial fraction of N then this is not

    a viable strategy.

    In contrast, under certain assumptions on the attainable compression of off-

    diagonal blocks in the factorizations considered in this paper, if the number of modi-

    fied unknowns between two factorizations is bounded by m then the asymptotic cost

    of our updating method is Opm logpNq for some small p. Furthermore, one obtainsa factorization of the new matrix and not just a method for solving systems. This

    factorization can of course be subsequently efficiently updated, but can be useful for

    other reasons such as computing determinants or applying or solving with a matrix

    square root.

    We focus in this work on the 2D case d 2.

    4.2 Updating algorithm

    Given RS in Algorithm 1 and HIF in Algorithm 2, we consider updating existing

    instantiations of these factorizations in response to a localized modification to the

    problem. Concretely, we suppose that we have on hand a factorization corresponding

  • 4.2. UPDATING ALGORITHM 37

    to the initial problem with matrix K and assume a new matrix qK is obtained by

    discretizing a locally-perturbed problem. For simplicity of exposition, we initially

    assume that the perturbation does not change the total number of points N and does

    not necessitate a change in the structure of the hierarchical decomposition of space,

    i.e., the old quadtree is still valid for the new problem with the same occupancy

    bound nocc. Extension to the more general case is straightforward.

    4.2.1 Initial observation on updating a skeletonization

    We will begin with a detailed discussion of updating when the base factorization is

    RS, and later describe the necessary modifications for HIF. It will be useful to write

    RS in a slightly different way than described in Algorithm 1 to make explicit exactly

    which blocks are modified as a consequence of each step of skeletonization. Thus, we

    rewrite the RS algorithm in modified form in Algorithm 3

    Algorithm 3 RS in modified form

    1: // Initialize

    2: A : K3: for ` : L down to 1 do4: // Get blocks and operators

    5:

    tSi, Ri, XSiSi , XRiRiuBiPL` , rVL` , rWL`

    : skel grouppA,L`, q6: // Assemble skeletonization

    7: for each B P L` with B S YR do8: Ap:,Rq : 09: ApR, :q : 0

    10: ApS,Sq : XSS11: ApR,Rq : XRR12: end for13: end for14: // Store middle block diagonal matrix and permutation

    15: D : Pt APt16: Ft : tsome factorization of Du17: Output: F as in (3.14)

    Recall from subsection 3.2.4 that the use of a proxy surface gives a strong notion

    of locality to the skeletonization process, wherein the ID in (3.15) does not depend

  • 38 CHAPTER 4. UPDATING FACTORIZATIONS

    on KPJ but only on KOJ , where J is the set of DOFs for a box B and O is the setof DOFs belonging to boxes in nborpBq. This locality has implications in updating.

    Namely, suppose that we are considering a subdomain B i at the lowest levelof our quadtree with corresponding DOFs J B and neighbor DOFs O. Assumingthat B X loc H and O X loc H, i.e., that

    qKpB,B YOq KpB,B YOq and qKpB YO,Bq KpB YO,Bq,(4.4)

    we find that the ID

    rS, R, TBs idpK,B, q

    is identical to the ID

    rS, R, TBs idpqK,B, q

    in the sense shown, i.e., the output is exactly the same. We remark that this is

    not a statement of the form both IDs are accurate to tolerance but rather, a

    stronger statement: assuming the same proxy surface and deterministic floating-point

    arithmetic, the output is the same.

    By following the algebra in section 3.1, we see that the same statement holds for

    the full skeletonization with respect to B: if B X loc H and O X loc H, thenthe skeletonization

    S, R, XSS , XRR, rV, rW

    skelpK,B, q

    returns the same result as the skeletonization

    S, R, XSS , XRR, rV, rW

    skelpqK,B, q.

    The fact that the proxy surface shields the subdomain B from the effect of

    perturbations to the problem that are far from B and thus the local skeletonizations

    do not change is not a complicated mathematical result, but it is a powerful one that

  • 4.2. UPDATING ALGORITHM 39

    forms the core of the updating algorithm. We have described already how to apply

    this idea at the lowest level of the quadtree, now we must discuss some propagation

    rules for applying this idea at higher levels.

    4.2.2 Propagation rules for higher levels

    From our discussion in subsection 4.2.1, we know that when (4.4) does not hold then

    we must possibly recompute the skeletonization with respect to B. We define thecollection of marked DOF sets (or simply marked boxes) at level L as

    ML tBi |i is a subdomain at level L of the tree and (4.4) does not holdu ,(4.5)

    such that ML LL contains all the DOF sets with respect to which we must redoskeletonization. The remainder of this section is dedicated to describing how to define

    the marked set M` at higher levels.

    For ` L, suppose that Bi is the DOF set for a box i at level `` 1 with respectto which we have redone skeletonization. Further, let j parentpiq have activeDOFs Bj. If qSi is the new set of skeleton DOFs arising from the reskeletonizationwith respect to Bi, then we know by definition of Bj that qSi Bj. This leads to themost self-evident propagation rule: if Bi at level `` 1 is marked, then so is its parentBj at level `. Based on this we define for each the collection

    P` tBj |j parentpiq for some i with Bi P M``1u ,

    which contains all parent boxes of marked boxes at the previous level.

    The next propagation rule accounts for the fact that boxes must be reskeletonized

    if any of their neighbors are modified. In particular, if Bi P M``1 and Bj P L` are suchthat j parentpiq, then we see from before that Bj P P`. However, this impliesthat for any Bk such that j P nborpkq we must also perform reskeletonization withrespect to Bk. This is because the DOFs Bj are possibly inside the proxy circle around

  • 40 CHAPTER 4. UPDATING FACTORIZATIONS

    B

    Figure 4.2: Left: Suppose the local perturbations are contained in box B with DOFsB so that possibly qKp:,Bq Kp:,Bq or qKpB, :q KpB, :q. Initially ML contains theDOF sets corresponding to the shaded boxes. Center: At level L 1, DOF setscorresponding to the dark gray boxes are in PL1 and thus ML1 because they havemarked children, and the light gray boxes are in UL1 and thus ML1 because theyhave neighbors in PL1 Right: The corresponding quadtree with nodes shaded thesame as their associated boxes.

    Bk, so there is no longer any guarantee of shielding. Thus, we define the collection

    U` tBk |j P nborpkq for some Bj P P`u ,

    which corresponds to neighbors of parents of marked boxes at level `` 1.If the quadtree is perfect, the previous two rules are sufficient to define the new

    marked set. As a technical detail however, we note that due to heterogeneous refine-

    ment it is possible that there are leaf boxes i at levels ` L that have been directlymodified, i.e., for which (4.4) does not hold. Such boxes are also clearly marked,

    though they may not be covered by the previous two rules. Combining this rule with

    the previous two leads us to define the collection of marked DOF sets for levels ` Las

    M` tBi |i P L` is a leaf box for which (4.4) does not holdu YP` YU`.

    We see an example of the evolution of the marked set M` in Figure 4.2.

    4.2.3 Updating a group skeletonization

    At this point, we have identified the collection of DOFs M` at each level such that

    in updating RS we possibly need to reskeletonize with respect to Bi P M` but do not

  • 4.2. UPDATING ALGORITHM 41

    need to reskeletonize with respect to Bj P L`zM`. So, consider the skeletonization ofA rZ

    qK; LL,LL1, . . . ,L``1

    with respect to L`,

    tSi, Ri, XSiSi , XRiRiuBiPL` , rVL` , rWL`

    skel grouppA,L`, q,

    with corresponding factorization

    rZ pA; L`q rV1L`A rW1L`

    BiPL`

    rV1i

    A

    BiPL`

    rW1i

    .

    Using the previous observation and the fact that the factors inside the parentheses

    above commute, we may write

    rZ pA; L`q

    BjPL`zM`

    rV1j

    BiPM`

    rV1i

    A

    BiPM`

    rW1i

    BjPL`zM`

    rW1j

    ,

    where we observe that the matrices in the product over L`zM` are the same asin the initial skeletonization of rZ pK; LL,LL1, . . . ,L``1q with respect to L`. Thisobservation leads us to decompose the skeletonization with respect to L` as

    rZ pA; L`q rZ

    rZ pA; M`q ; L`zM`

    ,

    where the outer skeletonization need not be recomputed since we already know what

    the result will be. Thus, the necessary computation to update this skeletonization is

    simply

    tSi, Ri, XSiSi , XRiRiuBiPM` , rVM` , rWM`

    skel grouppA,M`, q,

    with rVM` andrWM` defined analogously to

    rVL` andrWL` .

    4.2.4 Updating RS

    In light of the previous discussion, it is straightforward to write the process of

  • 42 CHAPTER 4. UPDATING FACTORIZATIONS

    Algorithm 4 Updating RS after a local perturbation

    1: // Initialize

    2: A : qK3: for ` : L down to 1 do4: // Get new blocks and operators

    5:

    tSi, Ri, XSiSi , XRiRiuBiPM` , rVM` , rWM`

    : skel grouppA,M`, q6: // Assemble new parts of skeletonization

    7: for each B P M` with B S YR do8: Ap:,Rq : 09: ApR, :q : 0

    10: ApS,Sq : XSS11: ApR,Rq : XRR12: end for13: // Assemble old parts of skeletonization

    14: for each B P L`zM` with B S YR do15: Ap:,Rq : 016: ApR, :q : 017: ApS,Sq : XSS18: ApR,Rq : XRR19: end for20: e