consistent manifold representation for topological …math.gmu.edu/~tsauer/pre/tdafds.pdf · these...

38
Foundations of Data Science doi:10.3934/fods.2019001 c American Institute of Mathematical Sciences Volume 1, Number 1, March 2019 pp. 1–38 CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL DATA ANALYSIS Tyrus Berry and Timothy Sauer * Department of Mathematical Sciences Fairfax, VA 22030, USA Abstract. For data sampled from an arbitrary density on a manifold embed- ded in Euclidean space, the Continuous k-Nearest Neighbors (CkNN) graph construction is introduced. It is shown that CkNN is geometrically consistent in the sense that under certain conditions, the unnormalized graph Laplacian converges to the Laplace-Beltrami operator, spectrally as well as pointwise. It is proved for compact (and conjectured for noncompact) manifolds that CkNN is the unique unweighted construction that yields a geometry consistent with the connected components of the underlying manifold in the limit of large data. Thus CkNN produces a single graph that captures all topological fea- tures simultaneously, in contrast to persistent homology, which represents each homology generator at a separate scale. As applications we derive a new fast clustering algorithm and a method to identify patterns in natural images topo- logically. Finally, we conjecture that CkNN is topologically consistent, meaning that the homology of the Vietoris-Rips complex (implied by the graph Lapla- cian) converges to the homology of the underlying manifold (implied by the Laplace-de Rham operators) in the limit of large data. 1. Introduction. Building a discrete representation of a manifold from a finite data set is a fundamental problem in machine learning. Particular interest pertains to the case where a set of data points in a possibly high-dimensional Euclidean space is assumed to lie on a relatively low-dimensional manifold. The field of topo- logical data analysis (TDA) concerns the extraction of topological invariants such as homology from discrete measurements. Currently, there are two major methodologies for representing manifolds from data sets. One approach is an outgrowth of Kernel PCA [37], using graphs with weighted edges formed by localized kernels to produce an operator that converges to the Laplace-Beltrami operator of the manifold. These methods include versions of diffusion maps [1, 12, 5, 4] that reconstruct the geometry of the manifold with respect to a desired metric. Convergence of the weighted graph to the Laplace- Beltrami operator in the large data limit is called consistency of the graph con- struction. Unfortunately, while such constructions implicitly contain all topological information about the manifold, it is not yet clear how to use a weighted graph to build a simplicial complex from which simple information like the Betti numbers can be extracted. 2010 Mathematics Subject Classification. Primary: 58J65, 62H30; Secondary: 62G07. Key words and phrases. Topological data analysis, Laplace-de Rham operator, manifold learn- ing, spectral clustering, geometric prior. The authors are partially supported by NSF grant DMS-1723128. * Corresponding author: Timothy Sauer. 1

Upload: others

Post on 12-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

Foundations of Data Science doi:10.3934/fods.2019001c©American Institute of Mathematical SciencesVolume 1, Number 1, March 2019 pp. 1–38

CONSISTENT MANIFOLD REPRESENTATION FOR

TOPOLOGICAL DATA ANALYSIS

Tyrus Berry and Timothy Sauer∗

Department of Mathematical SciencesFairfax, VA 22030, USA

Abstract. For data sampled from an arbitrary density on a manifold embed-ded in Euclidean space, the Continuous k-Nearest Neighbors (CkNN) graph

construction is introduced. It is shown that CkNN is geometrically consistent

in the sense that under certain conditions, the unnormalized graph Laplacianconverges to the Laplace-Beltrami operator, spectrally as well as pointwise. It

is proved for compact (and conjectured for noncompact) manifolds that CkNN

is the unique unweighted construction that yields a geometry consistent withthe connected components of the underlying manifold in the limit of large

data. Thus CkNN produces a single graph that captures all topological fea-

tures simultaneously, in contrast to persistent homology, which represents eachhomology generator at a separate scale. As applications we derive a new fast

clustering algorithm and a method to identify patterns in natural images topo-

logically. Finally, we conjecture that CkNN is topologically consistent, meaningthat the homology of the Vietoris-Rips complex (implied by the graph Lapla-

cian) converges to the homology of the underlying manifold (implied by the

Laplace-de Rham operators) in the limit of large data.

1. Introduction. Building a discrete representation of a manifold from a finitedata set is a fundamental problem in machine learning. Particular interest pertainsto the case where a set of data points in a possibly high-dimensional Euclideanspace is assumed to lie on a relatively low-dimensional manifold. The field of topo-logical data analysis (TDA) concerns the extraction of topological invariants suchas homology from discrete measurements.

Currently, there are two major methodologies for representing manifolds fromdata sets. One approach is an outgrowth of Kernel PCA [37], using graphs withweighted edges formed by localized kernels to produce an operator that convergesto the Laplace-Beltrami operator of the manifold. These methods include versionsof diffusion maps [1, 12, 5, 4] that reconstruct the geometry of the manifold withrespect to a desired metric. Convergence of the weighted graph to the Laplace-Beltrami operator in the large data limit is called consistency of the graph con-struction. Unfortunately, while such constructions implicitly contain all topologicalinformation about the manifold, it is not yet clear how to use a weighted graph tobuild a simplicial complex from which simple information like the Betti numberscan be extracted.

2010 Mathematics Subject Classification. Primary: 58J65, 62H30; Secondary: 62G07.Key words and phrases. Topological data analysis, Laplace-de Rham operator, manifold learn-

ing, spectral clustering, geometric prior.The authors are partially supported by NSF grant DMS-1723128.∗ Corresponding author: Timothy Sauer.

1

Page 2: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

2 TYRUS BERRY AND TIMOTHY SAUER

A second approach, known as persistent homology [8, 16, 18], produces a se-ries of unweighted graphs that reconstructs topology one scale at a time, trackinghomology generators as a scale parameter is varied. The great advantage of an un-weighted graph is that the connection between the graph and a simplicial complexis immediate, since the Vietoris-Rips construction can be used to build an abstractsimplicial complex from the graph. However, the persistent homology approachcustomarily creates a family of graphs, of which none is guaranteed to contain alltopological information. The goal of a consistent theory is not possible since thereis not a single unified homology in the large data limit.

In this article we propose replacing persistent homology with consistent homologyin data analysis applications. In other words, our goal is to show that it is possi-ble to construct a single unweighted graph from which the underlying manifold’stopological information can be extracted. We introduce a specific graph construc-tion from a set of data points, called continuous k-nearest neighbors (CkNN), thatachieves this goal for any compact Riemannian manifold. Theorem 2 states that theCkNN is the unique unweighted graph construction for which the (unnormalized)graph Laplacian converges spectrally to a Laplace-Beltrami operator on the mani-fold in the large data limit. Furthermore, even when the manifold is not compactwe compute the optimal bias-variance tradeoff, and show that CkNN always resultsin a geometry where the Laplace-Beltrami operator has a discrete spectrum.

The proof of consistency for the CkNN graph construction is carried out in Sec-tion 6 for both weighted and unweighted graphs. There we complete the theory ofgraphs constructed from variable bandwidth kernels, computing for the first timethe bias and variance of both pointwise and spectral estimators. Combined withexisting work on spectral convergence [48, 2, 45, 46, 39] we obtain consistency. Ouranalysis reveals the surprising fact that the optimal bandwidth for spectral estima-tion is significantly smaller than the optimal choice for pointwise estimation (seeFig. 12). This is crucial because existing statistical estimates [40, 4] imply very dif-ferent parameter choices that are not optimal for the spectral convergence desiredin most applications. Moreover, requiring the spectral variance to be finite allowsus to specify which geometries are accessible on non-compact manifolds. Details onthe relationship of our new results to previous work are given in Section 6.

As mentioned above, the reason for focusing on unweighted graphs is their relativesimplicity for topological investigation. In an unweighted graph construction, onecan apply the depth first search algorithm to determine the zero homology. Onthe other hand, there are many weighted graph constructions that converge to theLaplace-Beltrami operator with respect to various geometries [1, 12, 5, 4]. Althoughthese methods are very powerful, they are not convenient for extracting topologicalinformation. For example, to determine the zero homology from a weighted graphrequires numerically estimating the dimension of the zero eigenspace of the graphLaplacian, much less efficient than a depth-first search. Secondly, determination ofthe number of zero eigenvalues requires setting a nuisance parameter as a numericalthreshold. For higher-order homology generators, the problem is even worse, asweighted graphs require the construction of the Laplace-de Rham operators whichact on differential forms. (We note that the 0-th Laplace-de Rham operator acts onfunction, or 0-forms, and is called the Laplace-Beltrami operator.) In contrast, theunweighted graph construction allows the manifold to be studied using topologicaldata analysis methods that are based on simplicial homology (e.g. computed fromthe Vietoris-Rips complex).

Page 3: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

CONSISTENT MANIFOLD REPRESENTATION 3

The practical advantages of the CkNN are: (1) a single graph representation ofthe manifold that captures topological features at multiple scales simultaneously(see Fig. 5) and (2) identification of the correct topology even for non-compactsampling measures (see Fig. 6). In CkNN, the length parameter ε is eliminated,and replaced with a unitless scale parameter δ. Fortunately, for computational pur-poses, the consistent homology in terms of δ uses the same efficient computationalhomology algorithms as conventional persistent homology.

For some applications, the unitless parameter δ may be a disadvantage; for ex-ample, if the distance scales of particular features need to be explicitly separated.In such a case, absolute distances are meaningful, units are important, and the stan-dard ε ball persistence diagram may be more relevant to the problem than a singleconsistent topology. The point of this article is that if one is truly interested onlyin the topology in TDA, a unitless δ is more appropriate and leads to a consistencytheory. Finally, for a fixed data set, the consistent homology approach requireschoosing the parameter δ (which determines the CkNN graph) and we re-interpretthe classical persistence diagram as a tool for selecting δ.

We introduce the CkNN in Section 2, and demonstrate its advantages in topo-logical data analysis by considering a simple but illustrative example. In Section3 we show that consistent spectral estimation of the Laplace-de Rham operatorsis the key to consistent estimation of topological features. In particular, the keyto estimating the connected components of a manifold is spectral estimation of theLaplace-Beltrami operator. We show how these results guarantee consistency of theconnected components (clustering), and combined with recent results [6] on spectralexterior calculus (SEC), imply consistency of the higher order homology. In Section4, these results are used to show that the CkNN is the unique unweighted graph con-struction which yields a consistent geometry via the Laplace-Beltrami operator onfunctions. We give several examples that demonstrate the consistency of the CkNNconstruction in Section 5, including a fast and consistent clustering algorithm thatallows more general sampling densities than existing theories. Theoretical resultsare given in Section 6. We conclude in Section 7 by discussing the relationship ofCkNN to classical persistence. In this article, we focus on applications to TDA,but the theoretical results will be of independent interest to those studying thegeometry as well as topology of data.

2. Continuous scaling for unweighted graphs. We begin by describing theCkNN graph construction and comparing it to other approaches. Then we discussthe main issues of this article as applied to a simple example of data points arrangedinto three rectangles with nonuniform sampling.

2.1. Continuous k-Nearest Neighbors. Our goal is to create an unweighted,undirected graph from a point set with interpoint distances given by a metric d.Since the data points naturally form the vertices of a graph representation, for eachpair of points we only need to decide whether or not to connect these points withan edge. There are two standard approaches for constructing the graph:

1. Fixed ε-balls: For a fixed ε, connect the points x, y if d(x, y) < ε.2. k-Nearest Neighbors (kNN): For a fixed integer k, connect the points x, y

if either d(x, y) ≤ d(x, xk) or d(x, y) ≤ d(y, yk) where xk, yk are the k-thnearest neighbors of x, y respectively.

The fixed ε-balls choice works best when the data is uniformly distributed on themanifold, whereas the kNN approach adapts to the local sampling density of points.

Page 4: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

4 TYRUS BERRY AND TIMOTHY SAUER

However, we will see that even when answering the simplest topological questions,both standard approaches have severe limitations. For example, when clustering adata set into connected components, they may underconnect one part of the graphand overestimate the number of components, while overconnecting another part ofthe graph and bridging parts of the data set that should not be connected. Despitethese drawbacks, the simplicity of these two graph constructions has led to theirwidespread use in manifold learning and topological data analysis methods [8].

Our main point is that a less discrete version of kNN sidesteps these problems,and can be proved to lead to a consistent theory in the large data limit. Define theContinuous k-Nearest Neighbors (CkNN) graph construction by

3. CkNN: Connect the points x, y if d(x, y) < δ√d(x, xk)d(y, yk)

where the parameter δ is allowed to vary continuously. Of course, the discrete natureof the (finite) data set implies that the graph will change at only finitely manyvalues of δ. The continuous parameter δ has two uses. First, it allows asymptoticanalysis of the graph Laplacian in terms of δ, where we interpret the CkNN graphconstruction as a kernel method. Second, it allows the parameter k to be fixed foreach data set, which allows us to interpret d(x, xk) as a local density estimate.

The CkNN construction is closely related to the “self-tuning” kernel introducedin [49] for the purposes of spectral clustering, which was defined as

K(x, y) = exp

(− d(x, y)2

d(x, xk)d(y, yk)

). (1)

The kernel (1) leads to a weighted graph, but replacing the exponential kernel withthe indicator function

K(x, y) = 1{d(x,y)2

d(x,xk)d(y,yk)<1

} (2)

and introducing the continuous parameter δ yields the CkNN unweighted graphconstruction. The limiting operator of the graph Laplacian based on the kernels(1) and (2) was first analyzed pointwise in [44, 4]. In Sec. 6 we provide the firstcomplete analysis of the spectral convergence of these graph Laplacians, along withthe bias and variance of the spectral estimates.

The CkNN is an instance of a broader class of multi-scale graph constructions:

(*) Multi-scale: Connect the points x, y if d(x, y) < δ√ρ(x)ρ(y)

where ρ(x) defines the local scaling near the point x. In Section 4 we will show that

ρ(x) ∝ q(x)−1/m (3)

is the unique multi-scale graph construction that yields a consistent limiting geom-etry, where q(x) is the sampling density and m is the intrinsic dimension of thedata.

In applications to data, neither q(x) nor m may be known beforehand. Fortu-nately, for points on a manifold embedded in Euclidean space, the kNN itself pro-vides a very simple density estimator, which for k sufficiently small approximatelysatisfies

||x− xk|| ∝ q(x)−1/m (4)

where xk is the k-th nearest neighbor of x and m is the dimension of the underlyingmanifold [26]. Although more sophisticated kernel density estimators could be used(see for example [43]), a significant advantage of (4) is that it implicitly incorporatesthe exponent −1/m without the need to estimate the intrinsic dimension m of theunderlying manifold.

Page 5: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

CONSISTENT MANIFOLD REPRESENTATION 5

(a) (b)

Figure 1. Rectangular regions indicate the true clusters. (a)Circles of a fixed radius ε show that bridging of the dense regionsoccurs before any connections are made in the sparse region. (b)Graph connecting all points with distance less than ε.

In the next section, we demonstrate the advantages of the CkNN on a simpleexample before turning to the theory of consistent topological estimation.

2.2. Example: Representing non-uniform data. In this section we start with avery simple example where both the fixed ε-ball and simple kNN graph constructionsfail to identify the correct topology.

Example 1. Fig. 1 shows a simple “cartoon” of non-uniform sampling of datathat is common in real applications, and reveals the weakness of standard graphconstructions. All the data points lie in one of the three rectangular connectedcomponents outlined in Fig. 1(a). The left and middle components are denselysampled and the right component is more sparsely sampled. Consider the radius εindicated by the circles around the data points in Fig. 1(a). At this radius, the pointsin the sparse component are not connected to any other points in that component.This ε is too small for the connectivity of the sparse component to be realized, butat the same time is too large to distinguish the two densely sampled components.A graph built by connecting all points within the radius ε, shown in Fig. 1(b),would find many spurious components in the sparse region while simultaneouslyimproperly connecting the two dense components. We are left with a serious failure:The graph cannot be tuned, with any fixed ε, to identify the “correct” three boxesas components.

The kNN approach to local scaling is to replace the fixed ε approach with theestablishment of edges between each point and its k-nearest neighbors. While thisis an improvement, in Fig. 2 we show that it still fails to reconstitute the topologyeven for the very simple data set considered in Fig. 1. Notice that in Fig. 2(a) thegraph built based on the nearest neighbor (k = 1) leaves all regions disconnected,while using two nearest neighbors (k = 2) incorrectly bridges the sparse region witha dense region, as shown in Fig. 2(b). Of course, using kNN with k > 2 will havethe same problem as k = 2. Fig. 2 shows that simple kNN is not a panacea fornonuniformly sampled data.

Finally, we demonstrate the CkNN graph construction in Fig. 3. An edge isadded between points x and y when d(x, y) < δ

√d(x, xk)d(y, yk). We denote the

kth-nearest neighbor of x (resp., y) by xk (resp., yk). The coloring in Fig. 3(a)exhibits the varying density between boxes. Assigning edges according to CkNN

Page 6: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

6 TYRUS BERRY AND TIMOTHY SAUER

(a) (b)

Figure 2. Data set from Fig. 1. (a) Graph based on connectingeach point to its (first) nearest neighbor leaves all regions discon-nected internally. (b) Connecting each point to its two nearestneighbors bridges the sparse region to one of the dense regionsbefore fully connecting the left and center boxes.

(a) (b)

Figure 3. Data set from Fig. 1. (a) Color indicates the relativevalues of the bandwidth function ρ(x) = d(x, xk) using k = 10 andoptimally tuning δ (blue = low, red = high). (b) Graph connecting

pairs of data points with ||x − y|| < δ√ρ(x)ρ(y). Notice that

the connected components are fully connected and triangulated,yielding the correct homology in dimensions 0, 1, and 2.

with k = 10 and optimally tuning δ, shown in Fig. 3(b), yields an unweighted graphwhose connected components reflect the manifold in the large data limit. Theorem2 guarantees the existence of such a δ, that yields an unweighted CkNN graph withcorrect topology.

Although we have focused on the connected components of the point set, theCkNN graph in Fig. 3 fully triangulates all regions, which implies that the 1-homology is correctly identified as trivial. Clearly, the graph constructions in Fig-ures 1 and 2 are very far from identifying the correct 1-homology.

To complete our analysis of the three-box example, we compare CkNN to afurther alternative. Two crucial features of the CkNN graph construction are (1)symmetry in x and y which implies an undirected graph construction, and (2)introduction of the continuous parameter δ which allows k to be fixed so thatρ(x) = ||x − xk|| is an estimator of q(x)−1/m. There are many alternative ways ofcombining the local scaling function ρ(x) with the continuous parameter δ. Our

detailed theoretical analysis in Sec. 6 shows that the geometric average δ√ρ(x)ρ(y)

is consistent in the large data limit, but it does not discount all alternatives.

Page 7: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

CONSISTENT MANIFOLD REPRESENTATION 7

(a) (b)

Figure 4. Non-compact version of data set from Fig. 1, densityof leftmost ‘box’ decays continuously to zero. (a) The kNN ‘AND’construction bridges the components while the 1-homology still hasspurious generators (holes) in the sparse regions. (b) The CkNN iscapable of capturing the correct homology for this data set. Bothmethods use k = 10 and the value of δ was tuned to give the bestresults for each method. Decreasing δ for (a) would obtain thecorrect clusters but introduce more spurious holes.

For example, we briefly consider the much less common ‘AND’ constructionfor kNN, where points are connected when d(x, y) ≤ min{d(x, xk), d(y, yk)} (asopposed to standard kNN which uses the max). Intuitively, the advantage of the‘AND’ construction is that it will not incorrectly connect dense regions to sparseregions because it takes the smaller of the two kNN distances. However, on a non-compact domain, shown in Fig. 4, this construction does not identify the correcthomology whereas the CkNN does. We conclude the ‘AND’ version of kNN is notgenerally superior to CkNN. Moreover, our analysis in Sec. 6 does not apply to thisalternative, due to the fact that the max and min functions are not differentiable,making their analysis more difficult than the geometric average used by CkNN.

2.3. Multiscale homology. In Section 4 we will see that the geometry representedby the CkNN graph construction captures the true topology with comparatively lit-tle data by implicitly choosing a geometry that is adapted to the sampling measure.The CkNN construction yields a natural multi-scale geometry, which is assumed tobe very smooth in regions of low density and can have finer features in regionsof dense sampling. Since all geometries yield the same topology, this multi-scalegeometry is a natural choice for studying the topology of the underlying manifold,and this advantage is magnified for small data sets. In Fig. 5 we demonstrate theeffect of this geometry on the persistent homology for a small data set with multiplescales. Following that, in Fig. 6 we show how the CkNN graph construction cancapture the homology even for a non-compact manifold.

Example 2. To form the data set in Fig. 5 we sampled 60 uniformly random pointson a large annulus in the plane centered at (−1, 0) with radii in [2/3, 1] and another60 uniformly random points on a much smaller annulus centered at (1/5, 0) withradii in [1/5, 3/10]. Together, these 120 points form a “figure eight” with a sparselysampled large hole and a densely sampled small hole as shown in Fig. 5(b)(d). Wethen used the JavaPlex package [42] to compute the H1 persistence diagram for theVR complex based on the standard ε-ball graph construction, shown in Fig. 5(a).Note that two major generators of H1 are found along with some “topologicalnoise”, and the generators do not exist for a common ε.

Since JavaPlex can build the VR complex persistence diagram for any distancematrix, we could easily compute the CkNN persistence diagram, shown in Fig. 5(c),

Page 8: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

8 TYRUS BERRY AND TIMOTHY SAUER

0 0.1 0.2 0.3 0.4 0.5 0.6

ǫ

Dim

1

(a) (b)

0 0.05 0.1 0.15 0.2

δ

Dim

1

(c) (d)

Figure 5. (a) Standard ε-ball persistent 1-homology reveals thatno value of ε captures both holes simultaneously: Each homologyclass persists at separate ε scales. (b) When ε = 0.16 we see that thesmall hole is completely triangulated (the black segment completesthe triangulation), while the large hole is still not connected. (c)CkNN persistent 1-homology as a function of δ shows that bothhomology classes (top two lines) are present simultaneously overa large range of δ values. (d) When δ = 0.1, the resulting graphrepresents the true topology of the manifold.

by using the ‘distance’ matrix d(x, y) = ||x−y||√||x−xk|| ||y−yk||

, and we used k = 10 to

form the matrix. The CkNN captures both generators in a single graph, givinga multi-scale representation, whereas the standard ε-ball graph captures only onescale at a time. Fig. 5(d) shows the edges for δ = 0.1, at which the graph cap-tures the correct topology in all dimensions. Moreover, the CkNN construction ismore efficient in this example, requiring only 934 edges to form a single connectedcomponent, whereas the ε-ball construction requires 2306 edges.

In the above examples, we saw that CkNN outperforms the standard ε-ball ap-proach, and kNN methods, for a given finite data set sampled nonuniformly froma compact manifold. However, in the large data limit, all of the above approachesconverge to the correct homology, so the advantage of CkNN is only one of efficiency.Next we examine an example of non-compact data, where the CkNN constructionconverges to the correct topological information in the large data limit, and theε-ball method cannot.

Page 9: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

CONSISTENT MANIFOLD REPRESENTATION 9

(a)

0 0.05 0.1 0.15 0.2

Dim

0

0 0.05 0.1 0.15 0.2

ǫ

Dim

1

(b)

(c) (d)

0 0.1 0.2 0.3 0.4

Dim

0

0 0.1 0.2 0.3 0.4

δ

Dim

1

(e) (f)

Figure 6. (a) Data sampled from a 2-dimensional Gaussian witha radial gap forming a non-compact manifold. (b) Standard fixed ε-ball persistence diagram has no persistent region with 2 connectedcomponents and one non-contractible hole. (c) When ε = 0.135 the1-homology is correct, but outliers are still not connected. (d) Forε = 0.16 the outliers are still not connected and the middle sectionis bridged. For any size gap this bridge will happen in the limitof large data as the outliers become more spaced out. (e) CkNNpersistence in terms of δ shows the correct homology (β0 = 2 andβ1 = 1) for 0.180 < δ < 0.215. (f) The CkNN construction forδ = 0.20.

Page 10: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

10 TYRUS BERRY AND TIMOTHY SAUER

Example 3. To form the data set in Fig. 6(a) we sampled 150 points from a 2-dimensional Gaussian distribution and then removed the points of radius between[1/4, 3/4], leaving 120 points lying on two connected components with a single non-contractible hole. In this case the standard ε-ball persistence does not even capturethe correct 0-homology for the manifold, due to the decreasing density near theoutlying points, as shown in Fig. 6(c-d). Furthermore, since the true manifold isnon-compact, there is no reason to expect the ε-ball construction to converge tothe correct topology even in the limit of large data. In fact, as the amount ofdata is increased, the outlying points will become increasingly spaced out, leadingto worse performance for the fixed ε-ball construction, even for the H0 homology.In contrast, the CkNN construction is able to capture all the correct topologicalfeatures for a large range of δ values, as shown in Fig. 6(e-f).

In Sections 3 and 4 we prove that the CkNN is the unique graph constructionthat provides a consistent representation of the geometry of the underlying compact(or noncompact, under some technical assumptions) manifold in the limit of largedata. An immediate consequence is the consistency of the connected components.

3. Manifold topology from graph topology. Our goal is to access the truetopology of the underlying manifold using the Vietoris-Rips (VR) complex, which isan abstract simplicial complex on the finite data set. The VR complex is constructedinductively, first adding a triangle whenever all the faces are in the graph and thenadding a higher order simplex whenever all the faces of the simplex are included.

Definition 1. We say that a graph construction from a random sampling is topo-logically consistent if the homology computed from the VR complex is isomorphicto the homology of the underlying manifold with probability approaching 1 as thenumber of data points goes to infinity.

Topological consistency has been shown directly for the ε-ball graph construc-tion on compact manifolds without boundary [25, 34, 7]. However, Example 3 aboveshows why the ε-ball construction cannot be guaranteed to be consistent for non-compact manifolds. In this section we delineate our notion of graph consistencyfor CkNN on Riemannian manifolds, compact and noncompact, based on spectralconsistency of graph Laplacians with the Laplace-Beltrami operator. Of course,this will establish geometric properties of the data that go beyond topological dataanalysis.

We first note that on a Riemannian manifold, the Laplace-Beltrami operatorcompletely determines the Riemannian metric and thus the entire topology of aRiemannian manifold. To make this connection explicit, in coordinates x1, ..., xm

we can compute the Riemannian metric by

gij = gx(∇xi,∇xj) =1

2(∆(xixj)− xi∆xj − xj∆xi).

The Riemannian metric g is used to define the lengths of curves. In turn, thedistance between points is defined as the infimum of these lengths over all curvesbetween the points [35]. The notion of distance determines the natural metrictopology on the manifold. In this formal way, perfect knowledge of the Laplace-Beltrami operator implies perfect knowledge of the topology.

Of course, this formal construction falls short of practical usage for two reasons.First, our discrete approximation does not yield perfect knowledge of the Laplace-Beltrami operator, but only convergence in a certain sense in the limit of large data.

Page 11: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

CONSISTENT MANIFOLD REPRESENTATION 11

Second, extracting a given topological invariant from knowledge of the Laplace-Beltrami operator may be very difficult.

In this article, we are concerned mainly with addressing the first problem byshowing the spectral convergence required to obtain certain types of topologicalinvariants, namely cohomology. In Sec. 6, combined with a result of [48], we provepointwise and spectral convergence of the CkNN graph Laplacian to the Laplace-Beltrami operator, which we refer to as spectral consistency of the graph Laplacian.In particular, for a manifold without boundary or with a smooth boundary, conver-gence of the CkNN graph construction is guaranteed, and the manifold topology isuniquely determined.

The question remains, does spectral consistency to the Laplace-Beltrami operatorimply topological consistency? The question can be visualized in the followingdiagram.

Graph VR Homology Hn(G)topological−−−−−−→consistency

Manifold Homology Hn(M)

Graph Theory

x

xHodge Theory + SEC [6]

Graph Laplacian Lun = ∂∂>spectral−−−−−−−−→

consistencyLaplace-Beltrami ∆ = δd

The left vertical arrow corresponds to the fact that the graph Laplacian Lun =D−W can trivially be used to reconstruct the entire graph since the non-diagonalentries are simply the negative of the adjacency matrix. Since the graph deter-mines the entire VR complex, the graph Laplacian completely determines the VRhomology of the graph. This connection is explicitly spectral for the zero homology,since the zero-homology of a graph corresponds exactly to the zero-eigenspace ofthe graph Laplacian [47].

For higher homology, one would need to establish that the VR homology com-puted from the graph Laplacian agrees with the homology uniquely determinedby the Laplace-Beltrami operator (which is uniquely determined by the convergingCkNN graph Laplacians). We next explore the plausibility of this conjecture.

The right vertical arrow follows from the fact that from the Laplace-Beltramioperator, it is possible to reconstruct the metric on a Riemannian manifold whichcompletely determines the homology of the manifold. The Riemannian metric liftsto an inner product on forms g(ω, ν) which defines the Hodge inner products

〈ω, ν〉 =

∫Mg(ω, ν) dV

on differential forms. From the Hodge inner products, one defines the codifferentialoperators δk as the formal adjoint of the exterior derivative dk on k-forms. Finally,the Laplace-de Rham operator on differential k-forms is defined by ∆k = δk+1dk +dk−1δk, and the Hodge theorem [35] states that the kernel of ∆k is isomorphicto the k-th de Rham cohomology group and the k-th singular cohomology group,ker(∆k) ∼= Hk

dR(M) ∼= Hksing(M). For closed manifolds without boundary, Poincare

duality relates the homology to the cohomology Hn−k(M) ∼= Hk(M). In general,cohomology is considered a stronger invariant.

A partial solution to this problem is given in [6], with a method of extractingcohomology information that relies on constructing estimators of Laplace-de Rham

Page 12: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

12 TYRUS BERRY AND TIMOTHY SAUER

operators on k-forms. The construction of these estimators and their consistencyrelies on the spectral convergence results shown here. This construction is called thespectral exterior calculus (SEC) since the Laplace-de Rham operators are approx-imated by Galerkin truncation on a spectral basis (as opposed to a finite elementbasis) constructed using the eigenfunction of the Laplace-Beltrami operator. Whilea finite element basis would require a simplicial complex, the SEC construction isvalid for an abstract complex such as the VR complex, since it relies only on thespectral consistency of the graph Laplacian. Estimators of the Laplace-de Rhamoperators on 1-forms are constructed and are shown to converge spectrally, thusinsuring that the kernel of these estimators yields the true cohomology in the limitof large data. In [6] a general strategy is outlined for lifting the results to theLaplace-de Rham operators on k-forms for k > 1. The restriction to 1-forms wassimply due to the complexity of the explicit formulations. The SEC together withthe spectral convergence of the graph Laplacian to the Laplace-Beltrami operatorcompletes the right vertical arrow by giving an explicit construction.

An alternative approach that has not been fully explored yet would be to con-struct the Laplace-de Rham operators on k-forms directly from the VR-complex.This is an alternative to the SEC construction which represents these operators ona basis built from eigenfunction of the Laplace-Beltrami operator. Such a construc-tion would allow for the top horizontal arrow to be shown directly. Establishingthis connection requires defining a discrete analog of differential forms and discreteanalogs of the higher-order Laplace-de Rham operators and then showing spectralconvergence to the corresponding operators on the manifold. One promising con-struction is the discrete exterior calculus [15, 23] but so far only very restrictedconsistency results have been shown [36, 38].

To summarize the above discussion, just as the discrete Laplacian completelydetermines the graph VR homology, the Laplace-Beltrami operator ∆ completelydetermines the manifold homology. In perfect analogy to the discrete case, thezero-homology of the manifold corresponds to the zero-eigenspace of the Laplace-Beltrami operator. This connection is explicitly spectral, and we conjecture thatthe isomorphism generalizes to higher-order homology groups via the Laplace-deRham operators on differential forms.

Similar results exist for weighted graphs, which are less helpful to topological dataanalysis because they are not directly interpretable through VR calculations. Forexample, [12] proves pointwise convergence on compact manifolds for any smoothsampling density, but their construction requires a weighted graph in general. Forgraph Laplacian methods (including the results developed here), the dependenceon the curvature and nearness to self-intersection appears in the bias term of theestimator as shown in [12, 20]. Using more complicated weighted graph construc-tions, recent results show that for a large class of non-compact manifolds [20] withsmooth sampling densities that are allowed to be arbitrarily close to zero [4], thegraph Laplacian can converge pointwise to the Laplace-Beltrami operator. Theseresults require weighted graph constructions due to several normalizations whichare meant to remove the influence of the sampling density on the limiting operator.

4. The unique consistent unweighted graph construction. In this section weshow that the continuous k-nearest neighbors (CkNN) construction is the uniqueunweighted graph construction that yields a consistent unweighted graph Laplacian

Page 13: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

CONSISTENT MANIFOLD REPRESENTATION 13

for any smooth sampling density on manifolds of the class defined in [20], includingmany non-compact manifolds.

Consider a data set {xi}Ni=1 of independent samples from a probability distribu-tion q(x) that is supported on a m-dimensional manifoldM embedded in Euclideanspace. For a smooth function ρ(x) on M, we will consider the CkNN graph con-struction, where two data points xi and xj are connected by an edge if

d(xi, xj) < δ√ρ(xi)ρ(xj). (5)

This construction leads to the N×N adjacency matrix W whose ijth entry is 1 if xiand xj have an edge in common, and 0 otherwise. Let D be the diagonal matrix ofrow sums of W , and define the “unnormalized” graph Laplacian Lun = D −W . InSec. 6 we show that for an appropriate factor c depending only on δ and N , in thelimit of large data, c−1Lun converges both pointwise and spectrally to the operatordefined by

Lq,ρf ≡ qρm+2(∆f −∇ log

(q2ρm+2

)· ∇f

), (6)

where ∆ is the positive definite Laplace-Beltrami operator and ∇ is the gradient,both with respect to the Riemannian metric inherited from the ambient space. Infact, pointwise convergence follows from a theorem of [44], and the pointwise biasand a high probability estimate of the variance was first computed in [4]. Both ofthese results follow for a larger class of kernels than the one defined in (5).

Although the operator in (6) appears complicated, we now show that it is still aLaplace-Beltrami operator on the same manifoldM, but with respect to a differentmetric. A conformal change of metric corresponds to a new Riemannian metricg ≡ ϕg, where ϕ(x) > 0, and which has Laplace-Beltrami operator

∆gf =1

ϕ(∆f − (m− 2)∇ log

√ϕ · ∇f). (7)

For expressions (6) and (7) to match, the function ϕ must satisfy

1

ϕ= qρm+2 and ϕ

m−22 = q2ρm+2. (8)

Eliminating ϕ in the two equations results in q−(m+2) = ρm(m+2), which impliesρ ≡ q− 1

m as the only choice that makes the operator (6) equal to a Laplace-Beltrami

operator ∆g. The new metric is g = q2/mg. Computing the volume form dV of thenew metric g we find

dV =√|g| =

√|q2/mg| = q

√|g| = q dV (9)

which is precisely the sampling measure. Moreover, the volume form dV is exactlyconsistent with the discrete inner product

E[~f · ~f

]= E

[N∑i=1

f(xi)2

]= N

∫f(x)2q(x) dV = N 〈f, f〉dV .

This consistency is crucial since the discrete spectrum of Lun are the minimizers ofthe functional

Λ(f) =~f >c−1Lun

~f

~f > ~f→N→∞

〈f,∆gf〉dV〈f, f〉dV

where c is a scalar depending only on δ and N (see Theorem 3 in Sec. 6). If the

Hilbert space norm implied by ~f > ~f were not the volume form of the Riemannianmetric g, then the eigenvectors of Lun would not minimize the correct functional in

Page 14: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

14 TYRUS BERRY AND TIMOTHY SAUER

the limit of large data. This shows why it is important that the Hilbert space normis consistent with Laplace-Beltrami operator estimated by Lun.

Another advantage of the geometry g concerns the spectral convergence of Lun

to Lq,ρ shown in Theorem 7 in Sec. 6, which requires the spectrum to be discrete.Assuming a smooth boundary, the Laplace-Beltrami operator on any manifold withfinite volume will have a discrete spectrum [11]. This insures that spectral conver-gence always holds for the Riemannian metric g = q2/mg since the volume form isdV = qdV , and therefore the volume of the manifold is exactly

volg(M) =

∫MdV =

∫Mq dV = 1.

Since all geometries on a manifold have the same topology, this shows once againthat the metric g is completely natural for topological investigations since spectralconvergence is guaranteed, and spectral convergence is crucial to determining thehomology.

The fact that ρ = q−1/m is the unique solution to (8) along with the spectralconsistency implies the following result.

Theorem 2 (Unique consistent geometry). Consider data sampled from a compactRiemannian manifold satisfying Assumption 1 (see Section 6). Among unweightedgraph constructions (5), ρ = q−1/m is the unique choice which yields a consistentgeometry in the sense that the unnormalized graph Laplacian converges spectrally toa Laplace-Beltrami operator. Thus, the CkNN graph construction yields a consistentclustering for any density q.

Based on the results in Section 6, we conjecture that spectral convergence alsoholds for non-compact manifolds. This would imply that CkNN is the unique con-sistent graph construction for clustering, since for any other ρ, there will exist

densities q where M has infinite volume with respect to q4

m−2 ρ2m+4m−2 dV , precluding

consistency. Based on the discussion in the previous section, and in particular thefact that the Laplace-Beltrami operator determines the entire cohomology of themanifold, we propose the following conjecture:

Conjecture 1 (Unique consistent topology). CkNN is the unique graph construc-tion (5) with an associated VR complex that is topologically consistent in the limitof large data.

As mentioned in the previous section, completing the proof of this conjecturewould require explicit estimation of the Laplace-de Rham operators, ∆k, and cor-responding spectral convergence proofs. In this paper we establish this fact for theLaplace-Beltrami operator, ∆ = ∆0, which is the key first step to supporting theconjecture. The spectral exterior calculus (SEC) [6] provides a construction whichlifts this results to general ∆k, and explicitly proves spectral convergence for ∆1

which is the most challenging theoretical barrier. The only remaining challenge tolifting the result to ∆k for all k is simply the problem of formulating the explicitconstruction.

The theorem and conjecture above have practical ramifications since (as shownin the previous section) a consistent graph construction will have the same limitingtopology as the underlying manifold. In Examples 5 and 6 we will empiricallyillustrate the consistency of the CkNN choice ρ = q−1/m as well as the failure ofalternative constructions.

Page 15: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

CONSISTENT MANIFOLD REPRESENTATION 15

As a side note, we mention that the unnormalized graph Laplacian is not the onlygraph Laplacian. With the same notation as above, the “normalized”, or “random-walk” graph Laplacian is often defined as Lrw = I − D−1W = D−1Lun, and hasthe limiting operator

c−1Lrw ≡ c−1D−1Lun →N→∞ ρ2(∆−∇ log

(q2ρm+2

)· ∇)

= q4

m−2 ρ4m

m−2 ∆g

(see for example [44, 4]; the constant c is different from the unnormalized case). Notethat again ρ = q−1/m is the unique choice leading to a Laplace-Beltrami operator.

This choice implies that to leading order D~f ≈ qρmf = f , so the corresponding

Hilbert space has norm ~f >D~f →N→∞ 〈f, f〉dV . This implies spectral consistency

since Lrw~f = λ~f is equivalent to Lun

~f = λD~f , which is related to the functional

Λ(f) =~f >c−1Lun

~f

~f >D~f→N→∞

〈f,∆gf〉dV〈f, f〉dV

.

Therefore, for the choice ρ = q−1/m, both the unnormalized and normalized graphLaplacians are consistent with the same underlying Laplace-Beltrami operator withrespect to the metric g.

We emphasize that we are not using either graph Laplacian directly for com-putations. Instead, we are using the convergence of the graph Laplacian to showconvergence of the graph connected components to those of the underlying man-ifold. Since this consistency holds for an unweighted graph construction, we canmake use of more computationally efficient methods to find the topology of thegraph, such as depth-first search to compute the zero-level homology. More gen-erally we compute the higher order homology from the VR complex of the graph,which our conjecture suggests should converge to that of the underlying manifold.A wider class of geometries are accessible via weighted graph constructions (see forexample [12, 4, 5]), but fast algorithms for analyzing graph topology only apply tounweighted graphs.

5. Further applications to topological data analysis. The fundamental ideaof extracting topology from a point cloud by building unweighted graphs relies ondetermining what is considered an edge in the graph as a function of a parameter,and then considering the graph as a VR complex. In the ε-ball, kNN and CkNNprocedures, edges are added as a parameter is increased, from no edges for extremelysmall values of the parameter to full connectivity for sufficiently large values. Fromthis point of view, the procedures differ mainly by the order in which the edges areadded.

Classical persistence orders the addition of possible edges by ||x − y||, whereas

the CkNN orders the edges by ||x−y||√||x−xk|| ||y−yk||

. More generally, a multi-scale graph

construction with bandwidth function ρ(x) orders the edges by ||x−y||√ρ(x)ρ(y)

. Our

claim is that CkNN gives an order that allows graph consistency to be proved. Inaddition, we have seen in Figs. 5 and 6 that the CkNN ordering is more efficient.In this section we show further examples illustrating this fact. We will quantifythe persistence or stability of a feature by the percentage of edges (out of the totalN(N − 1)/2 possible in the given ordering) for which the feature persists. Thismeasure is an objective way to compare different orderings of the possible edges.

The consistent homology approach differs from the persistent homology approachby using a single graph construction to simultaneously represent all the topological

Page 16: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

16 TYRUS BERRY AND TIMOTHY SAUER

features of the underlying manifold. This requires selecting the parameter δ whichdetermines the CkNN graph. The asymptotically optimal choice of δ in terms ofthe number of data points is derived in Sec. 6, however the constants depend onthe geometry of the unknown manifold. There are many existing methods of tuningδ for learning the geometry of data [14, 4]. As a practical method of tuning δ fortopological data analysis, we can use the classical persistence diagram to find thelongest range of δ values such that all of the homological generators do not change.In a sense we are using the classical persistence diagram in reverse, looking for asingle value of δ where all the homology classes are stable. In Examples 5 and 6below we validate this approach by showing that the percentage of edges whichcapture the true homology is longest when using ordering defined by ρ = q−1/m

which is equivalent to the CkNN.

5.1. A fast graph-based clustering algorithm. We consider the problem ofidentifying the connected components of a manifold from a data set using the con-nected components of the CkNN graph construction. While clustering connectedcomponents is generally less difficult than segmenting a connected domain, outlierscan easily confuse many clustering algorithms. Many rigorous methods, includingany results based on existing kernel methods [27, 49, 48, 29], require the samplingdensity to be bounded away from zero. In other words, rigorous clustering algo-rithms require the underlying manifold to be compact. A common work-around forthis problem is the estimate the density of the data points and then remove pointsof low density. However, this leaves the removed points unclustered [10, 9, 41, 28].We have shown that the CkNN method is applicable to a wide class of non-compactmanifolds, and in particular the connected components of the CkNN graph willconverge to the connected components of the underlying manifold.

Here we use the CkNN to put an ordering on the potential edges of the graph.While the full persistent homology of a large data set can be computationally veryexpensive, the 0-homology is easily accessible using fast graph theoretic algorithms.First, the connected components of a graph can be quickly identified by the depth-first search algorithm. Second, unlike the other homology classes, the 0-homologyis monotonic; as edges are added, the number of connected components can onlydecrease or stay the same. This monotonicity allows us to easily identify the entire 0-homology δ-sequence by only finding the transitions, meaning the numbers of edgeswhere the 0-th Betti number changes. We can quickly identify these transitionsusing a binary search algorithm as outlined below.

Algorithm 1. Fast Binary Search Clustering.

Inputs: Ordering of the N(N − 1)/2 possible edges, number of clusters C > 1.Outputs: Number of edges, L, such that the graph has C components and

adding an edge yields C − 1 components.

1. Initialize the endpoints L = 0 and R = N(N − 1)/22. while L < R− 1

(a) Set M = floor((L+R)/2)(b) Build a graph using the first M edges from the ordering

(c) Use depth-first search to find the number of components C

(d) If C ≥ C set L = M otherwise set R = M3. return L.

Page 17: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

CONSISTENT MANIFOLD REPRESENTATION 17

(a)

1 2

Percent of Edges

0

1

2

3

4

5

6

7

8

9

10

Num

ber

of C

luste

rs

(b) (c)

(d)

0.4 0.5 0.6

Percent of Edges

0

1

2

3

4

5

6

7

8

9

10

Num

ber

of C

luste

rs

(e) (f)

Figure 7. The fast clustering algorithm applied to a set of threespirals with densities which decay exponentially along the length ofthe spiral. (a) 1000 total points sampled from three 2-dimensionalspirals in the plane. (b) Diagram shows number of clusters as afunction of the percentage of possible edges added to the graph(a unitless measure of persistence) (c) Graph corresponding to themaximum number of edges with 3 components, colored according toidentification by depth-first-search. (d) 2000 total points sampledfrom three 3-dimensional spirals in R3. (e) Large interval identifiesthe correct number of clusters. (f) Graph with maximum numberof edges having 3 components, colored by clustering algorithm.

When the goal is to find all of the transition points, Algorithm 1 can easily beimproved by storing all the numbers of clusters from previous computations andusing these to find the best available left and right endpoints for the binary search.

Example 4. In Fig. 7, we illustrate the use of Algorithm 1 on a point set consistingof the union of three spiral-shaped subsets with nonuniform sampling. In fact, thedensity of points falls off exponentially in the radial direction. Fig. 7(a) showsthe original set, and panel (b) shows the number of components as a functionof the proportion of edges. When the number of edges is between one and twopercent of the possible pairs of points, the persistence diagram in (b) detects threecomponents, shown in (c) along with the edges needed. A three-dimensional versionof three spiral-shaped subsets is depicted in Fig. 7(d)-(f), with similar results.

In the next two examples, we illustrate the theoretical result that the choice ofβ = −1/m in the bandwidth function ρ = qβ is optimal, where m is the dimensionof the data.

Example 5. We begin with the zero-order homology. We will demonstrate empiri-cally that the choice β = −1/m maximizes the persistence of the correct clustering,

Page 18: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

18 TYRUS BERRY AND TIMOTHY SAUER

-1/8-1/4-1/3-1/2-1-3/2

β

0

2

4

6

8

Pe

rce

nt

of

Ed

ge

s

Dimension = 1

N=500N=1000N=2000

(a)

-1/8-1/4-1/3-1/2-1-3/2

β

0

2

4

6

8

10

Pe

rce

nt

of

Ed

ge

s

Dimension = 2

N=500N=1000N=2000

(b)

-1/8-1/4-1/3-1/2-1-3/2

β

0

2

4

6

8

Pe

rce

nt

of

Ed

ge

s

Dimension = 3

N=500N=1000N=2000

(c)

-1/8-1/4-1/3-1/2-1-3/2

β

0

1

2

3

4

5

6

Pe

rce

nt

of

Ed

ge

s

Dimension = 4

N=500N=1000N=2000

(d)

Figure 8. Persistence of the clustering of a cut-Gaussian distri-bution using the multi-scale graph construction with ρ = qβ , wherethe distribution is in dimension (a) 1 (b) 2 (c) 3 (d) 4. Persistenceis measured as a percentage of the total number of possible edgesin the graph, N(N − 1)/2 as a function of β. Each data pointrepresents an average over 500 data sets, each data set starts withsufficiently many points randomly sampled from an m-dimensionalGaussian so that after points in the gap region are rejected thereare N points remaining. The correct homology (two connectedcomponents) is most persistent when β is near −1/m implying theoptimal multi-scale graph construction is the CkNN constructionwhere ρ ∝ q−1/m.

for 1 ≤ m ≤ 4. Consider data sampled from an m-dimensional Gaussian distribu-tion with a gap of radial width w = 0.11/m centered at radius w + 3m/10. (Thedependence on the dimension m is necessary to insure that there are two connectedcomponents for small data sets.) The radial gap separates Rm into two connectedcomponents, the compact interior m-ball, and the non-compact shell extending toinfinity with density decaying exponentially to zero.

Given a data set sampled from this density, we can construct a graph usingthe multi-scale graph construction which connects two points x, y if ||x − y|| <δ√ρ(x)ρ(y). Since the true density is known, we consider the bandwidth functions

ρ = qβ for β ∈ [−3/2,−1/8]. For each value of β we used the fast clusteringalgorithm to identify the minimum and maximum numbers of edges which wouldidentify the correct clusters. We measured the persistence of the correct clustering

Page 19: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

CONSISTENT MANIFOLD REPRESENTATION 19

-1.5 -1 -0.5 0

β

0

1

2

3

4

5

6

7

Perc

ent of E

dges

Dimension = 1

N=50N=75N=100

(a)

-1.5 -1 -0.5 0

β

0

1

2

3

4

5

6

7

8

9

Perc

ent of E

dges

Dimension = 2

N=50N=75N=100

(b)

Figure 9. Persistence of the true homology using the multi-scalegraph construction with ρ = qβ . The underlying data sets are (a)a cut 1-dimensional Gaussian embedded in the plane with a loopintroduced and (b) the same 2-dimensional cut Gaussian densityas in Fig. 8(b). Persistence is measured as a percentage of the totalnumber of possible edges in the graph, N(N −1)/2 as a function ofβ. Each data point represents an average over 200 data sets. Thecorrect homology (β0 = 2 and β1 = 1 for both (a) and (b)) is mostpersistent when β is near −1/m implying the optimal multi-scalegraph construction is the CkNN construction where ρ ∝ q−1/m.

as the difference between the minimum and maximum numbers of edges whichidentified the correct clusters divided by the total number of possible edges N(N −1)/2. We then repeated this experiment for 500 random samples of the distributionand averaged the persistence of the correct clustering for each value of β. The resultsare shown in Fig. 8. Notice that for each dimension m = 1, ..., 4 the persistence hasa distinctive peak centered near β = −1/m which indicates that the true clusteringis the most persistent using the multi-scale graph construction that is equivalent tothe CkNN.

Example 6. Next, we examine the discovery of the full homology for a 1-dimen-sional and 2-dimensional example using Javaplex [42]. To obtain a one-dimensionalexample with two connected components we generated a set of points from a stan-dard Gaussian on the t-axis with points with 0.4 < t < 0.8 removed, and thenmapped these points into the plane via t 7→ (t3 − t, 1/(t2 + 1))>. The embeddinginduces a loop, and so there is non-trivial 1-homology in the 1-dimensional example.The correct homology for this example has Betti numbers β0 = 2 and β1 = 1, whichis exactly the same as the true homology for the 2-dimensional cut Gaussian fromFig. 8, which will be our 2-dimensional example. In Fig. 9 we show the persistenceof the correct homology in terms of the percentage of edges as a function of theparameter β that defines the multi-scale graph construction. As with the clusteringexample, the results clearly show that the correct homology is most persistent whenβ is near −1/m which corresponds to the CkNN graph construction.

5.2. Identifying patterns in images with homology. In this section we con-sider the identification of periodic patterns or textures from image data. We take a

Page 20: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

20 TYRUS BERRY AND TIMOTHY SAUER

(a)

(b)

(c)

Figure 10. (a) Four simple patterns whose sub-image orbifoldsexhibit nontrivial homology. The true values of β1 for the fourpatterns are 1, 2, 2, and 3 respectively. (b) Using fixed ε-ball graphconstruction and selecting ε from the region where the homologygenerators span the largest proportion L of total edges withoutchanging (the most persistent homology). (c) Using the CkNNconstruction and selecting δ from the region where the homologygenerators span the largest L without changing. The homologywas computed with JavaPlex [42].

topological approach to the problem, and attempt to classify the orbifold (the quo-tient of the plane by the group of symmetries) by its topological signature. Notethat to achieve this, we will not need to learn the symmetry group, but will directlyanalyze the orbifold by processing the point cloud of small s× s pixel subimages of

the complete image in Rs2 without regard to the original location of the subimages.

Example 7. In Fig. 10(a) we show four simple patterns that can be distinguishedby homology. To make the problem more difficult, the patterns are corrupted by a‘brightness’ gradient which makes identifying the correct homology difficult. From

Page 21: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

CONSISTENT MANIFOLD REPRESENTATION 21

left to right the patterns are: First, stripes have a single periodicity so that β1 = 1;second, a pattern that is periodic in both the vertical and horizontal directions,implying β1 = 2; third, a checkerboard pattern also has only two periodicities β1 =2, but they have different periods than the previous pattern; fourth, a hexagonalpattern has 3 periodicities so that β1 = 3. To see the three periodicities in the fourthpattern, notice that the pattern repeats when moving right two blocks, or downthree blocks, or right one block and down two blocks; each of these periodicitiesyields a distinct homology class.

To identify the pattern in each image, we cut each full image into 9-by-9 sub-images, yielding 121 points in R81. In Fig. 10(b) we show the results of applyingthe fixed ε-ball graph construction to each of the four sets of sub-images. In orderto choose ε we used JavaPlex [42] to compute the persistent homology and thenchose the region with the longest persistence (meaning the region of ε where thehomology went the longest without changing). In the title of each plot we showfirst two betti numbers for the graph constructed with this value of ε, we also showthe length of the persistence in terms of the percentage of edges for which thehomology is unchanged. In Fig. 10(c) we repeated this experiment using the CkNNconstruction, choosing δ from the region with the longest unchanging homology. Inthis set of examples, the CkNN construction is more efficient, and finds the correctorbifold homology.

When the ‘brightness’ gradient is removed, both the fixed ε-ball and CkNN con-structions identify the correct homology in the most persistent region. However, the‘brightness’ gradient means that the patterns do not exactly meet (see the leftmostpanels in Figs. 10(b,c)). For the simple stripe pattern, the ε-ball construction canstill bridge the gap and identify the correct homology; however, for the more com-plex patterns, the ε-ball construction finds many spurious homology classes whichobscure the correct homology.

Example 8. We applied the CkNN graph construction to identify patterns in realimages of zebra stripes and fish scales in Figure 11. The images shown in Fig. 11were taken from larger images [17, 24]. In order to analyze the subimage spaces wefirst decimated the images to reduce the resolution, (by a factor of 2 for the stripesand factor of 25 for the scales) in each case to yield a 40 × 40 pixel image. Wethen formed the set of all 23-pixel by 23-pixel subimages shifting by two pixels inthe vertical and horizontal directions to obtain 136 subimages, considered as points

in R529. We built the rescaled distance matrix ||x−y||√||x−xk|| ||y−yk||

using k = 5. In

Fig. 11(b,e) we show the persistent homology of the VR complex associated to therespective distance matrices in terms of the parameter δ of the CkNN. Using thisdiagram, we chose the maximal interval of δ for which the homology was stable. InFig. 11(c,f) we show the CkNN graph for δ chosen from this region along with thecorrect Betti numbers.

6. Convergence of graph Laplacians. We approach the problem of consistentgraph representations of manifolds by assuming we have data points which aresampled from a smooth probability distribution defined on the manifold. We viewthis assumption as establishing a “geometric prior” for the problem. Our maingoal is to approximate the Laplace-Beltrami operator on the manifold, independentof the sampling distribution, and using as little data as possible. This is a naturalextension of ideas developed by [1] and Coifman and collaborators [12, 32, 13, 31, 30].

Page 22: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

22 TYRUS BERRY AND TIMOTHY SAUER

(a)

1 1.2 1.4 1.6 1.8 2

Dim

0

1 1.2 1.4 1.6 1.8 2

δ

Dim

1

(b)

-0.2

0.2

0.1

0

-0.1

0

-0.15-0.1-0.050-0.2 0.050.10.15

β0=1, β

1=1, L=31.5%

0.2

(c)

(d)

0.95 1 1.05 1.1 1.15 1.2 1.25 1.3

Dim

0

0.95 1 1.05 1.1 1.15 1.2 1.25 1.3

δ

Dim

1

(e)

-0.2

0.2

0.1

0

-0.1

-0.20.20.10-0.1-0.2

0

β0=1, β

1=2, L=18.3%

0.2

(f)

Figure 11. (a) Image of zebra stripes [17]. (b) Persistence dia-gram for the space of subimages. (c) The CkNN graph constructionon the subimage space with δ chosen from the longest region wherethe homology is constant. (d) Image of fish scales [24]. (e) Per-sistence diagram. (f) The CkNN graph construction on the PCAprojection of the subimage space with δ chosen from the longest re-gion where the homology is constant. The homology was computedwith JavaPlex [42].

The proof of consistency for any graph construction has three parts, two statisti-cal and one analytic: (1) showing that the (discrete) graph Laplacian is a consistentstatistical estimator of a (continuous) integral operator (either pointwise or spec-trally), (2) showing that this estimator has finite variance, and (3) an asymptoticexpansion of the integral operator that reveals the Laplace-Beltrami operator as theleading order term. The theory of convergence of kernel weighted graph Laplaciansto their continuous counterparts was initiated with [1] which proved parts (1) and(3) for uniform sampling on compact manifolds, and part (2) was later completedin [40]. Parts (1) and (3) were then extended to non-uniform sampling in [12], andpart (2) was completed in [4]. The extension to noncompact manifolds was simi-larly divided. First, [20] provided the proof of parts (1) and (3), and introduced thenecessary additional geometric assumptions which were required for the asymptoticanalysis on non-compact manifolds. However, [4] showed that the pointwise errorson non-compact manifolds could be unbounded and so additional restrictions hadto be imposed on the kernel used to construct the graph Laplacian. In order toconstruct the desired operators on non-compact manifolds, [4] showed that vari-able bandwidth kernels were required (part (1) for variable bandwidth kernels waspreviously achieved in [44]). In all of this previous work, parts (1) and (2) arealways proven pointwise, despite the fact that most applications require spectralconvergence.

Page 23: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

CONSISTENT MANIFOLD REPRESENTATION 23

The geometric prior assumes that the set of points that have positive samplingdensity is a smooth manifold M ≡ {x ∈ Rn : q(x) > 0} where q is a smoothsampling density. (This is a natural definition since regions of zero density will notbe observed in data sets.) Some weak assumptions on the manifoldM are required.The theory of [1, 12] assumes that the manifold is compact, implying that the densityq must be bounded away from zero on M. The theory of [20, 21, 22] showed thatthis assumption could be relaxed, and together with the statistical analysis in [4]allows a large class of noncompact manifolds with densities that are not boundedaway from zero. (Note that since q is continuous, if M is compact the minimumvalue of q cannot be zero since it is attained onM which is defined to have nonzerodensity.) In this article we requireM to have injectivity radius bounded below andcurvature (intrinsic and extrinsic) bounded above; these technical assumptions holdfor all compact manifolds and were introduced to allow application to noncompactmanifolds in [20].

Assumption 1. The data points are sampled from a density q : Rn → [0,∞) suchthat M ≡ {x ∈ Rn : q(x) > 0} is a C3 Riemannian manifold with finitely manyconnected components and q restricted to M is a C3 function. The manifold Minherits a metric g from the ambient space, and the conformal metric q2/dg hasinjectivity radius bounded away from zero and curvature and second fundamentalform bounded above. (Note that M may be noncompact and may have a boundary.)

There are several algorithms for estimating the Laplace-Beltrami operator, how-ever the particularly powerful construction in [4] is currently the only estimatorwhich allows the sampling density q to be arbitrarily close to zero. Since we areinterested in addressing the problem of non-uniform sampling, we will apply the re-sult of [4] for variable bandwidth kernels. However, the method of [4] used a specialweighted graph Laplacian in order to approximate the Laplace-Beltrami operator.The weighted graph Laplacian uses additional normalizations which were first in-troduced in the diffusion maps algorithm [12] in order to counteract the effect of thesampling density. The goal of this paper is to use an unweighted graph Laplacianto approximate the Laplace-Beltrami operator, since this allows us to compute thetopology of the graph using fast combinatorial algorithms. In fact, the unweightedgraph Laplacian will converge to the Laplace-Beltrami operator of the embeddedmanifold in the special case of uniform sampling. The power of our new graphconstruction is that we can recover a Laplace-Beltrami operator for the manifoldfrom an unweighted graph Laplacian, even when the sampling is not uniform.

Let q(x) represent the sampling density of a data set {xi}Ni=1 ⊂ M ⊂ Rn. Wecombine a global scaling parameter δ with a local scaling function ρ(x) to definethe combined bandwidth δρ(x). Consider the symmetric variable bandwidth kernel

Wδ(x, y) = h

(||x− y||2

δ2ρ(x)ρ(y)

)(10)

for any shape function h : [0,∞)→ [0,∞) that has exponential decay. For a functionf and any point x ∈M we can form a Monte-Carlo estimate of the integral operatorgiven by

E

1

N

N∑j=1

Wδ(x, xj)f(xj)

=

∫MWδ(x, y)f(y)q(y) dV (y). (11)

Page 24: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

24 TYRUS BERRY AND TIMOTHY SAUER

From [4] the expression in (11) has the asymptotic expansion

δ−m∫M

Wδ(x, y)f(y)q(y) dV (y)

= m0fqρm + δ2

m2ρm+2

2

[ωfqρ−2 + L(fq)

]+O(δ4) (12)

where

Lh ≡ −∆h+ (m+ 2)∇ log(ρ) · ∇h,∆ is the positive definite Laplace-Beltrami operator, and ∇ is the gradient operator,both with respect to the Riemannian metric that M inherits from the ambientspace Rn. (Note that in [4] the expansion (12) is written with respect to thenegative definite Laplace Beltrami operator, whereas here we consider the positivedefinite version.) In the right-hand side of (12), all functions are evaluated at x.The function ω = ω(x) depends on the shape function h and the curvature of themanifold at x.

6.1. Pointwise and integrated bias and variance. The standard graph Lapla-cian construction starts with a symmetric affinity matrix W whose ij entry quanti-fies similarity between nodes xi and xj . We assume in the following that W = Wδ

from (10), which includes the CkNN construction as a special case. Define the di-agonal normalization matrix D as the row sums of W , i.e. Dii =

∑jWij . The

unnormalized graph Laplacian matrix is then

Lun = D −W. (13)

We interpret this matrix as the discrete analog of an operator. Given a function f ,

by forming a vector ~fj = f(xj) the matrix vector product is(Lun

~f)i

= fiDii −N∑j=1

Wijfj =∑j 6=i

Wij(fi − fj) (14)

so that Lun~f is also a vector that represents a function on the manifold. Theorem

3 below makes this connection rigorous by showing that for an appropriate factor c

that depends on N and δ, the vector c−1(Lun~f)i is a consistent statistical estimator

of the differential operator

Lq,ρf ≡ ρm+2 [fLq − L(fq)] = qρm+2[∆f −∇ log

(q2ρm+2

)· ∇f

](15)

where all functions are evaluated at the point xi. The last equality in (15) fol-lows from the definition of L and applying the product rules for positive definiteLaplacian ∆(fg) = f∆g + g∆f − 2∇f · ∇g and the gradient ∇(fg) = f∇g + g∇f .Theorem 3 shows that Lun is a pointwise consistent statistical estimator of a dif-ferential operator. In fact consistency was first shown in [44] and the bias of thisestimator was first computed in [4]. Here we include the variance of this estimatoras well.

Theorem 3 (Pointwise bias and variance of Lun). Let M be an m-dimensionalRiemannian manifold embedded in Rn, let q :M→ R be a smooth density function,and define Lun as in (13). For {xi}Ni=1 independent samples of q, and f ∈ C3(M)we have

E[c−1

(Lun

~f)i

]= Lq,ρf(xi) +O(δ2) (16)

Page 25: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

CONSISTENT MANIFOLD REPRESENTATION 25

and

var[c−1

(Lun

~f)i

]= a

δ−m−2

N − 1ρ(xi)

m+2q(xi)||∇f(xi)||2 +O(N−1, δ2) (17)

where c ≡ m2

2 (N − 1)δm+2 and a ≡ 4m2,2

m22

are constants defined in the table below.

Before proving Theorem 3, we comment on the constants such as m0,m2 andm2,2, which depend on the shape of the kernel function. It was originally shownin [12] that the expansion (12) holds for any kernel with exponential decay atinfinity. A common kernel used in geometric applications is the Gaussian h(||z||2) =exp(−||z||2/2), due to its smoothness. For topological applications, we are interestedin building an unnormalized graph, whose construction corresponds to a kerneldefined by the indicator function 1||z||2<1. The sharp cutoff function enforces thefact that each pair of points is either connected or not. (The indicator kernel satisfies(12) since it has compact support and thus has exponential decay.)

In the table below we give formulas for all the constants which appear in theresults, along with their values for the Gaussian and indicator functions (note thatBm is the unit ball in Rm).

Constant Formula h(x) = e−x/2 h(x) = 1x<1

m0

∫Rm h(||z||2) dz (2π)m/2 vol(Bm)

m2

∫Rm z21h(||z||2) dz (2π)m/2 (m+ 2)−1vol(Bm)

m2,2

∫Rm z21h(||z||2)2 dz 2−1πm/2 (m+ 2)−1vol(Bm)

a 4m2,2(m2)−2 21−mπ−m/2 4(m+ 2)vol(Bm)−1

It turns out that the variance of the statistical estimators considered below areall proportional to the constant a. As a function of the intrinsic dimension m, theconstant a decreases exponentially for the Gaussian kernel. On the other hand,since the volume of a unit ball decays like m−m/2−1/2 for large m, the constant aincreases exponentially for the indicator kernel.

Proof of Theorem 3. Notice that the term i = j is zero and can be left out of

the summation, this allows us to consider the expectation of Lun~f only over the

terms i 6= j which are identically distributed. From (12) the i-th entry of the vector(Lun

~f)i

has expected value

E[(Lun

~f)i

]=m2

2(N − 1)(δρ)m+2 (fLq − L(fq)) +O(Nδm+4)

and by (15) we have Lq,ρ ≡ ρm+2(fLq − L(fq)). Dividing by the constant c yields(16), which shows that Lun is a pointwise consistent estimator with bias of orderδ2.

We can also compute the variance of this estimator defined as

var(c−1

(Lun

~f)i

)≡ E

[(c−1

(Lun

~f)i− E

[c−1

(Lun

~f)i

])2]

= c−2E

N∑j 6=i,j=1

fiWij −Wijfj − E [fiWij −Wijfj ]

2 .(18)

Page 26: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

26 TYRUS BERRY AND TIMOTHY SAUER

Since xj are independent we have

var(c−1

(Lun

~f)i

)= c−2(N − 1)E

[(fiWij −Wijfj − E [fiWij −Wijfj ])

2]

= c−2(N − 1)E[(fiWij −Wijfj)

2]− c−2(N − 1)E [fiWij −Wijfj ]

2

Notice that for the last term we have

c−2(N − 1)E [fiWij −Wijfj ]2

=1

N − 1E[N − 1

c(fiWij −Wijfj)

]2(19)

=1

N − 1(Lq,ρf(xi))

2 +O(δ2)

this term will be higher order so we summarize it as O(N−1, δ2). Computing theremaining term we find the variance to be

var(c−1

(Lun

~f)i

)= c−2(N − 1)E

[(fiWij −Wijfj)

2]

+O(N−1, δ2)

= c−2(N − 1)E[f2iW

2ij − 2fiW

2ijfj +W 2

ijf2j

]+O(N−1, δ2)

(20)

Since W 2ij is also a local kernel with moments m0,2 and m2,2 the above asymptotic

expansions apply and we find

E[f2iW

2ij − 2fiW

2ijfj +W 2

ijf2j

]=m2,2

2(δρ)m+2

(f2Lq − 2fL(fq) + L(f2q)

)+O(δm+4)

= m2,2(δρ)m+2q||∇f ||2 +O(δm+4)

where the last equality follows from the applying the product rule. Substituting theprevious expression into (20) verifies (17).

Theorem 3 gives a complete description of the pointwise convergence of the dis-crete operator Lun. While the expansion (16) shows that the matrix c−1Lun ap-proximates the operator Lq,ρ, it does not tell us how the eigenvectors of c−1Lun

approximate the eigenfunctions of Lq,ρ. That relationship is the subject of Theo-rem 4 below.

Since c−1Lun is a symmetric matrix, we can interpret the eigenvectors ~f as thesequential orthogonal minimizers of the functional

Λ(f) =~f >c−1Lun

~f

~f > ~f. (21)

By dividing the numerator and denominator by N , we can interpret the inner

product N−1 ~f > ~f as an integral over the manifold since

E[N−1 ~f > ~f

]= E

[1

N

N∑i=1

f(xi)2

]=

∫Mf(x)2q(x) dV (x) =

⟨f2, q

⟩dV.

It is easy to see that the above estimator has variance N−1[⟨f4, q

⟩dV−⟨f2, q

⟩2dV

].

Similarly, in Theorem 4 we will interpret 1N~f>c−1Lun

~f as approximating an integralover the manifold.

Page 27: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

CONSISTENT MANIFOLD REPRESENTATION 27

Theorem 4 (Integrated bias and variance of Lun). Let M be an m-dimensionalRiemannian manifold embedded in Rn and let q : M → R be a smooth densityfunction. For {xi}Ni=1 independent samples of q, and f ∈ C3(M) we have

E[(cN)−1 ~f >Lun

~f]

= 〈f, qLq,ρf〉dV +O(δ2) (22)

and

var(

(cN)−1 ~f >Lun~f)

=aδ−m−2

N(N − 1)

⟨f2, qLq,ρ(f2)

⟩dV

+4

N

⟨f2, q(Lq,ρf)2

⟩dV

+O(δ2). (23)

So assuming the inner products are finite (for example if M is a compact manifold)

we have var(

(cN)−1 ~f >Lun~f)

= O(N−2δ−m−2, N−1, δ2).

Proof. By definition we have

E[c−1

N~f>Lun

~f

]= E

c−1∑i 6=j

(Lun)ij~fi ~fj

= E[f(xi)c

−1(Lun

~f)i

]

where the term i = j is zero and so the sum is over the N(N−1) terms where i 6= j.Since xi are sampled according to the density q we have

E[c−1

N~f>Lun

~f

]=

∫fq2ρm+2

(∆f +∇ log(q2ρm+2) · ∇f

)dV +O(δ2)

=⟨f, q2ρm+2

(∆f +∇ log(q2ρm+2) · ∇f

)⟩L2(M,dV )

+O(δ2)

= 〈f, qLq,ρf〉dV +O(δ2). (24)

We can now compute the variance of the estimator c−1

N~f>Lun

~f which estimates〈f, qLq,ρf〉dV . To find the variance we need to compute

E

[(c−1

N~f>Lun

~f

)2]

=c−2

N2E

∑i 6=j

(Lun)ij fifj

∑k 6=l

(Lun)kl fkfl

(25)

Notice that when i, j, k, l are all distinct, by independence we can rewrite theseterms of (25) as

a1N2

E

∑i6=j

c−1 (Lun)ij fifj

E

∑k 6=l

c−1 (Lun)kl fkfl

= a1 〈f, qLq,ρf〉2dV . (26)

The constant a1 ≡ (N−2)(N−3)N(N−1) accounts for the fact that of the N2(N − 1)2 total

terms in (25), only N(N −1)(N −2)(N −3) terms have distinct indices. Since i 6= jand k 6= l, we next consider the terms where either i ∈ {k, l} or j ∈ {k, l} but notboth. Using the symmetry of Lun, by changing index names we can rewrite all four

Page 28: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

28 TYRUS BERRY AND TIMOTHY SAUER

combinations as i = k so that these terms of (25) can be written as

4Exi

1

N2

∑i

Exj

∑j

c−1 (Lun)ij fifj

Exl

[∑l

c−1 (Lun)il fifl

]=

4

N2Exi

[∑i

f(xi)2(Lq,ρf(xi))

2

]

=4

N

⟨f2, q(Lq,ρf)2

⟩dV

(27)

Finally, we consider the terms where i ∈ {k, l} and j ∈ {k, l}. By symmetry werewrite the two possibilities as i = k and j = l and these terms become,

2c−2

N2E

∑i 6=j

(Lun)ij fifj (Lun)ij fifj

=2a2c

−1

N

⟨f2, qLq,ρ(f2)

⟩dV

(28)

where the constant a2 ≡ m2,2

m2is the ratio between the second moment m2 of the

kernel W and the second moment m2,2 of the squared kernel W 2.We can now compute the variance of the estimator

var

(c−1

N~f>Lun

~f

)= E

[(c−1

N~f>Lun

~f

)2]− E

[c−1

N~f>Lun

~f

]2= E

[(c−1

N~f>Lun

~f

)2]− 〈f, qLq,ρf〉2dV +O(δ2)

=−4N + 6

N(N − 1)〈f, qLq,ρf〉2dV +

4

N

⟨f2, q(Lq,ρf)2

⟩dV

+ 4a2m−12

δ−m−2

N(N − 1)

⟨f2, qLq,ρ(f2)

⟩dV

+O(δ2).

In particular this says that

var

(c−1

N~f >Lun

~f

)= O

(N−2δ−m−2, N−1, δ2

)assuming that all the inner products are finite.

We should note that determining whether the inner product is finite is nontrivialwhen the manifold in question is unbounded and when the sampling density qis not bounded away from zero. We will return to this issue below. First, wecompute the bias and variance of the spectral estimates. For wider applicability,

we consider the generalized eigenvalue problem, c−1Lun~f = λM ~f for any diagonal

matrix Mii = µ(xi) which corresponds to the functional

Λ(f) =~f >c−1Lun

~f

~f >M ~f.

Notice that N−1E[~f >M ~f ] =⟨f2, µq

⟩dV

and this estimator has variance

var(N−1 ~f >M ~f

)= N−1

(⟨f4, µ2q

⟩dV−⟨f2, µq

⟩2dV

)= O(N−1).

A particular example which draws significant interest is the so-called ‘normalizedgraph Laplacian’ where M = D, implying that µ(xi) = q(xi)ρ(xi)

m +O(δ2).

Page 29: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

CONSISTENT MANIFOLD REPRESENTATION 29

Theorem 5 (Spectral bias and variance). Under the same assumptions as Theorem4 we have

E[Λ(f)] = E

[~f >c−1Lun

~f

~f >M ~f

]=〈f, qLq,ρf〉dV〈f2, µq〉dV

+O(δ2, N−1) (29)

and

var (Λ(f)) = aδ−m−2

N(N − 1)

⟨f2, qLq,ρ(f2)

⟩dV

〈f2, µq〉2dV+

4

N

⟨f2, q(Lq,ρf)2

⟩dV

〈f2, µq〉2dV+O(δ2, N−1).

(30)

So assuming the inner products are finite (for example if M is a compact manifold)

we have var(

(cN)−1 ~f >Lun~f)

= O(N−2δ−m−2, N−1, δ2).

Proof. We consider Λ(f) to be a ratio estimator of the form Λ(f) = ab where

a = N−1 ~f >c−1Lun~f and b = N−1 ~f >M ~f . The correlation of a and b is given by

E[(a− a)(b− b)] =m2δ

m+2

2N2(N − 1)

∑i 6=j,k

f(xi)(Lun)ijf(xj)f(xk)2µ(xk)− ab = O(N−1)

since the sum of the terms with k = i or k = j is clearly order N−1, and when bothk 6= i and k 6= j the expectation is equal to ab by independence. Since the varianceof b and the correlation are both order N−1 by the standard ratio estimates, wehave

E[ab

]=a

b+O(N−1)

and

var(ab

)=

var(a)

b2 +O(N−1).

Combined with Theorem 4, these equations yield the desired result.

Comparing Theorems 3 and 5 we find a surprising result, namely that the optimalδ for spectral approximation of the operator Lq,ρ is different from the optimal δ forpointwise approximation. To our knowledge this has not been noted before in theliterature. We find the optimal choice of δ by balancing the squared bias with thevariance of the respective estimators. For the optimal pointwise approximation weneed δ4 = N−1δ−m−2 so the optimal choice is δ ∝ N−1/(m+6) and the combinederror of the estimator is then O(N−2/(m+6)). In contrast, for optimal spectralapproximation we need δ4 = N−2δ−m−2 so the optimal choice is

δ ∝ N−2/(m+6)

and the combined error (bias and standard deviation) is

O(N−4/(m+6)).

The one exception to this rule is the case m = 1, where the second term in thespectral variance dominates, so that the optimal choice is δ ∝ N−1/3 and thecombined error is order N−1.

Since graph Laplacians are often used spectrally (for example to find low-dimen-sional coordinates with the diffusion map embedding [1, 12], spectral clustering[33, 48], applications to time series analysis [19, 3], and the spectral exterior calculus[6]) this implies that the choice of bandwidth δ should be significantly smaller in

Page 30: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

30 TYRUS BERRY AND TIMOTHY SAUER

0 π 2π

θ

-1.5

-1

-0.5

0

0.5

1

1.5

Poin

twis

e E

stim

ate

Truth

N=1000

N=5000

N=20000

(a)

0 2 4 6 8Number of Eigenvalue

0

2

4

6

8

10

12

14

16

Eig

en

va

lue

TruthN=1000N=5000N=20000

(b)

0 π 2π

θ

-4

-2

0

2

4

Poin

twis

e E

stim

ate

Truth

N=1000

N=5000

N=20000

(c)

0 2 4 6 8Number of Eigenvalue

0

2

4

6

8

10

12

14

16

Eig

en

va

lue

TruthN=1000N=5000N=20000

(d)

Figure 12. For N ∈ {1000, 5000, 20000} uniformly randomdata points on a unit circle. (a) Using δ = 3N−1/7 and letting~fi = sin(θi) we compare the pointwise estimate

(c−1Lun

~f)i

(red,

dashed) to the true operator ∆f(xi) = − sin(θi) (black, solid). (b)Using δ = 3N−1/7 we compare the first 9 eigenvalues of c−1Lun

(red, dashed) to the first 9 eigenvalues of ∆ (black, solid). (c,d)Same as (a,b) respectively but with δ = 3N−1/3.

these applications than suggested in the literature [40, 4]. We demonstrate this forthe cutoff kernel on a circle in the example below.

Example 9 (Pointwise and spectral estimation on the unit circle). We illustrateby example the difference in bandwidth required for pointwise versus spectral ap-proximation that are implied by Theorems 3 and 5. Consider data sets consistingof N ∈ {1000, 5000, 20000} points {xi = (sin(θi), cos(θi))

>}Ni=1 sampled uniformlyfrom the unit circle and the unnormalized graph Laplacian c−1Lun constructed us-ing the cutoff kernel. We first chose δ = 3N−1/(m+6) = 3N−1/7 (the constant 3 wasselected by hand tuning) which is optimal for pointwise estimation of the Laplace-Beltrami operator ∆ as shown in Figure 12(a)(b). Next, we set δ = 3N−1/3 whichis optimal for spectral estimation as shown in Figure 12(c)(d). This demonstrateshow spectral estimation is optimal for a much smaller value of δ that pointwiseestimation. Despite the extremely poor pointwise estimation when δ = 3N−1/3,

Page 31: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

CONSISTENT MANIFOLD REPRESENTATION 31

the spectral estimation, which involves an integrated quantity, is superior using thissignificantly smaller value of δ. Notice that the relative error in the eigenvaluesincreases as the eigenvalues increase. This phenomenon is explained at the end ofthe next section.

In general we only know that the optimal choice is δ ∝ N−2/(m+6) for m > 1 andδ ∝ N−1/3 for m = 1, and the constants for the optimal choice are quite complex.However, this does indicate the correct order of magnitude for δ, especially for largedata sets. Figure 12 dramatically illustrates the differences in optimal pointwise andoptimal spectral estimation. As mentioned above, graph Laplacians are often usedfor spectral approximation, so previous analyses which tune δ for optimal pointwiseestimation [40, 4] are misleading for many applications.

In Figure 13 we verify the power laws for the optimal choice of δ for pointwiseand spectral estimation. For N ∈ {250, 500, 1000, 2000, 4000, 8000} we compute thepointwise and spectral root mean squared error (RMSE) for a wide range of valuesof δ and then plot the optimal value of δ as a function of N . To estimate thepointwise error for a fixed N and δ we generated N uniformly random points ona circle {xi = (sin(θi), cos(θi))

>}Ni=1 and then construct the unnormalized graphLaplacian c−1Lun using the cutoff kernel. We then multiplied this Laplacian matrix

by the vector ~fi = sin(xi) so that(c−1Lun

~f)i≈ ∆ sin(xi) = − sin(xi) and we

computed the RMSE (N∑i=1

((c−1Lun

~f)i− (− sin(xi))

)2)1/2

between the estimator and the limiting expectation. We repeated this numericalexperiment 10 times for each value of N and δ and the average RMSE is shown inFigure 13(a). To estimate the spectral error, we computed the smallest 5 eigenvaluesof c−1Lun and found the RMSE between these eigenvalues and the true eigenvalues[0, 1, 1, 4, 4] of the limiting operator ∆. This numerical experiment was repeated10 times for each value of N and δ and the average RMSE is shown in Figure13(b). Finally, in Figure 13(c) we plot the value of δ which minimized the pointwiseerror (black) and the spectral error (red) for each value of N and we compare thesedata points to the theoretical power laws for the pointwise error δ ∝ N−1/7 (black,dashed line) and spectral error δ ∝ N−1/3 (red, dashed line) respectively.

6.2. Limiting geometries and spectral convergence. We now show that theoperator that is approximated spectrally in Theorem 5 is a Laplace-Beltrami opera-tor on the manifoldM, but with respect to a conformal change of metric. Considerthe functional approximated spectrally by Lun as shown in (29) which is

Λ(f)→〈f, qLq,ρf〉dV〈f2, µq〉dV

=

⟨f, q2ρm+2

(∆f −∇ log q2ρm+2 · ∇f

)⟩dV

〈f2, µq〉dV. (31)

A conformal change of metric corresponds to a new Riemannian metric g ≡ ϕg,where ϕ(x) > 0, and which has Laplace-Beltrami operator

∆gf =1

ϕ(∆f − (m− 2)∇ log

√ϕ · ∇f). (32)

In order to rewrite the operator in (31) as a Laplace-Beltrami operator, the function

ϕ must be chosen to be ϕm−2

2 = q2ρm+2. Moreover, we need to change the inner

Page 32: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

32 TYRUS BERRY AND TIMOTHY SAUER

10-1

100

δ

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Poin

twis

e R

MS

E

N=250

N=500

N=1000

N=2000

N=4000

N=8000

(a)

10-1

100

δ

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Spectr

al R

MS

E

N=250N=500N=1000N=2000N=4000N=8000

(b)

250 500 1000 2000 4000 8000

Number of data points, N

0.5

1

1.5

Op

tim

al va

lue

of δ

Pointwise optimal δ

Theory based, δ∝ N-1/7

Spectral optimal δ

Theory based, δ∝ N-1/3

(c)

Figure 13. For N ∈ {250, 500, 1000, 2000, 4000, 8000} we show(a) the root mean squared error (RMSE) of the pointwise approx-

imation of c−1Lun~f ≈ − sin(xi) where ~fi = sin(xi) and (b) the

RMSE of the first 5 eigenvalues of c−1Lun. Both (a) and (b) areaveraged over 10 random data sets for each N and each δ and theminimum RMSE in each curve is highlighted with a black point.(c) For each N , we plot the optimal δ for pointwise approximation(black points) and the theoretical power law δ ∝ N−1/7 (black,dashed line) and the optimal δ for spectral approximation (redpoints) and the theoretical power law δ ∝ N−1/3 (red, dashed line).

product in (31) to dV which is the volume form of the new metric given by

dV =√|g| =

√|ϕg| = ϕm/2

√|g| = ϕm/2 dV. (33)

Changing the volume form and substituting ∆g in (31) we find

Λ(f)→⟨f, q2ρm+2ϕ1−m/2∆gf

⟩dV⟨

f2, µqϕ−m/2⟩dV

=〈f,∆gf〉dV⟨

f2, µqϕ−m/2⟩dV

where the second equality follows from the definition ϕm−2

2 = q2ρm+2. Finally, inorder for the denominator to represent the appropriate spectral normalization we

Page 33: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

CONSISTENT MANIFOLD REPRESENTATION 33

require

µ = q−1ϕm/2 = qm+2m−2 ρm( m+2

m−2 ).

which implies that

Λ(f)→〈f,∆gf〉dV〈f, f〉dV

.

This immediately shows that when m 6= 2 we can always choose ρ, µ to estimatethe operator ∆g for any conformally isometric geometry g = ϕg. In the special case

m = 2 this can also be achieved by setting ρ = q−1/2 and µ = qϕ−1. However, ifwe assume that ρ = qβ for some power β, then there are three choices of ρ, µ whichhave the same geometry for every dimension m, we summarize these in the tablebelow.

Geometry g dV ρ µ

Sampling measure geometry q2/dg q dV q−1/m 1

Embedding geometry g dV q−2/(m+2) q−1

Inverse sampling geometry q−1g q−d/2 dV q−1/2 qm/2+1

For the choice β = −1/m we find µ = 1, which is explored in the main body of thepaper. The choice µ = 1 is the only one allowing an unweighted graph construction.To reconstruct the embedding geometry, one can estimate the Laplace-Beltramioperator ∆g using β = −2

m+2 . Finally, if we select β = −1/2 we find µ = qm/2+1,

which is closely related to the results of [4].The above construction shows that an appropriately chosen graph Laplacian is

a consistent estimator of the normalized Dirichlet energy

Λ(f)→〈f,∆gf〉dV〈f, f〉dV

=

∫M ||∇gf ||

2dV∫M f2dV

for any conformally isometric geometry g = ϕg. The minimizers of this normalizedDirichlet energy are exactly the eigenfunctions of the Laplace-Beltrami operator∆g, so Theorem 5 is the first step towards spectral convergence. However, in thedetails of Theorem 5 there is a significant barrier to spectral convergence. Thisbarrier is a very subtle effect of the second term in the variance of the estimator.Rewriting (30) from Theorem 5 in terms of the new geometry we find

var (Λ(f))

=aδ−m−2

N(N − 1)

⟨f2,∆g(f

2)⟩dV

〈f, f〉2dV+

4

N

⟨f2, q−1ϕm/2(∆gf)2

⟩dV

〈f, f〉2dV+O(δ2, N−1). (34)

The first term in (34) normally controls the bias-variance trade-off since it divergesas δ → 0, and the second term is typically higher order. However, the integralwhich defines the constant in the second error term has the potential to be infinitewhen q is not bounded below. Ignoring the terms which depend on f , we needq−1ϕm/2 dV = q−1ϕm dV to be integrable in order for the variance of the estimatorto be well-defined, which proves the following result.

Theorem 6 (Spectral Convergence, Part 1). Let {xi}Ni=1 be sampled from a den-sity q. Consider a conformally equivalent metric g = ϕg such that q−1ϕm is inte-grable with respect to dV . Define the unnormalized Laplacian Lun using a kernel

Page 34: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

34 TYRUS BERRY AND TIMOTHY SAUER

Kδ(x, y) = h(||x−y||2δ2ρ(x)ρ(y)

)for any h with exponential decay where

ρ =

{q

2m+2ϕ

m−2m+2 m 6= 2

q−1/2 m = 2

and define Mii = µ(xi) where

µ =

{q−1ϕm/2 m 6= 2qϕ−1 m = 2

then for f ∈ C3(M, g) the functional Λ(f) =~f >c−1Lun

~f~f >M ~f

which corresponds to the

generalized eigenvalue problem c−1Lun~f = λM ~f is a consistent estimator of the

normalized Dirichlet energy functional

E[Λ(f)]→N→∞〈f,∆gf〉dV〈f, f〉dV

=

∫M ||∇gf ||

2dV∫M f2dV

and the estimator Λ(f) has bias of leading order δ2 and finite variance which is ofleading order δ−m−2N−2. The optimal choice δ ∝ N−2/(m+6) yields a root meansquared error of order N−4/(m+6).

In particular, for the choice ρ = q−2/(m+2) we find q−1ϕm = q−1 which will oftennot be integrable on a non-compact manifold when q is not bounded away fromzero. This implies that we cannot spectrally approximate the Laplace-Beltramioperator with respect to the embedding metric for many non-compact manifolds.Notice, that this variance barrier only applies to spectral convergence, so we canstill approximate the operator pointwise using Theorem 3. Similarly, when β =−1/2 we find q−1ϕm = q−1−m and the exponent on q is negative, also leading tothe possibility of divergence on non-compact manifolds. This shows yet anotheradvantage of the choice β = −1/m, since q−1ϕm = q which is simply the samplingmeasure and is always integrable with respect to dV by definition.

Theorem 6 shows that the functional Λ(f) converges to the normalized Dirichletenergy functional whose minimizers are the eigenfunctions of the Laplace-Beltramioperator ∆g. It remains only to show that vectors which minimize the discretefunctional Λ(f) converge to the minimizers of the normalized Dirichlet energy func-tional. In fact this has already been shown in [48] assuming that the spectrum of∆g is discrete.

Next, will address an issue of spectral convergence that was introduced in [48],which suggests that unnormalized graph Laplacians have worse spectral convergenceproperties than normalized Laplacians. The theory in that article is the basis ofour spectral convergence result below. However, a subtle detail reveals that theunnormalized Laplacian c−1Lun does not suffer from the spectral convergence issue

they consider. For an unnormalized Laplacian, Lun = D−W where Dii =∑Nj=1Wij

is the degree function, “Result 2” of [48] states that if the first r eigenvalues of thelimiting operator Lq,ρ do not lie in the range of the degree function

d(x) = limN→∞

N∑j=1

W (x, xj),

then the first r eigenvalues of the unnormalized Laplacian Lun converge to those ofthe limiting operator (spectral convergence). This would suggest that the spectralconvergence only holds for eigenvalues which are separated from the range of Dii.

Page 35: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

CONSISTENT MANIFOLD REPRESENTATION 35

However, notice that we divide Lun by the normalization constant c = O(Nδm+2)

whereas limN→∞∑Nj=1W (x, xj) ∝ q(x)Nδm which implies that

c−1d(x) = O(δ−2).

Since δ → 0 as N → ∞, this implies that the range of the true degree functionc−1d(x) approaches ∞ as N grows. This special class of unnormalized Laplaciansavoids this difficulty because the first order term of the degree function in exactlycancelled by the first order term of the kernel Wij in the limit of large data, whichis why the constant c is higher order than the degree function in terms of δ.

Finally, while Theorem 6 gives an optimal bias-variance tradeoff of δ ∝ N−2/(m+6),this may not be sufficient for spectral convergence in high dimensions. Taking δ → 0and N →∞ simultaneously was first addressed in [2] and more recently in [45, 46,

39]. The best results [46] require a bounded manifold and also(N−1 logN

)1/m< δ

but the optimal bias-variance tradeoff only satisfies this constraint for m < 6, so

for m ≥ 6 we require δ =(N−1 logN

)1/m. While [46] has the best current result,

we conjecture that spectral convergence still holds on unbounded manifolds whenthe spectrum ∆g is discrete and also when δ ∝ N−2/(m+6) for m ≥ 6.

Theorem 7 (Spectral Convergence, Part 2). Under the assumptions of Theorem6, in the limit of large data the eigenvalues of c−1M−1Lun converge to those of thelimiting operator ∆g, assuming the spectrum is discrete and M is bounded.

In particular, for a manifold without boundary or with a smooth boundary, spec-tral convergence holds for the unnormalized Laplacian when ρ = q−1/m and µ = 1.

Proof. Notice that the eigenvalues of c−1M−1Lun~f = λ~f exactly correspond to the

minimizers of the functional Λ(f) in Theorem 6. The first claim follows from theconvergence of the functional Λ(f) to the normalized Dirichlet energy combinedwith convergence results of [46] as discussed above.

The guarantee of spectral convergence for ρ = q−1/m and µ = 1 follows from thefact that q−1ϕm = q is always integrable. Moreover, the volume of the manifold withrespect to dV = q dV is always voldV (M) =

∫M q dV = 1, and for any manifold

of finite volume with a smooth boundary the spectrum of the Laplace-Beltramioperator is discrete [11].

Since the matrix c−1M−1Lun is not symmetric it is numerically preferable to

solve the symmetric generalized eigenvalue problem c−1Lun~f = λM ~f .

Finally, it is often noticed empirically that the error in the spectral approx-imations increases as the value of the eigenvalue increases. To understand thisphenomenon, we will use the variance formula (34). First, if f is an eigenfunctionwith ∆gf = λf we note that⟨

f2,∆g(f2)⟩dV

=⟨f2, 2f∆gf − 2∇gf · ∇gf

⟩dV

= 2λ⟨f2, f2

⟩dV− 2

⟨f2,∇gf · ∇gf

⟩dV

= 2λ⟨f2, f2

⟩dV− 2

⟨−divg

(f2∇gf

), f⟩dV

= 2λ⟨f2, f2

⟩dV− 2

3

⟨−divg

(∇gf3

), f⟩dV

= 2λ⟨f2, f2

⟩dV− 2

3

⟨∆g(f

3), f⟩dV

= 2λ⟨f2, f2

⟩dV− 2

3

⟨f3,∆gf

⟩dV

=4

3λ⟨f2, f2

⟩dV.

Page 36: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

36 TYRUS BERRY AND TIMOTHY SAUER

Applying this formula to (34) we find λ ≡ Λ(f) is a consistent estimator of theeigenvalue λ with relative variance

var(λ)

λ= a

δ−m−2

N(N − 1)

4⟨f2, f2

⟩dV

3 〈f, f〉2dV+

N

⟨f2, q−1ϕm/2f2

⟩dV

〈f, f〉2dV+O(δ2, N−1).

Although the first term (which is typically the leading order term, especially oncompact manifolds) can be tuned to give a constant relative error, the second termstill grows proportionally to the eigenvalue λ. We should expect the spectrum tobe fairly accurate until the order of the second term is equal to that of the firstterm, at which point the relative errors in the eigenvalues will grow rapidly. Whenϕ = q2/m as in the CkNN construction, the inner products in the two terms are thesame, so the spectrum will be accurate when

λ < λmax ≡aδ−m−2

3(N − 1)=aN2(m+2)/(m+6)

3(N − 1)= O

(N

m−2m+6

)where the second and third equalities hold for the optimal choice δ ∝ N−2/(m+6).This constraint does not apply to the case m = 1, where the second term in (34)dominates, or to the case m = 2 where the second and first terms are equal order forthe optimal choice of δ. In all cases, the relative error increases as the eigenvaluesincrease.

7. Conclusion. We have introduced a new method called continuous k-nearestneighbors as a way to construct a single graph from a point cloud, that is provablyconsistent on connected components. By proving the consistency of the geometry(spectral convergence of the graph Laplacian to the Laplace-Beltrami operator)we support our conjecture that CkNN is topologically consistent, meaning thatcorrect topology can be extracted in the large data limit. For many finite-dataexamples from compact Riemannian manifolds, we have shown that CkNN comparesfavorably to persistent homology approaches.

The proposed method replaces a small ε radius, or k in the k-nearest neighborsmethod, with a unitless continuous parameter δ. The theory proves the existenceof a correct choice of δ, and it needs to be tuned in specific examples. Whilethe difference between the CkNN and the kNN constructions is fairly simple, thecrucial difference is that the approximation (4) only holds for k small, relative toN . By varying the parameter δ and holding k constant at a small value we canconstruct multi-scale approximations of our manifold that are still consistent withthe underlying manifold. This contrasts with the standard kNN approach, wherethe parameter k is varied and both the coarseness of the manifold approximationand the underlying manifold geometry are changing simultaneously (because thescaling function is changing).

Surprisingly, a careful analysis of the bias and variance of the graph Laplacian asa spectral estimator of the Laplace-Beltrami operator is a key element of the proofof consistency. The variance can be infinite on non-compact manifolds, dependingon the geometry, creating a previously unknown barrier to spectral convergence.The variance calculation also allows us to explain why the relative error of theeigenvalue increases along with the eigenvalue, and we determine the part of thespectrum that can estimated with constant relative error, as a function of the datasize N .

We would like to thank D. Giannakis for helpful conversations.

Page 37: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

CONSISTENT MANIFOLD REPRESENTATION 37

REFERENCES

[1] M. Belkin and P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data repre-

sentation, Neural Computation, 15 (2003), 1373–1396.[2] M. Belkin and P. Niyogi, Convergence of Laplacian eigenmaps, Advances in Neural Informa-

tion Processing Systems, (2007), 129–136.[3] T. Berry, J. R. Cressman, Z. G. Ferencek and T. Sauer, Time-scale separation from diffusion-

mapped delay coordinates, SIAM J. Appl. Dyn. Sys, 12 (2013), 618–649.

[4] T. Berry and J. Harlim, Variable bandwidth diffusion kernels, Appl. Comp. Harmonic Anal.,40 (2016), 68–96.

[5] T. Berry and T. Sauer, Local kernels and the geometric structure of data, Appl. Comp.

Harmonic Anal., 40 (2016), 439–469.[6] T. Berry and D. Giannakis, Spectral exterior calculus, arXiv preprint, arXiv:1802.01209.

[7] O. Bobrowski, S. Mukherjee, J. E. Taylor et al., Topological consistency via kernel estimation,

Bernoulli , 23 (2017), 288–328.[8] G. Carlsson, Topology and data, Bulletin of the American Mathematical Society, 46 (2009),

255–308.

[9] J. Chacon, A population background for nonparametric density-based clustering, StatisticalScience, 30 (2015), 518–532.

[10] K. Chaudhuri, S. Dasgupta, S. Kpotufe and U. von Luxburg, Consistent procedures for clustertree estimation and pruning, Information Theory, IEEE Transactions on, 60 (2014), 7900–

7912.

[11] A. Cianchi and V. Maz’ya, On the discreteness of the spectrum of the laplacian on non-compact riemannian manifolds, J. Differential Geom., 87 (2011), 469–492, URL http:

//projecteuclid.org/euclid.jdg/1312998232.

[12] R. Coifman and S. Lafon, Diffusion maps, Appl. Comp. Harmonic Anal., 21 (2006), 5–30.[13] R. Coifman, S. Lafon, B. Nadler and I. Kevrekidis, Diffusion maps, spectral clustering and

reaction coordinates of dynamical systems, Appl. Comp. Harmonic Anal., 21 (2006), 113–127.

[14] R. Coifman, Y. Shkolnisky, F. Sigworth and A. Singer, Graph Laplacian tomography fromunknown random projections, IEEE Trans. on Image Proc., 17 (2008), 1891–1899.

[15] M. Desbrun, E. Kanso and Y. Tong, Discrete differential forms for computational modeling,

in Discrete Differential Geometry, Springer, 38 (2008), 287–324.[16] H. Edelsbrunner and J. Harer, Computational Toplogy: An Introduction, American Mathe-

matical Soc., 2010.[17] S. Fazel, Zebra in mikumi.jpg, 2012, URL https://commons.wikimedia.org/wiki/File:

Zebra_in_Mikumi.JPG, https://commons.wikimedia.org/wiki/File:Zebra_in_Mikumi.JPG;

accessed June 3, 2016; Creative Commons License.[18] R. Ghrist, Barcodes: The persistent topology of data, Bulletin of the American Mathematical

Society, 45 (2008), 61–75.[19] D. Giannakis and A. J. Majda, Nonlinear laplacian spectral analysis for time series with

intermittency and low-frequency variability, Proceedings of the National Academy of Sciences,

109 (2012), 2222–2227.

[20] M. Hein, Geometrical aspects of statistical learning theory, Thesis, URL http://elib.

tu-darmstadt.de/diss/000673.

[21] M. Hein, Uniform convergence of adaptive graph-based regularization, in Learning Theory,Springer, 4005 (2006), 50–64.

[22] M. Hein, J.-Y. Audibert and U. Von Luxburg, From graphs to manifolds–weak and strong

pointwise consistency of graph Laplacians, in Learning Theory, Springer, 3559 (2005), 470–

485.[23] A. N. Hirani, Discrete Exterior Calculus, PhD thesis, California Institute of Technology, 2003.

[24] Kallerna, Scale common roach.jpg, 2009, URL https://commons.wikimedia.org/wiki/File:

Scale_Common_Roach.JPG, https://commons.wikimedia.org/wiki/File:Scale_Common_

Roach.JPG; accessed June 3, 2016; Creative Commons License.

[25] J. Latschev, Vietoris-Rips complexes of metric spaces near a closed Riemannian manifold,Archiv der Mathematik , 77 (2001), 522–528.

[26] D. Loftsgaarden, C. Quesenberry et al., A nonparametric estimate of a multivariate density

function, The Annals of Mathematical Statistics, 36 (1965), 1049–1051.[27] M. Maier, M. Hein and U. Von Luxburg, Cluster identification in nearest-neighbor graphs, in

Algorithmic Learning Theory, Springer, 2007, 196–210.

Page 38: CONSISTENT MANIFOLD REPRESENTATION FOR TOPOLOGICAL …math.gmu.edu/~tsauer/pre/TDAFDS.pdf · these drawbacks, the simplicity of these two graph constructions has led to their widespread

38 TYRUS BERRY AND TIMOTHY SAUER

[28] M. Maier, M. Hein and U. von Luxburg, Optimal construction of k-nearest-neighbor graphsfor identifying noisy clusters, Theoretical Computer Science, 410 (2009), 1749–1764.

[29] M. Maier, U. Von Luxburg and M. Hein, How the result of graph clustering methods depends

on the construction of the graph, ESAIM: Probability and Statistics, 17 (2013), 370–418.[30] B. Nadler and M. Galun, Fundamental limitations of spectral clustering methods, in Advances

in Neural Information Processing Systems 19 (eds. B. Scholkopf, J. Platt and T. Hoffman),MIT Press, Cambridge, MA, 2007.

[31] B. Nadler, S. Lafon, R. Coifman and I. Kevrekidis, Diffusion maps-a probabilistic interpre-

tation for spectral embedding and clustering algorithms, in Principal Manifolds for DataVisualization and Dimension Reduction, Springer, NY, 58 (2008), 238–260.

[32] B. Nadler, S. Lafon, I. Kevrekidis and R. Coifman, Diffusion maps, spectral clustering and

eigenfunctions of Fokker-Planck operators, in Advances in Neural Information ProcessingSystems, MIT Press, 2005, 955–962.

[33] A. Ng, M. Jordan and Y. Weiss, On spectral clustering: Analysis and an algorithm, Neur.

Inf. Proc. Soc.[34] P. Niyogi, S. Smale and S. Weinberger, Finding the homology of submanifolds with high

confidence from random samples, Discrete & Computational Geometry, 39 (2008), 419–441.

[35] S. Rosenberg, The Laplacian on a Riemannian Manifold , Cambridge University Press, 1997.[36] D. S. Rufat, Spectral Exterior Calculus and Its Implementation, PhD thesis, Califor-

nia Institute of Technology, 2017, URL http://resolver.caltech.edu/CaltechTHESIS:

05302017-094600781.

[37] B. Scholkopf, A. Smola and K. Muller, Nonlinear component analysis as a kernel eigenvalue

problem, Neural Computation, 10 (1998), 1299–1319.[38] E. Schulz and G. Tsogtgerel, Convergence of discrete exterior calculus approximations for

Poisson problems, 2016.

[39] Z. Shi, Convergence of laplacian spectra from random samples. arXiv preprint,arXiv:1507.00151.

[40] A. Singer, From graph to manifold Laplacian: The convergence rate, Applied and Computa-

tional Harmonic Analysis, 21 (2006), 128–134.[41] I. Steinwart, Fully adaptive density-based clustering, The Annals of Statistics, 43 (2015),

2132–2167.

[42] A. Tausz, M. Vejdemo-Johansson and H. Adams, JavaPlex: A research software packagefor persistent (co)homology, in Proceedings of ICMS 2014 (eds. H. Hong and C. Yap),

Lecture Notes in Computer Science 8592, 2014, 129–136, Software available at http:

//appliedtopology.github.io/javaplex/.

[43] D. G. Terrell and D. W. Scott, Variable kernel density estimation, Annals of Statistics, 20

(1992), 1236–1265.[44] D. Ting, L. Huang and M. Jordan, An analysis of the convergence of graph Laplacians, in

Proceedings of the 27th International Conference on Machine Learning (ICML), 2010.[45] N. G. Trillos, M. Gerlach, M. Hein and D. Slepcev, Error estimates for spectral convergence

of the graph Laplacian on random geometric graphs towards the Laplace-Beltrami operator,

arXiv preprint, arXiv:1801.10108.

[46] N. G. Trillos and D. Slepcev, A variational approach to the consistency of spectral clustering,Applied and Computational Harmonic Analysis, 45 (2018), 239–281.

[47] U. Von Luxburg, A tutorial on spectral clustering, Statistics and Computing, 17 (2007),395–416.

[48] U. Von Luxburg, M. Belkin and O. Bousquet, Consistency of spectral clustering, Annals of

Statistics, 36 (2008), 555–586.

[49] L. Zelnick-Manor and P. Perona, Self-tuning spectral clustering, Adv. Neur. Inf. Proc. Sys.(2005)

E-mail address: [email protected]

E-mail address: [email protected]