homology computation of large point clouds using quantum ... · homology computation of large point...

17
Homology computation of large point clouds using quantum annealing Raouf Dridi * and Hedayat Alghassi 1QB Information Technologies (1QBit) 458 – 550 Burrard Street Vancouver, British Columbia V6C 2B5, Canada Abstract Homology is a tool in topological data analysis which measures the shape of the data. In many cases, these measurements translate into new insights which are not available by other means. To compute homology, we rely on mathematical constructions which scale exponentially with the size of the data. Therefore, for large point clouds, the computation is infeasible using classical computers. In this paper, we present a quantum annealing pipeline for computation of homology of large point clouds. The pipeline takes as input a graph approximating the given point cloud. It uses quantum annealing to compute a clique covering of the graph and then uses this cover to construct a Mayer-Vietoris complex. The pipeline terminates by performing a simplified homology computation of the Mayer-Vietoris complex. We have introduced three different clique coverings and their quantum annealing formulation. Our pipeline scales polynomially in the size of the data, once the covering step is solved. To prove correctness of our algorithm, we have also included tests using D-Wave 2X quantum processor. * [email protected] [email protected] 1 arXiv:1512.09328v3 [quant-ph] 6 Jun 2016

Upload: others

Post on 19-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Homology computation of large point clouds using quantum ... · Homology computation of large point clouds using quantum annealing Raouf Dridi and Hedayat Alghassiy 1QB Information

Homology computation of large point clouds using quantumannealing

Raouf Dridi∗ and Hedayat Alghassi†

1QB Information Technologies (1QBit)458 – 550 Burrard Street

Vancouver, British Columbia V6C 2B5, Canada

Abstract

Homology is a tool in topological data analysis which measures the shape of the data. In manycases, these measurements translate into new insights which are not available by other means.To compute homology, we rely on mathematical constructions which scale exponentially with thesize of the data. Therefore, for large point clouds, the computation is infeasible using classicalcomputers. In this paper, we present a quantum annealing pipeline for computation of homologyof large point clouds. The pipeline takes as input a graph approximating the given point cloud.It uses quantum annealing to compute a clique covering of the graph and then uses this cover toconstruct a Mayer-Vietoris complex. The pipeline terminates by performing a simplified homologycomputation of the Mayer-Vietoris complex. We have introduced three different clique coveringsand their quantum annealing formulation. Our pipeline scales polynomially in the size of the data,once the covering step is solved. To prove correctness of our algorithm, we have also includedtests using D-Wave 2X quantum processor.

[email protected][email protected]

1

arX

iv:1

512.

0932

8v3

[qu

ant-

ph]

6 J

un 2

016

Page 2: Homology computation of large point clouds using quantum ... · Homology computation of large point clouds using quantum annealing Raouf Dridi and Hedayat Alghassiy 1QB Information

1 Introduction

The abundance of data, of all sorts, represents undoubtedly an exceptional and unprecedented wealthof knowledge for humanity to benefit from. Yet, the extent of such abundance combined with theinherent complexity of the data make the deciphering and extraction of this knowledge tremendouslydifficult. Therefore a data scientist faces two non trivial challenges. First, design models and algorithmsappropriate to the complexity of the data and then leveraging them to large scales. In our opinion,the appropriate algorithmics are to be found in advanced mathematics where concepts like “correctglueing of local statistical information into a global insight” are captured, precisely defined and solved.Topological data analysis (TDA) is one of these mathematics. It uses algebraic topology; a branch ofmodern mathematics which investigates topological features through algebraic lenses. In this work,we leverage TDA algorithms to large scales using quantum annealing.

The main concept in TDA is homology which is an invariant that consists of a sequence of vectorspaces measuring the shape of a given point cloud. These measurements usually translate into valuableinsights about the data which are not available by other means. An excellent survey of TDA and itsapplications, in addition to a gentle introduction to the notions of homotopy equivalence, simplicialcomplexes and their homologies, can be found in [Car09] and [Zom12]; also we refer to [Mas91] and[Hat02] for more advanced reading. The objective of the paper is to propose and test a quantumalgorithm for computing the homology of large point clouds.

Our approach rests on Mayer-Vietoris blow-up complex [Seg68]. This approach starts, as any homologycomputation method, with approximating the given point cloud X with a simplicial complex K, as inFigure 1. Commonly used complexes are the so-called witness and Vietoris-Rips complexes, reviewedin the next section.

Figure 1: The left graph is the 1-skeleton of the witness complex of the unit torus with 100 landmarks and

100 000 data points. The corresponding witness complex is the 1-skeleton in addition to all its cliques. The

right graph is the 1-skeleton of the witness complex (with 80 landmarks) of the NKI data set ([vdVHvtV+02])

that contains 24 496 gene expressions of 295 female patients diagnosed with breast cancer.

Next, we cover the complex K with smaller subcomplexes and then blow-up their overlaps. The blow-up operation is a homotopy equivalence which implies that K and it’s Mayer-Vietoris complex havethe same homology. Clearly, the efficiency of this construction depends on the covering step.

Both Vietoris-Rips and witness complexes are clique complexes. Recall that a clique complex is anabstract complex given by the set of all cliques, of some graph (called 1-skeleton), sorted with the settheoretical inclusion. In reality, we assume that only the 1-skeleton graph G of K is given and burdenthe homology pipeline with the heavy task of constructing K, which is enumerating all cliques of G.

2

Page 3: Homology computation of large point clouds using quantum ... · Homology computation of large point clouds using quantum annealing Raouf Dridi and Hedayat Alghassiy 1QB Information

The problem is now to cover the 1-skeleton in a way which makes the subsequent computations ofthe pipeline (i.e., construction of K and homology computation of the Mayer-Vietoris complex) scale“nicely” with the size of G. In the context of quantum computation, we seek exponential speed-up. Itis important to note that, in general, a brute force graph partitioning allows only parallelization andcomputations still scale exponentially with size of the data.

We prove that the type of covering needed here is a clique based covering, that is, one needs to cover Gwith cliques. The key points are: firstly, cliques are homotopy equivalent to spheres, thus have knwonsimple topology. Secondly, a large portion of simplices of K is computed with this covering step i.e.,a large fraction of simplices are confined inside the covering cliques.

We present three constructions for a such covering: minimum edge clique cover, minimum vertex cliquecover, and an iterative method to compute a specific edge clique cover, that we call edge disjoint edgeclique cover. We express the three methods as quadratic unconstrained binary optimization (QUBO)formulations [BHT07, BH02]. We argue in the Discussion Section that our pipeline (simplicial complexconstruction and homology computation) scales polynomially in the size of the data, once the coveringstep is solved. The covering step which is a binary optimization problem, can be solved via anyoptimization oracle thus, it is solver agnostic. Although we have tested our algorithm using D-Wave2X processor, any quantum annealing processor can be utilized.

The present work is the first pipeline for homology computation using quantum annealing. A pipelinebased on the gate model, has been recently proposed in [LGZ16]. The key point there is the compressionof the simplicial complex K into a quantum state in a log2(|K|)-dimensional Hilbert space, spanned bythe simplices in K. Within this space, Betti numbers are computed in polynomial time using quantumphase estimate. It is interesting to mention that, in our paper, we also have some form of compression:the covering cliques compress a lots of simplicial data.

We usually track the dimensions of the homology spaces (called Betti numbers) over a range of values ofa persistent parameter ε (see definition next section). We compute the so-called bar codes. Meaningfulinsights persist over a long range of ε; on the contrary, noise don’t. On the other hand, a suddenchange in the bar codes might point to an outlier. Real-life applications for a such pipeline wouldbe subpopulation detection in cancer genomics, fraud detection in financial transactions ([LSL+13]),brain networks ([LCK+11]), and robot sensor networks ([dSGM05]), to name a few.

In Section 2.1, we review the witness and Vietoris-Rips complexes. In Section 2.2, we discuss Mayer-Vietoris complex and explain how the homology of this complex is computed. Section 2.3 containsour pipeline. The complexity analysis is discussed in Section 2.4. We have included tests using theD-Wave 2X processor as well as a basic description of how such quantum processors work.

3

Page 4: Homology computation of large point clouds using quantum ... · Homology computation of large point clouds using quantum annealing Raouf Dridi and Hedayat Alghassiy 1QB Information

2 Results

2.1 From data sets to simplicial complexes

In order to compute homology algorithmically one needs to map the given data set into a simplicialcomplex. Here we give two conversion maps which are commonly used in TDA: the so called Vietoris-Rips and witness complexes. Before diving into their definitions, it is helpful to give a bird’s-eye viewof the overall process. In general a mapping which assigns a simplicial complex to a data set is of theform

dataSets → simplicialComplexes (2.1)

X 7→ K

For the particular cases of Vietoris-Rips and witness complexes, the map (2.1) factors as

dataSets → Graphs→ simplicialComplexes (2.2)

X 7→ G 7→ K

where K is the clique complex of G. The computational cost of this construction is the highly non-trivial cost of enumerating all cliques in G.

The difference between Vietoris-Rips and witness complexes is in the definition of the graph G. Thegraph G, in the case of Vietoris-Rips complexes, is the neighbourhood graph: two points in the dataset are connected if their distance is less than some parameter ε > 0. In the case of witness complexes,G is defined as follows [dSC04]. Suppose we are given a set L ⊂ X, called the landmark set. For everypoint x ∈ X, we let mx denote the distance from this point to the set L, i.e., mx = minl∈L{d(x, l)}.The graph G is the graph whose vertex set is L, and where a pair {l, l′} ⊂ L is an edge if and onlyif there is a point x ∈ X (the witness) such that d(x, {l, l′}) ≤ mx + ε. Finding the “right” set oflandmarks is discussed in [dSC04].

Homology is another mapping

simplicialComplexes → AbelianGroups (2.3)

K 7→ H∗(K)

which comes after (2.2) and assigns the Abelian groups H∗(K) = {H0(K), H1(K), · · · } to the sim-plicial complex K. Assuming either witness, or Vietoris-Rips complexes are being, any homologycomputation pipeline must implement the two steps (2.2) and (2.3). In order to be efficient, thepipeline must also scale “nicely” with the size of the data set.

2.2 Mayer-Vietoris blow-up complex

We recall here the definition of Mayer-Vietoris blow-up complex and then describe its homology com-putation. For the convenience of the non-expert reader, we have presented this technical sectionthrough a simple example.

Let K be a simplicial complex and suppose C = {Ki}i∈I is a cover of K by simplicial subcomplexes

4

Page 5: Homology computation of large point clouds using quantum ... · Homology computation of large point clouds using quantum annealing Raouf Dridi and Hedayat Alghassiy 1QB Information

Ki ⊆ K, that is, K = ∪i∈IKi. For J ⊆ I, we define KJ = ∩j∈JKj. The Mayer-Vietoris blow-upcomplex ([Seg68, ZC08]) of the simplicial complex K and cover C is defined by

KC =⋃J⊆I

⋃σ∈KJ

σ × J. (2.4)

A basis for the k-chains Ck(KC) is {σ⊗ J ∈ KC| dimσ+ card J = n}. The boundary of a cell σ⊗ J is

given by: ∂(σ ⊗ J) = ∂σ ⊗ J + (-1)dimσσ ⊗ ∂J. We will not provide a proof here, but it is a fact thatthe projection KC → K is a homotopy equivalence and induces an isomorphism H∗(K

C) ' H∗(K).

The definition above boils down to the following: The simplicial complex KC is the set of “original”simplices in addition to the ones we get by blowing up common simplices. These are of the form σ⊗Jin the definition above. In Figure 2, the vertex d common to the two subcomplexes {K1, K2} is blownup into an edge v ⊗ 12 and the edge bc is blown up into the triangle-like bc ⊗ 01. In Figure 3, thevertex a common to three subcomplexes {K0, K1, K2} is blown up into the triangle a⊗ 012.

Figure 2: Top: The simplicial complex K is the depicted graph. Middle: K is covered with K0, K1,and K2. Bottom: The blow-up complex of the cover depicted in the middle image. After the blow-up,the edges b⊗ 01, c⊗ 01, d⊗ 12, and the 2-simplex bc⊗ 01 appear.

Now, the key point is that the boundary map of the simplicial complex KC (which replaces K bythe homotopy equivalence) has a block form suitable for parallel rank computation. As an example,let us consider again the simplicial complex K depicted in Figure 2. First, the space C0(K

C) isspanned by the vertices a ⊗ 0, b ⊗ 0, c ⊗ 0, b ⊗ 1, c ⊗ 1, d ⊗ 1, d ⊗ 2, e ⊗ 2. That is, all vertices ofK take into account the partition to which they belong. The space of edges C1(K

C) is spanned byab⊗ 0, bc⊗ 0, bc⊗ 1, cd⊗ 1, de⊗ 2, b⊗ 01, c⊗ 01, d⊗ 12. That is, first the “original” edges (i.e., thoseof the form σ ⊗ j, with j ∈ J = {0, 1, 2} and σ being an edge in K) are constructed. Then, the newedges that result from blow-ups (i.e., those of the form v ⊗ ij, where v is a vertex in Kj ∩Kj; if theintersection is empty the value of boundary map is 0), are constructed. The matrix ∂0 with respect tothe given ordering is then:

5

Page 6: Homology computation of large point clouds using quantum ... · Homology computation of large point clouds using quantum annealing Raouf Dridi and Hedayat Alghassiy 1QB Information

Figure 3: The triangle a⊗ 012 appears after blowing up the cover of the middle picture.

−1 0 0 0 0 0 0 01 −1 0 0 0 −1 0 00 1 0 0 0 0 −1 0

0 0 −1 0 0 1 0 00 0 1 −1 0 0 1 00 0 0 1 0 0 0 −1

0 0 0 0 −1 0 0 10 0 0 0 1 0 0 0

We see that one can now row-reduce each coloured block independently. There might be remainders,i.e., zero rows except for the intersection part. We collect all such rows in one extra matrix and row-reduce it at the end and aggregate. For the second boundary matrix, we need to determine C2(K

C).The 2-simplices are of three forms: the original ones (those of the form σ ⊗ j with σ ∈ C2(K); inthis example there are none); those of the form σ × {i, j}, with σ ∈ Ki ∩Kj; and those of the formv ⊗ {i, j, k}, with v ∈ Ki ∩Kj ∩Kk (there are none in this example but Figure 3 has one). We getC2(K

C) = 〈bc⊗ 01〉.

2.3 The quantum pipeline

This is the part of the paper where we describe our quantum pipeline which rests on Mayer-Vietorisconstruction we reviewed in the previous section. We assume that we have been given a large data setX and we have used one of the mapping procedures of Section 2.1 to map X into a graph G. The (tobe found) simplicial complex K, approximating X, is the clique complex of G.

6

Page 7: Homology computation of large point clouds using quantum ... · Homology computation of large point clouds using quantum annealing Raouf Dridi and Hedayat Alghassiy 1QB Information

The key point in our pipeline is finding an optimal cover of K,where the subcomplexes Ki are homotopyequivalent to a sphere1. This is the same requirement as covering the graph G with cliques. It isoptimal because: 1) it reduces the rank computation, we have discussed in the previous section, toonly the intersection part, which is itself minimized, 2) it minimizes the construction of the simplicialsubcomplexes Ki (a consequence of the fact that the covering cliques confine a lots of simplicial data).We formulate three different QUBOs for finding a such cover, all of them can be implemented on anyquantum annealing processor.

The outline of this part is as follows. In Section 2.3.1, we present our three different QUBOs, startingwith minimum edge clique cover which is more complete than the other two in a sense that it coversall the edges. This QUBO, however, needs many binary variables (binary variables for vertices andedges in each solution). Due to the practical limitation of qubit resources, we present two alternatives,minimum vertex clique cover (binary variables for vertices only in each solution) and an iterativeedge disjoint-edge clique cover method (only one set of binary variables for vertices). The disjoint-edge clique cover does cover all the edges but uses the quantum annealer more than once. Laterin Sections 2.3.2 and 2.3.3, we discuss the simplified simplicial complex construction from the givenset of cliques and local rank computation for these three covering methods. Complexity analysis isdetailed in the Discussion section. For completeness, we have also presented a QUBO formulation ofthe (minimum k-cut) graph partitioning (see Supplementary materials subsection 4.1). The subgraphsin graph partitioning covering do not have any specific structure, thus their local rank computationcan not be simplified and the homolgoy computation is still exponential. It also assumes K is givenwhich is problematic. We conclude in Section 2.3.4 by testing our pipeline with the only availablequantum annealing processor D-Wave 2X processor, as proof of concept and correctness.

2.3.1 Covering step

We denote by V(G) and E(G) the vertex set and edge set of G, respectively, and also define n := |V(G)|and m := |E(G)|.

1. Edge clique cover. The first method uses Edge Clique Cover (ECC), which is one of Karp’s 21 NP-complete problems ([Kar72]). The problem is to cover the set of edges of G using a given number ofcliques k. For each (clique) solution Ui , 0 ≤ i ≤ k−1, in the graph we need n binary decision variablesto represent vertices. The row vector xi = (xi1, xi2, . . . , xin) is the solution vector that indicates if thevertex vj of G belongs to the ith clique solution Ui, the xij is equal to 1; otherwise it is 0. Similarlyeach row vector ei = (ei1, ei2, . . . , eim) is the solution vector indicating if each of the m edges are inthe clique solution Ui. Let’s define a k(n+m) size binary variable vector X:

X =[x0 x1 · · · xk−1 e0 e1 · · · ek−1

]TThe edge clique cover QUBO formulation is then [Alg15]:

min XTQX

Q =

[Ik ⊗ (Jn − In) −2Ik ⊗B−2Ik ⊗BT (Jk + 3Ik)⊗ Im

]1Similar to the covering step in Cech construction but the cover will be used in Mayer-Vietoris construction instead.

7

Page 8: Homology computation of large point clouds using quantum ... · Homology computation of large point clouds using quantum annealing Raouf Dridi and Hedayat Alghassiy 1QB Information

where B is the incidence matrix of G. The matrix Ik is the k × k identity matrix and Jk is the k × kone matrix with all entries 1. Here, for simplicity, we have bounded the number of cliques that cancover an edge to two. Thus k(n+m) binary variables is required. To increase the bound on number ofoverlapping cliques (that can cover an edge), we need to add extra slack variables as many as

⌈k2

⌉m.

The smallest number of cliques that cover E(G) is called edge clique cover number or intersectionnumber θe(G). An upper bound for θe(G) is 2e2(d+ 1)2loge(n) with d = ∆(G) is the maximum degreeof complement of G [Alo86]. If we assume the graph is dense, which is the case for many hard homologyproblems, d would be small and the intersection number is on the order of log(n).

2. Vertex clique cover. An approach with lower number of variables, is to compute a Vertex CliqueCover (VCC) for the graph G. This problem, which is also among the Karp’s 21 NP-complete problems[Kar72], consists of covering the vertex set with cliques such that each vertex is assigned to a uniqueclique. It can be translated into a vertex colouring problem where we cover G with a given number ofcolours such that no edge connects two vertices of the same colour. Let A be the adjacency matrix ofthe graph G, and X denotes:

X =[x0 x1 · · · xk−1

]Tthe QUBO problem formulation of the vertex clique cover is ([Alg15]):

min XT (Qmain + αQorth)XQmain = Ik ⊗ (Jn − In −A)

Qorth = (Jk − 2Ik)⊗ In

where k, the number of cliques in the problem, is chosen greater than or equal to the clique coveringnumber of G, θ(G) (equal to χ(G), the chromatic number of G). An upper bound for θ(G) is d = δ(G),which is the minimum degree of graph G (Brooks’ theorem [Bro41]).

3. Edge disjoint-edge clique cover. The third clique covering method is a variation of the edge cliquecover, we call it Edge Disjoint-Edge Clique Cover (ED-ECC). Here, the covering subgraphs intersectonly at vertices. The algorithm takes as input the graph G and a stopping criterion. The idea isto iteratively find the maximum clique and each time remove the clique edges from the graph of theprevious iteration. Each run gives one maximum clique. At step i, we get a new graph Gi withadjacency matrix Ai. We stop when the clique is small (stopping criterion 1) or after a certain numberof cliques computation (stopping criterion 2). The QUBO formulation for finding the maximum clique(at iteration i) is

min xT (A(i) − In)x,

where x = (x1, . . . , xn)T , A(i) is the updated adjacency matrix at step i, and n is the dimension

of A(i). The adjacency matrix of the maximum clique is then C(i) = x(i)x(i)T ◦ (Jn− In), where ◦ is theHadamard product and x(i) is the solution of the QUBO problem at iteration i. There is an obviousgain in terms of the size of the problems that we can handle. Indeed, the number of variables involvedhere is only n, making this covering method more practical considering current limitation on the sizeof quantum annealing processor.

2.3.2 Construction of the Mayer-Vietoris complex

We describe now how the Mayer-Vietoris complex is constructed for the three different covers wepresented above. A large number of the simplices are confined inside the covering cliques. One

8

Page 9: Homology computation of large point clouds using quantum ... · Homology computation of large point clouds using quantum annealing Raouf Dridi and Hedayat Alghassiy 1QB Information

needs, however, to find the few remaining simplices outside these covering cliques, depending on thecovering method. The set of these remaining simplices will be the last subcomplex Kk in the coverK = ∪i∈{0,··· ,k} {Ki}. The subcomplexes Ki, for 0 ≤ i ≤ k − 1, are the power set of the clique Ui.

For the edge clique cover (and Edge disjoint-edge clique cover), the last subcomplex Kk is the cliquecomplex of the subgraph Uk defined as follows. Its vertex set V(Uk) is the set of vertices inside thepair-wise intersections between the covering cliques {Ui}i=0...k−1. The edge set E(Uk) is the restrictionof E(G) to V(Uk). For the vertex clique cover, the last subcomplex Kk is the clique complex of thegraph Uk whose vertex set V(Uk) is the set of all the vertices of the connecting edges and its set ofedges E(Uk) is the restriction of E(G) to V(Uk).

The complexity of this step is dictated by the size of the intersections; since one needs to connect thevertices in these intersections. This is the same as analyzing the size of the matrices Bi we introducenext.

2.3.3 The rank calculation

As described in Section 2.2 the boundary matrix of the Mayer-Vietoris complex, independently of thenature of the cover we choose, has the form

A0 0 · · · 0 B00 A1 · · · 0 B1...

... · · · ......

0 0 · · · Ak Bk

(2.5)

where the matrix Ai is the boundary matrix of the subcomplex Ki. In general, this will not yield anyspeed-up and it only allows parallelization of the computation. The situation changes substantiallyusing one of the clique based covers above. The fact that each of the Ki for i ∈ {0, · · · , k − 1} ishomotopy equivalent to a sphere, results in a reduced rank computation which scales polynomially withthe size of the graph G (See Discussion). For i ∈ {0, · · · , k−1}, rankAi = rank ∂i` =

∑α=0...`(-1)`−α

(mα

)with m being the size of the clique Ui. Additionally, the passage matrix Pi, which makes Ai uppertriangular (AiPi is upper triangular) is also known. To find the remainders (see Section 2.2) we let ri =rankAi be the precomputed rank of Ai. The remainder is then given by the product Bi[ri + 1, end]Pi,where Bi[ri + 1, end] is the submatrix of Bi containing the rows ri + 1 downward.

In the case of disjoint edge clique cover, by construction, the covering subcomplexes {Ki}i=0...k intersectonly at vertices (however, a vertex can be blown up into a high-dimensional simplex if it belongs toseveral covering subgraphs). This translates into a considerable reduction in the size of the matrices Biwhich now involve only simplices of the form v ⊗ J , where v is a vertex in G and J is the set of allsubcomplexes containing the vertex v. Obviously, the complexity of the computation is defined by thedimension of the submatrices Bi (See Discussion Section).

2.3.4 Implementation on D-Wave quantum processor

We have tested our algorithm on the D-Wave 2X machine over many instances of the solid torus, as aproof of concept. Here, we report some statistics. Due to the embedding limitations of the D-Wave 2Xprocessor (Appendix A.2), some of the instances were not successfully embedded into the processor,

9

Page 10: Homology computation of large point clouds using quantum ... · Homology computation of large point clouds using quantum annealing Raouf Dridi and Hedayat Alghassiy 1QB Information

thus we could not calculate their Betti numbers. Columns of tables below represent 1) n the numberof vertices, 2) m the number of edges, 3) D the density of the graph , 4) the intersection number:θe(G) in ECC and θ(G) in VCC, 5) |P | the problem size (i.e., the number of binary variables in theQUBO), 6) Embed the embedding and solving status and 7) Betti the Betti number calculated status.The samples are sorted based on number of vertices and problem size, so the reader can see the borderof embeddable graphs for each method. Since some of the QUBO’s are more sparse than others, theycan be embedded in higher problem sizes.

ECC

n m D θe |P | Embed Betti6 13 0.87 4 76

√ √

8 12 0.43 4 80√ √

8 20 0.71 4 112 × NA10 15 0.33 5 125 × NA12 24 0.30 4 144 × NA

VCC

n m D θ |P | Embed Betti12 36 0.55 3 36

√ √

12 24 0.36 4 48√ √

15 45 0.43 4 60√ √

16 40 0.33 4 64 × NA16 56 0.47 4 64 × NA

ED−ECC

n m D |P | Embed Betti40 580 0.75 40

√ √

50 725 0.59 50√ √

60 1320 0.75 60√ √

70 1435 0.59 70 × NA80 1560 0.47 80 × NA

For all instances that the problem was successfully embedded into D-Wave 2X processor, our algorithmsuccessfully calculated the Betti numbers. Note that we only observed the minimum energy solution,among many reads of each annealing process, since our task is to prove the concept. The reader shouldalso note that, these tests only show the correctness of each method’s implemented algorithm. Theperformance comparison and scaling characteristics of discussed algorithms cannot be evaluated withcurrent size of quantum annealing processor.

3 Discussion

In this paper we have discussed how quantum annealing can be used to speed-up the homology compu-tation of large point clouds and presented proof-of-concept tests using the newest D-Wave 2X quantumprocessor. Additionally, we have presented our work as a complete data mining pipeline.

Our pipeline is dedicated to dense graphs; sparse cases should be treated classically. Clearly, thecomplexity of the pipeline is defined by the dimension of the submatrices Bi of the boundary mapmatrix (2.5). This dimension is given by the number of the blownup simplices, that is, simplices of theform σ ⊗ J with |J | > 1. Precisely, to compute the lth Betti number β`, we count all ` + 1 simplicesof the form σ ⊗ J , such that dim(σ) + |J | − 1 = `+ 1 in addition to |J | > 1. For the vertex and edgecover, the intersection between the subgraphs {Uj}j=0..k is always a clique. The column dimension

10

Page 11: Homology computation of large point clouds using quantum ... · Homology computation of large point clouds using quantum annealing Raouf Dridi and Hedayat Alghassiy 1QB Information

of Bi, needed for β`, would be then∑

i=0..`

`+ 1− i

)(κ

i+ 2

)with ω := max {| ∩j Uj|} the size of

the maximum intersection and κ is the maximum number of subgraph Uj with non empty overlap(i.e., the size of the maximum simplex in the nerve complex N{Uj}). Obviously, κ is less than θe(G)which, for dense graph, is in the order of O(log(n)) in the edge clique cover case. If κ is very bigthen the subgraphs Uj are small and thus ` is small (β` is also the Betti number of the initial witnesscomplex K and thus ` is limited by the size of the maximum subgraph). If ω is very big then thegraph is covered with only a small number of cliques. Almost all other cliques are confined inside thesecovering cliques and thus in the image of the boundary map. This implies that, after we mod out bythe image of the boundary map, we dont have enough simplices to bound high dimensional voids andthus ` is small. The conclusion of this is that our algorithm, using edge clique cover, is polynomial intime for graphs in which κ and ω are not very big. These are the type of graphs which are intractableclassically. For vertex clique, the argument and conclusion are the same with replacing θe(G) withθ(G). By Brooks’ theorem, θ(G) is bounded by the degree of the complement graph. And since thecomplement is sparse, θ(G) is small. For edge disjoint edge clique covering, the column dimensionof Bi reduces to ω× ν with ν := card{J ⊂ {0, · · · , k} | |J | = `+ 2, ∩j∈JCj 6= φ}, since the intersectionis forced to be at the vertex level only. In this case, our algorithm takes O(ω2 × ν2) i.e., polynomialin the size of the graph since ω < n.

11

Page 12: Homology computation of large point clouds using quantum ... · Homology computation of large point clouds using quantum annealing Raouf Dridi and Hedayat Alghassiy 1QB Information

4 Supplementary materials

4.1 Partitioning based parallelization of the homology computation

Partitioning the graph G using the minimum k-cut will not yield any significant speed-up and thecomputation is still exponentially expensive, it only allows parallelization. Additionally, it assumesthe simplicial complex K given which is problematic. We present it here for completeness (since it hasbeen discussed in [LZ14], where METIS library [KK98] is used).

The minimum k-cut is another NP-problem in Karp’s 21 list ([Kar72]). It consists of partitioning thevertex set of G into k non-empty and fixed-sized subsets so that the total weight of edges connectingdistinct subsets is minimized. The QUBO formulation of the minimum k-cut problem is formulatedin ([FP10]) (see also[Alg15]). For each partition Ui in the graph, for 0 ≤ i ≤ k − 1, we need k binarydecision vectors of size n. The row vector xi = (xi1, xi2, . . . , xin) is part of the solution that indicatesthat if the vertex vj of G belongs to the ith partition, the xij is equal to 1; otherwise it is 0. Let Xdenotes:

X =[x0 x1 · · · xk−1

]TThe minimum k-cut graph partitioning problem is then the QUBO:

min XT (Qmain + αQorth + βQcard)XQmain = −Ik ⊗A

Qorth = (Jk − 2Ik)⊗ In

Qcard = Ik ⊗ (Jn − 2savIn)

The matrix Ik is the k × k identity matrix, Jk is the k × k one matrix with all entries 1, sav =12

(⌈nk

⌉+⌊nk

⌋)represents the average size (cardinality) of partitions. Also α and β are the orthogonality

and cardinality constraint balancing factors.

Once we pass this QUBO to the quantum annealer we obtain subgraphs {Ui}0≤i≤k−1 which, in additionto an extra subgraph containing all of the edges between them (now minimized), define a covering forG. To execute the parallel computation, we need to complete this graph covering into a cover of K interms of subcomplexes for which we have K = ∪k0Ki. For this, we assign the subcomplex Ki to thesimplex σ ∈ K if its vertices belong to the subgraph Ui. Otherwise, the simplex is put in the extracover Kk. Similar to the tables in Section 2.3.4, the table below shows some statistics for the graphpartitioning.

n m D k |P | Embed Betti12 24 0.36 4 48

√ √

12 36 0.55 4 48√ √

12 48 0.73 4 48√ √

16 40 0.33 4 64 × NA16 56 0.47 4 64 × NA

12

Page 13: Homology computation of large point clouds using quantum ... · Homology computation of large point clouds using quantum annealing Raouf Dridi and Hedayat Alghassiy 1QB Information

4.2 Quantum annealing

Here we introduce the quantum annealing concept that ultimately solves a general Ising (QUBO)problem, then talk about the important topic of embedding a QUBO problem into the specific quantumannealer (D-Wave 2X processor).

Quantum annealing (QA), along with the D-Wave processor, have been the focus of much research.We refer the interested reader to [JAG+11, CCD15, BAS+13, BRI+14, LPS+14]. QA is a paradigmdesigned to find the ground state of systems of interacting spins represented by a time-evolving Hamil-tonian:

S(s) = E(s)HP −1

2

∑i

∆(s)σxi ,

HP = −∑i

hiσxi +

∑i<j

Jijσzi σ

zj .

The parameters hi and Jij encode the particular QUBO problem P into its Ising formulation. QA isperformed by first setting ∆ � E , which results in a ground state into which the spins can be easilyinitialized. Then ∆ is slowly reduced and E is increased until E � ∆. At this point the system isdominated by HP , which encodes the optimization problem. Thus the ground state represents thesolution to the optimization problem.

4.3 Embedding

An embedding is the mapping of the nodes of an input graph to the nodes of the destination graph.The graph representing the problem’s QUBO matrix needs to be embedded into the actual physicalqubits on the processor in order for it to solve the QUBO problem. The specific existing connectivitypattern of qubits in the D-Wave chip is called the Chimera graph. Embedding an input graph (aQUBO problem graph) into the hardware graph (the Chimera graph) is in general NP-hard ([Cho08]).

Figure 4 shows a sample embedding into the Chimera graph of the D-Wave 2X chip consisting of an12× 12 lattice of 4× 4 bipartite blocks. The Chimera graph is structured so that the vertical andhorizontal couplers in its lattice are connected only to either side of each bipartite block. Each nodein this graph represents one qubit and each edge represents a coupling between two qubits. Adjacentnodes in the Chimera graph can be grouped together to form new effective (i.e., logical) nodes, creatingnodes of a higher degree. Such a grouping is performed on the processor by setting the coupler betweentwo qubits to a large negative value, forcing two Ising spins to align such that the two qubits end upwith the same values. These effective qubits are expected to behave identically and remain in thesame binary state at the time of measurement. The act of grouping adjacent qubits (hence formingnew effective qubits) is called chain creation or identification.

An embedding strategy consists of two tasks: mapping and identification. Mapping is the assignmentof the nodes of the input graph to the single or effective nodes of the destination graph. Solving suchproblems optimally is in general NP-hard, but one can devise various approximations and enhancementstrategies to overcome these difficulties, for example, using statistical search methods like simulatedannealing, structure-based methods, or a combination of both. For a better understanding of currentembedding approaches, we refer the reader to [Cho08], [BCI+14], [JWA14], [TAA15]. In Figure 4

13

Page 14: Homology computation of large point clouds using quantum ... · Homology computation of large point clouds using quantum annealing Raouf Dridi and Hedayat Alghassiy 1QB Information

Figure 4: Top: A sample 1-skeleton, consists of five cliques of size 14 overlapping on five nodes.Bottom: The actual embedding of the corresponding edge disjoint edge clique cover problem insidethe current D-Wave 2X Chimera graph. The QUBO problem has 50 variables, 280 quadratic termsused to map the problem and 586 quadratic terms used to map chains. The colouring of the nodes(and edges, respectively) represents the h parameter values (the J values, respectively) according tothe colouring scheme on the interval [-1, 1] represented by a colour interval [blue, red] in the D-Wave2X processor API.

14

Page 15: Homology computation of large point clouds using quantum ... · Homology computation of large point clouds using quantum annealing Raouf Dridi and Hedayat Alghassiy 1QB Information

(bottom), the blue lines indicate the identified couplers, the yellow lines indicates the problem couplers(i.e., the edges of the problem graph), and the grey lines indicate empty couplers.

Acknowledgements

We would like to thank M. Elsheikh for constructive discussions about the parallelization, S. S. Rezaeifor his helpful comments, P. Haghnegahdar, J. Oberoi and P. Ronagh for helpful discussions on thetopic, and M. Bucyk for proof reading the manuscript.

References

[Alg15] Hedayat Alghassi, The algebraic QUBO design framework, Preprint (2015).

[Alo86] Noga Alon, Covering graphs by the minimum number of equivalence relations, Combi-natorica 6 (1986), no. 3, 201–206.

[BAS+13] Sergio Boixo, Tameem Albash, Federico M. Spedalieri, Nicholas Chancellor, andDaniel A. Lidar, Experimental signature of programmable quantum annealing, NatCommun 4 (2013).

[BCI+14] Zhengbing Bian, Fabian Chudak, Robert Israel, Brad Lackey, William G Macready,and Aidan Roy, Discrete optimization using quantum annealing on sparse ising models,Frontiers in Physics 2 (2014), no. 56.

[BH02] Endre Boros and Peter L. Hammer, Pseudo-boolean optimization, Discrete AppliedMathematics 123 (2002), no. 1-3, 155 – 225.

[BHT07] Endre Boros, PeterL. Hammer, and Gabriel Tavares, Local search heuristics forquadratic unconstrained binary optimization (qubo), Journal of Heuristics 13 (2007),no. 2, 99–132 (English).

[BRI+14] Sergio Boixo, Troels F. Ronnow, Sergei V. Isakov, Zhihui Wang, David Wecker,Daniel A. Lidar, John M. Martinis, and Matthias Troyer, Evidence for quantum an-nealing with more than one hundred qubits, Nat Phys 10 (2014), no. 3, 218–224.

[Bro41] R. L. Brooks, On colouring the nodes of a network, Mathematical Proceedings of theCambridge Philosophical Society 37 (1941), 194–197.

[Car09] Gunnar Carlsson, Topology and data., Bull. Am. Math. Soc., New Ser. 46 (2009), no. 2,255–308 (English).

[CCD15] Cristian S. Calude, Elena Calude, and Michael J. Dinneen, Guest column: Adiabaticquantum computing challenges, SIGACT News 46 (2015), no. 1, 40–61.

[Cho08] Vicky Choi, Minor-embedding in adiabatic quantum computation: I. the parametersetting problem, Quantum Information Processing 7 (2008), no. 5, 193–209.

15

Page 16: Homology computation of large point clouds using quantum ... · Homology computation of large point clouds using quantum annealing Raouf Dridi and Hedayat Alghassiy 1QB Information

[dSC04] Vin de Silva and Gunnar Carlsson, Topological estimation using witness complexes,Proceedings of the First Eurographics Conference on Point-Based Graphics (Aire-la-Ville, Switzerland, Switzerland), SPBG’04, Eurographics Association, 2004, pp. 157–166.

[dSGM05] Vin de Silva, Robert Ghrist, and Abubakr Muhammad, Blind swarms for coveragein 2-d., Robotics: Science and Systems (Sebastian Thrun, Gaurav S. Sukhatme, andStefan Schaal, eds.), The MIT Press, 2005, pp. 335–342.

[FP10] Neng Fan and Panos M. Pardalos, Linear and quadratic programming approaches forthe general graph partitioning problem, Journal of Global Optimization 48 (2010),no. 1, 57–71 (English).

[Hat02] Allen Hatcher, Algebraic topology., Cambridge: Cambridge University Press, 2002 (En-glish).

[JAG+11] M. W. Johnson, M. H. S. Amin, S. Gildert, T. Lanting, F. Hamze, N. Dickson, R. Har-ris, A. J. Berkley, J. Johansson, P. Bunyk, E. M. Chapple, C. Enderud, J. P. Hilton,K. Karimi, E. Ladizinsky, N. Ladizinsky, T. Oh, I. Perminov, C. Rich, M. C. Thom,E. Tolkacheva, C. J. S. Truncik, S. Uchaikin, J. Wang, B. Wilson, and G. Rose, Quan-tum annealing with manufactured spins, Nature 473 (2011), no. 7346, 194–198.

[JWA14] Cai Jun, G. Macready William, and Roy Aidan, A practical heuristic for finding graphminors, Preprint arXiv:1406.2741 (2014).

[Kar72] Richard M. Karp, Reducibility among combinatorial problems, Complexity of ComputerComputations, ed. R.E. Miller, J.W. Thatcher and J.D. Bohlinger, 1972, pp. 85–103.

[KK98] George Karypis and Vipin Kumar, A fast and high quality multilevel scheme for par-titioning irregular graphs, SIAM J. Sci. Comput. 20 (1998), no. 1, 359–392.

[LCK+11] Hyekyoung Lee, M.K. Chung, Hyejin Kang, Bung-Nyun Kim, and Dong Soo Lee,Discriminative persistent homology of brain networks, Biomedical Imaging: From Nanoto Macro, 2011 IEEE International Symposium on, 2011, pp. 841–844.

[LGZ16] Seth Lloyd, Silvano Garnerone, and Paolo Zanardi, Quantum algorithms for topologicaland geometric analysis of data, Nat Commun 7 (2016).

[LPS+14] T. Lanting, A. J. Przybysz, A. Yu. Smirnov, F. M. Spedalieri, M. H. Amin, A. J.Berkley, R. Harris, F. Altomare, S. Boixo, P. Bunyk, N. Dickson, C. Enderud, J. P.Hilton, E. Hoskinson, M. W. Johnson, E. Ladizinsky, N. Ladizinsky, R. Neufeld, T. Oh,I. Perminov, C. Rich, M. C. Thom, E. Tolkacheva, S. Uchaikin, A. B. Wilson, andG. Rose, Entanglement in a quantum annealing processor, Phys. Rev. X 4 (2014),021041.

[LSL+13] P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan,J. Carlsson, and G. Carlsson, Extracting insights from the shape of complex data usingtopology, Scientific Reports 3 (2013), 1236 EP –.

16

Page 17: Homology computation of large point clouds using quantum ... · Homology computation of large point clouds using quantum annealing Raouf Dridi and Hedayat Alghassiy 1QB Information

[LZ14] Ryan Lewis and Afra Zomorodian, Multicore homology via mayer vietoris,arxiv.org/abs/1407.2275 (2014).

[Mas91] William S. Massey, A basic course in algebraic topology, Graduate Texts in Mathemat-ics, vol. 127, Springer-Verlag, New York, 1991. MR 1095046 (92c:55001)

[Seg68] Graeme Segal, Equivariant k-theory, Publications Mathematiques de l’Institut desHautes Etudes Scientifiques 34 (1968), no. 1, 129–151.

[TAA15] Boothby Tomas, D. King Andrew, and Roy Aidan, Fast clique minor generation inchimera qubit connectivity graphs, Preprint arXiv:1507.04774 (2015).

[vdVHvtV+02] Marc J. van de Vijver, Yudong D. He, Laura J. van ’t Veer, Hongyue Dai, Augusti-nus A.M. Hart, Dorien W. Voskuil, George J. Schreiber, Johannes L. Peterse, ChrisRoberts, Matthew J. Marton, Mark Parrish, Douwe Atsma, Anke Witteveen, An-nuska Glas, Leonie Delahaye, Tony van der Velde, Harry Bartelink, Sjoerd Rodenhuis,Emiel T. Rutgers, Stephen H. Friend, and Rene Bernards, A gene-expression signa-ture as a predictor of survival in breast cancer, New England Journal of Medicine 347(2002), no. 25, 1999–2009.

[ZC08] Afra Zomorodian and Gunnar Carlsson, Localized homology, Computational Geometry41 (2008), no. 3, 126 – 148.

[Zom12] Afra Zomorodian, Advances in applied and computational topology, American Mathe-matical Society, Boston, MA, USA, 2012.

17