quickcsg: fast arbitrary boolean combinations of n solids

HAL Id: hal-01587902https://hal.inria.fr/hal-01587902

Submitted on 14 Sep 2017

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Distributed under a Creative Commons Attribution - NonCommercial - NoDerivatives| 4.0International License

QuickCSG: Fast Arbitrary Boolean Combinations of NSolids

Matthijs Douze, Jean-Sébastien Franco, Bruno Raffin

To cite this version:Matthijs Douze, Jean-Sébastien Franco, Bruno Raffin. QuickCSG: Fast Arbitrary Boolean Combina-tions of N Solids. [Research Report] ArXiv. 2017. �hal-01587902�

https://hal.inria.fr/hal-01587902

http://creativecommons.org/licenses/by-nc-nd/4.0/



https://hal.archives-ouvertes.fr

QuickCSG: Fast Arbitrary Boolean Combinations of N Solids

Matthijs Douze, Jean-Sebastien Franco, Bruno Ra�n

June 7, 2017

Figure 1: Intersection of 6 Buddhas with the union of 100,000 spheres (total 24 million triangles). Computed in8 seconds on a desktop machine.

Abstract

QuickCSG computes the result for general N-polyhedronboolean expressions without an intermediate tree ofsolids. We propose a vertex-centric view of the prob-lem, which simplifies the identification of final geometriccontributions, and facilitates its spatial decomposition.The problem is then cast in a single KD-tree exploration,geared toward the result by early pruning of any regionof space not contributing to the final surface. We as-sume strong regularity properties on the input meshesand that they are in general position. This simplify-ing assumption, in combination with our vertex-centricapproach, improves the speed of the approach. Com-plemented with a task-stealing parallelization, the algo-rithm achieves breakthrough performance, one to twoorders of magnitude speedups with respect to state-of-the-art CPU algorithms, on boolean operations over twoto dozens of polyhedra. The algorithm also outperformsGPU implementations with approximate discretizations,while producing an output without redundant facets.Despite the restrictive assumptions on the input, weshow the usefulness of QuickCSG for applications withlarge CSG problems and strong temporal constraints,e.g. modeling for 3D printers, reconstruction from vi-sual hulls and collision detection.

1 Introduction

Solid modeling using boolean operations is an emblem-atic problem in computer graphics and computationalgeometry, almost as old as these research topics them-selves. It has found its way in every solid modeler inthe industry, whether applied to model design for avia-tion, transportation, manufacturing, architecture, or en-tertainment. It is also an ubiquitous building block andsubject of interest for many fields of research, includingcomputer graphics, computer vision, robotics, virtual re-ality, and generally any topic where geometric models ofsubjects of interest are to be manipulated, constructed,truncated or combined.

Since the first introduction of boundary representa-tions (B-Rep) Baumgart (1974), the problem has re-ceived considerable attention and been the subject ofextensive work over more than 40 years. It is all themore striking that, despite the many existing algorithmsand variants in this huge corpus, the vast majority of al-gorithms rely on a common principle and canvas found inthe earliest formalizations of the problem Requicha andVoelcker (1985); Laidlaw et al. (1986), summarized here-after. First and foremost, boolean B-Rep merging algo-rithms are written for the case of two solids. Second,the computation is divided in three stages: an initialsubdivision stage, where the boundaries of both objects

1

arX

iv:1

706.

0155

8v1

[cs.G

R] 5

Jun

2017

are split in two component groups along their intersec-tion with the other object’s boundary. A classificationstage follows, where each group is classified as belonginginside or outside the other object. In the final recon-struction stage, the relevant primitives are gathered andconnected to build the final model in accordance to theboolean expression. Note that the subdivision and clas-sification require to intersect and situate all primitivesof an object’s boundary with respect to the primitivesof the other object’s boundary, which if done naivelyleads to impractical quadratic-time algorithms. Thus,a third aspect of most algorithms is the use of spatialdecomposition structures, often hierarchical, to enablesublinear O(logm) access to each of the m object prim-itives. The construction of this data structure becomesthe bottleneck of the algorithm, giving it its typicallyquasilinear time complexity O(m logm) in the numberof input primitives Naylor et al. (1990); Hachenbergeret al. (2007). An inherent drawback of this dominantapproach is that all input solid’s primitives are fully de-composed, but typically only a fraction of those primi-tives contribute to the output result; the time spent com-puting hierarchical decompositions for non-contributingprimitives is thus useless and can be eliminated, as wewill show.

The more general case of n input solids is classicallyaddressed by combining pairwise operations in arbitraryboolean expressions. The approach has been formalizedas Constructive Solid Geometry (CSG) Requicha (1980);Mantyla (1987), where these expressions map to CSG-trees of boolean operations. Evaluating the result forB-rep solids then relies on combining two boundaries ateach node of the tree using an existing two-solid booleanalgorithm. This approach however has some significantdrawbacks, as they involve computing and storing a setof intermediate results, which are then recombined un-til the tree expression is fully resolved. The successiverecombinations can be error prone, in fact leading toinherently degenerate situations for some expressions.Figure 2 shows for example all possible boolean func-tion outcomes for the case of three solids and illustratessuch cases. Another inherent limitation is that certainexpressions lead to combinatorial size trees which areimpractial to compute. Consider for example identify-ing the solid whose volume is the intersection of at leastk solids, an operation later referred as min�k: this oper-ation involves computing the union of all possible inter-sections over k or more solids, yielding a combinatorialCSG tree size.

In this paper, we challenge these dominant views ofboolean modeling to eliminate these drawbacks. First,our algorithm directly computes the result of arbitraryn-ary boolean expressions, avoiding the overhead and in-termediate results of binary CSG-tree approaches. Thisis achieved by directly identifying vertices of the finalpolyhedron among all relevant vertex candidates, withan e�cient vertex classification test using bitvector eval-uations of the n-ary CSG function.

Second, our algorithm performs the classification andsubdivision stages simultaneously: as opposed to exist-ing approaches, our hierarchical decomposition and ex-

ploration is performed on the combined set of all inputsolids primitives and guided by their participation to thefinal solid B-rep.

Both key aspects of our algorithm are made possibleby embedding all input solids in a single KD-tree explo-ration. The algorithm is able to retrieve the final resultdirectly because each new KD-tree node explored is clas-sified as soon as it is created, by inferring whether itscontents is completely inside or outside the final solid,or if it may participate to the final solid boundary in-stead. The KD-tree is only subdivided in the latter case,pruning large sets of primitives that do not need addi-tional work, and focusing all computational e↵ort andrefinement on those space cells containing intersectingprimitives participating to the final result.

Consequently, the depth and size of our KD-tree de-pends on the output model size instead of input modelsize, leading to gains in time and space complexity thatgrow with the number of input solids and the input-to-output primitive ratio. This means that our algorithmalso outperforms the traditional two-solid boolean algo-rithms as soon as the result size is smaller than the inputsize. Finally, as our KD-tree decomposes space into non-intersecting cells that can be processed independently,the classification stage and subsequent subdivision andreconstruction stages naturally lend themselves to paral-lel evaluation, further substantiating the temporal gainover all state-of-the-art polyhedral boolean evaluation al-gorithms tested, including recent GPU implementations.

Contribution summary.

• Complexity: we improve asymptotic upper boundsfor running time and memory consumption, bothanalytically and experimentally, with a simple andparsimonious output-sensitive algorithm;

• Any boolean operation: QuickCSG directly com-putes the result of arbitrary boolean expressionsover a set of input solids, certain classes of whichcannot be realistically computed with existing bi-nary boolean approaches;

• Applications on practical use cases: solid modelingfor 3D printing, collision detection, and silhouette-based 3D reconstruction from visual inputs;

• Benchmark: we introduce a benchmark comprisedof several dozen datasets, made available to the re-search community with our implementation.

The paper is organized as follows. We first reviewstate of the art of boolean operations (Section 2), thenpropose a formalization of the N-solid boolean problem(Section 3). We then explain how vertices (Section 4)and facets (Section 5) of the final polyhedron can bedirectly identified. We show how the problem can be castas a KD-tree divide-and-conquer exploration (Section 6),and how the algorithm and this exploration in particularcan be performed in parallel (Section 7). We validatethe algorithm experimentally in Section 8 and discussapplications in Section 9.

2

Figure 2: There are 223

= 256 possible boolean functions of three inputs, of which we show 128, applied to3D cuboids (the 128 other ones are the same with inside and outside flipped). In our figures, each input solid isassigned a color, which is inherited by the output facets it contributed to the result. Some functions cannot becomputed from binary CSG operations without producing degeneracies. For example, the 2nd solid in readingorder represents the masking operation (P

1

\P2

)[ (P3

\P2

) with P2

in green, P3

in red and P1

in blue. The unionoperation in this expression is degenerate because facets of P

2

appear on both sides.

2 Related Work

2.1 Boolean Solid Modelling Back-ground

In the 1970’s, boolean solid modeling and boundaryrepresentations (B-Reps) have been simultaneously pio-neered in the context of computer graphics Braid (1975)and computer vision Baumgart (1974). Both proposeddiscrete representations of solid boundaries as a con-junction of simpler polygonal primitives, either wingededges (Baumgart) or loops (Braid). While many of theideas are already present in Braid’s work, the idea thatsolids could be specified as a tree of boolean operations(Constructive Solid Geometry or CSG) was theorizedby Requicha (1980), and various practical implementa-tions proposed for polyhedral boundaries Requicha andVoelcker (1985); Laidlaw et al. (1986), in particular set-ting the standard for the aforementioned 3-stage can-vas. Robustness was by then identified as a recurringissue, due to lack of formal description of degeneratesolid configurations, and numerical computation error innear-coincident situations. Most algorithms thereafter,including industrial implementations, thus conform toa set-based formalism with algebraically closed regular-ized boolean set operations Requicha (1977), ensuringresults exclude any non-volume enclosing (dangling) sur-face primitives. Several works also took on the task ofpainstakingly accounting for all degenerate relative con-figurations of solid primitives Ho↵mann (1989); Mantyla(1987), leading to tedious algorithm descriptions. Theynotably formalize the B-Rep primitive hierarchy as ver-

tex, edges, faces and shells, and the two-polyhedra in-tersections and degeneracy cases as arising from thepossible intersection combination of each primitive typeof solid A to each primitive type of solid B. To avoidthe complete enumeration, many implementations focuson generic triangle-to-triangle or polygon-to-polygon astheir central intersection unit. Even with this simplifi-cation, the complexity of dealing with all cases is knownto yield unreliable implementations, including in com-mercial software, as reported in various test cases Wang(2011); Feito et al. (2013). Our algorithm has a signifi-cantly simplified core that focuses all classification andsubdivision e↵orts on producing the final output ver-tices, excluding higher order primitives or intermediatevertices. The final reconstruction stage operates on thesevertices, identifying the topologically correct final edgesand loops through a posteriori logical vertex-to-vertexreconnections. This improves both the clarity and reg-ularity of the proposed algorithm, and paves the wayfor tackling the otherwise una↵ordable exact topologyretrieval in the general N-polyhedron case.

Robustness has remained a dominant issue, withvarious solutions proposed reviewed by e.g. Ho↵mann(2001); Li et al. (2004), such as geometric predicate anal-ysis and fixed or arbitrary precision exact arithmetics.This e↵ort has culminated with Hachenberger’s work onCGAL Hachenberger et al. (2007), which uses arbitraryprecision arithmetic, with the particularity that it fol-lows Nef’s formalism Nef (1978) instead of regularizedbooleans Requicha (1977), i.e. it explicitly representsdangling primitives. While now standing out as a ref-erence implementation of the research community, it is

3

notoriously slow, and as most exact schemes, tedious tore-implement, leading to a somewhat paradoxical status:while the perception of the community is that the poly-hedral B-Rep boolean problem is solved, free and com-mercial code is still being crafted and distributed usingfragile but fast and memory-e�cient geometric predicateevaluations.

Most new contributions in this area are focusing onspeeding up or easing the implementation of variousalgorithmic work cases of the classic two-polyhedronboolean algorithms, with e.g. new specialized data struc-tures Campen and Kobbelt (2010), faster exact arith-metic types Bernstein and Fussell (2009), optimizationfor particular inputs such as triangular meshes Feitoet al. (2013) or polyhedral cones of arbitrary basis Francoand Boyer (2009). Each implementation relies on spe-cific and non-optimal tradeo↵s between implementationcomplexity, speed, memory footprint, input genericity,robustness (or lack thereof). We simultaneously im-prove over all problematic aspects: our greedy pruningscheme eliminates the need to compute any intermedi-ate geometry and thus improves the complexity, robust-ness, memory footprint and execution time while en-abling multi-arity boolean operations on n polyhedra,including but not limited to binary boolean trees. Asthis result strongly relies on the careful use of hierar-chical subdivision structures, we specifically review thisaspect of prior art in the following section.

2.2 Subdivision Structures for E�cientComputation

Because of the need for e�cient subdivision and classi-fication stages in the algorithm, a substantial researche↵ort has been devoted to hierarchical structures in thecontext of boolean solid modelling. Some of the earliestaxis-aligned plane-separation structures in this contextare the polytrees Carlbom (1987) and extended octrees,which embed the polyhedral B-Rep primitives in theirnodes Brunet and Navazo (1990). Binary space par-titions (BSP) of polyhedral B-Reps were devised as away to more e�ciently store the polyhedron, where sep-arating planes are based on input facets Thibault andNaylor (1987). The most common strategy to computeboolean combinations of two solids with these represen-tations is to perform simultaneous traversal of both hi-erarchies to isolate intersecting primitives Brunet andNavazo (1990), or similarly compute a merged BSP treeitself representing the result Naylor et al. (1990). Re-cent reference implementations continue to use variantsof these seminal approaches, e.g. CGAL uses KD-treesas accelerated axis-aligned plane separating search struc-tures Hachenberger et al. (2007), GTS uses axis-alignedbounding box (AABB) trees Popinet (2006), while CarveCSG uses octrees Sargeant (2011). A number of hybridvariants exist, which seek simultaneous benefit from theaccess simplicity of the octree structure and the repre-sentational flexibility of BSPs Adams and Dutre (2003).

Of significant interest among such hybrid methods,Pavic et al. (2010) examine the boolean CSG binary treewith a single octree to embed all input geometry, subdi-

vide cells down to a fixed cell size as long as two inputsolids are volumetrically present, then classify each leafcell after subdivision by evaluating the CSG boolean treeexpression. A key di↵erence with our proposal is that theresulting meshes are stitched with an approximate localtriangulation at intersecting leaf cells, while we computetrue surface-to-surface boolean contributions, for arbi-trary boolean expressions that need not be expressedwith a boolean tree. Feito et al. (2013) uses a similaroctree subdivision triggered by general two-surface pres-ence, but focuses only on triangular meshes and the two-solid case. Fundamentally for both approaches it can benoted that classification is still independently computedand not used to guide the subdivision.

A compelling case we make in this paper is that sep-arating subdivision and classification stages, as done byall boolean B-Rep algorithms we are aware of, leads toan inherently suboptimal boolean algorithm. In lightof the review of prior art, this is because the hierarchi-cal structures proposed are in the vast majority of casesconstructed as alternate representations of each individ-ual input solid, decomposing its geometric details withtrees of logarithmic depth in that input solid’s size. Incontrast, our algorithm builds a single geometric decom-position, splitting nodes according to a partial classifi-cation of their content computed on the fly. Branchesnot contributing to the resulting solid are pruned away,building a tree whose nodes are focused on the final sur-face geometry, with a depth logarithmic in the numberof intersections present in the resulting solid instead.This yields two fundamental improvements over state ofthe art. First the algorithmic complexity is improved asit is now proportional to the logarithm of the resultingsolid size. Second, because this tree classifies final con-tributions on the fly during subdivision, our algorithmstores only the information of the currently explored treebranch. This frees the algorithm from storing the subdi-visions structures of each input solid in preparation for aseparate classification stage, as with previous methods.

3 N-Polyhedron CSG Formaliza-tion

This section introduces the representation of the inputsolids and the CSG operation.

3.1 Definitions and Assumptions

We consider n input polyhedra {Pi}i2{1,··· ,n}, whosesurfaces are assumed to be closed orientable 2-manifoldsembedded in R3, i.e. surfaces with no holes and witha consistent normal orientation. These classical as-sumptions ensure every polyhedron non-ambiguously de-fines a closed volume of R3. Each input polyhedronPi = (Vi ,Fi) is defined by its set of vertices Vi and facetsFi , each described as a loop of vertex indices whose orderis consistent, e.g. given with counterclockwise orienta-tion as seen from its outer region. We assume unicityof vertices, i.e. no vertex coordinates are duplicated

4

and adjacent loops share common vertices. A polyhe-dron may have various connected components. Facetsare assumed convex and described with a single loop tosimplify the explanation and implementation, althoughthe reasoning extends to general, non-convex polygonsthat may contain holes.

The resulting shape may be complex, as any outputfacet may be shaped by arbitrary primitives of all in-puts. The complexity of possible degeneracies betweenall types of primitives for two-polyhedron booleans is al-ready quite daunting and error-prone to implement Ho↵-mann (1989); ?. Generalizing Ho↵mann’s analysis to n-case degeneracies is not desirable, because the combina-torial possibilities of coincidental positioning of vertices,edges and facets are arbitrarily large. As an example,degenerate output vertices may arise from the coinci-dental positioning of anywhere from 4 to n input facetschosen among any input solid’s facets, all of which wouldresult in di↵erent special cases for reconstructing the ver-tex neighborhood. In practice, implementations most of-ten get away with using double-precision floating-pointarithmetic, as degeneracy cases are shown to be highlyunlikely when dealing with noisy inputs, either resultingfrom an acquisition process Curless and Levoy (1996);Franco and Boyer (2009) or artificially generated withjittering for this purpose. We follow this approach inQuickCSG.

3.2 Boolean Functions of n inputs

Instead of the usual CSG tree form of boolean expres-sions, we provide a framework for arbitrary expressions.We express a boolean solid operation using a boolean-valued function over n boolean inputs, f : {0, 1}n !{0, 1}. We note Ii(x) 2 {0, 1} the indicator functionof polyhedron Pi , whose value reflects whether a point

x 2 R3 is in polyhedron Pi ’s inner volume. The indi-cator function If (x) of the final solid Pf can then becomputed using f :

If (x) = f (I1

(x), · · · , In(x)). (1)

We define the indicator vector of point x as the tupleof its n indicator functions, I(x) = (I

1

(x), · · · , In(x)),and denote the CSG operation as occurring over its in-dicator vector, i.e. If (x) = f (I(x)). Note that, as Pi

is given as a set of vertices and faces (Vi ,Fi), indicatorfunction values Ii(x) can be computed by shooting a rayand counting the winding numbers of x Schneider andEberly (2003). If a vertex is known to belong to thesurface of a polyhedron, we denote the correspondingboolean value as ‘s’, see Figure 3. Note that we nevercompute function f on s inputs.

Any indicator function If (x) can be evalued from clas-

sical binary boolean operators (e.g. as a conjunction ofdisjunctions), but alternative evaluations are also possi-ble based on, for instance, higher arity boolean operatorsor arithmetic operations. The rationale is to simplify theexpression of some operations and speed up the evalua-tion. The following examples show a few operators whose

11

11

1

1

1

111

1

1

11 1

11

11

22

22

2

2

3

2

v4

v7

v6

v0 v

8

v9

vertex indicatorv0

(s, 0, s)v4

(s, s, s)v6

(s, 0, s)v7

(s, 0, s)v8

(0, s, s)v9

(0, s, 0)

Figure 3: Combination of three solids, with the orders ofthe vertices (in black circles). Left: the indicator vectorfor some of the vertices.

n-ary formulation enables e�cient evaluations:

n-Intersection: I\(x) = min(I1

(x), · · · , In(x)),(2)

n-Union: I[(x) = max(I1

(x), · · · , In(x)),(3)

Mutual exclusion: Ixor

(x) = I1

(x) xor · · · xor In(x),(4)

In k or more solids: Imin-k(x) = (

Pi Ii(x)) � k. (5)

The min-k operation retrieves the solid being part of atleast k input polyhedra. This operation in particularwould be tedious to decompose over a binary CSG tree:it requires to evaluate the union of all possible intersec-tions of k solids, leading to a tree of combinatorial size.

Since the indicator vector can be e�ciently repre-sented as a bit vector stored in machine words, evalu-ating typical boolean functions f is for most practicalpurposes a constant time operation in n, either by di-rectly evaluating a boolean test expression over the ma-chine word, or by building lookup/hash tables for com-patible expressions. Any binary tree of CSG operationscan be expressed as a single boolean function. Therefore,models built by binary CSG operations can be directlyconstructed by our algorithm. We will see below howthese definitions are used to make final boundary sur-face decisions.

4 Final Polyhedron Vertices

We now analyze the structure of the final polyhedron Pf ,

focusing on its vertices. With primitives in generic (i.e.non coincidental) position as assumed, vertices of thefinal polyhedron can be of only three types (Figure 3):

• First order vertices are vertices already present inone of the input polyhedra Pi ’s vertex set Vi .

• Second order vertices result from the intersection ofan edge of a polyhedron Pi and the facet of anotherpolyhedron Pj .

5

• Third order vertices result from the intersection ofthree facets of three di↵erent polyhedra Pi, Pj , andPk.

Thus, the order of a vertex is the number of s bits in itsindicator vector. A trivial way to generate all possiblevertex candidates is to examine all input vertices, edge-to-facet combinations, and three-facet combinations, andcompute the resulting geometric intersections using stan-dard algorithms Schneider and Eberly (2003). Inputprimitives may intersect at various locations in space,without necessarily participating to the final surface, asdetermined by the CSG function. We thus propose aclassification process to select the candidate vertices par-ticipating to the output result. Our description is illus-trated on the left column of Figure 4, which summarizesthe geometry of vertices of each order, and the notationsused. In this figure, orientation information is given inred, and classification information in green. Small greengrids are given to break down local subvolume config-urations around the vertex v, and their correspondingclassification bits, which are defined hereunder.

4.1 Vertex Classification

Intuitively a necessary condition for a vertex candidateto be kept is for it to lay at the border of the final solid,in other words it should be part of a surface transitionfrom inside to outside Pf . But this condition is notalways su�cient.

The condition is su�cient for a first order vertex.By definition, the candidate vertex v participates in theboundary of one input polyhedron Pi , which separatesthe vicinity of the vertex into two subvolumes, insideand outside Pi . For this vertex to lay on the boundaryof Pf , it must also partition the surrounding volume intoinside and outside regions of Pf . This information can

be obtained by examining how f (I(v)) transitions at theboundary of Pi , i.e. when the i-th bit of the indicatorvector, initially a s bit, is flipped between 0 and 1:

f (I1

(v), · · · , 0, · · · , In(v)) 6= f (I1

(v), · · · , 1, · · · , In(v)).(6)

This conditions is noted isFinal1(v), and tests wetherthere is a final indicator change when the boundary of Pi

is traversed. If the two expressions were equal, then thevertex v would be completely inside (both 1) or outside(both 0) Pf . This assumes all other bits Ij(v), withj 6= i, were computed for vertex v by e.g. ray shooting.

Second order vertex. Because facets of two poly-hedra Pi and Pj are involved, the volume surroundingthe vertex candidate is locally partitioned in four sub-volumes, each of which may be decided to be inside oroutside the final polyhedron Pf by the CSG function f .We must therefore examine how each combination ofboundary traversals at vertex v influence the final in-dicator function, by evaluating the corresponding i and

j bit-flippings of I(v) :

b00

= f (I1

(v), · · · , 0, · · · , 0, · · · , In(v))b01

= f (I1

(v), · · · , 0, · · · , 1, · · · , In(v))b10

= f (I1

(v), · · · , 1, · · · , 0, · · · , In(v))b11

= f (I1

(v), · · · , 1, · · · , 1, · · · , In(v)).(7)

We call b(v) = (b00

, b01

, b10

, b11

) 2 {0, 1}4 the classifi-cation vector of second order vertex v . Trivially, if allfour bits turn out equal, the vertex v is either completelyinside (all 1’s) or outside (all 0’s) of Pf and does not par-ticipate to the final surface. Vertex candidates may layon the final boundary and still not participate to the fi-nal surface description if the vertex is introduced in themiddle of a final edge. This happens as soon as the bitpattern of v is symmetric along one of the components ior j, which means that traversing the vertex along thisborder does not change the final primitive participationof the other polyhedron. A su�cient condition can thusbe written as the predicate isFinal2(v), which rules outany topological symmetries along the i or j components,as underlined in subscripts:�(b

00

, b01

) 6= (b10

, b11

)� ^ �

(b00

, b10

) 6= (b01

, b11

)�

(8)

It can be noted that this condition includes the necessaryconditions, since complete inclusion or exclusion is alsoa case of pattern symmetry.

Third order vertex. At the intersection locus ofthree facets from three input polyhedra Pi , Pj , Pk , thevertex’s neighborhood is locally split in eight subvol-umes. The analysis is analogous to second order ver-tices, and requires examining the influence of crossingthe three boundaries. The corresponding 8 combinationsof i, j and k bit-flippings are:

b000

= f (I1

(v), · · · , 0, · · · , 0, · · · , 0, · · · , In(v))b001

= f (I1

(v), · · · , 0, · · · , 0, · · · , 1, · · · , In(v))· · ·

b111

= f (I1

(v), · · · , 1, · · · , 1, · · · , 1, · · · , In(v)).(9)

We call b(v) = (b000

, · · · , b111

) 2 {0, 1}8 the classifi-cation vector of third order vertex v . Similarly to theorder-2 case, complete inclusion or exclusion of the vol-ume rules out the vertex, as well as any axis symme-tries, which can be jointly evaluated with the predicateisFinal3(v):

�(b

000

, b001

, b010

, b011

� 6= �b100

, b101

, b110

, b111

)�

^ �(b

000

, b001

, b100

, b101

� 6= �b010

, b011

, b110

, b111

)�

^ �(b

000

, b010

, b100

, b110

� 6= �b001

, b011

, b101

, b111

)�

(10)If none of these conditions are met, the vertex is com-pletely inside or outside the result polyhedron. If one(resp. two) of the conditions are met, the vertex is on afacet (resp. edge) of the final polyhedron.

Specificity of the 2-polyhedron case. Interest-ingly, in this situation, there are only first and secondorder vertices with no axis symmetries, i.e. all ordertwo vertex candidates participate to Pf , which has been

analyzed by Franco et al. (2013).

6

b10

= 1

b00

= 0

d13

d21

F2

F1

F3

b00

b10

b01

b11

F+

1

F+

1

d12

d13

b10

= 0

b00

= 1 b01

= 1

b11

= 0F�1

F�1

v

d31

d12

d21

d31

b01

= 0

b11

= 1

(d13

, v, d21

|F+

1

) (d12

, v, d13

|F+

1

) (d12

, v, d31

|F�1

) (d31

, v, d21

|F�1

)

Looplets generated on F1

and F3

(d23

, v, d31

|F+

3

) (d31

, v, d32

|F+

3

) (d13

, v, d32

|F�3

) (d23

, v, d13

|F�3

)


Second order vertex

F+

2

F+

2

F�2

F�2

d12

d32

d12

d32

d21

d23

d23

d21

b00

= 0

b10

= 0

b01

= 1

b11

= 1

b01

= 0b00

= 1

b10

= 1

b11

= 0

(d12

, v, d32

|F+

2

) (d23

, v, d21

|F+

2

) (d23

, v, d21

|F�2

) (d12

, v, d32

|F�2

)

Predicate:((b

00

, b01

) 6= (b10

, b11

)^(b

00

, b10

) 6= (b01

, b11

))

First order vertex

F1

F2

F3

F+

1

F�1

d12

d13

d31d

21

(d13

, v, d12

|F+

1

) (d21

, v, d31

|F�1

)

b1

= 1

b0

= 0

b1

= 0

b0

= 1


Third order vertex


on quadrant (r, s) = (0, 0):

v

F1

F3F

2

b000

b011

b001

b010

b100

b101

b111

b110

v

For r, s 2 {0, 1} we define t+rs := ((b0rs, b1rs) = (0, 1)) and t�rs := ((b

0rs, b1rs) = (1, 0))

t+00

= 1

t+01

= 0 t+10

= 0

t+00

= 0

t+01

= t+10

= t+11

= 1

t�00

= 1

t�01

= 0 t�10

= 0t�01

= t�10

= t�11

= 1

t�00

= 0

F�1

F�1

F+

1

F+

1

d31

d21

d13

d12

d13 d

12

d31

d21

(d21

, v, d31

|F+

1

) (d13

, v, d12

|F+

1

) (d13

, v, d12

|F�1

) (d21

, v, d31

|F�1

)

d23

d32

d32

Predicate: b0

6= b1

Figure 4: Vertex configurations and their corresponding possible looplets.

7

4.2 Vertex Retrieval Summary

We illustrate in Figure 5 how the set of Pf ’s verticesmay be retrieved using a simple but quartic worst-casecomplexity algorithm which loops over all 1, 2 and3-facet combinations, using the previously defined is-Final1, isFinal2, and isFinal3 predicates. Thesepredicates include rayshooting operations to computethe vertex’s indicator vector, a linear operation withall input facets. The algorithm uses classical inter-section functions Schneider and Eberly (2003): inter-sect2facets computes the intersected edge betweentwo convex facets, as a pair of vertices giving the edge ex-tremities, and intersectSegmentFacet computes thevertex representing the intersection of a segment and afacet, if any. Note that the quartic behaviour is in prac-tice mitigated by the fact that the third order loop isonly executed if a pair of intersecting input faces wasalready encountered.

5 Final Polyhedron Connectivity

Once the subset of final vertices Vf is known through theclassification process, the main task left is to identify howvertices are connected together to form the faces Ff ofthe final polyhedron Pf . Our method makes a clear dis-tinction between computing all geometric coordinates offinal vertices and building the final topology. While theformer involves numerical coordinate construction, thelatter relies only on orientation and ordering predicates.

The classification vectors introduced previously notonly inform us of the participation of a given vertex vto Pf , they also give a snapshot of volume and surfaceadjacencies around the vertex, as illustrated in Figure 4.We show here how to find all the polygons v partici-pates in, by introducing the looplet construct. We definea looplet of v as a loop fragment running through thisvertex, represented by a symbolic incoming and outgo-ing edge direction. Its purpose is to compactly representall partial adjacency information available for individualunconnected vertices, while being independently com-putable at each vertex.

5.1 Surface and Edge Orientation

Surfaces and edges need to be oriented to define looplets.Surface boundaries contributing to the final result Pf

may change orientation (reversed normal), e.g. whenparticipating in a boolean subtraction. For a facet Fa ofany polyhedron, we note the orientation of its contribu-tions to the final surface as either F+

a if the contributionconserves initial orientation, and F�

a if the orientationof the contribution is inverted. In similar spirit we needto define an intrinsic edge orientation dab for every edgeadjacent to two faces Fa and Fb. For this purpose wedistinguish two cases:

• if the edge is pre-existing from an input Pi, dab isthe edge direction with polygon Fa on its left andFb on its right on the oriented surface.

• if the edge arises from the intersection of Fa and Fb,we define the edge direction dab = Fa ⇥ Fb, as thecrossproduct of the corresponding face normals.

In both cases dab = �dba. With these definitionswe can introduce a concise notation for looplets, as(dab, v, dac|F+

a ), a looplet characterized as a positive con-tribution in Fa, centered on vertex v, with incoming edgedirection dab and outward edge direction dac. Loopletscan be symbolically and compactly stored as a vertexreference for v, an orientation bit, and three orderedface references (a,b,c in the above example) since the in-coming and outgoing directions share the face a in theiradjacencies. In the following, we will describe how clas-sification vectors determine which looplets are present ata vertex, upon which the final polyhedron facets can bebuilt. For this purpose, we break down the presentationof looplet generation cases for each vertex order. Thisbreakdown is illustrated in the right column of Figure 4,where each subfigure shows in green the bit classificationstate deciding the presence of corresponding looplets, an-notated under each figure in blue.

5.2 First Order Vertex Looplets

A first order vertex v that passed the classification testis on the surface boundary of a polyhedron Pi and alsoon the boundary of Pf . Each of its adjacent facets thusat least partially participates to Pf , and contributes alooplet for this vertex. The directions of this looplet aregiven by the edges adjacent to the facets of the looplet.If there is no change of orientation at the vertex v , i.e.b0

= 0 and b1

= 1, the edges and facet keep the orien-tation they had on the initial polyhedron, e.g. yieldinga looplet (d

13

, v, d12

|F+

1

) for facet F1

in Figure 4. Incontrast, looplets are inverted in case of surface orienta-tion change, i.e. when b

0

= 1 and b1

= 0, e.g. yielding

looplet (d21

, v, d31

|F�1

) for facet F1

.

5.3 Second Order Vertex Looplets

Second order vertices are the intersection of the edgeof a polyhedron Pi and the facet of a polyhedron Pj .As such it involves three facets, one facet labeled F

2

from Pj , and two facets F1

and F3

from Pi, adjacentto the edge yielding v from intersection with facet F

2

.We here assume facet orientation as in Figure 4, withoutloss of generality: if the normal of one of the two surfacesis inverted, swapping F

1

and F3

brings us back to thisreference configuration.

The crossing surfaces at the vertex define four bound-aries between four subvolumes, each with two possibleorientations. In the case of F

2

, each of the two subvolumeboundaries may yield one looplet in each orientation, e.g.for the top subvolume boundary in Figure 4 the two pos-sible looplets are (d

12

, v, d32

|F+

2

) and (d23

, v, d21

|F�2

).One of the two possible looplets is generated for a sub-volume boundary, as soon as the classification bits of thetwo subvolumes it separates have di↵erent values. Thelooplet generated is the one consistent with the orienta-tion of the final surface, with positive normal going from

8

function CSGVerticesInput: V , F : set of vertices and facets of input polyhedraOutput: Vf : corresponding set of final output verticesfor F

1

in F do . Enumerate input facesfor v in F

1

do . Order-1 candidatesif isFinal1(v) then Vf Vf [ {v}

end forfor F

2

in F dov1

,v2

intersect2facets(F1

,F2

) . Order-2 candidatesif {v

1

, v2

} = ; then continue F2

loop . No intersectionif isFinal2(v

1

) then Vf Vf [ {v1

}if isFinal2(v

2

) then Vf Vf [ {v2

}for F

3

in F do . Order-3 candidatesv intersectSegmentFacet(v

1

, v2

, F3

)if v 6= ; and isFinal3(v) then Vf Vf [ {v}

end forend for

end forend function

Figure 5: Brute-force algorithm to find all vertices of the result polyhedron.

the inside (b⇤⇤ = 1) to the outside (b⇤⇤ = 0) of the finalvolume.

The decision scheme is analogous for looplets of F1

andF3

. Looplets of these facets are always simultaneouslydecided as they are determined by the same classificationbits.

5.4 Third Order Vertex Looplets

The eight subvolumes surrounding v are separated bythe three facets F

1

, F2

, F3

. The configuration shown inFigure 4 assumes that the normals of these three facetsform a right-handed trihedron. This is again without lossof generality, should one of the facets have an oppositenormal, a single permutation in the order in which facetsare considered brings us back to this reference configu-ration. Because the looplet possibilities are analogousfor all three facet planes, we shall only enumerate theconfigurations for F

1

. The enumeration also has a ro-tational symmetry within the facet plane, since it is di-vided in four quadrants by the other two facets. Thefour quadrants are indexed with (r, s) 2 {0, 1}2 tuples.Figure 4 shows that there are four possible looplets for aquadrant, bringing the total possible order-three loopletcount to 3 ⇥ 4 ⇥ 4 = 48 for a vertex. We focus ourdescription on one quadrant of F

1

where (r, s) = (0, 0).To ease the description, we introduce two intermediateboolean predicates t+rs, t

�rs for each quadrant (r, s) of F

1

.They are computed as t+rs := ((b

0rs, b1rs) = (0, 1)) andt�rs := ((b

0rs, b1rs) = (1, 0)), indicating whether the sur-face portion corresponding to the quadrant participatesto the final surface as a positive or negative contributionin the plane of F

1

.Looplet decisions involve examining several quadrant

boundary predicates. Concave looplets exist if threeof the four quadrant boundaries exist for an orienta-tion and the fourth doesn’t, i.e. (d

13

, v, d12

|F+

1

) existsif ((t+

00

, t+01

, t+10

, t+11

) = (0, 1, 1, 1)), and (d21

, v, d31

|F�1

)exists if ((t�

00

, t�01

, t�10

, t�11

) = (0, 1, 1, 1)). On the other

F0

F1

F2

F3

v0

v1

v6

v7

v5

v4

v3

v2

d01

d06

d04

d03

v8

F5

d02

looplets(d

06

, v0

, d01

|F+

0

)(d

04

, v1

, d06

|F+

0

)(d

05

, v2

, d04

|F+

0

)(d

01

, v3

, d05

|F+

0

)(d

30

, v4

, d01

|F+

0

)(d

04

, v5

, d30

|F+

0

)(d

02

, v6

, d04

|F+

0

)(d

01

, v7

, d02

|F+

0

)(d

05

, v2

, d40

|F�0

)(d

10

, v3

, d05

|F�0

)(d

03

, v4

, d10

|F�0

)(d

40

, v5

, d03

|F�0

)

Figure 6: Result of the (P1

xorP2

) \ P3

operation onthe example of Figure 3. The table lists the looplets forfacet F

0

.

hand, convex looplets conditions depend only on threequadrant boundaries, the diagonally opposite quadrantin the facet having no influence. Concerning quadrant(0, 0) in Figure 4, convex looplet (d

21

, v, d31

|F+

1

) existsif ((t+

00

, t+01

, t+10

) = (1, 0, 0)), while the opposing looplet(d

13

, v, d12

|F�1

) exists if ((t�00

, t�01

, t�10

) = (1, 0, 0)).

5.5 Retrieving Final Polyhedron Facets

Once all vertices and their looplets have been generated,they can be re-indexed for each facet to generate its con-tributions to Pf . We process each input facet separately,which improves the locality of the algorithm and reducesit to 2D. The corresponding algorithm is in Figure 7 andan example of a result illustrated in Figure 6, where thefocus is on contributions of F

0

. Within a finally con-tributing facet, an arbitrary seed looplet is chosen andits outgoing direction followed, searching for sequentiallymatching looplets to close the loop. In some occurrences,

9

two or more looplets may match for a given direction:see for instance looplet (d

06

, v0

, d01

|F+

0

) in Figure 6, forwhich both (d

01

, v3

, d05

|F+

0

) and (d01

, v7

, d02

|F+

0

) match.In this case, the First function selects the closest loopletin the search direction d

01

, here (d01

, v3

, d05

|F+

0

). Thewhole process may be repeated until there are no loopletsleft in the face. For F

0

, this results in the following loops:

F+

0

: v0

d01�! v3

d05�! v2

d04�! v1

d06�! v0

F+

0

: v4

d01�! v7

d02�! v6

d04�! v5

d30�! v4

F�0

: v4

d10�! v3

d05�! v2

d40�! v5

d30�! v4

Two convex, diagonally opposing looplets may be trig-gered for a same vertex, as is the case for e.g. v

3

orv4

in F1

. Both negative and positive orientation facetsmay be generated for a single input facet, and may evenbe adjacent and share an edge, as for F

0

in our exam-ple. Both of these configurations are typical of exclusive-or operations, but may happen with other operations.More generally, the algorithm can generate arbitraryoutput facets, with non-convex loops, several loops perfacet (holes). Remarkably, our vertex-centered frame-work transparently accounts for all such possibilities.

Since facets may be arbitrarily large (see e.g. Fig-ure 21 or Dithering in Figure 22), the cost of Firstsearches may be quadratic if implemented naively. Toavoid this, output vertices can be sorted lexicographi-cally by vertex index and o↵set on the edge, to obtainquasilinear searching, in O(r log r) with r the number ofvertices in the output facet. At this stage the algorithmproduced no superfluous geometry: all vertices and edgesare required to represent the output polyhedron. How-ever, non-convex and non-0 genus polygons are hard tomanipulate, render or even feed back as input to thealgorithm. Therefore, we typically tesselate the outputfacets to triangles or convex polygons, also an O(r log r)operation. This only happen once at finalization andnever at an intermediate stage.

6 Hierarchical Algorithm

A fully functional approach can be implemented basedon the two simple algorithms, CSGVertices and CS-GFacets. CSGFacets needs little tuning as it alreadyruns quasilinearly over the output primitives identifiedby CSGVertices. However as previously noted thenaive CSGVertices has an impractical quartic worst-case run time. We thus rely on a hierarchical space explo-ration to trigger CSGVertices only on tightly boundedspace regions where output geometry is expected to befound. We choose a KD-tree based exploration for thispurpose because of its simplicity, and its favorable re-ported performance compared to other datastructuresfor similar tasks Havran (2000).

The KD-tree Bentley (1975) is a binary space parti-tion tree, whose nodes each represent a particular axis-aligned cuboid cell, containing all cells of its child nodes.It is typically built as a search datastructure for a setof points in a d-dimensional space, by recursively subdi-viding the input point set in two subsets of equal size,

using an axis-aligned split plane, achieving build com-plexities of O(m logm) time and O(m) space with minput points de Berg et al. (2008). As applied to polyhe-dra, the KD-tree cells contain a list of polygons, croppedto the cell’s bounding box. When polygons sit across asplit plane, they are divided in two fragments inheritedby both children of the parent node (Figure 8). Typi-cally the cell subdivision is pursued down to a certaindepth or until the number of primitives in the cell fallsunder a chosen bound, below which brute force search ismore e�cient.

6.1 KD-Tree Exploration for theBoolean Problem

Our procedure is similar to KD-tree construction, withthe key di↵erences that we never need to store the tree,and that our tree exploration is adaptive and output-sensitive. We want to subdivide only the nodes thatmay contain order-2 or -3 vertices of the final polyhe-dron, while pruning others, see Figure 10. This requiresclassifying KD-nodes during construction for every poly-hedron involved. A volumetric cell can be classified asbeing fully inside or outside a polyhedron, but it can alsostraddle the polyhedron surface as soon as it containssurface primitives, and be thus undecided. We thereforeneed to extend the binary boolean logic to ternary asin Pavic et al. (2010). We define three correspondingindicator states Ii(c) for a cell c with respect to polyhe-dron Pi : Ii(c) 2 {0, 1, u} and the cell’s indicator vectorI(c) = (I

1

(c), ..., In(c)) as the tuple of the cell’s ternaryindicators. In turn, final classifications If (c) = f (I(c))of a cell c are computed with ternary logic, with:

f : {0, 1, u}n ! {0, 1, u}. (11)

Although the indicator vector of the cell may contain un-defined bits, it is often possible to get a definite answerfor the cell classification with f . For example, considerthe intersection of n polyhedra, where f\(I(c)) = 1 i↵I(c) = (1, 1, · · · , 1). It su�ces for the cell to be knownoutside of any polyhedron to conclude that the cell is out-side the intersection volume, i.e. f\(· · · , u, · · · , 0, · · · ) =0. Likewise for unions, f[(· · · , u, · · · , 1, · · · ) = 1, regard-less of other bits, the cell is known to be inside the unionas soon as it is in one of the input polyhedra. Truth ta-bles for ternary versions of usual boolean operations areeasy to build. Note that for specific hard functions suchas f

xor

, all bits of I(c) must be defined ( 6= u) to computea definite classification, in other words all order 2 and3 vertices participate to the final surface. Neverthelesswhatever the boolean function f , it always benefits fromother generic pruning features of the algorithm discussedbelow.

6.2 Algorithm Summary

The algorithm is summarized in Figure 11. There aretwo pruning conditions. First, if the cell final classifi-cation f (I(c)) becomes completely determined, then itdoes not contain final surface primitives. Second, if the

10

function CSGFacetsInput: Vf : set of vertices of the outputOutput: L: final polyhedron facets as set of loopsF ;, V {} . Set of contributing facets, and their verticesfor v in Vf do . Collect looplets for all vertices

for F in AdjacentFacets(v) doF F [ {F+} [ {F�} . Keep both orientations of facetsV [F ] V [F ] [ {v} . Facet vertex contributions

end forend forfor F in F do . Process each facet’s two orientations

l ; . Looplets indexed by incoming directionfor v in V [F ] do . Collect looplets for all vertices of F

l l [ComputeLooplets(⇤, v, ⇤|F )end forwhile l 6= ; do . Looplets left for this facet

(d1

, v, d|F ) = pop(l) . Pick and remove a loopletF 0 [] . Build this final facetrepeat . Chain looplet vertices

F 0 [F 0, v ](d0, v, d|F ) First( (d, ⇤, ⇤|F ) in l )

until d = d1

. Until back to startL L [F 0 . Add this facet to final set

end whileend for

end function

Figure 7: Algorithm to find all facets of the result polyhedron from looplets.

Parent Child 1 Child 2

node parent child 1 child 2indicator vector (u, u) (u, u) (0, u)

Figure 8: A KD-tree cell containing facets from P1

andP2

is split. The polygons stored in the KD-tree cells arecropped to the cell.

cell is still undetermined but only one polyhedron par-ticipates, then all of its primitives in the cell are part ofthe final polyhedron and no recursion is needed. Thiscondition is expressed by counting the undefined bits ofI(c), where Undefined(I) denotes the set of indices ofpolyhedra whose surface still runs through the cell. Asevidenced by the example of Figure 10, for large meshes,most of the explored nodes will fall in these two prun-ing cases, and the algorithm subdivisions can be seenas converging to the final surface’s order-2 and order-3intersections.

If the number of facets falls under a threshold Fmax

,we fall back to CSGVertices to report final vertices.Experimentally F

max

= 20 is found to be e�cient in allsituations. Specifics of local indicator evaluation in CS-GVertices will be discussed in Section 6.4. Finally, ifnone of the previous conditions were met, the primitive

set can be split, and the two subtrees explored once theirindicator status has been updated. isInside(V⇤, Pi) re-turns a value in {0, 1, u}: 0 if all vertices of V⇤ are outsidePi , 1 if they are all inside, u otherwise. Since splitting isat the heart of the algorithm complexity, and isInsideis performed jointly for the two children, we specificallyaddress their implementation in Section 6.3.

6.3 Computing Splits and Node Indica-tors

Various split heuristics exist and are widely documentedin the litterature Havran (2000). They determine thetree balancing and the number of split polygon frag-ments, which may clutter the tree and its performance.Yet the split must be performed as rapidly as possibleto keep overall runtime under control. The complexityshould be O(m) with m the number of primitives. Pre-sorting input vertices along the axis to compute medi-ans without re-sorting each node, or minimizing polygonsplits with the Surface Area Heuristic (SAH) MacDon-ald and Booth (1990), are typical optimizations used toachieve O(m logm) tree build performance with smalloverhead for various tasks Wald and Havran (2006).However, our objective is di↵erent, because entire sub-trees are to be pruned and the split algorithm can hardlyanticipate where. Minimizing the number of splits orfinding the median of input polygon vertices is not as im-portant to our algorithm as favoring large pruning pos-sibilities. We have tried several advanced heuristics forpruning, but found experimentally that simply splitting

11

uuu|300

uuu|205

uuu|111

uu0|51

uu0|37

uu0 uu0

uu0

uu0

u00

uuu|78

uuu|50

uuu|32

uuu

0uu

u0u

u0u

u00|42

uuu|121

uuu|92

u00|32 uuu|72

uuu|46

u0u uuu|34

uuu

uuu

uu0|41

uu0|31

uu0 uu0

uu0|48

uu0 uu0|36

uu0

uu0

uu0

uu0

uuu|149

uuu|85

uuu|63

uuu|46

0uu uuu|34

uuu

uuu

u0u

u00

uuu|95

uuu|72

uuu|60

uuu|41

uuu

uuu

uuu|32

uuu

uuu

uuu|38

uuu uuu

uuu

u0u

u00|36

Figure 9: View of the KD-tree for operation Pf = P1

\(P2

[ P3

) applied to three simple meshes. Each boxrepresents a node, whose width is proportional to its number of polygons. When space allows, text in the boxindicates node’s indicator vector and the number of polygons. Color code: = node that was split, = leafwhere vertices were found with CSGVertices, = facets were just copied to the output, = node was foundcompletely inside or outside the mesh. Each dashed box represents a parallel task.

Figure 10: Left: Boolean di↵erence between a red and a green dragon mesh, center: bounding boxes of leaf nodesof the KD-tree, right: the leaves where CSGVertices is called (on average, 0.56 order-2 vertices are found oneach of these). Color code, center and right: = leaf where vertices were found with CSGVertices, = facetswere just copied to the output, = node was found completely inside or outside the mesh. The yellow boxesare barely visible in the central representation because the vast majority of output does not require intersectioncomputations.

function KDVerticesInput: V , F , I: cell’s input vertices, facets and indicator vectorOutput: Vf : cell’s output verticesif f (I) 6= u then . Cell is completely inside or outside Pf

return ; . Pruning, no surface primitive containedelse if |Undefined(I)| = 1 then . Only one active input Pi

return V . Pruning, all cell vertices participate to Pf

else if |F | Fmax

then . Threshold cell size is reachedreturn CSGVertices(V , F ) . Find cell’s order 1,2,3 vertices

else . Cell may still have final verticesV1, F1, V2, F2 Split(V , F ) . Choose and execute splitI1, I2 I . Subnode indicators are updated from Ifor i in Undefined(I) do . Enumerate still active Pi ’s

I1[i] isInside(V1, Pi) . Update i-th bitI2[i] isInside(V2, Pi) . Update i-th bit

end forreturn KDVertices(V1, F1, I1) [ KDVertices(V2, F2, I2)

end ifend function

Figure 11: Hierarchical algorithm to find all vertices of the result polyhedron.

12

in the middle of the bounding box’s largest dimensionyields excellent overall performance.

As soon as a the splitting plane is decided, a singleO(m) pass can build the split primitive sets. This passcan also be used to compute new bounding boxes foreach sub-tree. Checking whether child nodes intersectthe i-th bounding box provides information to updatethe i-th bit of the child node’s indicator vector (which isdenoted isInside in the algorithm of Figure 11). If it stillintersects, the i-th bit stays undefined, u. If no longeractive, the child node contains no i-th polygon fragment,but we still need to determine whether the cell is com-pletely inside or outside the i-th polyhedron. Shootinga ray outside the current cell would require examiningall input primitives and downgrade performance. For-tunately it is possible to answer the question locally bykeeping a reference to extremal vertices in the split di-rection during the splitting pass. Namely, we examinethe facets adjacent to the extremal vertex, and if theydo not have the same normal orientation, we select thefacet closest (most parallel) to the splitting plane for theorientation decision. In the example of Fig. 8, the nor-mal of the P

1

polygons in child 1 are used to computebit 1 of the indicator vector of child 2.

6.4 Computing the Indicator Vector ofLeaf Points

Recall that CSGVertices requires ray shooting to com-pute indicators of candidate vertices in the general al-gorithm. In the context of the KD-exploration, thisrayshooting occurs for calls of CSGVertices in leaves,and can be made local to keep the computational timebounded. Given the indicator vector I(c) for a leaf nodec, the i-th bit of the indicator vector Ii(x) can be com-puted as follows:

• if x is on a facet of polyhedron Pi , then Ii(x) = s

(surface bit),

• if the cell indicator is known Ii(c) 6= u, all pointsinside inherit the indicator bit Ii(x) = Ii(c),

• otherwise Ii(v) = u, we consider the non-empty setP of polygons of Pi in the node. We shoot a rayfrom x to an arbitrary point of one of the polygonsto ensure at least one intersection with Pi . Then wecompute the intersection of this ray with all poly-gons of P an keep the intersection nearest to x. Thesign of the dot product of the ray’s direction withthe normal at the closest point gives the indicatorbit Ii(x).

6.4.1 Jittering

To avoid feeding the algorithm degenerate configura-tions, two workarounds are implemented:

• for CAD meshes, it is useful to randomly trans-late each meshes independently by a random vector.The vector should be small enough not to changethe topology of the output, but still the same order

of magnitude as the input mesh size. The randomtranslation is reverted, ie. the true intersection ver-tices are recomputed from the faces they are theintersection of.

• the KD-tree is axis-aligned, so axis-aligned facetsmay also produce degeneracies. A simpleworkaround is to apply a random rotation to theinput and revert this rotation at the end.

No explicit check is done that the random jitter doesnot introduce new degeneracy: maybe the random rota-tion aligns another facet with the bounding boxes? Thisis because the probability computation above shows thatsuch coincidence is almost impossible.

7 Parallel Implementation

Performance being a main concern, parallelization ismandatory to benefit from today’s mainstream multi-core computer architectures. Parallelization has beenstudied in the context of boolean solid modeling, withelementary operations queued and balanced among pro-cessors Krishnan et al. (2001), by concurrently comput-ing the result of non-dependent node operations in theCSG tree, or by using the GPU-friendly Layer DepthImage as CSG approximation Wang (2011). KD-tree al-gorithms have also been parallelized in the context ofray-tracing e.g. Choi et al. (2010) and Shevtsov et al.(2007). The biggest common issue algorithms face is theirregularity of the tree exploration, because work associ-ated to each node of the KD-tree is data-dependent anddi�cult to predict. A good workload balancing strat-egy is necessary for optimal resource and core usage.Instead of crafting very specific code, we use the work-stealing paradigm for tree exploration, for which o↵-the-shelf scheduling algorithms exist.

7.1 Work Stealing Principle

Work stealing frameworks allow to express the poten-tial parallelism by delimiting dynamically created tasksthat can be executed concurrently. Each processing coremaintains a list of tasks. When a core generates a task,it pushes it in its local list. A task of this list is readyfor execution once synchronization constraints have beenresolved. When a core becomes idle (i.e. no local taskleft), it randomly selects another core and steals partof the tasks ready to be executed in the task list ofits target. If no task can be stolen, an other victimis targeted. This scheduling algorithm has proven per-formance Blumofe and Leiserson (1999). Today, severalparallel programming environments are based on workstealing (Cilk, TBB, OpenMP, KAAPI). They comewith high level constructions easing the parallelization ofcommon patterns (loop with independent iterations forinstance). Their implementations ensures high perfor-mance on multi-core processors and shared memory ma-chines. We parallelized QuickCSG with Thread BuildingBlocks (TBB).

13

T1 (47 ms) T2 (37 ms) H (125 ms)0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123

Figure 12: Gantt chart of QuickCSG running on a 48-core AMD machine on 3 di↵erent datasets (left:T1,central:T2,right: H). Datasets are detailed in section 8.2. Each line coresponds to a core ativity (white: idle). Nodesplitting is performed in the green and yellow bars, while internal node parallelization is represented by dark greenbars. Red bars are leaf computations (calls to CSGVertices). A light blue sequential step concatenates resultsfrom the di↵erent cores before building facets in parallel (calls to CSGFacets, shades of blue). Notice that totalexecution times (indicated on top) are degraded due to code instrumentation to gather these statistics.

7.2 Proposed Implementation

The recursive nature of the KD-tree construction fits thetask model well. We encapsulate the KDVertices nodeprocessing function in a task. These tasks can be exe-cuted concurrently and work stealing ensures they aredynamically spread amongst enrolled cores.

The KD-tree exploration starts with a single task.Enough tasks become available to keep all cores busyonly once a certain depth is reached. Meanwhile, manycores will stall. To circumvent this bottleneck, Splitcalls in the toplevel nodes are parallelized internally, witha simple parallel for, with results accumulated in sepa-rate vectors for each thread. Since this is less e�cientthan the node-level parallelization, this internal paral-lelization is enabled only in the very upper levels of thetree. Creating a task comes with some overhead, thatcan become significant for nodes with a light computeload. This is the case for deep nodes where the num-ber of tasks is much higher than the number of enrolledcores. Thus to shave o↵ overheads, we turn to a sequen-tial sub-tree exploration once the number of facets toprocess in a node is below a given threshold (80). Theresults, spread in thread-local data structures, are thensequentially concatenated in a global data structure be-fore the call to the CSGFacets function, which is easilyparallelized with a parallel for.

Figure 12 shows a gantt chart for executions ofQuickCSG on three di↵erent datasets. Though top nodesplitting is parallelized internally (dark green), it is notas e�cient as the node level parallelization once enoughnodes have been generated (yellow, green and red bars).The sequential concatenation of results (light blue bars)incurs a non-negligible cost at 48 cores. This step is verymemory-intensive, drastically limiting the e�ciency ofany parallelization. The scene geometry greatly influ-ences the execution. For instance the number of outputfacets is much higher than the input facets for T2 (cen-tral chart), which explains the relatively high cost of the

calls to CSGFacets (blue bars).

8 Experiments

We implemented the QuickCSG algorithm in C++, us-ing 64-bit floats for coordinates. We rely on TBB forthread-stealing implementation, with optimized threadlocal memory allocations. The parallel implementationreverts to sequential exploration when there are less than80 faces in the node, and brute-force CSGVertices iscalled as soon as the number of faces drops below 20.We store indicator vectors for nodes and points as 64-bitword bitfields. We use the particularly robust and e�-cient GLU implementation for the output polygon tes-selation.

Similar to Wang (2011); Feito et al. (2013), we ob-served that many CSG implementations (e.g. CGAL,3DSMax ProBoolean and Carve CSG) crash, generateempty results or refuse to process meshes that they can-not handle completely. Non-robustness to near degen-eracies, inability to cope with large and dense primitives,incompatibility of inputs with processing hypotheses arethe typical causes. Violation of QuickCSG’s input hy-potheses may also occur for certain input meshes, e.g.in Figure 13 the input polyhedra are self-intersecting,which a↵ects isInside, leading to a hole in the boundingbox of an incorrectly cancelled node. Regarding numericrobustness, we resort to ✏-coordinate jittering, which,combined to the fact that only output primitives arecomputed, are observed to vastly reduce and in mostcases eliminate the occurrence of degeneracy problems.In both cases QuickCSG fails gracefully: it records theoccurrence of errors while outputting the constructed re-sult, even if partially incomplete, as in Figure 13. Inthe experiments presented, we state when errors haveoccured. On QuickCSG’s web page and the supplemen-tary material, we provide the data and command linesthat reproduce the experiments on QuickCSG.

14

Figure 13: Failure case: when computing the unionbetween two meshes, the algorithm makes a mistake onthe mesh position of a KD-tree node because the inputmesh is invalid (self-intersecting). Left: the result, right:close-up of the hole in the mesh (the example is “Buddha[ Vase-Lion” from Section 8.1.2).

In this section, we compare QuickCSG to state-of-the-art CSG implementations on their own provided bench-marks. Because such benchmarks are typically limitedto a few operations on medium to large size meshes, weintroduce a new set benchmarks with more meshes andmore complex CSG operations.We experimentally probethe main characteristics of QuickCSG: its complexity,parallel performance, and gains of n-ary versus binaryoperators.

Unless stated otherwise, we ran the experiments on ai5 CPU 750 at 2.7 GHz (4 cores) with 4 GB of RAM. Re-ported execution times encompasses the processing frominput meshes to the output mesh including tesselizationto convex polygons, but excluding startup time and diskI/O. The reported timings are wall-clock times in sec-onds, measured by the gettimeofday function. As tim-ings for several runs are within 1% of each other, we donot report standard deviations.

8.1 Comparison With State of the Art

Our comparisons focus on recent boolean methods Wang(2011,?); Pavic et al. (2010); Feito et al. (2013) and onesoftware package (Carve). To the best of our knowledgethese implementations are currently the most computa-tionally e�cient ones. The related papers provide tim-ing comparisons with other public (CGAL, GTS) Feitoet al. (2013) or commercial packages (Rhino, ACIS Wang(2011), Houdini Pavic et al. (2010), and 3DS Max Feitoet al. (2013)), which were consistently found to be sig-nificantly slower. We thus focus our comparisons on theformer, with input data obtained either directly from theauthors, or from the Stanford 3D scanning repository,resizing the meshes to the detail level reported in thepapers if necessary. In most cases, the authors did notprovide their implementation, so we directly compareQuickCSG to the published execution times, with somecomments about the di↵erences between the hardware.For these experiments, we use single-core sequential runsof QuickCSG, unless otherwise stated.

8.1.1 Carve CSG

Carve is the CSG library1 used in the Blender mod-eler. It uses an octree accelerator structure and pro-duces clean output meshes, with few useless vertices.We compare Carve and QuickCSG on the most timeconsuming benchmarks of the Carve CSG test suite(test intersect.cpp). QuickCSG is 4 to 14 timesfaster than Carve on these examples handling largemeshes or/and many meshes:

Example m Carve QuickCSG21: sphere � translated sphere 19600 0.342 0.07729: union of 30 rotated cubes 180 0.495 0.07730: sphere � sphere \ cube 19606 0.469 0.03234: cow [ translated cow 185728 3.601 0.313

8.1.2 MeshWorks

MeshWorks2 is a GPU implementation of an approxi-mate Layered Depth Image algorithm Wang (2011). Be-cause the intersection of surfaces is only resolved up tothe resolution of the depth layer images used, several res-olutions are tested in their paper. The range of timingsin the following table reflect the range from coarse to fineresolution, with the finer scale taking the most time. Itis the most relevant to our tests since QuickCSG doesnot perform any approximation and retrieves the exacttopology. QuickCSG, executed on 4 cores, is from 10 to40 times faster than MeshWorks executed on a nVidiaGTX 260 GPU (+ 4 CPU cores), on the examples pro-vided along with the software:

Example m Wang (2011) QuickCSGDragon [ Bunny 941k 55.4 3.4Small dragon � Bunny 347k 3.06 – 8.97 0.253Buddha [ Vase-Lion 1.48M 10.68 – 21.81 1.027

We compare both with the provided MeshWorks im-plementation (first example) and timings from the orig-inal paper (two last examples). Notice that the inputmeshes have small self-intersections which violates theinput assumptions, so the QuickCSG output containsholes, see Figure 13.

MeshWorks is optimized for large and detailed meshes.In this case, most faces do not intersect another face.Once identified, these faces can be directly copied to theoutput. MeshWorks and QuickCSG support this, butit seems that the up- and down-load to the GPU hurtsMeshWork’s performance.

8.1.3 Hybrid Booleans

The “Hybrid Booleans” method of Pavic et al. (2010)subdivides the input space with an octree, then con-structs an approximate output mesh by remeshing thesurface at the resolution of leaf nodes. We run QuickCSGon the paper’s test data, and compare it against the re-ported timings, as the implementation is not available.

1Carve 1.4 can be found at http://code.google.com/p/carve/

downloads/list.

2Executable at http://www2.mae.cuhk.edu.hk/

~

cwang/

projMeshWorks.html

15

http://code.google.com/p/carve/downloads/list

http://code.google.com/p/carve/downloads/list

http://www2.mae.cuhk.edu.hk/~cwang/projMeshWorks.html

http://www2.mae.cuhk.edu.hk/~cwang/projMeshWorks.html

Depending on the quality settings for Pavic et al. (2010),QuickCSG is 5 to more than 100 times faster. Againthe higher times are more relevant to the comparison asQuickCSG retrieves exact topology on all these exam-ples:

Example m Pavic et al. (2010) QuickCSGChair 1.5k 1.3 – 13 0.003Sprocket 11k 5 – 47 0.069Organic 219k 1.6 – 24 (+1) 0.488

The Hybrid Booleans method requires to explore thetree to a predefined depth even for the simplest of inputmeshes (the ”Chair” example), hampering performance.It also generates a large amount of over-tesselated poly-gons. The ”Organic” example is based on a CSG oper-ation with six solids, of the form (P

1

\P2

) [ (P3

\P4

) [(P

5

\P6

). QuickCSG directly computes the result, whileit is computed with intermediate meshes in Pavic et al.(2010), hence the additional second for the intermediatemesh computation.

8.1.4 Feito et al

The algorithm from Feito et al. (2013) implements two-component boolean expressions on triangular meshes,which they resolve by exploring an octree with parallelthreads. The authors did not provide their implemen-tation, and thus we compare QuickCSG with the timespublished in the original paper. Compared to our testmachine, they use a higher-end processor (Xeon X55502.7 GHz), with more cores and memory (12 GB). Theevaluation of the original paper relies on combining stan-dard meshes (dragon, armadillo) with a translated ver-sion of the same mesh, and averaging the timings resultsover four CSG operations (union, intersection, and thetwo possible di↵erences). QuickCSG is 4 to 5 times fasterfor the same core count: (cr = # cores):

Example m Feito et al. (2013) QuickCSG1 cr 4 cr 16 cr 1 cr 4 cr

Armadillo 2⇥ 150k 2.71 1.46 0.68 0.57 0.24Dragon 2⇥ 871k 12.64 6.48 2.72 2.61 1.18

Possible reasons for this performance gap couldcome from the intermediate geometry generated (over-tesselized triangles and vertices that must be merged ina later stage), and an octree fully stored in memory as itis required for ray shooting, unlike our approach whichdoesn’t require storing the tree.

8.2 Performance & Comparisons onHuge Datasets

Importantly, all previous tests were performed on thedatasets of the original papers, which only process afew, sparse-intersecting inputs, and a small number ofboolean operations, i.e. in the “comfort zone” of exist-ing algorithms. The algorithm is already shown to sig-nificantly outperform them in this favorable situation.To illustrate the even larger gain in the more general sit-uations our algorithm can tackle, we introduce new test

cases involving large meshes, very dense intersection ar-eas, with several dozen operations:

• T1 is a set of 50 random toruses. We compute thedi↵erence between the union of the 25 first toruseswith the union of the 25 next ones: Pf = (A

1

[· · · [ A

25

)\(A26

[ · · · [ A50

). This is a typical CSGcase, where many facets intersect, but the geometryis regular (small compact facets).

• T2 is a set of 50 concentric narrow random toruses.The toruses follow the great circles on a sphere, soeach torus intersects each other torus in two loca-tions. We compute the volumes where at least twoof the toruses are present (f

min�2

). In this case,most facets intersect another. This generates manydisconnected components with many more facetsthan there are on input.

• H is a set of 42 cones with arbitrary bases cor-responding to the silhouettes of a piece of ropeseen from 42 cameras. The silhouettes define coneswhose apexes are the optical centre of the cameras,and that pass through the silhouette’s shape on thecamera’s image planes. An approximation of thepiece of rope can be reconstructed by intersecting(f\) the cones Franco and Boyer (2009). The facetsare very elongated, and the output mesh has noorder-1 vertices.

Table 14 gives some statistics about these three testcases and QuickCSG’s performance. The performancetimings are broken down in three stages: the “topology”concerns a necessary preprocessing pass of our algorithmover the data for facet normals and adjacencies, and theother timings report the execution time of KDVerticesand CSGFacets. The behavior can be di↵erent de-pending on the mesh. For T1 and H, the slowest stageis CSGVertices, for T2 it is CSGFacets, because itgenerates many facets.

8.3 Comparisons

Most existing approaches do not provide their implemen-tation or simply fail in this case. For example, 3DSMaxProBoolean produced an incorrect result for T1, after12 s of computation. Therefore, we compare results ofT1, T2 and H, against one of the fastest and most robustmethod available, the Carve library. The operations onT1 and H were expressed as trees of binary operationsand the operation on T2 as the union of all intersectionsof 2 meshes to enable Carve to process them. This last

operation has⇣

n2

⌘components:

fmin�2

(a1

, · · · , an) =f[(f\(a1, a2),f\(a1, a3),· · · ,f\(a1, an),

f\(a2, a3),· · · ,f\(a2, an),· · · f\(an�1

, an))(12)

Sequential execution times are found to be:

16

InputT1 T2 H

n = 50, 40k vertices, 40k facets n = 50, 3500 vertices, 3500 facets n = 42, 16k vertices, 33k facets

Ouput (for vertices, we indicate: #order-1 + #order-2 + #order-3 = total # vertices)9207 + 6116 + 372 = 16k vertices, 32k facets 699 + 38220 + 7876 = 47k vertices, 94k facets 0 + 16508 + 10728 = 27k vertices, 14k facets

QuickCSG runtime (topology + CSGVertices + CSGFacets = total) and memory usage7.1 + 69.9 + 10.4 = 87.4 ms 0.9 + 78.3 + 72.5 = 151.7 ms 5.1 + 437.6 + 26.2 = 468.9 ms

61 MB RAM 45 MB RAM 96 MB RAM

Figure 14: The T1, T2 and H test cases and QuickCSG performance. RAM is measured as the maximum residentset size reported by the unix time utility.

Example m Carve CSG QuickCSGT1 40000 4.652 0.297T2 3500 (94.795) 0.596H 33108 26.330 1.720

Times are given in parenthesis when Carve was onlyable to provide a partial result. For T1 and H, bothQuickCSG and Carve provide the expected results,QuickCSG being about 15 times faster than Carve. ForT2, despite its careful handling of degenerate cases,Carve was not able to compute f

min�2

on more than 38input meshes. Indeed, computing the unions of the inter-mediate intersections generates many degeneracies be-cause Carve relies on several binary operations of meshesthat use exactly the same vertices but with di↵erent inci-dence, which fails for non-exact methods such as Carve.

Additionally, we evaluated how the speedup evolves asa function of size of input meshes, by generating increas-ingly subdivided versions of the T1 dataset. We reporttimings in Figure 15 against runs of Carve on the samedatasets. Carve fails over one million input polygons inthis example. Speedups for QuickCSG: with 100k poly-gons, mono-core QuickCSG is 30⇥ faster than Carve,and 4-core executions are 70⇥ faster. With one millioninput facets, this speedup reaches 60⇥ mono-core and150⇥ with 4 cores.

8.4 Probing the Characteristics ofQuickCSG

We validate some properties of the algorithm to supportthe claims of the previous sections: namely, it is e�cientto directly compute the result of CSG operations on mul-tiple polyhedra, the complexity of the CSG operations

0

20

40

60

80

100

120

140

160

10k 40k 100k 1M

Quic

kCS

G s

peedup w

rt. C

arv

e

nb of facets

1 thread4 threads

Figure 15: Speed comparison between Carve andQuickCSG for meshes of increasing complexity based onT1 (which corresponds to the point at 40k facets).

is O(m log h), and the parallelization is e�cient.

8.4.1 Comparison with binary CSG operations

We evaluate the performance improvement thatQuickCSG can provide by directly computing the finaloutput without intermediate polyhedra, compared to theclassical approach that combines polyhedra by pairs. In-deed, any associative boolean operation f over n inputpolyhedra can be expressed as a sequence or binary treeof n� 1 binary operations:

f(a1

, · · · , an) = f(a1

, f(a2

, · · · f(an�1

, an) · · · ))= f(· · · f(a

1

, a2

) · · · , · · · f(an�1

, an) · · · ))(13)

Such decompositions produce n � 2 intermediate poly-hedra.

Table 1 compares the execution for di↵erent combi-nations. The CSG operations on T1 and H can be ex-pressed with binary operations applied sequentially or

17

Table 1: QuickCSG execution times when expressingthe same CSG operation di↵erently. Tests performed onthe T1 and H test cases with 4 threads. Execution timesslightly di↵er from Figure 14 because they are obtainedvia the Python interface of QuickCSG

T1ordering time errorssingle 0.125 0binary tree 0.363 0sequential 0.619 05,5,2 0.180 025,2 0.111 0

Hordering time errorssingle 0.524 0binary tree 2.396 9147sequential 2.244 502588,6 0.674 04,11 0.872 0

through a binary tree. Directly evaluating the equiv-alent n-ary operation to compute the output polyhe-dron clearly outperforms binary operations that pro-duce many facets of intermediate polyhedra that are dis-carded later on. Producing intermediate polyhedra alsoincreases the probability of errors caused by degenera-cies.

We investigate alternate trees of operations by vary-ing the arities at di↵erent levels. In Table 1, for example“25,2” in T1 means that we first compute the union of 25polyhedra and next combine the resulting two polyhedrato get the final result. The results shows that it is gen-erally faster to perform the CSG operation in one pass,except for T1, where the “25,2” ordering gives the bestresults. In this case the reason could be that operatingon fewer meshes at a time means that fewer vertex/cellindicator bits need to be computed for unrelated com-ponents.

8.4.2 Evaluation of complexity

We experimentaly probe the expected asymptotical com-plexity of QuickCSG, O(m log h), by timing a large num-ber of random operations. We randomly sample 1000groups of 5 meshes from the “3D Mesh SegmentationBenchmark” collection3, consisting of 379 meshes, from2600 up to 55k facets each. This collection originallycomes from the watertight track of the 2007 shape re-trieval contest Giorgi et al. (2007). The CSG opera-tion applied for each group is randomly chosen betweenunion, intersection or union of the 3 first minus union ofthe 2 last (labeled di↵3 in the figure).

As in most cases involving complex meshes, the dom-inant step of the algorithm is the KDVertices stage,which our experimental analysis thus focuses on. Eventhough the total number of split polygons is found tobe s = Ø(m), as shown in Figure 16 (left), in practicecomplexity plots are significantly clarified by explicitlyaccounting for the influence of the number s of splits oc-curring during the entire exploration, which is volatilefrom dataset to dataset. Plotting the execution timeversus (m+ s) log h on the right of Figure 16, shows theproportionality relation between both. The fact that theproportionality is verified over all input sizes and hetero-geneous boolean operations brings a strong validation to

3http://segeval.cs.princeton.edu/

0

50k

100k

150k

200k

0 50k 100k 150k 200k

interuniondiff 3

0

0.1

0.2

0.3

KD

Ve

rtic

es

time

(s)

(m+s)log(h)

interuniondiff 3

Figure 16: Test on 1000 CSG operations on 5 randommeshes. Left: Relationship between the number of splitpolygons (s, y-axis) and the number of input facets (m,x-axis). Right: the runtime for the main computation asa function of (m+ s) log h.

0

5

10

15

20

25

5 10 15 20 25 30 35 40 45

speedup

nb of threads

linearT1T2H

Figure 17: Speedup obtained with more threads, withrespect to a sequential run. This is measured on a 48-core AMD Opteron(tm) Processor 6174.

our analysis of the algorithm complexity in Section ??.

8.4.3 Parallel Execution

We evaluate QuickCSG’s parallelization on a machinewith four 12-core AMD processors. Figure 17 plots thespeedups for the T1, T2 and H tests. The Gantt charts inFigure 12 give a detailed insight about the parallelizationbehavior at 48 cores. Despite the irregular nature of thecomputations, parallelization is more than 80% e�cientup to 8 cores for T1 and 20 cores for T2 and H. At largecore count, the performance is impacted by the explo-ration of the top KD-tree levels and the data gatheringbefore building facets. The parallelization e�ciency de-pends on the complexity of the input and output meshes.For instance the T1 output consists of a large majority offirst order vertices, with few order-2 and order-3 vertices,so there are too few CSGVertices tasks in the KD-treefor the distribution over so many cores to be e�cient, seeFigure 12. The parallel performance improves for largemeshes with complex outputs.

9 Applications

In this section we demonstrate several use cases wherethe performance of QuickCSG opens new possibilities forusage of boolean combinations of polyhedra. We firstexamine the computer vision problem of 3D modelingfrom a set of silhouettes, then solid modeling in the con-text of 3D printing, and collision detection in interactive

18

http://segeval.cs.princeton.edu/

0

2

4

6

8

10

0 10 20 30 40 50 60

3D

re

con

stru

ctio

n t

ime

(s)

number of cameras

EPVHQuickCSG, 1 thread

QuickCSG, 4 threads

Figure 18: (Left) QuickCSG vs. EPVH runtime.(Right) Resulting visual hull when computed from 68cameras (22k facets).

systems. We conclude the section on extreme uses ofboolean operations, in real-time or over million-polygoninputs.

9.1 3D Modeling

Given n real photographic frames of an object acquiredfrom di↵erent camera viewpoints, it is possible to build3D reconstructions of the object by extracting its sil-houettes in the obtained images, and building the visualhull of the object. The visual hull is the maximal 3Dvolume that projects onto the input silhouettes. Previ-ous works have shown that it can be built by intersectinga set of polyhedral viewing cones Baumgart (1974), i.e.cones whose apex is the optical center of the cameras,and whose basis is the silhouette itself. Because thequality of the model increases with the number of in-put views, new dedicated multi-camera acquisition plat-forms are being crafted with several dozen cameras, asis the case for example of the Kinovis 68-camera stu-dio4. These platforms provide huge amounts of complexdata, challenging even the fastest existing implementa-tions. In this context, we compare our algorithm withone of the fastest state-of-the-art methods specializedin this task, EPVH Franco and Boyer (2009). We usea 68-camera dataset produced on the Kinovis platform,plotting the execution time of both methods against thenumber of input cones, see Figure 18. QuickCSG sig-nificantly outperforms the dedicated method whateverthe number of input views considered, reaching a ten totwenty-fold speed increase above 30 cameras with fourthreads.

9.2 Solid Modeling

With the increasing popularity of 3D printing, there is ademand for fast and convenient modeling tools that en-able any user to sketch and put together his own 3Dobjects for printing. OpenSCAD5 is a popular soft-ware and scene description language used in this context.Solids are modeled through a tree of binary CSG oper-ations from primitive shapes (cylinder, sphere, extruded

4http://kinovis.inrialpes.fr

5http://www.openscad.org/documentation.html

2D shapes, etc.). OpenSCAD relies on CGAL Hachen-berger et al. (2007) to compute the resulting polygonalmesh. We compare the performance of QuickCSG versusOpenSCAD when both process the same binary tree ofoperations. Tested on two complex models, Balljoint andDoggie from IceSL Lefebvre (2013), OpenSCAD’s meshcomputation requires 16 minutes and 7 minutes respec-tively while QuickCSG’s only needs 1.46 s and 0.3 s. Weprinted the Balljoint mesh generated by QuickCSG us-ing Makerware on a Makerbot Replicator 2 (Figure 19).All the balljoints are functional.

It is also possible to compute the whole result at once,by using a boolean function f that evaluates the bi-nary CSG tree. This approach does not perform as well:3.8 s for Balljoint. To evaluate the trade-o↵ between bi-nary and all-at-once computation, we traverse the Open-SCAD CSG tree, returning an intermediate sub-tree foreach node. For a given node, we collect the result sub-trees of its two child nodes. If the total number of meshesin these sub-trees is above some threshold G (the group-ing factor), we call QuickCSG to compute the CSG oper-ation and return a 1-node sub-tree with the CSG result.Otherwise, we return a sub-tree built with the two childmesh results and the binary operation. The two base-lines, binary evaluation and all-at-once evaluation areobtained for G = 2 and G =1 respectively.

Figure 19 plots the execution time as a function ofthe grouping factor G. The optimum occurs at G =8, with a 30% gain compared to the binary evaluation:0.96 s for Balljoint and 0.2 s for Doggie (or 0.52 s and0.12 s respectively with 4 threads). The binary tree builtby OpenSCAD already conveys some spatial clustering.Two intersecting primitive shapes are very likely closein the binary tree. By grouping operations according totheir location in the binary tree, we indirectly benefitfrom a better space partitioning than the one the KD-tree performs when considering all operations at once.

QuickCSG is 500 to 1000 faster than CGAL. Becauseof the perfect CAD inputs, which exhibit more regu-larity than acquired datasets, the mesh produced byQuickCSG occasionally su↵ers from localized degenera-cies, even with jittering, unlike CGAL that relies on ex-act methods. However, when dealing with higher poly-gon counts, CGAL computation times quickly becomesimpractical. The huge performance gap o↵ered by ouralgorithm leaves room for building an exact version ofthe method with still significantly faster runtimes.

9.3 Collision Detection

Many interactive systems rely on virtual objects simu-lations that necessitate inter-object collision detection.A vast array of dedicated methods have been designedjust for this problem, often based on hierachical struc-tures that reduce the otherwise quadratic object inter-penetration tests Weller (2013). Interestingly however,the existence of generic high-performance CSG tools canprovide a new basis to reformulate the problem. Givena set of solids, the set of object interpenetrations subvol-umes can be obtained by computing the min-2 operationover all objects in a given scene. Each connected com-

19

http://kinovis.inrialpes.fr

http://www.openscad.org/documentation.html

Balljoint (154 primitive objects)Doggie (51 primitive objects)

0.1

1

10

2 4 8 16 32 64 128

CS

G c

om

pu

tatio

n t

ime

(s)

grouping factor

BalljointDoggie

Figure 19: Top: The Balljoint and Doggie mod-els. Down left: QuickCSG runtime (single thread) withdi↵erent grouping strategies (red squares indicate thatthere were degeneracy errors). Down right: printedBalljoint. All balljoints are functional.

Figure 20: QuickCSG computes min-2 on two flabbyoctopuses and four rings. (Left) 33k input faces in total.(Right) Interpenetration volumes.

ponent of the output is the interpenetration volume ofat least two solids (this is approach is as yet incomplete,since it does not handle self-intersections and interpen-etrations of more than 3 solids). We ran QuickCSG onan example of the SOFA physics engine6, see Figure 20.QuickCSG on 4 cores takes 15.5 ms (excluding the 6 mstopology pass, which can be run once at the beginningof the animation), while the state-of-the-art LDI methodfrom Allard et al. (2010) computes collisions and inter-penetration volumes in 5 ms on a GPU (Quadro 4000).Thus, our generic CPU-only implementation is only a3-4 factor away of a specialized approximate algorithmrunning on dedicated hardware.

9.4 Extreme CSG

The speed of QuickCSG makes it well suited for interac-tive applications. We developed a small Python OpenGLapplication that animates a set of moving or deform-

6http://www.sofa-framework.org.

ing polyhedra, and combines them with a user-specifiedboolean function. Performing the min-2 or union of threesets of quasi-parallel boxes (see Figure 21), runs at 30 fpsfor 3⇥ 10 boxes, while the output has a large number oftriangles (10⇥ 10⇥ 10 grid of 3D crosses for min-2).

To give an idea of the broader applicability of themethod on million-polygon datasets, we test QuickCSGon huge and intrinsically dense and complex examples,see Figure 22.

The Dithering test mixes two dragon meshes P1

andP2

with a 3D dithering pattern. The pattern is defined asthe union of three orthogonal combs: D = P

3

[P4

[P5

.Then the dragon meshes are combined using the patternas a mask: Pf = (P

1

\D) [ (P2

\D). This is a functionthat would generate degeneracies by definition, if evalu-ated as a CSG tree. Indeed, in the intersection volumeP1

\P2

it computes D [¬D. The result is computed in2.5 s on our test machine (input: 1.74M facets, output:1.69M).

The Serpent dataset is another such example, built asa fractal, where a tube, P

1

, is wound around a torus.Then another tube, P

2

, winds around P1

and so on untilP5

. We compute the min-2 operation. From 31M in-put facets, QuickCSG outputs a 10M triangle mesh witha topological genus Agoston (2005) of 701, i.e., it canbe transformed without tearing into a sphere with thismany handles. This dataset overwhelms the memory ofour standard test machine, so we computed it on a 12-core Mac Pro machine with 64 GB of RAM in only 15seconds.

The last example is built from six instances of theHappy Buddha mesh Curless and Levoy (1996), thelargest mesh from the Stanford repository. We inter-sect these with the union of 100,000 random spheres.The spheres were labeled with a greedy graph color-ing algorithm to group them into 37 disjoint subsets, sothere are a total of 43 input meshes and 24M triangles.The CSG operation computes the union of the 6 Bud-dhas and intersects this with the union of all spheres:Pf = (P

1

[ · · ·[P6

)\ (P7

[ · · ·[P43

). On the Mac Prothis last example runs in 8 s and generates 5M triangles.It is shown on Figure 1.

10 Discussion & Conclusion

We have presented a new output-sensitive approachto boolean solid modeling, which generalizes previousknown methods to the N-polyhedron case with arbi-trary expressions, by directly computing the result witha simplified, vertex-centric algorithm. Thanks to itsstraightforward divide-and-conquer and pruning scheme,the algorithm achieves a performance breakthrough ondatasets of all sizes. This has been extensively veri-fied experimentally against a vast array of state-of-the-art approaches. The speedup not only materializes ontypical cases previous algorithms would be used in, itproves to be groundbreaking on extremely dense andlarge datasets with up to tens of million polygons. Theperformance speedup in these cases reaches up to threeorders of magnitude with respect to commonly available

20

http://www.sofa-framework.org

union min-2

Figure 21: Union and min-2 running at 30fps on 3⇥ 10undulating boxes, generating 17k and 27k facets respec-tively. The number of output facets h grows as h ⇠ m3

with the number of input facets m.

approaches, when the latter do not fail due to the inabil-ity to deal with the large data.

The e�ciency and expressiveness of the approachopens new possibilities as it also enables computation ofresults for arbitrary expressions, some of which simplycannot be processed with existing approaches. For thisreason we believe many use cases of boolean modelingfor research problems, initially ruled out for feasabil-ity and performance reasons, now become accessible.We have shown some possible applications of the al-gorithm in the context of solid modeling for 3D print-ing, computer vision, interactive systems. We makethe program, datasets, experimental protocol, and ad-ditional results available to the research community inthe supplementary material and on the following page:http://kinovis.inrialpes.fr/static/QuickCSG.

Many future developments and ramifications of thismethod are possible. First, the demonstration here wasdone with polyhedral solids, but the method could beextended to other B-Rep representations, such as para-metric surfaces, as the topological analysis shown in thepaper is identical with curved facets, edges, and verticesat the intersection of curved edges and facets. Second, al-though the emphasis here is on performance with fast butnon-robust predicates, one can imagine deriving an ex-act version of the algorithm relying on the Simulation ofSimplicity paradigm Edelsbrunner and Mucke (1990), asthe algorithm relies on a small number of well identifiedcore geometric constructs and predicates (rayshooting inisFinal, trihedron orientation test in CSGVertices,intersection routines Intersect2Facets, and inter-sectSegmentFacet, vertex ordering in First, axis-aligned plane splitting in Split). The inherent propertyof the algorithm to compute only final geometric prim-itives and no intermediate results will necessarily bene-fit this use case, as a substantial fraction of the speedpenalty in converting to exact predicates would thus beavoided. Finally, the fact that the algorithm improvesthe known upper bound in complexity of boolean solidoperations raises the broader theoretical question of itsoptimality for this problem, which we will investigate infuture work.

References

Adams, B. and Dutre, P. 2003. Interactive booleanoperations on surfel-bounded solids. ACM Trans.Graph. 22, 3 (July), 651–656.

Agoston, M. K. 2005. Computer Graphics and Geo-metric Modeling. Springer, London.

Allard, J., Faure, F., Courtecuisse, H., Falipou,F., Duriez, C., and Kry, P. 2010. Volume ContactConstraints at Arbitrary Resolution. ACM Transac-tions on Graphics 29, 3 (Aug.).

Baumgart, B. G. 1974. Geometric modeling forcomputer vision. Ph.D. thesis, Stanford, CA, USA.AAI7506806.

Bentley, J. L. 1975. Multidimensional binary searchtrees used for associative searching. Commun.ACM 18, 9 (Sept.), 509–517.

Bernstein, G. and Fussell, D. 2009. Fast, exact, lin-ear booleans. Computer Graphics Forum 28, 5, 1269–1278.

Blumofe, R. D. and Leiserson, C. E. 1999. Schedul-ing multithreaded computations by work stealing. J.ACM 46, 5 (Sept.), 720–748.

Braid, I. C. 1975. The synthesis of solids bounded bymany faces. Commun. ACM 18, 4 (Apr.), 209–216.

Brunet, P. and Navazo, I. 1990. Solid representationand operation using extended octrees. ACM Trans.Graph. 9, 2 (Apr.), 170–197.

Campen, M. and Kobbelt, L. 2010. Exact and ro-bust (self) intersections for polygonal meshes. Com-put. Graph. Forum 29, 2, 397–406.

Carlbom, I. 1987. An algorithm for geometric set op-erations using cellular subdivision techniques. IEEEComputer Graphics and Applications 7, 5, 44–55.

Choi, B., Komuravelli, R., Lu, V., Sung, H.,Bocchino, R. L., Adve, S. V., and Hart, J. C.2010. Parallel sah k-d tree construction. In High Per-formance Graphics.

Curless, B. and Levoy, M. 1996. A volumetricmethod for building complex models from range im-ages. In SIGGRAPH.

de Berg, M. d., Cheong, O., Kreveld, M. v., andOvermars, M. 2008. Computational Geometry: Al-gorithms and Applications , 3rd ed. ed. Springer-VerlagTELOS, Santa Clara, CA, USA.

Edelsbrunner, H. and Mucke, E. P. 1990. Simu-lation of simplicity: A technique to cope with degen-erate cases in geometric algorithms. ACM TRANS.GRAPH 9, 1, 66–104.

21

http://kinovis.inrialpes.fr/static/QuickCSG

Figure 22: The Dithering (1.5M triangles) and Serpent (10M triangles) tests.

22

Feito, F., Ogayar, C., Segura, R., and Rivero,M. 2013. Fast and accurate evaluation of regularizedboolean operations on triangulated solids. Computer-Aided Design 45, 3, 705 – 716.

Franco, J. and Boyer, E. 2009. E�cient polyhe-dral modeling from silhouettes. Pattern Analysis andMachine Intelligence, IEEE Transactions on 31, 3(March), 414–427.

Franco, J.-S., Petit, B., and Boyer, E. 2013. 3DShape Cropping. In Vision, Modeling and Visualiza-tion. Eurographics Association, Lugano, Switzerland,65–72.

Giorgi, D., Biasotti, S., and Paraboschi, L. 2007.Shape retrieval contest 2007: Watertight models track.SHREC competition.

Hachenberger, P., Kettner, L., and Mehlhorn,K. 2007. Boolean Operations on 3D Selective NefComplexes: Data Structure, Algorithms, OptimizedImplementation and Experiments. Comput. Geom.Theory Appl. 38, 1-2 (Sept.), 64–99.

Havran, V. 2000. Heuristic ray shooting algorithms.Ph.D. thesis, Department of Computer Science andEngineering, Faculty of Electrical Engineering, CzechTechnical University in Prague.

Hoffmann, C. 2001. Robustness in Geometric Compu-tations. JCISE 1, 143–155.

Hoffmann, C. M. 1989. Geometric and Solid Modeling:An Introduction. Morgan Kaufmann Publishers Inc.,San Francisco, CA, USA.

Krishnan, S., Manocha, D., Gopi, M., Culver, T.,and Keyser, J. 2001. Boole: A boundary evaluationsystem for boolean combinations of sculptured solids.Int. J. Comput. Geometry Appl. 11, 1, 105–144.

Laidlaw, D. H., Trumbore, W. B., and Hughes,J. F. 1986. Constructive solid geometry for polyhe-dral objects. In Computer Graphics (Proceedings ofSIGGRAPH 86). Vol. 20. 161–170.

Lefebvre, S. 2013. IceSL : A GPU Accelerated modelerand slicer. In 18th European Forum on Additive Man-ufacturing. http://webloria.loria.fr/

~

slefebvr/

icesl.

Li, C., Pion, S., and Yap, C. 2004. Recent progressin exact geometric computation. J. of Logic and Al-gebraic Programming 64, 1, 85–111. Special issue on“Practical Development of Exact Real Number Com-putation”.

MacDonald, J. D. and Booth, K. S. 1990. Heuris-tics for ray tracing using space subdivision. The VisualComputer 6, 3, 153–166.

Mantyla, M. 1987. An Introduction to Solid Modeling.Computer Science Press, Inc., New York, NY, USA.

Naylor, B., Amanatides, J., and Thibault, W.1990. Merging bsp trees yields polyhedral set opera-tions. In Proceedings of the 17th Annual Conference onComputer Graphics and Interactive Techniques. SIG-GRAPH ’90. ACM, New York, NY, USA, 115–124.

Nef, W. 1978. Beitrage zur Theorie der Polyeder:mit Anwendungen in der Computergraphik. Beitragezur Mathematik, Informatik und Nachrichtentechnik.Lang.

Pavic, D., Campen, M., and Kobbelt, L. 2010. Hy-brid booleans. Computer Graphics Forum 29.

Popinet, S. 2006. GNU triangulated surface library.

Requicha, A. A. G. 1977. Mathematical Models ofRigid Solid Objects. Tech. Rep. TR-28, ProductionAutomation Project, University of Rochester. Nov.

Requicha, A. A. G. and Voelcker, H. 1985.Boolean operations in solid modeling: Boundary eval-uation and merging algorithms. Proceedings of theIEEE 73, 1 (Jan), 30–44.

Requicha, A. G. 1980. Representations for rigidsolids: Theory, methods, and systems. ACM Com-put. Surv. 12, 4 (Dec.), 437–464.

Sargeant, T. 2011. Carve CSG boolean library, version1.4.

Schneider, P. and Eberly, D. 2003. Geometric toolsfor computer graphics. Morgan Kaufmann, San Fran-cisco.

Shevtsov, M., Soupikov, A., and Kapustin, A.2007. Highly Parallel Fast KD-tree Construction forInteractive Ray Tracing of Dynamic Scenes. In Com-puter Graphics Forum. Vol. 26. Wiley Online Library,395–404.

Thibault, W. C. and Naylor, B. F. 1987. Set op-erations on polyhedra using binary space partitioningtrees. In Proceedings of the 14th Annual Conference onComputer Graphics and Interactive Techniques. SIG-GRAPH ’87. ACM, New York, NY, USA, 153–162.

Wald, I. and Havran, V. 2006. On building fast kd-Trees for Ray Tracing, and on doing that in O(N logN). In IEEE Symposium on interactive ray tracing.61–70.

Wang, C. L. 2011. Approximate boolean operations onlarge polyhedral solids with partial mesh reconstruc-tion. IEEE Transactions on Visualization and Com-puter Graphics 17, 6, 836–849.

Weller, R. 2013. A brief overview of collision detec-tion. In New Geometric Data Structures for CollisionDetection and Haptics. Springer Series on Touch andHaptic Systems. Springer International Publishing, 9–46.

23

http://webloria.loria.fr/~slefebvr/icesl

http://webloria.loria.fr/~slefebvr/icesl

quickcsg: fast arbitrary boolean combinations of n solids

Documents