efficient substitutes for subdivision surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf ·...

107
Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August 5, 2009 Course Organizer and Notes Editor: Tianyun Ni, Ignacio Casta ˜ no, NVIDIA Corporation Instructors: org Peters, University of Florida Tianyun Ni, Ignacio Casta ˜ no, NVIDIA Corporation Jason Mitchell, Valve Philip Schneider, Vivek Verma, Industrial Light and Magic 1

Upload: others

Post on 19-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Efficient Substitutes for Subdivision Surfaces

SIGGRAPH 2009 Course Notes

August 5, 2009

Course Organizer and Notes Editor:

Tianyun Ni, Ignacio Castano, NVIDIA Corporation

Instructors :

Jorg Peters, University of FloridaTianyun Ni, Ignacio Castano, NVIDIA Corporation

Jason Mitchell, ValvePhilip Schneider, Vivek Verma, Industrial Light and Magic

1

Page 2: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

About This Course

The goal of this course is to familiarize attendees with the practical aspects of subdivision surfacesfor which we introduce substitutes for increased efficiencyin real-time applications. The coursestarts by highlighting the properties that make SubD modeling attractive and introduces some re-cent techniques to capture these properties by alternativesurface representations with a smallerfoot-print. We list and compare the new surface representations and focus on their implementationon current and next-generation GPUs. Among the advantages and disadvantages of each approach,we address crucial practical issues, such as watertight evaluation, creases and corners, seamlessdisplacement mapping, cache optimization. Finally and most importantly, Valve and IndustrialLight Magic will present a few breathtaking practical examples and demonstrate how these ad-vanced techniques have been adopted into their gaming and movie production pipelines.

Prerequisites

Basic knowledge of geometric modeling algorithms, in particular subdivision surfaces, and somefamiliarity with the graphics pipeline of modern GPUs.

Instructors

Jorg Peters, University of FloridaDr. Jorg Peters is Professor of Computer and Information Sciences at University of Florida. He isinterested in representing, analyzing and computing with geometry. To this end, he has developednew tools for free-form modeling and design in spline, Bezier , subdivision and implicit represen-tations. He obtained his Ph.D. in 1990 in Computer Sciences from the University of Wisconsin,Carl de Boor advisor. In 1994, he received a National Young Investigator Award. He was tenuredat Purdue University in 1997 and moved to the University of Florida in 1998 where he became fullprofessor. He also serves as associate editor for the journals CAGD, APNUM and ACM ToG andon program committees (seehttp://www.cise.ufl.edu/˜jorg).

Tianyun Ni, NVIDIA CorporationTianyun Ni is a member of NVIDIA’s Developer Technology team. She received her Ph.D. fromthe University of Florida in 2008. She is passionate about developing new graphics techniquesand helping game developers to incorporate these techniques into their games. Her recent work

2

Page 3: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

involves in finding applications for Direct3D 11 and developing advanced technologies to harnessthe computing power of next-generation GPUs, especially inthe area of hardware tessellation. Herexpertise is real-time rendering of higher-order surfaceson modern GPUs where she has publisheda number of papers. Her publications can be found athttp://www.cise.ufl.edu/˜tni

Ignacio Castano, NVIDIA CorporationIgnacio is an engineer in the Developer Technology group at NVIDIA, where he helps game devel-opers adopt the latest technologies and harness the computing power of modern GPUs. His latestwork is focused on finding applications that take advantage of GPU tessellation. He has appliedexisting research, developed new algorithms, and built robust tools to prototype and experimentwith the new hardware tessellation pipeline. Now, he frequently gives talks about his work at pub-lic events, and provides guidance to developers on an individual basis. Ignacio was editor of theRendering section of GPU Gems 3. Before joining NVIDIA, he worked for several game compa-nies, including Crytek, Relic Entertainment, and OddworldInhabitants. Ignacio frequently blogsabout his work and other life events athttp://castano.ludicon.com/blog/

Jason Mitchell, Valve CorporationJason is a Software Engineer at Valve, where he works on real-time graphics techniques acrossall of Valve’s projects. Prior to joining Valve, Jason was the lead of the 3D Application ResearchGroup at ATI Research for eight years. Jason has published ona variety of topics from higher-ordersurfaces to non-photorealistic rendering and regularly speaks at graphics and game developmentconferences around the world. Jason received a B.S. in Computer Engineering from Case WesternReserve University and an M.S. in Electrical Engineering from the University of Cincinnati. Ja-son’s previous publications and other materials can be found athttp://www.pixelmaven.com/jason/

Philip Schneider, Industrial Light and MagicPhilip is a Senior Software Engineer in the Research and Development division at Industrial Light+ Magic, where he is the lead of the Geometry, Modeling, and Sculpting Group. Prior to joiningILM, he worked at Digital Equipment Corporation, Apple Computer, Digital Domain, and WaltDisney Feature Animation; at all but the first of these he led groups working in the areas of ge-ometry, modeling, and/or physics simulation. He is co-author (along with David Eberly) of theMorgan-Kaufman book ”Geometric Tools for Computer Graphics”.

Vivek Verma, Industrial Light and MagicVivek is a Software Engineer in the Research and Developmentdivision at Industrial Light +Magic, a Lucasfilm Company, where he is a member of the Geometry, Modeling, and SculptingGroup. Vivek’s background is in computer graphics, scientific visualization, and computer vision.He obtained his PhD in 2002 from the University of California, Santa Cruz. In the past Vivek has

3

Page 4: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

worked at the Vision Technologies lab at Sarnoff Corporation and the 3D group at Autodesk, Inc.His professional interests include geometric modeling andcomputer vision.

4

Page 5: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Contents

1 Fundamentals of efficient substitutes for Catmull-Clark subdivision surfaces 71.1 Why do we want smooth surfaces?. . . . . . . . . . . . . . . . . . . . . . . . . . 71.2 Surface smoothness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3 Filling the normal channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.4 Evaluation or approximation?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.5 Polynomial patches of degree bi-3 and bi-cubic splines. . . . . . . . . . . . . . . 111.6 Subdivision surfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.7 Evaluation of subdivision surfaces. . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.7.1 Standard Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.7.2 Tabulation of Generating Functions. . . . . . . . . . . . . . . . . . . . . 151.7.3 Patch selection (ii) in eigenspace. . . . . . . . . . . . . . . . . . . . . . . 161.7.4 Eigensystem evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.8 Can it be done simpler? Efficient Substitutes. . . . . . . . . . . . . . . . . . . . . 161.8.1 Control polyhedra and proxy splines. . . . . . . . . . . . . . . . . . . . . 171.8.2 Separate geometry and normal channels. . . . . . . . . . . . . . . . . . . 171.8.3 C1 surface constructions. . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.9 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.10 Higher-quality surfaces?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2 Implementation 222.1 The Direct3D 11 tessellation pipeline. . . . . . . . . . . . . . . . . . . . . . . . 22

2.1.1 Overview of the pipeline. . . . . . . . . . . . . . . . . . . . . . . . . . . 222.1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.2 Direct3D 11 Implementation of PN Triangles. . . . . . . . . . . . . . . . . . . . 272.3 Approximating Catmull-Clark Subdivision Surfaces on Direct3D 11 Pipeline . . . 31

2.3.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.3.2 Patch Construction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.3.3 Surface Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.3.4 Displacement mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.4 Instanced tessellation on current hardware. . . . . . . . . . . . . . . . . . . . . . 412.4.1 Instancing in Direct3D 9. . . . . . . . . . . . . . . . . . . . . . . . . . . 432.4.2 Instancing in Direct3D 10. . . . . . . . . . . . . . . . . . . . . . . . . . 442.4.3 Storage of control points in constant or texture memory . . . . . . . . . . . 452.4.4 Instancing in OpenGL. . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.4.5 Instanced tessellation with adaptive refinement. . . . . . . . . . . . . . . 462.4.6 Emulating theVertex Shaderand theHull Shader . . . . . . . . . . . . . . 482.4.7 Vertex cache optimizations. . . . . . . . . . . . . . . . . . . . . . . . . . 48

5

Page 6: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

2.5 Watertight Tessellation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512.5.1 Watertight Position Evaluation. . . . . . . . . . . . . . . . . . . . . . . . 512.5.2 Watertight Normal Evaluation. . . . . . . . . . . . . . . . . . . . . . . . 542.5.3 Watertight Texture Seams. . . . . . . . . . . . . . . . . . . . . . . . . . 54

3 Approximate Subdivision Surfaces in Valve’s Source Engine 613.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.2 Motivation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.3 Software Pipeline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.3.1 Native Tessellation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.3.2 Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.4 Creases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.5 Displacement Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.5.1 Wrinkle Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703.6 Moving from Polygons to Subdivision Surfaces. . . . . . . . . . . . . . . . . . . 70

3.6.1 Quality Tangent Frames. . . . . . . . . . . . . . . . . . . . . . . . . . . 713.6.2 Manageability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.7 Future Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4 Approximating Subdivision Surfaces in ILM’s Tool chain 744.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.2 Motivation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.2.1 Viewpaint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754.2.2 Catmull-Clark Limit Surface Evaluation. . . . . . . . . . . . . . . . . . . 814.2.3 Convergence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.3 Displacement Display Implementation. . . . . . . . . . . . . . . . . . . . . . . . 834.3.1 CPU-side Subdivision Pipeline. . . . . . . . . . . . . . . . . . . . . . . . 834.3.2 CPU-side ACC Subdivision Pipeline. . . . . . . . . . . . . . . . . . . . . 854.3.3 Displacement Display. . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.4 Limit Surface Evaluation Implementation. . . . . . . . . . . . . . . . . . . . . . 874.5 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4.5.1 Displacement Display. . . . . . . . . . . . . . . . . . . . . . . . . . . . 954.5.2 Limit Surface Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4.6 Conclusions and Future Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 954.7 Acknowledgements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

6

Page 7: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

1 Fundamentals of efficient substitutes for Catmull-Clark subdivision sur-faces

Jorg Peters, University of Florida

As real time graphics aspire to movie-quality rendering,higher-order, smooth surface representa-tions take center stage. Besides tensor-product splines, Catmull-Clark subdivision has become awidely accepted standard – whose advantages we now want to replicate in real-time environments.

Recently, efficient substitutes for recursive subdivisionhave been embraced by the industry. Thesenotes discuss the theory justifying the use of efficient substitutes for recursive subdivision. (Threeother sections discuss their current and future support in the graphics pipeline, in the movie pro-duction pipeline and for gaming implementations.)

Below we therefore explore the motivation and the properties that surfaces and representationsshould satisfy to be used alongside or in place of Catmull-Clark subdivision.

1.1 Why do we want smooth surfaces?

Figure 1: Smoothness andcreases(from [PK09]).

The ability to have continuously changing normals (comple-mented by creases where we choose to have them) is importantboth artistically and to avoid errors in downstream algorithms.

Artistic shape considerations require the ability to create smoothsurfaces and transitions: sharp turns and sharply changingnor-mals do not match our experience of, say, faces and limbs. Onthe other hand, where the curvature is high compared to the sur-roundings, smoothed outcreasesare often crucial to bring tolife an object or an animation character.

Downstream algorithms convey realism via lighting, silhouettes,and various forms of texturing. In particular, the diffuse and specular components of the lightingcomputation rely on well-defined directions (normals)n associated with pointsv of the object.

7

Page 8: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

This is evident in the OpenGL lighting model.1 Downstream algorithms relying onn andv

include (click on thehyperrefsif you have the electronic version of the notes) are for example

– Gouraud shading(http://en.wikipedia.org/wiki/Gouraudshading)

– Phong shading(http://en.wikipedia.org/wiki/Phongshading)

– Bump mapping(http://en.wikipedia.org/wiki/Bumpmapping)

– higher resolution near thesilhouette(http://en.wikipedia.org/wiki/Silhouette)

– Normal mapping(http://en.wikipedia.org/wiki/Normalmapping)

– Displacement mapping(http://en.wikipedia.org/wiki/Displacementmapping)

1.2 Surface smoothness

Surfaces that can locally be parameterized over their tangent plane are called regularC1 sur-faces (or manifolds). Such surfaces provide a unique normaln at every point computable asthe cross product of two independent directionst1 andt2 in the tangent plane:n||t1 × t2. Tocharacterize smoothness of piecewise surfaces, as they occur in graphics applications, the areaof geometric modeling(http://www.siam.org/activity/gd) has developed the notion of ‘geometriccontinuity’. Essentially, two patchesa andb join G1 to form a part of aC1 surface if their (partial)derivatives match along a common curveafter a change of variables.

Formally (see e.g. [Pet02]), the patchesa andb map a subset� of R2 to R3. That isa,b : � (

R2 → R3. Let ρ be a map that suitably connects the domains, i.e. changes thevariables. We callsuch aρ a (regular) reparameterizationρ : R2 → R2. Let E = [0..1] × 0 be an edge of� andZ

the non-negative integers and let◦ denote composition, i.e. the image of the function on its rightprovides the parameters of the function to its left.Patchesa andb join Gk if there exists a (regular) reparameterizationρ so that for the parameterrestricted toE

for i, j ∈ Z, i + j ≤ k, ∂i1∂

j2 a ◦ ρ = ∂i

1∂j2 b. (1)

1The red, green or blue intensity of OpenGL lighting is

intensity :=emissionm + ambientl · ambientm

+∑

lights

1

k0 + k1d + k2d2· spot

ℓ·(

ambientℓ · ambientm . . .

. . . + max{(p − v) · n, 0} · diffuseℓ · diffusem + max{s · n, 0}shininess· specularℓ· specular

m

)

wherev vertex n normal p light position e eye positionm material ℓ light source l lighting model d ‖p − v‖

s := s′

‖s′‖s′ := v−p

‖v−p‖+ v−e

‖v−e‖

8

Page 9: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Smooth surfaces must in particular satisfyG1 continuity, i.e. (1) for k = 1. That is the surfacesneed continuity along the common curve and matching transversal derivatives (across the edge):

a ◦ ρ(E) = b(E), ∂j2 a ◦ ρ(E) = ∂j

2 b(E). (2)

(Matching derivatives along the boundary curve,∂j1 a ◦ ρ(E) = ∂j

1 b(E) already follow froma◦ρ(E) = b(E).) Proofs are therefore usually concerned with establishing∂j

2 a◦ρ(E) = ∂j2 b(E)

for patches with a common boundary curve (segment).)

To join n patchesGk at a vertex, two additional constraints come into play: (a) thevertex enclosureconstraintmust hold for the normal component of the boundary curves; and (b) the reparameter-izationsρ must satisfy aconsistency constraint. Both constraints arise from the periodicity whenvisiting the patches, respectively the reparameterizations surrounding a vertex. For a detailed ex-planation of these constraints and an in-depth look at geometric continuity see for example [Pet02].

Figure 2: Normal channel de-fined separate from the geom-etry (from [VPBM01]). Lin-ear interpolation of the normalsat the endpoints (top) ignoresinflections in the curve whilethe quadratic normal construc-tion (bottom) can pick up suchshape variations.

A complex of G1-connected patches admits aC1 manifoldstructure.G1 constructions differ from the approach of classicaldifferential geometry in that they do not require fully definedcharts.G1 continuity only regulates differential quantities alongan interface, whereas charts require overlapping domains.

1.3 Filling the normal channel

The separation of the position and the normal channel in thegraphics pipeline makes it possible to substitute for the true nor-mal field of the surface, a field not necessarily orthogonal tothe surface. This ‘field of directions’ can be used, as in bumpmapping, to make a surface less smooth or to make it appearsmoother (under lighting but not its silhouette) than it truly is.

Of course, the geometry and the shape implied by lighting withthe ‘field of directions’ declared to be the ‘normal field’ willbe (slightly) inconsistent. But we may hope that this does notattract attention (see PN triangles and ACC patches below).Thevisual impact of the ‘field of directions’ in the normal channelmay be so good that a careful designer will have to check surfacequality partly by examining silhouettes.

For polyhedral models, determining vertex normals is an under-constrained problem and various heuristics, such as averaging face normals, can be used to fillthe normal channel (for a comparison see e.g. [JLW05]). Figure 2 illustrates that some level ofconsistency of the normal channel with the true surface geometry is important. Substitutes of

9

Page 10: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

(a) interpolation (b) approximation

Figure 3: Which polygon represents the circle better?

subdivision surfaces (Section1.8) therefore typically use more sophisticated approaches tofill thenormal channel.

1.4 Evaluation or approximation?

Due to the pixel resolution, we ultimately render an averaged, linearizedapproximationof sur-faces. As Figure3 illustrates, exact evaluation followed by piecewise linear completion need notbe superior to any other approximation where no point lies exactly on the circle. For anotherexample in 2D consider the U-shapey := x2 for x ∈ [−1..1]. The line segment that connects(−1, y(−1)) to (1, y(1)) is based on exact evaluation at the parameters−1 and1 but is a muchpoorer approximation (in the max-norm) to the parabola piece than the line segment(−1, 1/2) to(1, 1/2).

On a philosophical level, if one ultimately renders a triangulation of the surface, there is no reasonto believe that a triangulation with exact values at the vertices is a ‘best’ approximation to the truesurface. All we know is that the maximal error does not occur at the vertices but in the interiorof the approximating triangles. The error in the interior ofthe triangle may be far more than thedistance between a control point and the surface or a controltriangle and the surface.

So, while ‘exact’ evaluation may sound better than ‘approximate’ evaluation, there is often noreason to prefer one to the other. In fact, if we stay with the control net of a surface rather than pro-jecting it to the limit, we preserve the full information of the spline or subdivision representation.

One attempt at quantifying and minimizing this error aremid-structuresof Subdividable LinearEfficient Function Enclosures (slefes) [Pet04]. Mid-structures link the curved geometry of thesurface to a two-sided (sandwiching) piecewise linear approximation. For a subclass of surfacesthe approximation is optimal in themax-norm(http://en.wikipedia.org/wiki/Supremumnorm).

10

Page 11: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

The main justification for positioning points as exactly as possible on a surface is that, when twoabutting patches are tessellated independently, it is goodto agree on a rule that yields the samepoint in R3 so that the resulting surface has no holes, i.e. iswatertight. Mandating the point to beexactly on the surface (and being careful in its computation) is an easy-to-agree-on strategy for aconsistent set of points. Of course, any other well-known rule of approximation would do as well.

1.5 Polynomial patches of degree bi-3 and bi-cubic splines

replacements

t0 t1 t2 t3 t4 t5 t6 t7 t8

q0

q1

q2

q3

q4−1

0

1

2

3

control polygonspline

t0 t1 t2 t3 t4 t5 t6 t7 t8

q0

q1

q2

q3

q4−1

0

1

2

3

A B

Figure 4: Univariate uniform cubic spline (from [Myl08]). (A) Control pointsq := [1, 3, 1, 2,−1] (red)and knotst := [−1, 0, 1, 2, 3, 4, 5, 6, 7] define a cubic splinex(t) as the sum of uniform B-spline basesfℓ

scaled by their respective control points (blue, green, magenta, cyan). (B) An equivalent definition of thespline is as the limit of iterative control polygon refinement (subdivision).

A C

B

Figure 5: Commutativity oftensor-product spline subdivision(from [MKP07]). Bi-3 spline subdivision(A) in one direction followed by (B) the other, or (C) simultaneous refinement as in Catmull-Clark.

11

Page 12: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

If we want to avoid linearization, we need to use quadratic patches at a minimum. Quadraticsoffer a rich source of shapes – after allC2 surfaces can locally be well-approximated by them)but smoothly stitching pieces together is generally only possible for regular partitions. Moreover,enforcingG1 continuity can force flat spots for higher-order saddles, such as amonkey saddle(http://en.wikipedia.org/wiki/Monkeysaddle). [PR98] lists all classes of quadratic shapes.

Many curved objects are therefore modeled with cubic splinesx(t) :=∑

ℓ qℓfℓ(t) as illustrated inFigure4. Cubic spline curves in B-spline form are available in OpenGL asgluNurbsCurve.By tracing out cubic splines in two independent variables(u, v), we obtain a tensor-product splineavailable in OpenGL asgluNurbsSurface. We call the tensor of cubic splinesbi-3 splineorbi-cubic spline:

3∑

i=0

3∑

j=0

qi,jfi(u)fj(v). (3)

Bi-3 splines inB-spline formcan be evaluated efficiently, for example byde Boor’s algorithm(http://en.wikipedia.org/wiki/DeBoor algorithm).

Just as for curves, each tensor-product spline can be split into its parts by averaging control points.This is theC2 bicubic subdivision as illustrated in Figure5 whose limit is the bi-3 spline patch.

b00

b33

b30

q00 q10 q20

q33

Figure 6: Bi-3conversion fromB-spline coefficientsqij to BB-coefficientsbij

An alternative representation of a polynomial piece is theBernstein-Bezier formor, shorter,BB-form 2. Cubic spline curves in B-spline formare available in OpenGL asglMap1. It, too, can be tensored

3∑

i=0

3∑

j=0

bi,jhi(u)hj(v), hk(t) :=3!

(3 − k)!k!(1 − t)3−ktk. (4)

Bi-3 splines in BB-form are available in OpenGL asglMap2. Everysurface in B-spline form can be represented in BB-form usingone patchin BB-form for every quadrilateral of the B-spline control net. Due tocombinatorial symmetry in the positions, there are three types of formulasin B-form to BB-form conversion:

9b11 := 4q11 + 2(q12 + 2q21) + q22, (5)

18b10 := 8p11 + 2p12 + 2p10 + 4p21 + p20 + p22,

36b00 := 16q11 + 4(q21 + q12 + q01 + q10) + q22 + q02 + q00 + q20.

Conversely, if patches in BB-form are arranged in checkerboard form, they can be representedin B-spline form. To obtain the simplest representation, weremove knots where the surface issufficiently smooth. (If we do this locally and carefully keep track of where we removed knots, we

2http://en.wikipedia.org/wiki/Beziercurve

12

Page 13: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Figure 7: Control point structure of (left) a polynomial bi-3 patch and (right) aGregory patch.

arrive at T-splines [SZBN03]). That is, the B-spline form and the BB-form are equally powerful,but one may choose B-splines to have fewer coefficients and built-in smoothness, while the BB-form provides interpolation at the corners.

Additionally, the BB-form can be generalized to two variables so that the natural domain is a trian-gle, i.e. tototal degree BB-form3. Polynomials in BB-form can be evaluated byde Casteljau’s algorithm(http://en.wikipedia.org/wiki/DeCasteljau’salgorithm). As a byproduct of evaluation, De Castel-jau’s algorithm provides the derivatives at the evaluationpoint from which the normal direction canbe obtained by a simple cross product. For a detailed exposition of these useful representations seethe textbooks [Far97, PBP02].

There are a number of classic bi-3 surface constructions [Bez77, vW86, Pet91], but, due to funda-mental lower bounds, they work in general only if we split facets into several polynomial pieces.

TheC1 bi-3 Gregory patch[Gre74, BG75] is a rational surface patchx : [0..1]2 → R3 such that∂u∂vx 6= ∂v∂ux can hold at the corners. This allows separate definition of first order derivativesalong the two edges emanating from a corner point; this can beviewed as splitting certain controlpoints into two (see Figure7). The resulting lack of higher-order smoothness contributed to it notbeing widely used in geometric design but should not be a problem for real time graphics. Highevaluation cost and cost of computing normals require careful use.

1.6 Subdivision surfaces

We sketch here only the basics of subdivision surfaces4 sufficient to explain their evaluation andapproximation. A full account of the mathematical structure of subdivision surfaces can be foundin [PR08]. The SIGGRAPH course notes [ZS00] of Schroder and Zorin, and the book ‘Subdi-vision Methods for Geometric design’ by Warren and Weimer [WW02] complement the moreformal analysis by a collection of applications and data structures. See also the generic CGALimplementation [SAUK04].

3http://wapedia.mobi/en/Beziertriangle4subdivision surfaces(http://en.wikipedia.org/wiki/Subdivisionsurface)

13

Page 14: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Figure 8: Mesh refinement by theCatmull-Clark algorithm.

Figure 9: Subdivision surfacesconsist of anested sequence of surface rings.

Algorithmically, subdivision presents itself as amesh refinement procedure that applies rules todetermine (a) the position of new mesh pointsfrom old ones and (b) the new connectivity. Theserules are often represented graphically as weights(summing to one) associated with a local graphor stencilthat links the old mesh points combinedto form one new one: Forming the weighted oldmesh points yields the new point. On the GPU,recursive subdivision naturally maps to severalshader passes (see e.g. [SJP05, Bun05]5).

Alternatively, the weights can be arranged as arow of a subdivision matrixA. This subdivisionmatrix maps a mesh of initial pointsqℓ ∈ R3 collected into a vectorq to an (m times) refined mesh

qm = Amq. (6)

The mesh can haveextraordinary points. An extraordinary point is one that has an unusual numberof direct neighborsn; n is often referred to as thevalenceof the extraordinary point. For example,n 6= 4 is unusual for Catmull-Clark subdivision (see Figure8).

Mathematically, however, a subdivision surface is a spline surface with isolated singularities. Eachsingularity is the limit of one extraordinary point under subdivision. In particular, the neighborhoodof any such singularity consists of a nested sequence of surface rings as illustrated in Figure9.

In the case of Catmull-Clark subdivision, the nested surface rings consist ofn L-shapedsectorswith three bi-3 polynomial pieces each. Let� := [0..1]2 be the unit square. Then each sectorof themth ring can be associated with a parameter range1

2m

(

� − 1

2�

)

(see Figures 4.2, 4.3, 4.4,4.5 of [PR08] for a nice illustration of this natural parameterization and the fact that the union of

5http://http.developer.nvidia.com/GPUGems2/gpugems2chapter07.html

14

Page 15: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

rings then forms a spline with a central singularity). An alternative parameterization associatesλm

n

(

� − λn�)

with an L-segment, whereλn is the subdominant eigenvalue ofA for valencen.

1.7 Evaluation of subdivision surfaces

Since subdivision surfaces are splines with singularitiesthere are a number of evaluation methodsthat also work near extraordinary points. We list four methods below.

1.7.1 Standard Evaluation

(i) determine the ringm (by taking the logarithm base 2);(ii) apply m subdivision steps (either by matrix or stencil applications);(iii) interpret the resulting control net at levelm as those of then L-shaped sectors in B-splineform; and(iv) evaluate the bi-3 spline (by de Boor’s algorithm).

While step (ii) seems to require recursion, it can be replaced by the (non-recursive) matrix multi-plication (6).

This is typically themost efficient strategy to evaluate a subdivision surface(and it can not bepatented ;-) ). It is particularly efficient when many pointson a regular grid are to be evaluated,for example when, for even coverage, we want to evaluate 4 times more points in ringm than inring m− 1. It is also most efficient when the surfaces have adjustable creases [DKT98], i.e. whereCatmull-Clark refinement rules are averaged with curve refinement rules.

Some special scenarios, however, invite different evaluation strategies. Before settling for a strat-egy, it is good to verify the conditions under which they are appropriate and efficient.

1.7.2 Tabulation of Generating Functions

If the crease ratios are restricted to a few casesand the depth of the subdivision is restricted, thenwe can trade storage for speed by pre-tabulating the evaluation. The idea is to write the subdivisionsurfacex locally, in the neighborhood of an extraordinary point, as

x(u, v, j) =L

qℓbℓ(u, v, j), (7)

15

Page 16: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

where theqℓ ∈ R3 are the subdivision input mesh points; eachbℓ ∈ R is a generating spline, i.e.a function that we may think of as obtained by applying the rules of subdivision considering onecoordinateqj of qj and setting allqj = 0 except forqℓ; and the summation byℓ is over allbℓ thatare nonzero at the point(u, v, j) of evaluation;j ∈ {1, 2, . . . , n} denotes one of then sectors of thespline (ring). If, for each valence separately, we pre-tabulate thebℓ(u, v, j) for ℓ = 1, . . . , L thenwe can look up and combine these values with the subdivision input mesh pointsqℓ at run-time.When stored as textures, approximate ‘in-between’ values can be obtained by bi-linear averaging.[BS02]

1.7.3 Patch selection (ii) in eigenspace

If several but irregularly distributed parameters are to beevaluatedand if they lie very close to theextraordinary point, it is worth converting the subdivision input mesh pointsqℓ to eigencoefficientspℓ ∈ R3. For this, we need to form the Jordan decompositionAm = V JmV −1 (just once for anygiven subdivision matrixA of valencen) and setp := V −1q so that

Amq = V Jmp. (8)

If the Jordan matrixJm is diagonal then the computational effort at run time of step(ii) reducesto takingmth powers of its diagonal entries [DS78]. In step (iii) we need to applyV to p and canthen proceed as before with step (iv) to evaluate a bi-3 spline [Sta98]. Note that this method is nomore exact than any of the other evaluation methods and that exact evaluation at individual pointsdoes not mean that a polyhedron based on the values exactly matches the non-linear subdivisionlimit surface.

1.7.4 Eigensystem evaluation

For parameters on a grid, Cavaretta et al. showed that, for functions satisfying refinement rela-tions, the exact values on a lattice can be computed by solving an eigenvalue problem [CDM91,page 18],[dB93, page 11]. Schaefer and Warren [SW07] apply this approach to irregular settings.

We note that neither the standard evaluation using (6) nor any of the three approaches just listedrequire recursion or uniform refinement (with its concomitant high use of memory and possibly ofCPU–GPU bandwidth). However, they do not provide convenient short formulas.

1.8 Can it be done simpler? Efficient Substitutes

A surface construction can provide a substitute for the subdivision algorithm if the resulting sur-faces have similar properties.

16

Page 17: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Figure 10: Control point structure ofPN triangles (from [VPBM01]). (left) the positional channel; (right)the normal channel.

1.8.1 Control polyhedra and proxy splines

The classic substitute is to render, at a finite level of resolution, either the refined control poly-hedron or a polyhedron obtained by projecting the refined control vertices to the limit (using theleft eigenvectors of the subdivision matrixA). This is based on the fact that the distance betweencontrol polyhedron and limit surfaces decreases fast. One of the challenges here is to correctlyestimate the distance of the (projected) control polyhedron to the surface in order to determinethe (adaptive) subdivision level that gives sufficient resolution for the application. By character-izing control polyhedra as (the images of) proxy splines with the same structure as subdivisionsurfaces, [PR08, Chapter 8] gives general bounds on this distance for all subdivision schemes.Tighter bounds, specifically for Catmull-Clark subdivision surfaces can be found in [PW08]. Alsoavailable is a plug-in by Wu for (pov-)ray tracing based on the bounds in [WP04, WP05]. Thisclass of substitutes is only efficient, if it can be applied adaptively (see, e.g. [Bun05]).

1.8.2 Separate geometry and normal channels

A second class of substitutes takes advantage of the separation of the position and the normalchannel in the graphics pipeline. That is, the entries in thenormal channel are only approximately‘normal’ to the (geometry of the) surface.

— original geometry, refined normals To create a denser field for the normal channel thenwould be used by Gouraud shading, we can apply subdivision (averaging) to the polyhedralnormals [AB08].

— refined geometry, refined normalsReplacing an input triangle with normals specified atits vertices,PN triangles[VPBM01] consist of a total degree 3 geometry patch that joinscontinuously with its neighbor and has a common normal at thevertices. To convey theimpression of smoothness, a separate quadratic normal patch interpolates the vertex normals(Figure10). By reducing the patch degree to quadratics, trades flexibility of the geometry

17

Page 18: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

00

00 01

01

02

03

10

1011

11

12

13

20

2021

21

22

23

30

30 31

31

32

33

≫ h bi

bi+1bi−1

ordinary polar extraordinary(a) (b) (c)

Figure 11: Mesh-to-patch conversion. (from [MNP08]) The input mesh (top) is converted to patches(bottom) as follows. (a) An ordinary facet is converted to a bi-cubicpatch with 16 control pointsfij. (b)Every triangle in polar configuration becomes a singular bi-cubic patch represented by 13 control points◦.(c) An extra-ordinary facet withn sides is converted to aPn-patch defined by6n + 1 control points shownas◦. ThePn-patch is equivalent ton C1-connected degree-4 triangular patchesbi, i = 0 . . . n−1, havingcubic outer boundaries.

for faster evaluation [BA08] (see also [BS07]). Since the quadratic pieces have no inflectionsthis is particularly useful when the triangulation is already more refined.For four-sided facets, the corresponding (family of)PN quadswas known but not publishedat the time of PN triangles. Just like the triangles, its bi-3patches are constructed basedsolely on the pointsv and normalsn at the patch vertices so that a patch need not look upthe neighbor quads.Better shape can be achieved, when the neighbor patch(es) can be accessed. For example, theinner BB coefficientsbij can be derived from a bi-3 spline [Pet08]. One can use Equations5 for the inner coefficients of typeb11 and setb10 on an edge between two patches as anaverage of their closest inner points. A good heuristic is toset the corner control points tothe Catmull-Clark limit point (withq0 the central control point and forℓ = 0, . . . , n−1 q2ℓ−1

the direct neighbor points andq2ℓ the face neighbor points):

n(n + 5)bCC00 :=

n−1∑

l=0

(nq00 + 4q2ℓ−1 + q2ℓ) . (9)

Up to perturbation of interior control points near extraordinary points,

(n + 5) bACC11 := nq11 + 2(q12 + 2q21) + q22, (10)

this is how ACC patches [LS08a] are derived (see also the Section 2.3 of these lecture notes).

18

Page 19: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

(a) (b)

(c)

Figure 12: Quad/tri/pent polar models (from [MNP08]) (a) Axe handle; using a triangle and a pentagonto transition between detailed and coarser areas. The axe head (left) features a sharp crease. (b) Polarconfigurations naturally terminate parallel feature linesalong elongations, like fingers. (c) smooth surfaceconsisting of bi-cubic patches (yellow), polar patches (orange), and p-patches withn = 3 (green), n = 4(red), n = 5 (gray).

1.8.3 C1 surface constructions

A third class of substitutes are properC1 surfaces, i.e. their normals can be computed everywhere(eg in the pixel shader) as the cross product of tangents (derivatives obtained as a byproduct of deCasteljau’s evaluation) without recourse to a separate normal channel.

These patches are typically polynomial, although a rational construction like Gregory’s patch andits triangular equivalent could be used just as well. The patch corners and normals can moreoverbe adjusted to approximate Catmull-Clark limit surfaces.

Just as the second class, c-patches [YNM+] and themany-sided p(m)-patches [MNP08] (Figures11 and 12) can be constructed and displayed in real time. [MNP08] comes with shader code,allows for (rounded) creases and polar configurations (see Figure 12(d)) 6 The third class ofsurface constructions is related to surface splines [Pet95] and Loop’s construction [Loo92] and

6 One concern is that such creases and polar configurations result in ‘parametric distortion’ when texture mapping. Applyingthe same crease or polar mapping (inR2) when looking up texture coordinates, however, shows this concern to be unfounded.

19

Page 20: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

localized hierarchical surface splines [GP99].

1.9 Efficiency

Whether a particular representation or evaluation strategy is time and space efficient depends on thesoftware/hardware setup. However, we can observe the following in the context of GPU rendering.

Fixed, fine triangulationsare expensive to transfer to the GPU and require animation ofeach ver-tex. They lack refinability. Subdivision surfaces approximated byrecursive refinement, possiblyfollowed by projection of the control points to their limit,require multiple passes with increasingbandwidth and intermediate memory storage. Subdivision surfaces approximated bynon-recursiveevaluationas listed in Section1.7requires the inversion of (moderately sized) matrices. These ma-trices need to be adapted for different types of creases. Subdivision surfaces approximated bytabulationrequire storage that limits the representable crease configurations. The (efficient) sub-stitutes listed in Section1.8allow for creases, adaptive evaluation (by instancing or the tessellationengine) and, as low degree polynomials, have been created tobe both space-efficient and time-efficient, in their construction as well as in their evaluation.

efficiency space time comment

triangulation – – fixed resolutionrecursive subD – adaptivity?non-recursive subD – creases?tabulation – + creases?efficient substitutes + + creaseX, adaptX

1.10 Higher-quality surfaces?

For high-end design,C1 continuity is not sufficient. One can feel (and sometimes see) the lack ofcurvature continuity. In fact, Catmull-Clark subdivisiondoes not meet the requirements of high-end design: Generically, near extraordinary points, the curvature lines diverge, and the surfaces be-comes hyperbolic [KPR04]. Guided surfacing [KP07, KPN1], Loop and Schaefer [Loo04, LS08b]and most recently a bi-3C2 polar subdivision [MP09] promise better shape. Yet, it is not clear thatreal-time or movie applications can benefit from such high-quality surfaces.

Curiously, at least formally, displacement mapping, whichoften increase roughness of the surfaces,formally requires derivatives of normals and therefore higher-order continuity.

20

Page 21: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

1.11 Summary

Besides the classical rendering of the control polyhedron,possibly projected onto the surface, thereare two classes of surface constructions that can be used as efficient substitutes of subdivision sur-faces or as primitives in their own right. Both triangular patches and quad patches are available (aswell as polar configurations) to give the designer broad-ranging options and mimic both Catmull-Clark and triangle-based subdivision. The next chapters will explain the use of these constructionsin more detail and may inspire additional short-cuts and innovations (see for example7), made allthe more relevant by the imminent availability of tessellation hardware.

AcknowledgementsThis work supported in part by NSF Grant CCF-0728797

7http://castano.ludicon.com/blog/2009/01/07/approximate-subdivision-shading/

21

Page 22: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

2 Implementation

Tianyun Ni and Ignacio Castano, NVIDIA

In this chapter we start with an overview of the Direct3D 11 graphics pipeline, followed by themotivation behind the design of the the pipeline through twocase studies:PN TrianglesandAp-proximating Catmull-Clark subdivision surfaces. We then show how to implement and emulateportions of this pipeline on current hardware for backwardscompatibility. Finally, we discusssome of the practical implementation details.

2.1 The Direct3D 11 tessellation pipeline

Direct3D 11 is the latest version of Microsoft’s graphics API and it provides access to the lat-est features of modern graphics hardware. The most notable of those features is support for anextended pipeline that enables programmable hardware tessellation.

While here we refer to this new graphics pipeline as the Direct3D 11 pipeline, we expect the samefeatures will also be exposed under OpenGL. However, we use the Direct3D terminology, sincethe details of the corresponding OpenGL extensions are not publicly available yet.

2.1.1 Overview of the pipeline

Figure 13: The Direct3D 11pipeline.

Direct3D 11 extends the Direct3D 10 pipeline with support for pro-grammable tessellation. This is accomplished with the addition ofthree new stages: theHull Shader, theTessellator, and theDomainShader(Figure13). The goal of these new stages is to enable effi-cient rendering of higher order surfaces, such as the approximationsto subdivision surfaces described in the previous chapter.In fact, onecould say that the pipeline was designed with this particular applica-tion in mind.

These three new stages stand between theVertex Shaderand theGe-ometry Shader. As their name implies, theHull ShaderandDomainShaderare programmable stages, while theTessellator, althoughfairly flexible and configurable, is a fixed function stage.

In addition to these new stages, Direct3D 11 also adds a new prim-itive type: the patch. Patches are the only primitive types that aresupported when the tessellation stages are enabled. Patches have an

22

Page 23: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

arbitrary number of vertices between 1 and 32, and unlike anyof theother primitive types, patches do not have any implied topology. Thatis, a patch with three vertices is not necessarily a triangle; it’s up tothe programmer to write shaders that decide how the patch verticesare interpreted. In this setting, a patch is just a disconnected set ofvertices.

In the tessellation pipeline theVertex Shaderis still the first programmable stage. Its purpose,however, is reduced to transform vertices from object to world space. That is, it allows you toapply animation and deformations at a lower frequency. The idea is that by performing animationand simulation on the control mesh it will be possible to drastically reduce animation storage, andto implement much more realistic simulation algorithms.

On the other hand, one also has to take into account that the larger the size of the patch, the lowerthe effectiveness of the post-transform cache. While in triangle meshes the number of transformsper vertex is typically between 1.0 and 1.5, when using quad patches composed of an average of16 vertices, the number of transforms per vertex is generally about 6 times higher. In spite of that,performing animation in theVertex Shaderis in many cases more efficient than computing theanimations in a separate pass.

After theVertex Shader, theHull Shader(Figure14) is invoked for each patch with all its trans-formed vertices. In this regard theHull Shaderis similar to theGeometry Shadersince it canperform per primitive operations. However, as opposed to the Geometry Shader, theHull Shaderhas a fixed output that has to be declared in advance.

Figure 14: TheHull Shader.

23

Page 24: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

The Hull Shaderserves two purposes. One is to compute per edge tessellationfactors that areprovided to theTessellatorstage. The other is to perform computations that are invariant for all thevertices that will be generated in theDomain Shader. The most common example is to transformthe input vertices from one basis to another, which is usefulwhen the input representation is notpractical for direct evaluation.

In order to perform these tasks efficiently, theHull Shaderneeds to be parallelized explicitly. Thatis, instead of having a single thread per patch compute all control points and tessellation factors, theHull Shaderis divided into three parallel phases (Figure15), and each of these phases is composedof a user defined number of threads between 1 and 32. These threads cannot communicate betweeneach other, but each phase can see the output of the previous phase.

Figure 15:Hull Shaderphases.

That allows you to, for example, compute control points in the first stage, the Control Point Phase;based on these control points compute the edge tessellationfactors in the second stage, the ForkPhase; and finally based on the edge tessellation factors compute the interior tessellation factors inthe last stage, the Join Phase.

To simplify the programming of theHull Shader, the shader code that correspond to the Fork andJoin Phases is not provided explicitly. Instead, a single shader function to compute all the patchattributes is provided by the programmer and the HLSL compiler automatically extracts parallelismfrom it.

TheTessellatoris the stage in which the data expansion happens. The only input of theTessellatoris the set of edge and interior tessellation factors computed in theHull Shader. TheTessellatorgenerates semi regular tessellation pattern for each patchbased on these factors. The actual patternalso depends on the user-selected configuration.

24

Page 25: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Figure 16: Some examples of tessellation patterns generated by theTessellator.

TheTessellatorsupports tessellation in the triangle and quad domains (Figure16). In both casesthe Tessellatorcan generate triangles in clockwise or counter-clockwise order. The generatedtessellation patterns are symmetric along the edges and support fractional tessellation factors as in[Mor01]. However, as opposed to previous implementations, new triangles are not always insertedfrom the center of the tessellation domain, but follow a moreuniform pattern that’s based onthe least significant bit of the tessellation factor. That produces smoother transitions betweentessellation levels and minimize the aspect ratio of the triangles in the transition regions.

Fractional tessellation can sometimes result in sampling artifacts, because the sampling location isconstantly updated as the tessellation factors change. These sampling errors sometimes manifestitself as temporalswimmingartifacts. In order to avoid these problems theTessellatoralso supportsa power of two tessellation mode that is stationary, that is,it has the nice property of not movingvertices around as they are inserted or removed.

Finally, theDomain Shader(Figure17) takes the parametric coordinates of the vertices generatedby theTessellatorand the control points output by theHull Shaderand uses them to evaluate thesurface. TheDomain Shaderstage creates one thread for each generated vertex. These threadsare similar toVertex Shaderthreads; they evaluate the surface in parallel and cannot communicatewith each other.

To evaluate the surface in theDomain Shaderit’s necessary to use a surface representation that isamenable for direct evaluation. That is, given a parametriccoordinates, it should be possible toevaluate the position and normal of the surface at that location. In addition to position and normal,theDomain Shaderalso interpolates texture coordinates and can sample textures in order to apply

25

Page 26: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Figure 17: TheDomain Shader.

displacement maps.

Instead of directly using the parametric coordinates provided by theTessellator, it’s also possible toadjust them with perspective correction in theDomain Shaderto generate a more uniform triangledistribution in screen-space [MHAM08].

In addition to the surface evaluation, theDomain Shaderalso has other responsibilities that wouldtraditionally correspond to theVertex Shader. That includes projecting the vertex position toscreen, transforming normals and light vectors to the same space, computing view/eye vectors,etc.

Once the vertices have been transformed by theDomain Shader, the primitives generated by theTessellatorare optionally processed by theGeometry Shaderor directly sent to the triangle setupstage for rasterization.

The combination of programmable and fixed function stages ofthe tessellation pipeline provides apowerful surface rendering model that can be efficiently implemented in hardware. We think thispipeline provides developers with a great amount of flexibility, and will create a framework forinnovation and development of alternate evaluation algorithms and new surface representations.

2.1.2 Motivation

The rendering of the efficient substitutes outlined in Chapter1 can usually be decomposed into twostages. First, the input coarse mesh is converted to a set of low degree parametric patches. We refer

26

Page 27: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

to this stage asPatch Construction. Second, the positions and normals are evaluated at arbitrarylocations of the parametric domains. We call this stageSurface Evaluation.

Since each facet-to-patch conversion is independent of theothers, each input facet is converted toone (in some cases, few) patches in parallel duringPatch Construction. This stage maps naturallyto theHull Shaderwith a facet (and possibly its 1-ring) as an input patch primitive. The outputpatch of theHull Shaderis the converted parametric patch in the form of its control points. IntheHull Shader, every control point is computed as the weighted sum of the vertices in the patchprimitive. Since the control point computations are independent of each other, the control pointphase invokes multiple threads and computes one control point per thread. Some of the interiorcontrol points and all tessellation factors are determinedby multiple control points, so they arecomputed in Fork Phase and Join Phase. DuringSurface Evaluation, the position and normal ateach parametric domain can be evaluated in parallel. We mapSurface Evaluationto theDomainShaderand takes control points as well as the parametric domain generated by theTessellatorasinputs.

Next, we will discuss Direct3D 11 implementation in more details through two of the most popularefficient substitutes:PN TrianglesandApproximating Catmull-Clark subdivision surfaces.

2.2 Direct3D 11 Implementation of PN Triangles

PN Triangles[VPBM01] provide a simple tessellation scheme for triangular meshes. This inter-polation scheme replaces input flat triangles with triangular cubic Bezier patches and quadraticnormal variation. It can be easily integrated into Direct3D11 GPU pipeline. To implementPNTriangles, we first compute 10 geometry and 6 normal coefficients (or known as control points)for each input triangle. This stage is calledPatch Constructionand mapped to theHull Shader.Afterwards, the positions (and normals) are evaluated using geometry (and normal) coefficientsrespectively. This stage is mapped to theDomain Shader. The discussion of the Direct3D 11 im-plementation focuses on the data flow between two new shader stage and one fixed function unitfor tessellation (see Figure18).

The input patch of theHull Shaderis a patch that consists of three vertices where each vertexcontains its position (Pi) and normal (Ni). The Hull Shaderconverts a triangle into an outputpatch in the form of two Bezier patches that define the geometry and normal of the surface piece.While a single thread could be used, it’s more efficient to take advantage of the symmetry of theconstruction by using multiple threads to parallelize the computations. We divide the workload(see Figure19) based on the observation that from one edge pair(Pi, Ni) and(Pi+1, Ni+1) in a flattriangle, we can derive one vertex geometry (and normal) coefficient and two tangent geometry(and one tangent normal) coefficients. The coefficients computed according to each edge pair areindicated in ecliptic circle in Figure19. In this way, three threads are invoked inside theHull

27

Page 28: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Figure 18:Hardware Tessellation of PN Triangle on Direct3D 11 Pipeline.

Figure 19:Workload distribution among three threads in theHull Shader. Each thread computes three position control points(left) and two normal control points (right).

Shaderto compute one third of an output patch. Specifically, each thread computes three controlpoints for positions (i.e.b300, b201, bb102), and two normal control points(i.e.n200, n101). Thedetailed formulae of computingb andn are available in [VPBM01]. TheHLSLshader code in the

28

Page 29: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Hull Shaderis as follows:[domain("tri")][outputtopology("triangle_cw")][outputcontrolpoints(3)][partitioning("fractional_odd")][patchconstantfunc("HullShaderPatchConstant")]HS_CONTROL_POINT HullShaderControlPointPhase( InputPatch<HS_DATA_INPUT, 3> inputPatch,

uint tid : SV_OutputControlPointID, uint pid : SV_PrimitiveID){

int next = (1 << tid) & 3; // (tid + 1) % 3

float3 p1 = inputPatch[tid].position;float3 p2 = inputPatch[next].position;float3 n1 = inputPatch[tid].normal;float3 n2 = inputPatch[next].normal;

HS_CONTROL_POINT output;

// Position control pointsoutput.pos1 = (float[3])p1;output.pos2 = (float[3])(2 * p1 + p2 - dot(p2-p1, n1) * n1);output.pos3 = (float[3])(2 * p2 + p1 - dot(p1-p2, n2) * n2);

// Normal control pointsfloat3 v12 = 4 * dot(p2-p1, n1+n2) / dot(p2-p1, p2-p1);output.nor1 = n1;output.nor2 = n1 + n2 - v12 * (p2 - p1);

// Texture coordinatesoutput.tex = inputPatch[tid].texcoord;

return output;}

The center coefficientb111 is a function of all vertex and tangent coefficients. In orderto avoidredundant computations, we defer the derivation ofb111 to patch constant function stage whereall other control points have been derived. During this stage, we also determine the edge/interiortessellation factors according to the computed control points. As described in Section2.1.1, TheHull Shaderis composed of multiple phases. The control point phase is parallelized explicitly,while the other phases are parallelized automatically. Theuser only provides a serial function tocompute all the ”patch constant attributes”.

// Patch constant dataHS_PATCH_DATA HullShaderPatchConstant( OutputPatch<HS_CONTROL_POINT, 3> controlPoints ){

HS_PATCH_DATA patch = (HS_PATCH_DATA)0;

// Compute the edge tessellation factorsfor (int i = 0; i < 3; i++) {

HullShaderCalcTessFactor(patch, controlPoints, i);}

// Compute the interior tessellation factorpatch.inside = max(max(patch.edges[0], patch.edges[1]), patch.edges[2]);

29

Page 30: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

// Calculate the center control point.for (int i = 0; i < 3; i++) {

patch.center += (controlPoints[i].pos2 + controlPoints[i].pos3) * 0.5 - controlPoints[i].pos1;}

return patch;}

The Tessellatorunit takes the edge tessellation factors of the triangle as an input, and generatesa semi-uniform tessellation pattern. TheDomain Shaderthen takes the parametric coordinates ofthe generated vertices and the patch control points and attributes computed in theHull Shadertoevaluate both position and normal at each parametric domain.

[domain("triangle")]DS_DATA_OUTPUT DomainShaderPN(HS_PATCH_DATA patchData,

const OutputPatch<HS_CONTROL_POINT, 3> input, float3 uvw : SV_DomainLocation){

float u = uvw.x;float v = uvw.y;float w = uvw.z;

// Output position is a weighted combination of the 9 position control points and the center.float3 pos = input[0].pos1 * w*w*w + input[1].pos1 * u*u*u + input[2].pos1 * v*v*v +

input[0].pos2 * w*w*u + input[0].pos3 * w*u*u + input[1].pos2 * u*u*v +input[1].pos3 * u*v*v + input[2].pos2 * v*v*w + input[2].pos3 * v*w*w +patchData.center * u*v*w;

// Output normal is weighted combination the 6 normal control points.float3 nor = input[0].nor1 * w*w + input[1].nor1 * u*u + input[2].nor1 * v*v +

input[0].nor2 * w*u + input[1].nor2 * u*v + input[2].nor2 * v*w;

// Project position to screen, transform normal, and interpolate texture coordinates.DS_DATA_OUTPUT output;output.position = mul(float4(pos,1), g_mViewProjection);output.normal = mul(float4(normalize(nor),1), g_mNormal).xyz;output.texCoord = input[0].tex * w + input[1].tex * u + input[2].tex * v;

return output;}

PN triangleshave been extended by [BRS05] to support creases (but not corners) by attachingscalar tags to the mesh vertices in order to control the way geometry and normals are interpolated.Among the three types of position control points, only tangent coefficients are modified by theseshape controllers. The artifacts along the silhouette of meshes can be improved by the methodproposed in [DRS08] with the spirit of only increasing the geometry complexitywhere needed.

30

Page 31: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

2.3 Approximating Catmull-Clark Subdivision Surfaces on Direct3D 11 Pipeline

2.3.1 Introduction

The Catmull-Clark subdivision algorithm [CC78] can model surfaces of arbitrary topological typeand has become part of standard modeling packages (e.g., 3DMax, Maya, Softimage, Mirai, Light-wave, etc.). Catmull-Clark subdivision surfaces are widely used as modeling primitives in com-puter generated motion pictures, particularly for modeling characters. This method begins with acoarse mesh that approximates a 3d model. Thiscoarse meshis referred asbase meshor controlmesh. The mesh is refined iteratively by inserting new vertices into the mesh, refining existingpoint positions, and updating the connectivity. Each refinement step produces a denser mesh thanthe previous one. The subdivision limit surface is the smooth surface produced from this pro-cess after an infinite number of refinements. Highly detailedsurfaces are generated by applyingdisplacement maps to the smooth subdivision limit surfaces.

The techniques of implementing exact evaluation of Catmull-Clark subdivision surfaces on modernGPUs fall roughly into three categories, each with its own set of advantages and disadvantages:

• Recursive evaluation [Bun05, DKT98, LMH00, SJP05] is perhaps the most intuitive ap-proach, since it most closely models the mathematical definition of subdivision. Unfortu-nately it is far from the most efficient method. Despite heroic efforts by researches such asShiue [SJP05], who implemented Catmull-Clarksubdivision in multiple pixel shader passesusing spiral-enumerated mesh fragments to maximize parallelism, recursive evaluation is nota particularly good fit for GPU hardware due to large bandwidth and storage requirements.Even if it were easier to parallelize, the recursive method is incompatible with hardwaretessellation. Recursive evaluations split one edge into two at each subdivision step. Thisrefinement produces sampling patterns which are compatibleonly with binary subdivision,not with future tessellation hardware.

• Direct evaluation [Sta98]. Stam’s algorithm is a better fit for programmable tessellation hard-ware since it directly evaluates subdivision surfaces at arbitrary parameter values. However,the performance is unsatisfying for two reasons. Firstly, this algorithm requires branching,which reduces SIMD efficiency. Secondly, the required projection of control points into theeigen space is too complex for large meshes on the GPU. Stam’sevaluation is over an orderof magnitude more expensive than evaluation proposed in [LS08a, NYM+08, MNP08].

• Precomputed basis functions [BS02]. One way to think about subdivision surface evaluationis as as a linear combination of basis functions with the vertices in the base mesh. The basisfunctions can be pretabulated at uniform samples since theyonly depend on the topologyof a quad and its 1-ring. The total number of topological configurations is possibly a largenumber. This approach has the advantage of being both more inherently parallel and far more

31

Page 32: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

cache friendly. However, it requires extensive preprocessing, and the properties of the inputmesh must be tightly controlled in order to keep the size of the lookup tables manageable.

Many [BS02, Bun05, SJP05, Sta98] of these approaches share an additional flaw: the inabilityto evaluate quads with multiple extraordinary vertices on the GPU. Getting rid of such multipleextraordinary quads requires at least one iteration of the Catmull-Clark subdivision on the CPUbefore the mesh is even seen by the GPU. This in turn means a four-fold expansion in the numberof vertices that must be sent to and stored by the graphics hardware.

Although subdivision surfaces are popular in Digital Content Creation (DCC) packages and featurefilms, their use has been hindered in real-time applicationssuch as games because the exact evalu-ation of Catmull-Clark subdivision surfaces on modern GPUsis neither memory nor performanceefficient. The state of the art in current games is to refine Catmull-Clark subdivision surfaces of-fline using DCC application such as Maya. The resulting densemeshes do not require furtherruntime refinement, but they demand significant bus bandwidth, and consume large amounts ofvideo memory. In addition, the offline refinement process requires the artist to choose a fixedlevel of detail (LOD) . At run time, this can result in objectsbeing drawn with more trianglesthan are actually needed. The lack of dynamic LOD support obviously lowers performance. Thecost of animating a fixed-LOD mesh is even worse because each vertex in the dense mesh needsto be updated independently during animation/deformation. The computations are performed at ahigh frequency. If subdivision were deferred until after control mesh animation on the GPU, thecomputational overhead would be greatly reduced.

The difficulty of implementing exact evaluation of Catmull-Clark surfaces efficiently on the GPUled to interest in efficient substitutes for subdivision surfaces. With the advent of Direct3D 11,a short explicit surface definition is desired over recursively defined Catmull-Clark subdivisionsurfaces for hardware tessellation. The alternative surface representations [LS08a, NYM+08,MNP08] have been recently advocated to be better suited for the highly parallel SIMD natureof modern GPU hardware, and for non-uniform, adaptive hardware tessellation. The idea is toreplace the infinite collection of bi-cubic patches with a single parametric patch to simplify thesurface evaluation. The replacement patches produce smooth surface and closely mimic the shapeof the Catmull-Clarksubdivision surface. Those surfaces can be entirely constructed and evaluatedby local parallel operations on the GPU in real-time.

2.3.2 Patch Construction

In Patch Constructionstage, each facet with its 1-ring (Figure20 ) in a base mesh is convertedto parametric patch(s) in parallel. The conversion requires the computation of the control pointsthat define a parametric patch. We distinguish two types of facets: regular and extraordinary.A regular facet is a quad where each vertex has 4 neighbors andis only adjacent to quads. A

32

Page 33: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

facet which is not a regular facet is called an extraordinaryfacet. It is well known that a reg-ular facet can be converted to a Bicubic patch using standardB-Spline to Bezier conversionrules[Far97]. Therefore, any two adjacent patches derived from regularfacets will join C2 andreproduce exact Catmull-Clark surfaces. These regular facets should be separated from extraor-dinary facets and use the simpler and optimized way to convert and evaluate them. The chal-lenging part is how to convert the extraordinary facets to parametric patches. Generally speaking,a desirable conversion scheme should ensure at leastG1 continuity across adjacent patches, andshould closely approximate the equivalent subdivision surface. Of course, cost of evaluation isalso an important factor. The overall cost of such schemes isusually influenced by the num-ber of control points per patch, the degree of the patch, and the number of patches. The recentpublications[LS08a, NYM+08, MNP08] have made important contributions on improving shapequality and lowering evaluation cost. The key ideas of theseschemes are summarized in the fol-lowing. We encourage readers to refer to the original papersfor the more detailed conversion rules.

Figure 20: A facet with its1-ring.

The patch construction maps to theHull Shaderbased on the observa-tion that each control point is a linear combination of the vertices in afacet with its 1-ring and the weights contributed from the vertices arealways the same for all the patches sharing the same connectivity. Withpatch being entirely constructed in theHull Shader, theVertex Shaderis freed to perform animation/deformation in the same rendering pass.In this approach, we sort all patches based on their connectivity typeand then for each connectivity type, we pre-compute the weight of eachvertex. The weights are stored in a stencil texture. The HLSLcode isshown below.

ACC_CONTROL_POINT SubDToParametricPatchHS( InputPatch<CONTROL_POINT_OUTPUT, M> p,uint tid : SV_OutputControlPointID,uint pid : SV_PrimitiveID )

{ACC_CONTROL_POINT output;// the connectivity type IDint topo = topologyIndex[pid];// The global ordering of the vertexint num = vertexCount[pid];// invoke each thread to compute a control pointControlPointsEvaluation(pid, tid, topo, num);

return output;}

Given a list of the vertices in the 1-ring, we fetch the necessary setof weights from the stencil texture using the connectivity id, control point id, and input patch id.Each control point is computed independently per thread. Toguarantee consistent evaluation of theweighted sum of input vertices, a global ordering of the vertex is also required. This approach mapswell to the SIMD nature of Direct3D GPU pipeline. To optimizethe overall performance, we could

33

Page 34: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

reduce the number of texture fetches since only a subset of the vertices in the 1-ring actually involvein the patch construction. A boolean stencil mask is precomputed to indicate which vertex has zerocontribution and therefore to avoid the corresponding texture fetches. The size of stencil texturerelates to the complexity of connectivity, and the connectivity can be simplified by restricting themaximum valence as well as the number of triangles in the basemesh. Three major ApproximatingCatmull-Clark Subdivision schemes are summarized below:

Figure 21: A geometry patch (top) and a pair of tangent patches (bottom).

ACC patches [LS08a] This scheme has been implemented in Valve’s source engine and ILM’sTool chain (more details are available in Chapter 3 and Chapter 4). Each extraordinary quad (oneconfiguration in Figure20) is converted to ageometry patchand a pair oftangent patches(seeFigure21). The geometry patch is a simple bicubic Bezier patch. The tangent patches deriveddirectly from it can not produce well-defined normals along the edge emitting from any vertexthat does not have 4 neighbors. In order to achieve smooth shading in these areas, both the cornervectors (u00, u03, u20, u23, v00, v30, v02, v32) and tangent vectors (u01, u02, u21, u22, v10, v20, v12,v22) need to be adjusted to satisfy smoothness constraints. Specifically, the corner vectors aremodified to satisfy the vertex enclosure at verticespi, i = 0..3, and interpolate the normal of theCatmull-Clark limit surface. The tangent vectors are derived from solvingG1 continuity betweentwo adjacent patches.

The total control points are 40 (16 from the geometry patch and 24 from the tangent patches).The more control points to be derived, the more expensive computation in theHull Shader. Since

34

Page 35: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

many of them are redundant, we can reduce the number of control points to 32. This is because theinterior vectors (u10, u11, u12, u13, v01, v11, v21, v31) do not need to be modified and they are deriveddirectly from the geometry patch, they are eliminated from output control points. For furthercontrol points reduction, Loop and Shafer suggested to onlycomputeb00, b30, b03, b33 in Figure21top, 8 corner vectors in Figure21 bottom, and 12 verticespi, i = 0..11 from the facet and its 1-ring (Figure20). The remaining 12 geometry control points and 16 tangent control points are justthe function of these output control points. The fewer control points output from theHull Shaderhowever increases the workload in theDomain Shaderfor computing the remaining patch controlpoints. Choosing how many control points to be computed in the Hull Shaderdepends on whichshader (theHull Shaderor the domain) is more likely to become the performance bottleneck. TheControlPointsEvaluation function using 32 output points in HLSL for this scheme is the following:

float3 output.pos= float3(0,0,0);float3 output.tan= float3(0,0,0);for (int i=0; i< num; i++){

//use index global ordering for//consistent control points evaluationint idx = stencilIndex[pid][i];//fetch the weight of the tid_th control point//in the patch for position control pointint index = topo * MAX_CONTROL_POINTS + idx) * 32 + tid;output.pos += p[i] * gStencil.Load(int3((index, 0,0));//fetch the weight of the tid_th control point//in the patch for tangent control pointoutput.tan += p[i] * gStencil.Load(int3((index + 16, 0,0));

}

Figure 22:The labeling used for one sector of a c-patch is shown here.

C-patches [NYM +08] Instead of using separate channels for geometry and normal,each extraor-dinary quad could be converted to a singlec-patch. A c-patch is aC1 piecewise polynomial patchwith cubic boundary and defined by 24 control points. The surface generated using c-patches has

35

Page 36: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August
Page 37: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August
Page 38: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August
Page 39: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August
Page 40: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August
Page 41: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August
Page 42: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August
Page 47: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August
Page 49: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August
Page 50: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August
Page 51: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August
Page 52: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August
Page 53: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August
Page 54: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August
Page 56: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

For more information about parametrization methods in general, [FH05] and [SPR06] provide anoverview of most parametrization methods available to date.

Figure 35: Texture seams cause holes in the mesh.

Water-tightness and Texture Mapping Precision Modern hardware uses a floating point represen-tation to interpolate texture coordinates. That can cause problems, because floating point valueshave more precision closer to the origin than farther from it[Goldberg91]. As a result, interpola-tion of texture coordinates along an edge closer to the origin will produce a different set of samplesthan interpolation along an edge that is farther from it. This is exactly what happens on textureseams and will result in small cracks in the mesh even when using a seamless parametrization.When using programmable tessellation hardware as specifiedby Direct3D 11, interpolation is per-formed explicitly in theDomain Shader(or in theVertex Shaderwhen using instanced tessellationon older GPUs). That’s what enables the use of higher order interpolation, but it also allows theuse of fixed point instead of floating point for interpolation. However, fixed point interpolationalone does not solve all the problems. Another problem is that bilinear interpolation of texturesamples is not symmetric. Sampling atλ between two adjacent texels does not produce the sameresult as sampling at 1-λ when the values of the texels are reversed. This is also true for nearestfiltering, because the result of sampling at 0.5 is undefined.For this reason, none of the seamlesstexture mapping algorithms solve the water-tightness problem entirely. So, it’s necessary to useother methods.

Zippering Method A different approach is to introduce a triangle strip connecting the verticesalong the seam. These strips can be generated with the same tessellation unit used to generate thepatches, by setting the tessellation level equal to 1 in one of the parametric directions. This solvesthe problem nicely, but requires rendering more patches, and introduces additional triangles that inmost cases are nearly degenerate.

Another interesting solution is the zippering method proposed in [Sander03]. The idea is to samplethe displacement (or the geometry image) on both sides of theseam and to use the average of thetwo samples. The main problem of this approach is that it requires two texture samples along

56

Page 57: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Figure 36: Seamless parameterizations remove bilinear artifacts, but do not solve floating point precisionand bilinear filtering issues.

the seams, which means you have to take two samples in all cases, or use branching to take anextra sample on the seam vertices only. However, the averaging method does not work for corners.Along the edges there are only two possible displacement values, one for each side of the seam,but on corners there are more than two. Storing an arbitrary number of texture coordinates, andtaking an arbitrary number of texture samples would be too expensive. A simple solution is to snapthe corner texture coordinates to the nearest texel, and make sure that the displacement value forthat vertex is the same for all patches that meet at that corner.

A cheaper solution that only requires a single texture sample and handles corners more gracefullyis to define patch ownership of the seams [Cas08b, Cas08a]. By designating the patch owner forevery edge and corner, all patches can agree what texture coordinate to use when sampling thedisplacement at those locations. That means that for every edge and for every corner we need tostore the texture coordinates of the owner of those features. That is a total of 4 texture coordinatesper vertex, (16 for quads and 12 for triangles). At runtime, only a single texture sample is needed;the corresponding texture coordinate can be selected with asimple calculation:

// Compute texture coordinate indices (0: interior, 1,2: edges, 3: corner)

int idx0 = 2 * (uv.x == 1) + (uv.y == 1);int idx1 = 2 * (uv.y == 1) + (uv.x == 0);int idx2 = 2 * (uv.x == 0) + (uv.y == 0);int idx3 = 2 * (uv.y == 0) + (uv.x == 1);

// Barycentric interpolation of texture coordinatesfloat2 tc = bar.x * texCoord[0][idx0] +

bar.y * texCoord[1][idx1] +bar.z * texCoord[2][idx2] +bar.w * texCoord[3][idx3];

In the averaging method we would have to store the texture coordinate of every patch that con-tributes to a shared feature. Edges are shared by only two patches, but corners can be shared bymany patches. By defining the ownership of the shared features (corners and edges), we only have

57

Page 58: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

to store the texture coordinates of the patch that owns the corresponding feature. So, we have:

• 4 texture coordinates for the interior (4).

• 2 texture coordinates for each edge (8).

• 1 texture coordinate for each corner (4).

Therefore, the total number of texture coordinates per patch is: 4 + 8 + 4 = 16. Deciding whatpatch owns a certain edge or corner is done as a pre-process, so that the patch texture coordinatescan be computed in advance. The way we store these texture coordinates is shown in Figure37.

Figure 37: The texture coordinates at 4 corner vertices.

Each vertex has:

• one interior texture coordinate. (index 0)

• one edge texture coordinate for each of the edges. (index 1 and 2)

• one corner texture coordinate. (index 3)

On the interior, we interpolate the interior texture coordinates bi-linearly:float2 tc = bar.x * texCoord[0][0] +

bar.y * texCoord[1][0] +bar.z * texCoord[2][0] +bar.w * texCoord[3][0];

where bar stands for the barycentric coordinates:bar.x = ( uv.x) * ( uv.y);bar.y = (1 - uv.x) * ( uv.y);bar.z = (1 - uv.x) * (1 - uv.y);

58

Page 59: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

bar.w = ( uv.x) * (1 - uv.y);

On the edges we interpolate the edge texture coordinates linearly:if (uv.y == 1) tc = texCoord[0][1] * bar.x + texCoord[1][2] * bar.y;if (uv.y == 0) tc = texCoord[2][1] * bar.z + texCoord[3][2] * bar.w;if (uv.x == 1) tc = texCoord[3][1] * bar.w + texCoord[0][2] * bar.x;if (uv.x == 0) tc = texCoord[1][1] * bar.y + texCoord[2][2] * bar.z;

And at the corners we simply select the appropriate corner texture coordinate:if (bar.x == 1) tc = texCoord[0][3];if (bar.y == 1) tc = texCoord[1][3];if (bar.z == 1) tc = texCoord[2][3];if (bar.w == 1) tc = texCoord[3][3];

The same thing can be done more efficiently using a single bilinear interpolation preceded by somepredicated assignments:

// Interiorfloat2 t0 = texCoord[0][0];float2 t1 = texCoord[1][0];float2 t2 = texCoord[2][0];float2 t3 = texCoord[3][0];

// Edgesif (uv.y == 1) { t0 = texCoord[0][1]; t1 = texCoord[1][2]; }if (uv.y == 0) { t2 = texCoord[2][1]; t3 = texCoord[3][2]; }if (uv.x == 1) { t3 = texCoord[3][1]; t0 = texCoord[0][2]; }if (uv.x == 0) { t1 = texCoord[1][1]; t2 = texCoord[2][2]; }

// Cornersif (bar.x == 1) t0 = texCoord[0][3];if (bar.y == 1) t1 = texCoord[1][3];if (bar.z == 1) t2 = texCoord[2][3];if (bar.w == 1) t3 = texCoord[3][3];

float2 tc = bar.x * t0 + bar.y * t1 + bar.z * t2 + bar.w * t3;

And finally, the predicated assignments can be simplified andreplaced by an index calculation:// Compute texture coordinate indices (0: interior, 1,2: edges, 3: corner)int idx0 = 2 * (uv.x == 1) + (uv.y == 1);int idx1 = 2 * (uv.y == 1) + (uv.x == 0);int idx2 = 2 * (uv.x == 0) + (uv.y == 0);int idx3 = 2 * (uv.y == 0) + (uv.x == 1);

float2 tc = bar.x * texCoord[0][idx0] +bar.y * texCoord[1][idx1] +bar.z * texCoord[2][idx2] +bar.w * texCoord[3][idx3];

The same idea also applies to triangles:// Interiorfloat2 t0 = texCoord[0][0];float2 t1 = texCoord[1][0];float2 t2 = texCoord[2][0];

59

Page 60: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

// Edgesif (bar.x == 0) { t1 = texCoord[1][1]; t2 = texCoord[2][2]; }if (bar.y == 0) { t2 = texCoord[2][1]; t0 = texCoord[0][2]; }if (bar.z == 0) { t0 = texCoord[0][1]; t1 = texCoord[1][2]; }

// Cornersif (bar.x == 1) t0 = texCoord[0][3];if (bar.y == 1) t1 = texCoord[1][3];if (bar.z == 1) t2 = texCoord[2][3];

float2 tc = bar.x * t0 + bar.y * t1 + bar.z * t2;

And the resulting code can be optimized the same way:int idx0 = 2 * (bar.z == 0) + (bar.y == 0);int idx1 = 2 * (bar.x == 0) + (bar.z == 0);int idx2 = 2 * (bar.y == 0) + (bar.x == 0);

float2 tc = bar.x * texCoord[0][idx0] +bar.y * texCoord[1][idx1] +bar.z * texCoord[2][idx2];

Partition of Unity The zippering methods produce watertight results independently of the param-eterizations. However, if the parameterizations is not seamless or if the features of the displacementmap are different on each side of the seam, then that will result in sharp discontinuities, not holes,but undesirable creases along the seams. These problems canbe avoided using a seamless parame-terizations and generating the displacement maps making sure that the displacements match alongthe seams. However, another solution is to use a partition ofunity as proposed by [PB00]. A parti-tion unity is a method to combine multiple texture parameterizations to produce smooth transitionsbetween them. The idea is to define transition regions aroundthe seams, so that on those regionsboth parameterizations are used to sample the texture and the results are blended smoothly. Thezippering methods described before are just a special case of a partition of unity in which the blendfunction is just the unit step function.

Conclusion There are many different solutions to achieve watertightness when sampling of dis-placement maps. We advocate the use of zippering methods, since they do not impose any restric-tion on the parameterizations of the mesh and work with arbitrary displacement maps. They areeasy to implement and do not add much overhead to the shaders,even though they increase thenumber of texture coordinates. Note that even when using zippering methods to guarantee wa-tertightness, the use of seamless (or nearly seamless) parameterizations is still valuable, becausethey eliminate any visible crease or discontinuity along the seams. These artifacts can also beavoided by combining multiple parameterizations using a partition of unity, but these methods aretoo expensive to be practical.

60

Page 61: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

3 Approximate Subdivision Surfaces in Valve’s Source Engine

Jason Mitchell, Valve

Figure 38: Character from the gameTeam Fortress 2modeled as a Catmull-Clark subdivision surface andrendered with Valve’s subdivision surface approximation (based on ACC patches). The character and hisweapon contain sharp features which require crease supportto render correctly. In the second image, theblack lines indicate patch edges with tagged crease edges highlighted in green.

3.1 Introduction

At Valve, we have invested early in GPU-friendly approximations to displaced Catmull-Clark sub-division surfaces with the expectation that this will accelerate our ability to exploit hardware tessel-lation as it becomes available in Direct3D 11 hardware [Gee08]. Our software architecture followsthe Direct3D 11 pipeline architecture in order to ease the eventual migration to hardware. As aresult, we have been able to address implementation detailsspecific to mapping Loop and Schae-fer’s Approximate Catmull-Clark (ACC) [LS08a] scheme to Direct3D 11 such as those discussedin Section2.3 and have even extended ACC to include support for hard creases [KMDZ09]. Inthis chapter, we will describe our system, including the run-time characteristics of two differentimplementations, extensions necessary to support displacement mapping and implications for ourauthoring pipeline.

3.2 Motivation

Higher-order surfaces have long been standard in the film industry due to their ability to compactlyrepresent high quality smooth surfaces. In the real-time space, we anticipate widespread adoption

61

Page 62: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

of higher-order surface schemes as GPU compute density continues to outstrip memory and mem-ory bandwidth, particularly when it comes to console designs which tend to be especially memoryconstrained. In addition to the pure compute-related motivations for moving tessellation and sur-face evaluation onto the GPU, higher-order surfaces have a number of other desirable properties.Higher-order surfaces have a natural LOD mechanism, as theycan be arbitrarily tessellated to tradeoff quality and performance. We are interested in the ability to author assets which allow us to scalebothupanddown. That is, we want to build a model once and be able to scale itup to film qualityusing tessellation and displacement mapping for use in offline-rendered movies as well as futurehardware. Conversely, we want to be able to naturally scale the quality of an assetdownto meet theneeds of real-time rendering on a given system. We expect that such models can be tailored to dove-tail with traditional polygon rendering both at runtime andin the art pipeline, which is essentialwhen assets must be reused across hardware with varying performance levels and feature support(including no higher-order surface support at all). Another nice advantage of higher order surfacesis that their use is orthogonal to most shading techniques, including many popular rendering trendssuch as screen space techniques and deferred rendering, which are agnostic to the upstream geom-etry representation that populates their image-space inputs[ST90][DWS+88][Mit07][Val07]. Forthese reasons, we have advocated the implementation of tessellation hardware and have chosen totake on the risk of investing in this technology prior to hardware availability.

3.3 Software Pipeline

H ull S haderH u ll S hader

V ertex S haderV ertex S hader

D om ain S hader im p lem ented

by hardware V ertex S hader

P ixe l S haderP ixe l S hader

H u ll S haderH u ll S hader

V ertex S haderV ertex S hader

N ative Tesse lla tor

P ixe l S haderP ixe l S hader

Instanced Tesse lla tion N ative Tesse lla tion

T esse lla to r em u la ted

w ith instanced pa tches

D om ain S hader im p lem ented

by hardware V ertex S hader

Figure 39: DX11 pipeline using instanced& native tessellation on DX9

We have mapped the Direct3D 11 pipeline onto Di-rect3D 9, includinginstancedandnative tessellationcodepaths, where the vertex shader and hull shaderare implemented in software on the CPU and the re-maining stages are executed on the GPU as shown inFigure39. The CPU-side vertex shading operationsinclude skinning, vertex morphing and other opera-tions which are appropriate to perform at the controlmesh level. Post-transform control mesh vertices arethen sent to a threaded and SIMD-optimized softwarehull shader where they are mapped to a set of Bezierpatches using our technique. This data is copied asyn-chronously to a floating point texture map in GPUmemory for subsequent processing by the GPU.

With the control points sitting in a floating point tex-ture, domain points are instantiated with appropriatemesh connectivity on the GPU using either hardware

62

Page 63: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

instancing or ATI’s native hardware tessellator. Af-ter this on-chip data amplification, the GPU’s vertexshader—playing the role ofdomain shader—evaluates Bezier patch positions and tangent framesat the newly-generated vertices using Bezier control points fetched from the floating point texturegenerated by the CPU-side hull shader. In our case, the Catmull-Clark to ACC conversion wasdone on the CPU, though it can be performed on the GPU as is doneby the SubD10 sample in theDirectX SDK as well as by [NYM+08].

We can then instantiate domain points using eitherinstanced tessellationor native tessellation.As discussed in Section2.4, instanced tessellation is available in shader model 3.0 hardware withvertex texture fetch capabilities such as NVIDIA GeForce 8x00 and ATI RADEON HD 2x00 andnewer GPUs. Native tessellation is available in the XBox 360as well as ATI’s Direct3D 10 classGPUs [Lee06] [Tat07].

3.3.1 Native Tessellation

ATI’s hardware tessellator instantiates the vertex shaderat u, v points in the[0..1]2 domain andprovides the shader with access to all of the “super-primitive” data from the input vertices [Lee06][Tat07]. In ATI’s terminology, “super-primitive” refers to the notion that the shader has access tothe attributes of the vertices defining thewholeprimitive—in our case, all four vertices definingeach quad. This is in contrast to a traditional vertex shaderwhich only has access to the attributesof a single vertex, with no primitive-level information. The ATI model behaves much like a limitedform of domain shader in that the shader has access to primitive-level data, albeit much less datathan is made available to a Direct3D 11 hardware domain shader. In our implementation of nativetessellation the domain/vertex shader uses this super-primitive data and fetched Bezier patch datato evaluate patch attributes.

The remainder of the graphics pipeline is unchanged, so an implementor need only alter exist-ing vertex shaders and vertex buffer layouts to take advantage of tessellation. In Valve’s Sourceengine, minimal changes were necessary to add this functionality to existing production-tested ver-tex shaders, though we did run into some limitations of the Direct3D 9 vertex shader programmingmodel which we will discuss below.

3.3.2 Performance

In our architecture, it has been fairly straightforward to maintain both instanced and native tessel-lation codepaths so that we can measure tradeoffs of the two approaches. For example, as we willdiscuss below, we have measured performance advantages in the instancing path. Despite this, it is

63

Page 64: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

convenient to maintain the native tessellation method, both as a means of cross-checking algorith-mic details as well as to gain access to the additional features provided by native tessellation suchas easy access to floating point tessellation and separate per-edge LOD functionality necessary toimplement adaptive subdivision.

CPU Performance As discussed in section3.3, we perform vertex and hull shading on the CPUusing a software architecture modeled on the Direct3D 11 pipeline. In our measurements, the pri-mary bottleneck in the CPU hull shader’s conversion from Catmull-Clark to Bezier patches usingACC is the computation of the tangent patches. As mentioned in Section2.3.2, quad meshes aretypically comprised of a mix of regular and extraordinary patches, each with different properties.Because of the nature of ACC, we can avoid the significant performance cost of computing tangentpatches for the regular patches of the mesh since they are notrequired. Hence, the overall perfor-mance is dependent on the mix of valences in the mesh being converted, where a mesh consistingof only regular patches could see as much as a twofold performance increase over a wholly extraor-dinary mesh. Additionally, we have vectorized the conversion math using CPU SIMD operations,lookup tables and loop unrolling, resulting in roughly a2x speed improvement relative to our orig-inal CPU implementation. Further, since hull shader invocations are independent of one anotherin the Direct3D 11 pipeline architecture, it is natural to split this computation across CPU multiplecores. Threading the hull shader invocations resulted in anadditional3.58xperformance improve-ment on 4 cores for meshes between 1000 and 10,000 patches. Naturally, these CPU speedups areindependent of the chosen GPU data amplification method (instanced or native tessellation).

GPU Performance On the GPU side of the bus, we can compare the performance of instancedand native tessellation. In Table4, we compare instanced and native tessellation performanceofthe datasets shown in Figure40 using the ATI RADEON 4870 X2, which is capable of runningboth codepaths.

Native Tessellation Instanced Tessellation

Mesh N=3 N=9 N=15 N=3 N=9 N=15

Car 1344 1296 589 1550 1301 846

Ship 1245 326 137 1196 473 222

Poly 747 160 65 532 304 132

Table 4:Performance Comparisons- The Car, Ship and Poly models contain 1164, 5180 and 10618 quadfaces. Performance numbers are in frames per second, measured on an Intel Quad Core Q9450 2.66GHzand ATI RADEON 4870 X2.N = number of tessellated vertices per control mesh edge.

In our tests, we have often seen the instanced path perform twice as fast as the native path. As you

64

Page 65: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Figure 40: Car, Rocket Frog, Ship and Poly models used in performance measurements. The models contain1164, 1292, 5180 and 10618 quad faces respectively

would expect, both techniques perform the same number of texture operations. Due to differencesin the hardware interfaces, however, the native tessellation shader uses roughly 16% more instruc-tions than the instanced patch shader. This minor difference does not explain the2x performancedelta, leading us to conclude that the performance difference is not entirely related to shader length,but rather, is related to deeper hardware implementation details.

Graphics Pipeline State Our measurements indicate that ATI’s native tessellation path seems tobe more impacted by the rest of the pipeline state, notably the complexity of the pixel shader andthe number of interpolators output from the vertex shading unit to triangle setup. The numbersin Table4 were generated with a vertex shader which outputs two interpolators to a trivial pixelshader. If we output five interpolators to a 22 instruction pixel shader, we measure a 1.8x to 2x per-formance hit when using native tessellation. The instancedtessellation path sees no performancepenalty with the same change. Given that our production shaders frequently max out the numberof vertex shader outputs (up to 10 4D vectors in Direct3D 9), the instanced tessellation path has asignificant performance advantage in practice.

Regular vs Extraordinary Patches Through our own tests and using hardware analysis tools suchas NVPerfHUD, we have determined that both the instanced andnative hardware tessellationshaders are vertex texture fetch bound. Each invocation of the domain shader performs 30 fetchesof packed control point data (12 for the control points, and 9for each of the two tangent patches).For regular patches (with all vertices of valence 4), we can avoid fetching the tangent patch controlpoints and use the de Casteljau algorithm (described in Section 2.3.3) to compute both positionsand normals. This saves 18 texture fetches for these patchesat the expense of drawing regular andextraordinary patches with two API calls rather than one. Inthis case, we measured a 20% (1.9ms)performance improvement in GPU evaluation cost at N = 33 for the rocket frog mesh by splittingevaluation of regular and extraordinary patches. In addition, we avoid calculating tangent patchesfor regular patches when converting from Catmull-Clark to ACC in the (CPU) hull shader. In our

65

Page 66: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

implementation, this saves an additional 0.26 ms of CPU timeon the rocket frog.

In practice, care should be taken with small meshes (<2000 patches) with few regular patches.Separating regular and extraordinary vertices requires two API calls—as opposed to just one—andthe CPU-side overhead of this extra API call can outweigh thesavings gained by reducing theshader load for regular patches. Additionally, we have found it advantageous (and in some casesnecessary) to keep the use of vertex shader general purpose registers to a minimum, particularlywhen combining patch evaluation with some of the more advanced vertex shaders that we haveused in recent games such asTeam Fortress 2, Portal, Left 4 Deadand theHalf-Life 2 series. Toreduce the number of GPRs, we reorganized the shader code to split the loading and evaluationof the geometry patch and the loading and evaluation of the tangent patches, allowing GPRs tobe reused between position evaluation and tangent evaluation. This made the implementationsomewhat awkward, but the Direct3D 9 vertex shader programming model simply exhausted itsgeneral purpose register bank without such shader massaging.

Staying “On Chip” It is worth noting that the Direct3D 11 hardware pipeline is designed to elim-inate the need for the domain shader to fetch control point data from memory. That is, once allstages of this approximate Catmull-Clark rendering methodhave migrated to Direct3D 11 GPUs,the transmission of control point data from the hull shader to the domain shader will not require thecontrol points to ever reside in off-chip memory. The control points will only exist fleetingly “onchip,” computed at patch granularity by the hull shader and used for evaluation at post-tessellatedvertex granularity by the domain shader. It is the hope of API, hardware and game designers thatthis drastic reduction in memory traffic will greatly increase the GPU performance of any higher-order surface scheme which is executed on Direct3D 11 hardware. It will be exciting to see howthis plays out over the coming years.

3.4 Creases

Though Microsoft and its hardware partners had Loop and Schaefer’s ACC technique in mindwhen developing the Direct3D 11 pipeline, each new stage hasremained programmable so thatdevelopers can customize the functionality to suit their needs in a variety of ways. At Valve,the programmability of the Direct3D 11 architecture has allowed us to extended ACC to supporthard creases [KMDZ09]. This additional functionality has no measurable impact on performanceand, in fact, the tessellator and domain shader are no different than they are in the original ACCtechnique.

While subdivision surfaces allow us to compactly representsmooth geometry, we sometimes wishto incorporate hard creases into our art. On the left side of Figure 41, we see an example of acar modeled with Catmull-Clark subdivision surfaces and rendered with ACC. While this yields

66

Page 67: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

a smooth, high quality result with no visible polygonal artifacts, many areas of the car are overlysmooth and do not capture the shape that the artist intended to convey. On the right side of Fig-ure41, we see the same car with numerous edges tagged as hard creases by the artist and renderedwith our system to better convey the intended shape. For example, when rendered with regularACC, the front bumper turns into a cylindrical shape with tapered ends, but this shape is renderedas a square extruded along a smooth B-Spline path using our system. The artist has even mod-eled two prominent “pinches” (darts) in the hood of the car bytagging edges which happen to beisolated and do not form part of a corner or creased edge loop.

Figure 41: Left: A car model rendered with ACC.Right: The same model with creases and corners addedusing our method.

In Figure42, we see a closeup of the dashboard of the car, rendered with ACC and with our method.You can easily see a variety of areas where the artist used theexpressive power of hard creases toconvey the desired shape of this mechanical object. Even fornon-mechanical models such as theHeavy Weapons Guy fromTeam Fortress 2shown in Figures38 and43, hard creases are usefulfor modeling hard edges at areas such as the character’s clothing and even his fingernails.

3.5 Displacement Mapping

In addition to approximating the Catmull-Clark limit surface, it is possible to compactly representhigh frequency detail by displacing the vertices from the approximate limit surface [Coo84] [LMH00].We have written an extractor which processes the Catmull-Clark control mesh and a separate high-resolution detail mesh to generate a scalar displacement map relative to our approximation to theCatmull-Clark subdivision surface [COM98]. Each invocation of the domain shader performs 30fetches of packed control point data and the inclusion of an additional data fetch to access our dis-placement map has negligible incremental performance impact. Likewise, the few additional ALUoperations necessary to displace the vertex from the approximate limit surface are insignificant.

67

Page 68: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Figure 42: Left: The dashboard of our car model rendered with ACC.Right: The same model with creasesand corners added using our method.

Figure 43: Closeups of Heavy Weapons Guy’s hands using creases to concisely model details such asfingernails and the edges of the gloves

The barriers to the adoption of displacement mapping are theadditional memory burden of storingthe displacement maps and the tool investment necessary to integrate displacement mapping intothe art pipeline. There are available commercial tools suchas ZBrush, MudBox and others whichcan output heightfield textures suitable for use as displacement maps. In these tools, however, thecomputation of such height maps is performed relative to theCatmull-Clark limit surface of theunderlying control mesh. The Catmull-Clark limit surface is not what we are rendering, however.We are rendering an approximation made up of bicubic patchesusing a separate normal field whichmakes up for the fact that the geometry patches are not necessarily C1 at patch boundaries. As aresult, we have written our own displacement map baker whichuses the creased ACC geometryand normal fields in the baking process to ensure that the displacement maps are computed ren-dered to approximate limit surface. In Figure44, we see a Vortigaunt character from the gameHalf-Life 2 rendered as an approximate Catmull-Clark subdivision surface. In the row of images,we see the smooth approximate limit surface shaded with a simple Phong shader, using a normal

68

Page 69: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

map to provide some detail in the lighting. In the bottom row of images, displacement mappinghas been applied to the character to add surface detail.

Figure 44:Top row: A Vortigaunt fromHalf-Life 2 rendered as an approximate Catmull-Clark subdivisionsurface.Bottom Row: The Vortigaunt rendered with displacement mapping.

Future-proof Assets As you can see in Figure44, we tend to use displacement maps to captureso-called meso-structure details rather than large-scaleobject structures such as appendages. Wefeel that this is appropriate because these details can safely be LOD’d away or even omitted as partof the scalability required of interactive games which mustship into a marketplace with widelyvarying GPU capabilities and performance characteristics. So, while players of futureHalf-Lifegames may initially only see the displacement mapped Vortigaunt in non-interactive movies or onvery high-end hardware, we anticipate being able to phase inthe displacement mapped assets ashardware improveswithout having to go back and rebuild the character again.

69

Page 70: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

3.5.1 Wrinkle Mapping

For a number of years, we have been employing a technique we call wrinkle mappingto the normaland base color textures of our characters’ faces in order to give the impression of complex surfacedeformation. During facial animation, an additional scalar channel (thewrinkle weight) is accu-mulated per vertex along with our geometric morph target deformations. The pixel shader uses thisinterpolated wrinkle weight to perform a simple blend between three normal and color textures:theneutral, compressandstretchmaps, as described previously by [Dia08]. As you would expect,wrinkle mapping naturally combines with displacement mapping with very little shader modifi-cation. In Figure45, we can see the neutral, compress and stretch poses for the Heavy WeaponsGuy from Team Fortress 2, including displacements which add fine-grained dynamic geometrydeformations at very low cost.

Figure 45: Wrinkle displacement maps for neutral, compressand stretch poses on the Heavy Weapons Guy

3.6 Moving from Polygons to Subdivision Surfaces

Besides improvements in surface smoothness, the move from polygonal to subdivision surfacemodels has additional advantages for game developers, including the improvement in both tangentframe quality and skin weight management.

70

Page 71: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

3.6.1 Quality Tangent Frames

One fact that non-game-developers are often surprised to learn is that games generally have inaccu-rate tangent frames due to the nature of the current GPU pipeline. Typically, games must skin theirprecomputed tangent frames and, even worse, accumulate morph offsets to their tangent frames.Traditionally, this has resulted in awkward lighting artifacts and the need to generally constrain therange of possible character performance, particularly in the case of facial animation. Due to theway that the new Direct3D 11 pipeline stages integrate into the graphics pipelineafter vertex ani-mation operations such as skinning and morphing, surface normals can now be computed relativeto animated control meshes, improving shading quality. An example of a high quality normal fieldis shown in Figure46, which illustrates per-pixel normals of the Heavy Weapons Guy from TeamFortress 2.

Figure 46: High quality normals

3.6.2 Manageability

At Valve, the modelers who have begun the switch from polygonal modeling to subdivision surfacemodeling have experienced an increase in both productivityand surface control. This is primarilydue to the low number of control mesh vertices that must be managed relative to a polygonalmesh of sufficient complexity. Naturally, skin weighting ismuch more manageable as there are

71

Page 72: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

relatively few vertices in a control mesh. Additionally, the approximate limit surface tends tobehave predictably when the control mesh is animated. In many cases, achieving similar animationresults with a purely polygonal model with polygon count comparable to our post-tessellated limitsurface approximation would be not only impractical but virtually impossible.

3.7 Future Work

Obviously, a critical next step in our adoption of approximate Catmull-Clark subdivision surfacesis migration to Direct3D 11 hardware as it becomes available. Once we understand the perfor-mance characteristics of Direct3D 11 GPUs, we can determinehow aggressive we should be in ouradoption of approximate Catmull-Clark subdivision surfaces and how quickly we can realisticallyship it to our game customers. At this point, we are far enoughalong in our implementation thatwe have shipped GPU-rendered approximate Catmull-Clark subdivision surfaces in the form ofsome rendered elements in the animated shortsMeet the SandvichandMeet the Spy.

In the future, we intend to address the topic ofadaptive subdivision, as this will be critical for per-formance and level of detail (LOD) management. Bunnell has demonstrated extremely compellingresults using adaptive tessellation of displaced Catmull-Clark subdivision surfaces [Bun05]. Weanticipate that this will be particularly important as we move beyond isolated character and ob-ject meshes and into the trickier problem of environment andterrain rendering with approximatesubdivision surfaces. Given the new programming model introduced in Direct3D 11, we expectthat it will be necessary to develop new error metrics and schemes for determining the appropriatelevel of detail for a given primitive or primitive edge, particularly when performing displacementmapping. To date, we have intentionally put off explorationof adaptive tessellation schemes sincethey complicate the Direct3D 9 instanced (or ATI native) implementation in a way that doesn’tremain useful once we have a real GPU hull shader to tessellator connection [BS08].

Naturally, it is necessary for an interactive game to integrate any model representation with gamesystems such as real-time collision detection and decal/damage rendering. Some of the same func-tionality will be required for integration with other data amplification schemes such as hair render-ing and simulations as discussed in the next chapter. In addition to the optimizations described inSection3.3.2, we would like to explore culling operations appropriate toa displaced patch repre-sentation; [LMH00] has reported compelling speedups from the use of normal masks [ZH97].

We have integrated our technique with the Source engine’s skeletal and facial morphing systems asshown in Figure38but we look forward to exploring additional animation techniques such as fluidsimulation, cloth simulation or free-form deformation (FFD) of the low-resolution quad mesh. Weanticipate having to make changes to such simulations basedon the fact that we are animating asubdivision surface control mesh rather than the final polygonal primitives to be displayed.

72

Page 73: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

3.8 Conclusion

We are at a critical point in the evolution surface representations for real-time graphics and theinvestment being made by the GPU vendors in their Direct3D 11hardware aims to address mem-ory and memory bandwidth issues by performing programmabledata amplification on chip. Itstill remains to be seen, however, whether hardware will live up to its potential in this area. Af-ter all, hardware tessellation (PN Triangles) has shipped in mainstream graphics hardware before[VPBM01] and has languished due to lack of adoption by tool vendors and game developers. Thistime around, however, there is reason to be optimistic thatthis is the big switch to a new high qual-ity real-time surface representation as we are working withthe more established base primitive(Catmull-Clark Subdivision Surfaces) running on more performant and programmable hardwareunits.

73

Page 74: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

4 Approximating Subdivision Surfaces in ILM’s Tool chain

Philip Schneider and Vivek Verma, ILM

4.1 Introduction

Industrial Light + Magic (ILM) is like many other animation and effects companies in that we usedto rely on NURBS surfaces for model representation. And likethese other companies, we spent alot of effort dealing with continuity issues at the seams between patches and the general issues in-volved with the topological restrictions of NURBS. And again like many of these other companies,some years ago we switched over to using Catmull-Clark subdivision surfaces. Catmull-Clark sur-faces free modelers from the restrictions imposed by the patchwork-of-grids topology of NURBSsurface models, free software engineers from the of difficulties of maintaining continuity betweenpatches, and are well supported in widely used rendering packages such as Pixar’s RenderManR©.Recently, however, we have incorporated the use of an approximation to Catmull-Clark subdivi-sion surfaces. In this chapter, we discuss the context and motivations that led to our use of thatapproximation, as well as details about how we use it.

4.2 Motivation

ILM makes extensive use of displacement maps to add surface detail to characters, creatures,and hard-surface models. Figure47 shows an example of the head of the Mulgarath creaturefrom The Spiderwicke Chronicles- on the top is a basic “gray plastic shader” rendering of thelimit surface, and on the bottom the displacement map has been added. Displacement, along withnumerous other rendering effects defined by texture maps (color, bump, opacity, specularity, etc.),are rendered with custom RenderManR©shaders for both the final frames of the film and for TDsand artists to preview work in progress. The use of Catmull-Clark subdivision surfaces makesthis a straightforward process. Models are typically created and (if needed) animated in any oneof several commercial packages. ILM’s own proprietary Zenoapplication is used for setting uplighting, materials, textures, etc. and for funneling all the data down to RenderManR©for rendering.Zeno acts as the hub of ILM’s production pipeline, and provides a vast range of functionality.

ILM’s models are typically partitioned into sets of faces, each of which is UV-mapped separatelyand has assigned to it per-partition texture maps; that is, the usual atlas of charts approach. So, it’sjust a simple task of generating the requisite maps, shipping them off to RenderManR©, and we’re

74

Page 75: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

done, right? Well, of course the interesting part here is howthe texture maps are generated, andhow this fits into the artists’ workflow and the production pipeline.

In order to understand the why and how of ILM’s use of an approximation to Catmull-Clark sub-division surfaces, we have to take a step back and explain oneof the tools that’s been an importantpart of ILM’s pipeline for more than a decade: ILM’s Viewpaint.

4.2.1 Viewpaint

ILM’s Viewpaint was created in 1991-1992 by John Schlag, Brian Knep, Zoran Kacic-Alesic, andTom Williams. Its first major use was inJurassic Park. Prior to the development of Viewpaint,artists created and modified textures by simply painting on the texture image itself using a 2Dpainting program such as Parallax Software Limited’s Matador Paint SystemR©. Because the sur-faces of models are not generally flat, and even in the case of agood conformal mapping, there issome amount of distortion from the flat image to the rendered texture, and in any case the processis very indirect: the artist has to use experience and intuition to “reverse-engineer” the distortionto get the paint to go where it should on the model’s surface. The Viewpaint scheme allows a moredirect approach, and its success can be measured by its continued use at ILM, as well as its de-velopers being awarded a Scientific and Engineering Award from the Academy of Motion PictureArts and Sciences in 1997.

The Viewpaint approach allows artists to effectively painttextures directly on the surface of arendered model. However, this is not literally painting in 3D space: at the time of its development,display and computation speed would have been a significant impediment. While such direct 3Dpainting tools have been developed since then, the Viewpaint approach offers an advantage in thatthe artist can continue to make full use of all the power and flexibility of full-featured painting andimage processing tools such as AdobeR©PhotoShopR©or GIMP.

The basic idea behind Viewpaint is this:

1. Create an image that’s a snapshot of some texture-mapped model.

2. Export that image to the image-editing program of your choice, and paint or modify thedesired texture.

3. When done, map the painted pixels of that image back into the appropriate pixels in thetexture map, and re-render the model.

This process is shown in Figure48. The upper left-hand image (a) is a screen snapshot of Zeno,showing the face of Davy Jones from Disney’sPirates of the Caribbean, and Zeno’s Texture MapEditor displaying the UV mapping. The user interactively positions the model to show the regionthat is to be painted, and then simply hits a button to create asnapshot of that image; this is

75

Page 76: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

sent to (in this case) GIMP (b). The artist can then modify theimage (c), and when it’s saved,the modifications made to that Viewpaint image are applied back into the color texture in Zenousing the inverse mapping (d). Of course, this entire process is iterated numerous times, withthe Viewpaint artist repeatedly positioning the model, painting from this angle or that, and so on.This smooth workflow allows much of the ease of direct 3D painting, but with the ability to takeadvantage of the power of the artist’s familiar and powerfulimage-editing/painting tools.

76

Page 77: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Figure 47: Character rendered without (top) and with (bottom) displacement maps.

77

Page 78: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

(a)

(b)

(c)

(d)

Snapshot to GIMP

Paint...

Apply inverse

Figure 48: ILM’s Viewpaint workflow.

78

Page 79: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

The crux of this approach is that we need to know what texture map UV coordinate goes with eachpixel in the Viewpaint snapshot image, so we know where the Viewpaint snapshot pixel valuesare to be applied in the texture map image. Viewpaint mappingcan either be a forward or inversemapping, each having advantages and disadvantages. In inverse mapping, we draw the texturecoordinates as flat geometry in texture space, using the painted image as the texture, and invertingthe perspective distortion at each vertex. This approach has the advantage of being fast, and therasterization is handled robustly by the graphics library.Occlusion can be handled using a classicshadow buffer, commonly supported in hardware. The trick isto transform the depth into theorthographic texture space of the coordinates, so that the depth comparison is meaningful. Thishowever has depth aliasing problems similar to using shadowbuffers on regular 3D geometry. Aforward mapping approach is essentially a “deep buffer” scheme in which any mapping that canbe drawn in the same space as the painted image itself can be mapped per pixel, by splatting eachpixel’s paint into the corresponding area of its texture. Hybrid approaches are also possible, anddifferent variations have been employed at ILM.

We are finally coming closer to our first motivation for using an approximation to Catmull-Clarksurfaces. The observant reader will note that we haven’t mentioned displacement maps for awhile. Displacement maps can be created using painting, ad hoc processes, procedurally, andmore recently using programs such as Pixelogic’sTMZBrushR©and Autodesk’sR©MudboxTM, usingsculpting-like interface, and again these can just be passed on to the final rendering software asjust another texture effect.

But what happens if the Viewpaint artist is working on painting, say, a color texture on a modelwith nontrivial displacement? Because the Viewpaint rendering does not include the effect ofthe displacements, the artist has to use experience, intuition, and trial-and-error to paint the colortexture in a way that accomodates geometry they cannot visualize. The tremendously impressiveresults we see in images like that shown in Figure49 are a tribute to the great level of skill in ourViewpaint artists. But, if the displacement is large enough, even great skill and experience may notbe sufficient, and in some cases, ILM modelers have had to add new topology (i.e. more geometricdetail) in the areas of the displacement – that is, they add enough topology so that some amount ofthe displacement is represented explicitly in the geometryof the model. This, of course, tends tomake the models very heavy, thereby defeating some of the advantages of using displacement mapsin the first place, which is the ability to make lighter modelsby putting some of the high-frequencydetails into the displacement.

From both productivity and model footprint standpoints, burdening Viewpaint artists with havingto “imagine” where the displacements are, or simply adding more topology, are both highly unde-sirable. So, last year, our ILM’s R&D Geometry, Modeling, and Sculpting Group was approachedby artists from ILM’s Digital Model Shop (DMS) with a requestto enable Viewpaint artists to painton displaced Catmull-Clark surfaces. This was our first, andperhaps most significant, motivationleading to our use of a Catmull-Clark approximation scheme.

79

Page 80: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Figure 49: Displacement Image.

We considered some alternative solutions that would have only addressed the specific issue ofallowing the Viewpaint artists to get a displaced model image to paint on. If we could get Zeno todisplay displaced Catmull-Clark subdivision surface models in general, then we’d have not only theability to paint color (etc) textures on them, but other artists and TDs would be able to see modelsthat were more “true” to their final film-rendered shape in zeno itself. One often-used feature inZeno is its ability to capture GL-rendered playback of the timeline (like a Maya playblast) - thisis highly useful for generating quick turnaround clips of animation, simulation runs, etc, or simply

80

Page 81: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

generating a video that will run at film speed from a scene that’s too heavy to play interactively atthat rate. In that workflow, the ability to see the displacements would be tremendously beneficial(imagine the benefit of being able to quickly see what effect displacement has on facial animationwith shapes, for example).

4.2.2 Catmull-Clark Limit Surface Evaluation

The second motivation leading to the use of a Catmull-Clark approximation came from the desireto evaluate Catmull-Clark subdivision limit surfaces.

NURBS models have several characteristics that can offer significant advantages over Catmull-Clark subdivision surface models:

• They have a built-in parameterization.

• They have a simple closed formula, and commonly used algorithms such as exact evaluationof position, normals, tangents, etc is trivial, and operations like ray intersection and closestpoint queries are relatively straightforward.

• There is a wealth of literature supporting them (e.g. [PT97] [BBB87]).

Catmull-Clark subdivision surfaces offer neither of these. Indeed, while the subdivision surfaceswere first defined in 1978 by Edwin Catmull and Jim Clark [CC78], it was not until 1998 thatJos Stam published a practical method for Catmull-Clark subdivision surface evaluation in theneighborhood of extraordinary points [Sta98].

Surface evaluation of models can be important in a number of places in an animation, specialeffects, and simulation pipeline; for example:

• Interactive placement of objects on the surface of other objects.

• Hairs/fur on a creature needs to be constrained to a particular parametric point on a limitsurface such that a hair sticks to the surface even when the surface deforms.

• A simulated fluid’s surface may need to spawn particles for spray or bubble effects.

In lieu of a good Catmull-Clark subdivision limit surface evaluation library, ILM R&D engineershave resorted to various ad hoc workarounds when their toolsrequired points on the limit surface.Such approaches typically involved creating a copy of the Catmull-Clark subdivision mesh, sub-dividing it in place a few times, triangulating the resulting quads, and then using point-on-triangleschemes, ray-triangle intersections, and the like. Because Catmull-Clark subdivision converges onthe limit surface so rapidly, these techniques can be serviceable, but there are significant drawbacksto such approaches:

81

Page 82: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

• The piecewise linear approximation to the limit surface is just that – an approximation, andas such can suffer from imprecision and inaccuracy.

• For large meshes, or many such meshes, the memory footprint can be daunting.

• The linear approximation constitutes a cache, and as such has cache validity problems (imag-ine one used for a deforming surface). The mechanism for creating and maintaining this dataadds to code size and complexity, and impacts performance.

An obvious answer to the need for Catmull-Clark limit surface evaluation might be to simplyimplement Stam’s algorithm and be done with it. But this approach is not without its own problems:

• It doesn’t deal with hard edges or open boundaries.

• The computational load can be significant in the vicinity of extraordinary vertices, particu-larly if the algorithm is invoked many times.

• It doesn’t address such operations as ray intersection.

4.2.3 Convergence

The need for displaying displaced Catmull-Clark subdivision surfaces in Zeno was long-standing.By last year, the need for a good Catmull-Clark limit surfaceevaluation library went from a “wouldbe nice to have” to “we really need this yesterday” state: theproblems engendered by the ad hocapproaches were becoming a significant issue – the drawbacksmentioned earlier were becomingsignificant issues rather than just annoying or inelegant. So, we were in the situation of having twohigh-priority projects, and fairly constrained resources. A solution that helped fulfill both needswould be highly desirable.

A few years go, one of us (Philip) was talking to Charles Loop about a paper he was working on(later published as [LB06]). Charles mentioned that he was working on another project(whichwas later published as [LS08a]) and he’d send Philip a draft when it became available. Afterconsidering various other options for the two ILM projects,we realized that the Loop/Schaeferapproximation scheme might offer us a solution to both of ourseemingly disparate problems.

For the Viewpainter’s need to paint color (and other effects) on displaced geometry, it would sufficefor Zeno to be capable of displaying displaced Catmull-Clark subdivision surface models with thedisplacement faithfully applied. The Viewpaint system would then simply “just work”; that thesurface shape was due to displacement or to highly detailed dense geometric manipulation wouldbe entirely irrelevant. The Loop/Schaefer scheme provideda means to accomplish this displaywithin the confines of the existing software architecture.

The Loop/Schaefer ACC algorithm creates an approximation to the Catmull-Clark subdivision

82

Page 83: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

limit surface using bicubic Bezier patches. Bezier surfaces (and BSpline surfaces in general) havethose nice characteristics mentioned earlier that make them relatively easy to deal with for limitsurface evaluation. We realized that if we had a Bezier approximation to the Catmull-Clark limitsurface, and could re-cast the limit surface queries on the original model to limit surface querieson that Bezier approximation, we could leverage much of thework we needed to do for the dis-placement display and get the limit surface evaluation functionality for a relatively modest amountof additional effort.

4.3 Displacement Display Implementation

At this point it would be good to present some basic issues regarding ILM’s models and pipelinethat influenced our implementation:

• Models are mostly, but not exclusively, comprised of quadrilateral faces.

• Models are partitioned into sets of faces that are separately UV-mapped and textured.

• These individual partitions can have separate interactiveshaders associated with them.

• The visibility of models can be controlled on a per-partition basis (that is, some may behidden while others are displayed).

• Extensive use of shapes, enveloping, and interactive sculpting/modeling tools has led to quitea lot of code optimization that attempts to update only thosevertices on a model that differfrom the rest or previous positions, and only on partitions that are visible.

• Models occasionally have geometric discontinuities in theform of “hard edges” (aka creases).

We already have an existing tessellation subsystem that we use for interactively rendering Catmull-Clark subdivision surfaces. The issue with displacement that was problematic within this existingsystem was that displacement needed to be applied at the per-vertex level. Because the displace-ment texture can contain very high-frequency detail, the vertices displaced must be very closetogether in screen space. If we were simply to tessellate thesurface to the level necessary to avoidsubsampling artifacts, the amount of CPU-side memory wouldbe spectacular, as would be the timeneeded to transmit all of that data to the GPU and the time needed for the GPU to process it.

4.3.1 CPU-side Subdivision Pipeline

The existing Catmull-Clark rendering system in Zeno (priorto our implementation of the ACCalgorithm), basically consisted of repeated application of the Catmull-Clark split-then-smooth al-gorithm on the mesh, and then sending the resulting quadrilaterals to OpenGL for rendering. At thetime this was implemented, GL quad strips were a reasonably good choice for output primitives;

83

Page 84: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

the Catmull-Clark scheme also split faces into quadrilaterals at each step, leading to topologicallysquare grids of quadrilaterals for each face.

A number of issues made this simple-sounding scheme become fairly complex. In addition to themodel and pipeline issues we just listed, GL primitives likequad or tri strips require a matchingtopology for all per-vertex data (normals, colors, etc), and must share a single color or texture. Thismeans that the original mesh must, for the purposes of rendering as quad strips, be decomposedinto contiguous regions. So, in this initial decomposition, regions must at least be bounded byUV discontinuities, partition boundaries, and hard edges (geometric discontinuities). In addition,extraordinary vertices were used to bound regions.

The solution to all of these constraints was to decompose thebase Catmull-Clark mesh into stripsof quadrilaterals, whose boundaries were defined by any of the various criteria just enumerated.

Figure50 shows a schematic of this mesh-stripping scheme. The vertices in red are extraordinaryvertices, and the color indicates separate mesh partitions, with a UV discontinuity between themshown in green. Non-quadrilateral faces (such as faces j ando) were handled by decompositioninto separate strips consisting of a single quad each.

a b c d e f g h i

jk l m n o p

q r s t u v w x y

a b c d e f g h i

jk l m n o p

q r s t u v w x y

j

jo

o

o o

Figure 50: Stripping a mesh.

Once we have this decomposition into strips, we can apply theCatmull-Clark split-then-smoothalgorithm as many times as necessary; this turns each quad ineach strip into a topologically squaregrid of quadrilaterals, as shown in Figure51. This process ensures that each strip can be turnedinto OpenGL quad strips. As a final step, we push the vertices in the grids to the limit surfacebefore sending them to OpenGL; the reason we do so is this: if all the vertices of the tessellationare always at the limit surface, then as we interactively change the subdivision level of the renderedsurface the effective shape doesn’t appear to change much, but rather looks as if details are being

84

Page 85: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

filled in.

Strip

Level 1 Subdivision

Level 2 Subdivision

Level 3 Subdivision

Figure 51: Strips are subdivided into grids for output, using the Catmull-Clark split-then-smooth algorithm.

The wish to display or hide the different partitions of a subdivided mesh is enabled by keepingtrack of which partition goes with which output grid. Similarly, if there are geometric changes(due to modeling/sculpting operations, deformers, etc.),data structures that map from base meshvertex to output grids allow us to update only the subdividedpositions (and normals, UVs, colors,etc.) that are affected by the base vertex change; such changes are also restricted only to visiblepartitions.

4.3.2 CPU-side ACC Subdivision Pipeline

In the previous section, we described the pre-ACC subdivision pipeline. Here, we describe howwe utilized part of that existing mechanism to implement theACC approach. In the subdivisionpipeline, we used the appropriate Catmull-Clark rules (depending on vertex valence, boundarycondition, etc.) on the strips to generate the subdivided grids. For the ACC approach, we startwith the Level 1 subdivision grids: each quad face is converted to a bicubic Bezier patch and twotangent patches according to the ACC algorithm [LS08a] (see Section2.3). Figure52 shows thisprocess - the cyan quadrilateral is used to generate the corresponding Bezier patch.

We employ a symmetric computation in the generation of the ACC Bezier patches and tangentpatches in order to avoid the problems discussed in Section2.5: insert reference to section); theinfrastructure supporting this was already present in the previous subdivision pipeline in order toensure exact correspondence of vertices at strip boundaries.

85

Page 86: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Strip

Level 1 Subdivision

Bicubic Bezier Patches

Figure 52: Converting a strip to bicubic Bezier patches viathe ACC algorithm. Each Bezier patch corre-sponds to a single quad in the Level 1 Subdivision grid.

4.3.3 Displacement Display

The previous sections outlined the architecture of our tessellation scheme that converts a Catmull-Clark cage into a collection of bi-cubic Bezier patches. A straightforward way to render theseBezier patches would be to send all of them to the GPU where they could be tessellated andrendered using the instancing scheme outlined in Section2.4.

At ILM, we typically have very large models with large textures and as such the Bezier patch CVsgenerated for them can’t all fit on the available GPU memory. Given the memory constraint, wedesigned our rendering scheme to use constant buffers to transfer the Bezier patches to the GPU inmanageable chunks. The data for the Bezier patches (geometry and tangent field CVs) are storedin Parameter Bufferobjects. The buffers are created and set upfront for a model and at the time ofrendering they are bound to their uniform parameter buffer variables as needed. When the geome-try changes, we update the appropriate buffers. Since the buffers reside on the server side, the datais transferred from the CPU to the GPU more efficiently. Seehttp://www.opengl.org/registry/specs/NV/parameter_buffer_object.txtfor more details on Parameter Buffers.

A great advantage of using the ACC algorithm is that the underlying Catmull-Clark surface can berepresented by a collection of bi-cubic Bezier patches. The Bezier patches make it easy to modifythe surface using a displacement map. The reader would remember that the ACC algorithm notonly generates a bi-cubic Bezier approximation to the Catmull-Clark surface but it also generates aC2 continuous tangent field. This tangent field can be employed to give the appearance of a smoothsurface. However, after a displacement map is applied to thesurface, we can no longer use thistangent field for smoothly shading the surface. A typical work flow that generated the displacementmap (using software packages like ZBrushR©and MudboxTM) can also generate a normal map.Thenormal map can be generated in object space or tangent space depending on the application. Theobject space tangent map is faster to evaluate but cannot be used for deforming surfaces and insuch a situation a tangent space tangent map is needed. However, we can’t always guarantee the

86

Page 87: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

availability of a normal map. We use the GPU shader describednext when a normal map is notavailable to provide the shading information for a displaced surface.

Figure 53: After displacement of the surface in the normal direction, the shading normals change.

Figure53 illustrates the change in shading after a displacement map has been applied to a surface.In the absence of a normal map we approximate the shading of a displaced surface by computingthe normal at each vertex of the tessellated surface using the surrounding vertices. Figure54showsthe tessellated surface generated from a strip of Bezier patches. We generate a new normal foreach vertex in the vertex shader. Given that we send the Bezier CVs (position, tangent, and UVs)to the GPU using a constant buffer, for each vertex that passes through the vertex shader, we caneasily determine the position of the neighboring vertices.Our approximation algorithm effectivelycomputes an approximate faceted shading for the tessellated Bezier patches. For a given vertex thatpasses through the vertex shader, we determine the positionof the other three vertices such that thegiven vertex is the bottom right vertex of a tessellated faceof the surface. We have implementedNewell’s algorithm [SSRS74] to approximate the normal of the plane for the tessellated face. Thisprovides a sufficiently nice approximation for the shading of the displaced surface as can be seenin Figure55. In certain areas of the displacement map where the displaced quad has a very smallarea, we don’t want to use the expensive Newell’s algorithm.In those areas we use a threshold todetermine whether the quad’s area is small. If the area is small then we just approximate the theshading by the surface normal of the center of the face.

We can demonstrate this scheme in a more obvious fashion witha pair of images. In Figure56we show a Zeno rendering of Thimbletack with no displacement, and the same rendered withdisplacement, at the lowest resolution possible - there is asingle quadrilateral rendered per Bezierpatch (of which there are four per Level 1 subdivision quadrilateral).

4.4 Limit Surface Evaluation Implementation

The bicubic Bezier control points and tangent patches thatwe create for the purposes of renderingcan also be used for (approximate, of course) Catmull-Clarklimit surface evaluation. As stated in

87

Page 88: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

A B

C D

A'

B'

C'

D'

Figure 54: Computing the tessellation’s shading normals.

Section4.2.2, computing Bezier surface points, normals, and tangents,given a UV value, is en-tirely trivial; ray intersection, nearest point, and similar calculations are relatively straightforward.Bezier surfaces have the advantage of having a built–in parametrization, and so an API for a mod-ule/library that provides this sort of functionality can exploit this to great advantage: as a trivialexample, the location of a hair root on a piecewise Bezier surface could be defined unambiguouslyas a tuple of patch index and UV coordinates.

Catmull-Clark subdivision surfaces have no built-in parametrization. Of course, for purposes oftexture-mapping, we can assign UV coordinates to Catmull-Clark control cage vertices, and applythe Catmull-Clark subdivision and smoothing rules to them as well. However, we wished to eval-uate surfaces without requiring a UV mapping to be present. Also, even if a UV mapping werepresent, using those UV coordinates associated with the vertices would be undesirable. Consider aCatmull-Clark mesh consisting of a single square face, as show in Figure57. If we apply a couplesteps of the Catmull-Clark subdivision algorithm, we get the somewhat circular surface shown inthe interior. Letv1 have the UV value(0, 0), v2 the UV value(1, 0), and so on. The pointp is the“lower left corner” of the limit surface, and thus should correspond to the limit surface evaluatedat (0, 0).

Figure58 shows Zeno’s UV map editor (a), with a simple checkerboard texture applied to thismesh, and the resulting rendered mesh (b). Because we apply the Catmull-Clark algorithm to theUV coordinates for the purposes of rendering, the pointp is actually(1/6, 1/6) in texture space. So,if we try to make the texture UVs do “double duty” for a positional surface parametrization spaceand for texture space, thenp in surface parametrization space is(0, 0) while p in UV texture spaceis (1/6, 1/6); or, we use the UV texture space directly for surface parametrization and then someportions of parameter space don’t correspond to any point onthe surface (for example, parameter

88

Page 89: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Figure 55: Before and after the shading approximation.

value(0, 0)). If a mesh is watertight, then these boundary-related issues don’t pose a problem, buteven then using the UV texture coordinates by themselves would be insufficient for meshes thathave multiple UV charts. So, the problems became:

89

Page 90: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Figure 56: The top image shows the model without displacement. The bottom image adds displacement,showing the faceted shading approximation.

90

Page 91: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

v1 v2

v3v4

u

v

p

Figure 57: Parameterizing a single Catmull-Clark quadrilateral.

v1 v2

v3v4

p

(a) (b)

Figure 58: UV texture space does not suffice for a surface parametrization.

1. Given only vertex, edge, and/or face indices, how can we unambiguously specify a “para-metric position” on a Catmull-Clark mesh that defines a pointon the limit surface?

2. How do we map such a specification to the data used for the Bezier approximation (that is, aparticular Bezier patch identifer and a UV value on it)?

If every face were a triangle, then barycentric coordinatesof each triangle would be a convenientway to describe all points on the face; but we needed a solution that would apply to faces ofarbitrary degree. The solution comes from this observation: in the first step of Catmull-Clarksubdivision, each face of the original base mesh is subdivided into a set of quadrilaterals by splitting

91

Page 92: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

each edge at its midpoint, inserting a new vertex in the centroid of the face, and connecting theedges appropriately.

Our Catmull-Clark Mesh implementation has at its heart thequad edgedata structure [GS85], withexplicit vertex, face, and edge abstractions surrounding it. This approach allows us to define, forexample, a “base” edge for each face. It also allows us to define convenient iterators, for example(extending the previous example) over the edges in a face, which always start at the same edge, andare guaranteed to iterate in a counterclockwise order. Because we apply the edge-splitting steps inthat deterministic iterator’s order, we have a one-to-one ordered mapping from a face’s edges toeach quad generated by a split of those edges. While we found this scheme to be convenient, inprinciple any scheme that can uniquely identify these facesand edges can be used.

Figure59 demonstrates this approach. In (a) we see a quadrilateral Catmull-Clark mesh face. Theedges are labelede0, . . . , en−1, in the order in which they’re split, and the origin ofei is vertexvi.The new verticesvi,i+1 resulting from splitting each edgeei, along with the new centroid vertexvC ,define a set of four sub-quadrilaterals, as shown in (b). We treat each such sub-quadrilateral as aseparate parametric domain. The shaded sub-quadrilateralin the lower left of (b) has its parametricorigin at v0, with u increasing in the direction ofe0, andv increasing in the direction−e3. Theparametersu andv lie between 0 and 1 and vary bilinearly over the subdivided regions. A givenpoint p can then be described by the 4-tuple(F, e, u, v) whereF is the face index,e is the edgeindex, andu, v are the parameters that vary between 0 and 1; in (b), the tuplewould beF (e0, u, v).To demonstrate how this approach generalizes to n-gons, consider the pentagon shown in (c) and(d); triangular faces are similar.

In the ACC algorithm, each of the base mesh’s subdivided faces’ sub-quadrilateralQj is repre-sented by a Bezier patchBj, whose corner vertices correspond to the vertices of theQj . ForBj ,define the parametric (UV) origin to be the pointbi corresponding tovi, and whose orientationis defined by associating increasingu with ei (that is, in the directionei to ei+1). That is, wemake the origin and orientation of the Bezier patch match that of the face sub-quadrilateral. So,any point defined by a tuple(F, ei, u, v) corresponds to a pointBi(u, v), and vice versa, as showin Figure60. We have now solved the problems we enumerated earlier. In order to evaluate anapproximate point on the Catmull-Clark subdivision mesh’slimit surface that is defined by sucha tuple, we simply evaluate the correct Bezier patch at specified (u, v) parameter; for the case ofreturning the normal or tangents as well as the position, we utilize the tangent patches defined bythe ACC algorithm. Texture coordinates are typically assigned to mesh vertices meshes, so we canevaluate them in the same way, returning the texture UV values for the point on the limit surface.

One of the main motivations for our creating this functionality was simply to enable us to evaluatepoints on Catmull-Clark subdivision limit surfaces. But, because we have these underlying Bezierpatches and tangent patches approximating that surface, essentially any of the vast family of query,measure, intersection, etc. type algorithms for Bezier patches are available for Catmull-Clark sub-division surfaces. We have implemented such broadly usefulalgorithms such as ray intersection,

92

Page 93: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

(a)

e0

e1

e2

e3 F

p

v0 v1

v2v3

u

v u

v

u

vu

v

(u,v)

(b)

v0

vC

v0,1

v3,0

(c)

e0

e1

e2e3

e4

F

p

v0 v1

v2

v3

v3

u

v u

v

u

v

u v

u

v

(u,v)

(d)

v1

vC

v1,2

v0,1

Figure 59: Defining a per-face parametrization.

(a)

u

v

(u,v)

v1

vC

v1,2

v0,1

(b)

b1 b1,2

bCb0,1

u

v

(u,v)

Figure 60: Mapping from a mesh face parametrization to a Bezier patch.

distance from ray to surface, and nearest point on surface, to name a few.

Here’s an example of how we can use this scheme: say we wish to intersect a character’s cheek witha ray, and then place a hair on the intersection point, and constrain it so that it stays put on the scalpas the character is animated. We start off by intersecting a ray with the Bezier patches that we’vecreated to approximate the Catmull-Clark subdivision surface. The intersection computation givesus back the patch’s tuple(F, e0, u, v) identifying that point, and actual (approximate) limit surfaceposition, normal, and tangents. We record this tuple with the data structure for hair in question.

93

Page 94: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Whenever we wish to draw that hair, we look up that tuple to findthe relevant Bezier patch, andevaluate that patch at the given UV value, yielding the new position at which to constrain the hair’sorigin, as well as the orientation. This evaluation is very inexpensive computationally (since it’salways a bicubic), and so applying this technique for even a fairly large number of hairs is notproblematic.

The API for these query operations is rather unsurprising. Because these queries require that wecreate the Bezier patches for the approximation, and we wouldn’t wish to create them over an overagain when many queries are made sequentially, we require that the caller “bracket” the querieswith a begin/end pair of methods. Figure61 shows some example pseudocode query methodsignatures for a Catmull-Clark mesh.

void evaluateLimitSurfaceBegin();

void evaluateLimitSurface(const Edge *edge,const UV &uv,Point *position,Tangent *dU = 0,Tangent *dV = 0,Color *color = 0,UV *mappedUV = 0);

void evaluateLimitSurfaceEnd();

void nearestPointOnLimitSurface(const Point &localPoint,int &faceIndex,int &edgeIndex,UV &uv,Point *nearestPt,Tangent *dU = 0,Tangent *dV = 0,Color *color = 0,UV *mappedUV = 0);

bool rayLimitSurfaceIntersection(const Line3d &localRay,int &faceIndex,int &edgeIndex,UV &uv,Point *nearestPt,Tangent *dU = 0,Tangent *dV = 0,Color *color = 0,UV *mappedUV = 0);

Figure 61: Mesh limit surface query API.

Note that theUV &uv argument is the parametric location of the query point, in the domain of thesub-quads described earlier, while theUV &mappedUV is the return value of the texture coordi-

94

Page 95: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

nates (if any) on the mesh. As you can see, from the caller’s point of view, the fact that there is aBezier approximation being used is completely hidden. In the Python interface to Zeno, there areanalogous Python functions with essentially the same signatures.

4.5 Results

4.5.1 Displacement Display

For the display of displaced Catmull-Clark surfaces, our goal was to render them with sufficientfidelity to RendermanR©so as to allow the Viewpaint artists to paint textures for color and othereffects on displaced surfaces. In Figure62 the top image of Thimbletack was rendered withRendermanR©, and the bottom image was rendered with Zeno. In Figure63 the top image ofMulgarath was rendered with RendermanR©, and the bottom image was rendered with Zeno.

4.5.2 Limit Surface Evaluation

Here we show the use of limit surface query and evaluation functionality. Figure64 shows aCatmull-Clark subdivision surface on which we’ve placed a few hundred particles rendered assmall spheres. The tuple-defined positions were generated by a particle simulation system to createparticle emission points on the limit surface of the torus, using a blue noise function. A moresophisticated variant of this simple scheme could be used topopulate a character with hair, or toemit spray from the surface of a fluid simulation, etc.

A demonstration of the nearest-point-on-surface query is shown in Figure65. To generate thisimage, we placed particles in a vector “swirl” field and evaluated their simulated positions overtime. We sampled the particles’ positions and used the positions as the sources for nearest-pointqueries on the limit surface of the human model, drawing a curve with the sequences of limitsurface points.

4.6 Conclusions and Future Work

While we have successfully implemented Loop and Schaefer’sscheme to approximate subdivisionsurfaces, there are some remaining challenges:

• The existing CPU-side tessellation architecture, which was highly optimized for the graphicscards at the time of its development, is no longer optimal: new features in OpenGL and Cg,and new graphics card capabilities and capacities mean thata new approach can be under-taken. Some of the model characteristics that led to architectural decisions are still present

95

Page 96: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

(UV and geometric discontinuities, the occasional non-quadrilateral faces, very large modelswith many large texture maps, etc.), and we need to efficiently manage both deforming andstatic models.

• Tools that create displacement textures can optionally create a normal map that correspondsto that displacement. Considering that one of the algorithmically and computationally com-plex aspects of displacement shading is related to shading normals, using both displacementand normal maps may allow for both better performance and visual quality.

4.7 Acknowledgements

A lot of folks at ILM, both in R&D and production, contributedto the Zeno displacement displayand limit surface evaluation projects. Colette Mullenhoffand Vivek Verma of R&D did the bulkof the implementation of both projects. Lana Lan, Michael Koperwas, and Geoff Campbell ofILM’s Digital Model Shop helped specify functionality and interfaces for the displacement displayproject, and tirelessly tested our work in progress. Don Hatch, Stephen Bowline, and Nick Ras-mussen of ILM R&D contributed to the functional specification of the limit surface project, andhelped with testing the functionality and refining/extending the API. Aaron Elder was the ProjectManager for this work.

96

Page 97: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Figure 62: The top image of Thimbletack was rendered by Renderman, and the bottom image was renderedin Zeno.

97

Page 98: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Figure 63: The top image of Mulgarath was rendered by RendermanR©, and the bottom image was renderedin Zeno.

98

Page 99: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Figure 64: Random “blue noise” particles generated on Catmull-Clark limit surface.

99

Page 100: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Figure 65: Streamline traces from nearest-point-on-surface queries, from particles embedded in a vectorfield.

100

Page 101: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

References

[AB08] Marc Alexa and Tamy Boubekeur. Subdivision Shading.ACM Trans. Graph, Sig-graph Asia, 27(5):142, 2008.

[BA08] Tamy Boubekeur and Marc Alexa. Phong Tessellation.ACM Trans. Graph, SiggraphAsia, 27(5):141, 2008.

[BBB87] Richard H. Bartels, John C. Beatty, and Brian A. Barsky. An Introduction to Splinesfor Use in Computer Graphics & Geometric Modeling. Morgan Kaufmann PublishersInc., San Francisco, CA, USA, 1987.

[Bez77] Pierre E. Bezier.Essai de Definition Numerique des Courbes et des Surfaces Experi-mentales. Ph.d. thesis, Universite Pierre et Marie Curie, February 1977.

[BG75] R. Barnhill and J. Gregory. Compatible Smooth Interpolation in Triangles. J ofApprox. Theory, 15(3):214–225, 1975.

[BRS05] Tamy Boubekeur, Patrick Reuter, and Christophe Schlick. Scalar Tagged PN Trian-gles. InEUROGRAPHICS 2005 (Short Papers). Eurographics, 2005.

[BS02] J. Bolz and P. Schroder. Rapid Evaluation of Catmull-Clark Subdivision Surfaces. InProceedings of the Web3D 2002 Symposium, pages 11–18. ACM Press, 2002.

[BS05] Tamy Boubekeur and Christophe Schlick. Generic MeshRefinement on Gpu. InACM SIGGRAPH/Eurographics Graphics Hardware, 2005.

[BS07] Tamy Boubekeur and Christophe Schlick. QAS: Real-Time Quadratic Approxima-tion of Subdivision Surfaces. In Marc Alexa, Steven J. Gortler, and Tao Ju, editors,Proceedings of the Pacific Conference on Computer Graphics and Applications, Pa-cific Graphics 2007, Maui, Hawaii, USA, October 29 - November2, 2007, pages453–456. IEEE Computer Society, 2007.

[BS08] Tamy Boubekeur and Christophe Schlick. A Flexible Kernel for Adaptive MeshRefinement on GPU.Computer Graphics Forum, 27(1):102–114, 2008.

[Bun05] Michael Bunnell. GPU Gems 2: Programming Techniques for High-PerformanceGraphics and General-Purpose Computation, chapter Adaptive Tessellation of Sub-division Surfaces With Displacement Mapping. Addison-Wesley, Reading, MA,2005.

[Cas08a] Ignacio Castano. Next-Generation Rendering of Subdivision Surfaces, 2008. SIG-GRAPH.

[Cas08b] Ignacio Castano. Tessellation of Subdivision Surfaces in Directx 11, 2008. Gamefest.

101

Page 102: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

[CC78] Edwin Catmull and James Clark. Recursively Generated B-Spline Surfaces on Arbi-trary Topological Meshes.Computer-Aided Design, pages 350–355, 1978.

[CDM91] A.S. Cavaretta, W. Dahmen, and C.A. Micchelli. Stationary Subdivision.Memoirs ofthe American Mathematical Society, 93(453):1–186, 1991.

[CH02] Nathan A. Carr and John C. Hart. Meshed Atlases for Real-Time Procedural SolidTexturing.ACM Trans. Graph., 21(2):106–131, 2002.

[CHCH06] Nathan A. Carr, Jared Hoberock, Keenan Crane, and John C. Hart. RectangularMulti-chart Geometry Images. InSGP ’06: Proceedings of the fourth Eurograph-ics symposium on Geometry processing, pages 181–190, Aire-la-Ville, Switzerland,Switzerland, 2006. Eurographics Association.

[COM98] Jonathan Cohen, Marc Olano, and Dinesh Manocha. Appearance-Preserving Simpli-fication. InSIGGRAPH ’98: Proceedings of the 25th annual conference on Computergraphics and interactive techniques, pages 115–122, New York, NY, USA, 1998.ACM.

[Coo84] Robert L. Cook. Shade Trees.SIGGRAPH Comput. Graph., 18(3):223–231, 1984.

[dB93] Carl de Boor. On the Evaluation of Box Splines.Numerical Algorithms, 5(1–4):5–23,1993.

[DBG+06] Shen Dong, Peer-Timo Bremer, Michael Garland, Valerio Pascucci, and John C. Hart.Spectral Surface Quadrangulation.ACM Trans. Graph., 25(3):1057–1066, 2006.

[Dia08] Rich Diamant. Autodesk Mudbox: Integration and Usewith Autodesk 3ds Max andAutodesk Maya. InGame Developer’s Conference, 2008.

[DKT98] T. DeRose, M. Kass, and T. Truong. Subdivision Surfaces in Character Animation. InSIGGRAPH ’98: Proceedings of the 25th annual conference on Computer graphicsand interactive techniques, pages 85–94, New York, NY, USA, 1998. ACM Press.

[DRS08] C. Dyken, M. Reimers, and J. Seland. Real-Time Gpu Silhouette Refinement UsingAdaptively Blended Bezier Patches. InComputer Graphics Forum 27 (1), pages 1–12, 2008.

[DRSar] Christopher Dyken, Martin Reimers, and Johan Seland. Semi-Uniform AdaptivePatch Tessellation.Computer Graphics Forum, to appear.

[DS78] D. Doo and M. Sabin. Behaviour of recursive division surfaces near extraordinarypoints.Computer-Aided Design, 10:356–360, September 1978.

[DWS+88] Michael Deering, Stephanie Winner, Bic Schediwy, ChrisDuffy, and Neil Hunt. TheTriangle Processor and Normal Vector Shader: a VLSI System for High Performance

102

Page 103: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

Graphics. InSIGGRAPH ’88: Proceedings of the 15th annual conference on Com-puter graphics and interactive techniques, pages 21–30, New York, NY, USA, 1988.ACM.

[Far97] Gerald Farin.Curves and Surfaces for Computer-Aided Geometric Design: APrac-tical Guide. Academic Press, pub-ACADEMIC:adr, fourth edition, 1997.

[FH05] Michael S. Floater and Kai Hormann. Surface Parameterization: a Tutorial and Sur-vey. pages 157–186, 2005.

[For03] Tom Forsyth. Practical Displacement Mapping. InGame Developers Conference,2003.

[For06] Tom Forsyth. Linear-Speed Vertex Cache Optimization. 2006.

[Gee08] Kev Gee. DirectX 11 Tessellation. InMicrosoft GameFest, 2008.

[GP99] Carlos Gonzalez and Jorg Peters. Localized Hierarchy Surface Splines. InS.N. Spencer J. Rossignac, editor,ACM Symposium on Interactive 3D Graphics,1999.

[Gre74] J. A. Gregory.Smooth Interpolation Without Twist Constraints, pages 71–88. Aca-demic Press, 1974.

[GS85] Leonidas Guibas and Jorge Stolfi. Primitives for the Manipulation of General Subdi-visions and the Computation of Voronoi Diagrams.ACM Trans. Graph., 4(2):74–123,1985.

[JH08] Jin Ma Xinguo Liu Leif Kobbelt Hujun Bao Jin Huang, Muyang Zhang. SpectralQuadrangulation with Orientation and Alignment Control.SIGGRAPH Asia, 2008.

[JLW05] Shuangshuang Jin, Robert R. Lewis, and David West. AComparison of Algorithmsfor Vertex Normal Computation.The Visual Computer, 21(1-2):71–82, 2005.

[KMDZ09] Denis Kovacs, Jason Mitchell, Shanon Drone, and Denis Zorin. Real-time CreasedApproximate Subdivision Surfaces. InI3D ’09: Proceedings of the 2009 Symposiumon Interactive 3D Graphics and Games, pages 155–160, New York, NY, USA, 2009.ACM.

[KP07] Kestutis Karciauskas and Jorg Peters. Concentric Tessellation Maps and CurvatureContinuous Guided Surfaces.Computer Aided Geometric Design, 24(2):99–111, Feb2007.

[KPN1] Kestutis Karciauskas and Jorg Peters. Guided Spline Surfaces. Computer AidedGeometric Design, pages 1–20, 2009 N1.

103

Page 104: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

[KPR04] K. Karciauskas, J. Peters, and U. Reif. Shape Characterization of Subdivision Sur-faces – Case Studies.Computer-Aided Geometric Design, 21(6):601–614, july 2004.

[KSG03] Vladislav Kraevoy, Alla Sheffer, and Craig Gotsman. Matchmaker: ConstructingConstrained Texture Maps. 2003.

[LB06] Charles Loop and Jim Blinn. Real-time GPU Rendering of Piecewise Algebraic Sur-faces.ACM Trans. Graph., 25(3):664–670, 2006.

[Lee06] Matt Lee. Next-Generation Graphics Programming onXBox 360. In MicrosoftGameFest, 2006.

[LMH00] Aaron Lee, Henry Moreton, and Hugues Hoppe. Displaced Subdivision Surfaces. InSIGGRAPH ’00: Proceedings of the 27th annual conference on Computer graphicsand interactive techniques, pages 85–94, 2000.

[Loo92] Charles Loop.Generalized B-Spline Surfaces of Arbitrary Topological Type. PhDthesis, University of Washington, 1992.

[Loo04] Charles Loop. Second Order Smoothness over Extraordinary Vertices. InSymposiumon Geometry Processing, pages 169–178, 2004.

[LS08a] Charles Loop and Scott Schaefer. Approximating Catmull-Clark Subdivision Sur-faces with Bicubic Patches.ACM Trans. Graph., 27(1):1–11, 2008.

[LS08b] Charles T. Loop and Scott Schaefer.G2 Tensor Product Splines Over ExtraordinaryVertices.Comput. Graph. Forum, 27(5):1373–1382, 2008.

[LY06] Gang Lin and Thomas P. Y. Yu. An improved vertex caching scheme for 3d meshrendering.IEEE Transactions on Visualization and Computer Graphics, 12(4):640–648, 2006.

[MHAM08] Jacob Munkberg, Jon Hasselgren, and Tomas Akenine-Mller. Non-Uniform Frac-tional Tessellation. InACM SIGGRAPH/Graphics Hardware 2008, 2008.

[Mit07] Martin Mittring. Finding Next Gen: CryEngine 2. InSIGGRAPH ’07: ACM SIG-GRAPH 2007 courses, pages 97–121, New York, NY, USA, 2007. ACM.

[MKP07] Ashish Myles, Kestutis Karciauskas, and Jorg Peters. Extending Catmull-Clark Sub-division and PCCM with Polar Structures. InPG ’07: Proceedings of the 15th PacificConference on Computer Graphics and Applications, pages 313–320, Washington,DC, USA, 2007. IEEE Computer Society.

[MNP08] Ashish Myles, Tianyun Ni, and Jorg Peters. Fast Parallel Construction of SmoothSurfaces from Meshes with Tri/Quad/Pent Facets. InSymposium on Geometry Pro-cessing, July 2 - 4, 2008, Copenhagen, Denmark, pages 1–8. Blackwell, 2008.

104

Page 105: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

[Mor01] Henry Moreton. Watertight Tessellation Using Forward Differencing. InHWWS’01: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphicshardware, pages 25–32, New York, NY, USA, 2001. ACM.

[MP09] Ashish Myles and Jorg Peters. Bi-3C2 Polar Subdivision.ACM Transactions onGraphics, 2009.

[Myl08] Ashish Myles. Curvature-Continuous Bicubic Subdivision Surfaces for Polar Con-figurations. PhD thesis, University of Florida, December 2008.

[NYM +08] Tianyun Ni, Young In Yeo, Ashish Myles, Vineet Goel, and Jorg Peters. GPUSmoothing of Quad Meshes. InIEEE International Conference on Shape Modelingand Applications, 2008.

[PB00] Dan Piponi and George Borshukov. Seamless Texture Mapping of Subdivision Sur-faces by Model Pelting and Texture Blending. InSIGGRAPH ’00: Proceedings ofthe 27th annual conference on Computer graphics and interactive techniques, pages471–478, New York, NY, USA, 2000. ACM Press/Addison-WesleyPublishing Co.

[PBP02] H. Prautzsch, W. Boehm, and M. Paluszny.Bezier and B-Spline Techniques. Mathe-matics and Visualization. Springer-Verlag, Berlin, 2002.

[PCK04] Budirijanto Purnomo, Jonathan D. Cohen, and SubodhKumar. Seamless TextureAtlases. InSGP ’04: Proceedings of the 2004 Eurographics/ACM SIGGRAPHsym-posium on Geometry processing, pages 65–74, New York, NY, USA, 2004. ACM.

[Pet91] Jorg Peters. Smooth Interpolation of a Mesh of Curves.Constructive Approximation,7:221–247, 1991. Winner of SIAM Student Paper Competition 1989.

[Pet95] Jorg Peters.C1-Surface Splines.SIAM Journal on Numerical Analysis, 32(2):645–666, 1995.

[Pet02] Jorg Peters. Geometric Continuity. InHandbook of Computer Aided GeometricDesign, pages 193–229. Elsevier, 2002.

[Pet04] Jorg Peters. Mid-Structures of Subdividable Linear Efficient Function EnclosuresLinking Curved and Linear Geometry. In Miriam Lucian and Marian Neamtu, editors,Proceedings of SIAM conference, Seattle, Nov 2003. Nashboro, 2004.

[Pet08] Jorg Peters. PN-Quads. Technical Report 2008-421, Dept CISE, University ofFlorida, 2008.

[PK09] Jorg Peters and K. Karciauskas. An introduction toguided and polar surfacing. InMathematics of Curves and Surfaces, pages 1–26, 2009. Seventh International Con-ference on Mathematical Methods for Curves and Surfaces Toensberg, Norway.

105

Page 106: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

[PR98] J. Peters and U. Reif. The 42 Equivalence Classes of Quadratic Surfaces in AffineN-Space.Computer-Aided Geometric Design, 15:459–473, 1998.

[PR08] J. Peters and U. Reif.Subdivision Surfaces, volume 3 ofGeometry and Computing.Springer-Verlag, New York, 2008.

[PT97] Les Piegl and Wayne Tiller.The NURBS Book (2nd ed.). Springer-Verlag New York,Inc., New York, NY, USA, 1997.

[PW08] J. Peters and X. Wu. Net-to-Surface Distance of Subdivision Functions.JAT, page xx,2008. in press.

[SAUK04] Le-Jeng Shiue, Pierre Alliez, Radu Ursu, and Lutz Kettner. A Tutorial on CGALPolyhedron for Subdivision Algorithms. In2nd CGAL User Workshop, 2004.http://www.cgal.org/Tutorials/Polyhedron/.

[SJP05] Le-Jeng Shiue, Ian Jones, and J. Peters. A Realtime GPU Subdivision Kernel. InMarcus Gross, editor,Siggraph 2005, Computer Graphics Proceedings, Annual Con-ference Series, pages 1010–1015. ACM Press / ACM SIGGRAPH / Addison WesleyLongman, 2005.

[SNB07] Pedro V. Sander, Diego Nehab, and Joshua Barczak. Fast Triangle Reordering forVertex Locality and Reduced Overdraw.ACM Trans. Graph., 26(3):89, 2007.

[SPR06] Alla Sheffer, Emil Praun, and Kenneth Rose. Mesh Parameterization Methods andTheir Applications.Found. Trends. Comput. Graph. Vis., 2(2):105–171, 2006.

[SSRS74] Ivan E. Sutherland, Robert F. Sproull, Robert, andA. Schumacker. A Characterizationof Ten Hidden-Surface Algorithms.ACM Computing Surveys, 6:1–55, 1974.

[ST90] Takafumi Saito and Tokiichiro Takahashi. Comprehensible Rendering of 3D Shapes.SIGGRAPH Comput. Graph., 24(4):197–206, 1990.

[Sta98] Jos Stam. Exact Evaluation of Catmull-Clark Subdivision Surfaces at Arbitrary Pa-rameter Values. In M. Cohen, editor,SIGGRAPH 98 Proceedings, pages 395–404.Addison Wesley, 1998.

[SW07] S. Schaefer and J. Warren. Exact Evaluation of Non-Polynomial SubdivisionSchemes at Rational Parameter Values. InPG ’07: 15th Pacific Conference on Com-puter Graphics and Applications, pages 321–330, Los Alamitos, CA, USA, 2007.IEEE Computer Society.

[SZBN03] Thomas W. Sederberg, Jianmin Zheng, Almaz Bakenov, and Ahmad Nasri. T-splinesand T-NURCCs. In Jessica Hodgins and John C. Hart, editors,Proceedings of ACMSIGGRAPH 2003, volume 22(3) ofACM Transactions on Graphics, pages 477–484.ACM Press, 2003.

106

Page 107: Efficient Substitutes for Subdivision Surfaceswebstaff.itn.liu.se/.../subdivision-surfaces.pdf · Efficient Substitutes for Subdivision Surfaces SIGGRAPH 2009 Course Notes August

[TACSD06] Y. Tong, P. Alliez, D. Cohen-Steiner, and M. Desbrun. Designing Quadrangulationswith Discrete Harmonic Forms.Eurographics Symposium on Geometry Processing,2006.

[Tat07] Natasha Tatarchuk. Real-Time Tessellation on the GPU. In SIGGRAPH AdvancedReal-Time Rendering in 3D Graphics and Games Course, 2007.

[Tat08] Andrei Tatarinov. Instanced Tessellation in Directx 10, 2008. GDC,http://developer.nvidia.com/object/gamefest-2008-subdiv.html.

[Val07] Michal Valient. Deferred Rendering in Killzone 2. In DEVELOP Conference,Brighton, UK, 2007.

[VPBM01] Alex Vlachos, Jorg Peters, Chas Boyd, and Jason Mitchell. Curved PN Triangles. InI3D 2001: Proceedings of the 2001 Symposium on Interactive 3D Graphics, pages159–166, 2001.

[vW86] J. van Wijk. Bicubic Patches for Approximating Non-Rectangular Control-PointMeshes.Computer Aided Geometric Design, 3(1):1–13, 1986.

[WP04] X. Wu and J. Peters. Interference Detection for Subdivision Surfaces.ComputerGraphics Forum, Eurographics 2004, 23(3):577–585, 2004.

[WP05] X. Wu and J. Peters. An Accurate Error Measure for Adaptive Subdivision surfaces.In Proceedings of The International Conference on Shape Modeling and Applications2005, pages 51–57, 2005.

[WW02] J. Warren and H. Weimer.Subdivision Methods for Geometric Design.MorganKaufmann, New York, 2002.

[YNM +] Young In Yeo, Tianyun Ni, Ashish Myles, Vineet Goel, and Jorg Peters. ParallelSmoothing of Quad Meshes.The Visual Computer, pages x–x. accepted, in press,TVCJ-267.

[ZH97] Hansong Zhang and Kenneth E. Hoff, III. Fast BackfaceCulling Using NormalMasks. InSI3D ’97: Proceedings of the 1997 symposium on Interactive 3D graphics,pages 103–ff., New York, NY, USA, 1997. ACM.

[ZS00] Denis Zorin and Peter Schroder, editors.Subdivision for Modeling and Animation,Course Notes. ACM SIGGRAPH, 2000.

107