interactive subdivision of smooth surfaces on gpus · interactive subdivision of smooth surfaces on...

ResultsModel Input

patches

Output

grids

Split time Dice

time

Rendering

performance

Teapot 32 4,823 2.69 ms 1.27 ms 60.1 fps

Killeroo 11,532 14,426 6.30 ms 3.46 ms 29.7 fps

0

10

20

30

40

50

0

10

20

30

40

50

60

70

80

90

100

0 128 256 384 512

Mem

ory

Usa

ge (

MB

)

Su

bd

ivis

ion

Tim

e (

ms)

Bucket width (pixels)

0

5

10

15

20

25

30

0 5000 10000 15000 20000

Tim

e (m

s)

Number of grids

Interactive Subdivision of Smooth Surfaces on GPUsAnjul Patney, Mohamed S. Ebeida*, John D. Owens

University of California, Davis, *Carnegie Mellon University

Parametric surfaces Subdivision SurfacesAbstractWe present a strategy for performing view-adaptive, crack-free tessellation of Catmull-

Clark subdivision surfaces entirely on programmable graphics hardware. Our scheme

extends the concept of breadth-first subdivision, which up to this point has only been

applied to parametric patches. While mesh representations designed for a CPU often

involve pointer-based structures and irregular per-element storage, neither of these is

well-suited to GPU execution. To solve this problem, we use a simple yet effective data

structure for representing a subdivision mesh, and design a careful algorithm to update

the mesh in a completely parallel manner. We demonstrate that in spite of the

complexities of the subdivision procedure, real-time tessellation to pixel-sized primitives

can be done.

Our implementation does not rely on any approximation of the limit surface, and avoids

both subdivision cracks and T-junctions in the subdivided mesh. Using the approach in

this paper, we are able to perform real-time subdivision for several static as well as

animated models. Rendering performance is scalable for increasingly complex models.

AbstractWe present a GPU based implementation of Reyes-style adaptive surface

subdivision, known in Reyes terminology as the Bound/Split and Dice

stages. The performance of this task is important for the Reyes pipeline to

map efficiently to graphics hardware, but its recursive nature and irregular

and unbounded memory requirements present a challenge to an efficient

implementation. Our solution begins by characterizing Reyes subdivision as

a work queue with irregular computation, targeted to a massively parallel

GPU. We propose efficient solutions to these general problems by casting

our solution in terms of the fundamental primitives of prefix-sum and

reduction, often encountered in parallel and GPGPU environments.

Our results indicate that real-time Reyes subdivision can indeed be obtained

on today's GPUs. We are able to subdivide a complex model to subpixel

accuracy within 15 ms. Our measured performance is several times better

than that of Pixar's RenderMan. Our implementation scales well with the

input size and depth of subdivision. We also address concerns of memory

size and bandwidth, and analyze the feasibility of conventional ideas on

screen-space buckets.

Evaluate screen-

space bound

SplitIs bound >

threshold?

Dice

Yes

No

Input surface

Output micropolygons

Split surface

Algorithm

1

2

3 4

5

1

2

4 5

3

Breadth-first Subdivision

Summary

Dynamic Work-Queue

A B C D E F G H I

A B C E F H A C E

A B C E F H A C E

Fast scan-based compaction

(Sengupta ‘07)

Coalesced Scatter

• Recursive Subdivision in real-time

– Breadth-first formulation

– Maps well to GPUs

• First step towards a real-time Reyes pipeline

• Resolving subdivision cracks

• Displacement mapping

Future Work

Conclusion

Approach

SummaryResults

Fixing Cracks

Face

Vertex

Edge

Facev0, v1, v2, v3

Face-point

Edgev0, v1, f0, f1

Edge-point

Vertexx, y, z, val.

Vertex-point

Facev0, v1, v2, v3

Face-point

Edgev0, v1, f0, f1

Edge-point

Vertex

x, y, z, val.

Vertex-point

Facev0, v1, v2, v3

Face-point

Edgev0, v1, f0, f1

Edge-point

Vertex

x, y, z, val.

Vertex-point

+ +

Maintain three arrays: Vertices, Faces, and Edges Cracks due to view-adaptive

subdivision are fixed entirely in parallel

Rendering time grows linearly with

scene complexity

Performance grows with complexity,

and settles at 2-2.5 Mfaces/sec

• Parallel GPU tessellation of Catmull-Clark surfaces

• Robust data management for subdivision

• Dynamic view-dependence

• Fixing cracks in parallel

Future Work

Conclusion

• Efficient memory management

• Programmable geometry caching

Atomic

add

Atomic

add

Model Input faces Output faces Rendering time

Big Guy 1,450 91,992 37.12 ms

Monster Frog 1,292 80,452 34.17 ms

Killeroo 2,894 80,227 31.34 ms

interactive subdivision of smooth surfaces on gpus · interactive subdivision of smooth surfaces on...

Documents