interactive subdivision of smooth surfaces on gpus · interactive subdivision of smooth surfaces on...
TRANSCRIPT
ResultsModel Input
patches
Output
grids
Split time Dice
time
Rendering
performance
Teapot 32 4,823 2.69 ms 1.27 ms 60.1 fps
Killeroo 11,532 14,426 6.30 ms 3.46 ms 29.7 fps
0
10
20
30
40
50
0
10
20
30
40
50
60
70
80
90
100
0 128 256 384 512
Mem
ory
Usa
ge (
MB
)
Su
bd
ivis
ion
Tim
e (
ms)
Bucket width (pixels)
0
5
10
15
20
25
30
0 5000 10000 15000 20000
Tim
e (m
s)
Number of grids
Interactive Subdivision of Smooth Surfaces on GPUsAnjul Patney, Mohamed S. Ebeida*, John D. Owens
University of California, Davis, *Carnegie Mellon University
Parametric surfaces Subdivision SurfacesAbstractWe present a strategy for performing view-adaptive, crack-free tessellation of Catmull-
Clark subdivision surfaces entirely on programmable graphics hardware. Our scheme
extends the concept of breadth-first subdivision, which up to this point has only been
applied to parametric patches. While mesh representations designed for a CPU often
involve pointer-based structures and irregular per-element storage, neither of these is
well-suited to GPU execution. To solve this problem, we use a simple yet effective data
structure for representing a subdivision mesh, and design a careful algorithm to update
the mesh in a completely parallel manner. We demonstrate that in spite of the
complexities of the subdivision procedure, real-time tessellation to pixel-sized primitives
can be done.
Our implementation does not rely on any approximation of the limit surface, and avoids
both subdivision cracks and T-junctions in the subdivided mesh. Using the approach in
this paper, we are able to perform real-time subdivision for several static as well as
animated models. Rendering performance is scalable for increasingly complex models.
AbstractWe present a GPU based implementation of Reyes-style adaptive surface
subdivision, known in Reyes terminology as the Bound/Split and Dice
stages. The performance of this task is important for the Reyes pipeline to
map efficiently to graphics hardware, but its recursive nature and irregular
and unbounded memory requirements present a challenge to an efficient
implementation. Our solution begins by characterizing Reyes subdivision as
a work queue with irregular computation, targeted to a massively parallel
GPU. We propose efficient solutions to these general problems by casting
our solution in terms of the fundamental primitives of prefix-sum and
reduction, often encountered in parallel and GPGPU environments.
Our results indicate that real-time Reyes subdivision can indeed be obtained
on today's GPUs. We are able to subdivide a complex model to subpixel
accuracy within 15 ms. Our measured performance is several times better
than that of Pixar's RenderMan. Our implementation scales well with the
input size and depth of subdivision. We also address concerns of memory
size and bandwidth, and analyze the feasibility of conventional ideas on
screen-space buckets.
Evaluate screen-
space bound
SplitIs bound >
threshold?
Dice
Yes
No
Input surface
Output micropolygons
Split surface
Algorithm
1
2
3 4
5
1
2
4 5
3
Breadth-first Subdivision
Summary
Dynamic Work-Queue
A B C D E F G H I
A B C E F H A C E
A B C E F H A C E
Fast scan-based compaction
(Sengupta ‘07)
Coalesced Scatter
• Recursive Subdivision in real-time
– Breadth-first formulation
– Maps well to GPUs
• First step towards a real-time Reyes pipeline
• Resolving subdivision cracks
• Displacement mapping
Future Work
Conclusion
Approach
SummaryResults
Fixing Cracks
Face
Vertex
Edge
Facev0, v1, v2, v3
Face-point
Edgev0, v1, f0, f1
Edge-point
Vertexx, y, z, val.
Vertex-point
Facev0, v1, v2, v3
Face-point
Edgev0, v1, f0, f1
Edge-point
Vertex
x, y, z, val.
Vertex-point
Facev0, v1, v2, v3
Face-point
Edgev0, v1, f0, f1
Edge-point
Vertex
x, y, z, val.
Vertex-point
+ +
Maintain three arrays: Vertices, Faces, and Edges Cracks due to view-adaptive
subdivision are fixed entirely in parallel
Rendering time grows linearly with
scene complexity
Performance grows with complexity,
and settles at 2-2.5 Mfaces/sec
• Parallel GPU tessellation of Catmull-Clark surfaces
• Robust data management for subdivision
• Dynamic view-dependence
• Fixing cracks in parallel
Future Work
Conclusion
• Efficient memory management
• Programmable geometry caching
Atomic
add
Atomic
add
Model Input faces Output faces Rendering time
Big Guy 1,450 91,992 37.12 ms
Monster Frog 1,292 80,452 34.17 ms
Killeroo 2,894 80,227 31.34 ms