the university of north carolina at chapel hill cache-oblivious mesh layouts sung-eui yoon, peter...

Post on 19-Dec-2015

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Cache-Oblivious Mesh Layouts

Sung-Eui Yoon, Peter LindstromValerio Pascucci, Dinesh Manocha1: University of North Carolina - Chapel Hill2: Lawrence Livermore National Laboratory

1

1

2

2

http://gamma.cs.unc.edu/COL

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Goal

• Compute cache-coherent layouts of polygonal meshes ♦ For geometric processing and

visualization♦ Handle any kinds of polygonal

models (e.g., irregular geometry)

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Motivation

• High growth rate of computational power of CPUs and GPUs

Growth rateduring 1993 – 2004

Courtesy: http://www.hcibook.com/e3/online/moores-law/

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Memory Hierarchies and Caches

CPU or GPU

Fast memory or cache

Slow memory

Blocktransfer

Disk

106nsAccess time: 102ns100ns

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Cache-Coherent Layouts

• Cache-Aware♦ Optimized for particular cache

parameters (e.g., block size)

• Cache-Oblivious♦ Minimizes data access time without

any knowledge of cache parameters♦ Directly applicable to various

hardware and memory hierarchies

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

82 million trianglesIrregular distribution of geometry

CAD Model – Double Eagle Tanker Model

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Isosurface and Scanned Models

Isosurface100M triangles

St. Matthew372M triangles

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Main Contribution

• Algorithm to compute cache-oblivious layouts of polygonal meshes

Cache-oblivious metric

Multilevel optimization framework

Applicable to hierarchical representations

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Live Demo – View-Dependent Rendering (VDR)

GeForce Go 6800 Ultra

• Based on multiresolution hierarchy♦ Dynamically computes simplification♦ Cache-oblivious layout is used to

minimize GPU vertex cache misses

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Related Work

• Cache-coherent algorithms• Mesh layouts

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Cache-Coherent Algorithms

• Cache-aware [Coleman and McKinley 95, Vitter 01, Sen et al. 02]

• Cache-oblivious [Frigo et al. 99, Arge et al. 04]

Focus on specific problems such as sorting and linear algebra computations

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Mesh Layouts

• Rendering sequences♦ Triangle strips♦ [Deering 95, Hoppe 99, Bogomjakov

and Gotsman 02]

• Processing sequences♦ [Isenburg and Gumhold 03, Isenburg

and Lindstrom 04]

Assume that access patternglobally follows the layout order!

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Mesh Layouts

• Space-filling curves♦ [Sagan 94, Velho and Gomes 91,

Pascucci and Frank 01, Lindstrom and Pascucci 01, Gopi and Eppstein 04]

Assume geometric regularity!

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Outline

• Overview• Cache-oblivious metric• Results

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Outline

• Overview• Cache-oblivious metric• Results

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Overview

Multilevel optimizationCache-oblivious metric

Local permutations

va

vb vd

vc

Input graph

va vb vd vc

Result 1D layout

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Graph-based Representation

• Undirected graph, G = (V, E)♦ Represents access patterns of

applications

• Vertex♦ Data element ♦ (e.g., mesh vertex or mesh triangle)

• Edge♦ Connects two vertices if they are

likely to be accessed sequentially

va

vb vd

vc

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Problem Statement

• Vertex layout of G = (V, E)♦ One-to-one mapping of vertices to

indices in the 1D layout

• Compute a that minimizes the expected number of cache misses

: |}|, ... ,1{ VVva

vb vd

vc

va vb vd vc

1 2 3 4

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Local Permutation

Vertex layout

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Terminology

• Edge span of (va, vb)|)()(| ba vv

Layout mapping

1)( av

5)( cv

4|)()(| ca vv

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Terminology

• ♦ Set of edges having edge span i in

the layout

iE

4),( Evv ca 4

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Terminology

• Edge span distribution ♦ where i is in [1, n]|| iE

1|| 3 E1|| 2 E

1|| 4 E

4|| 1 E

Edge span1

Number of edges

2 3 4

1

1

1

1

4

2

3

4

1

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Cache Miss Ratio Function (CMRF),

• Probability of a cache miss for a given edge span i

ip

0

1Cache miss ratio =Probability to have

a cache miss

Edge span

ip

1 n-1i

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Number of Cache Misses at Runtime

• Estimated by multiplying two factors♦ Runtime edge span distribution♦ CMRF

1D Layout:

Edge span 2 Edge span 4 Edge span 2

2p 2p4p+ + ( 2 1, () 2p 4p, )( )

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Number of Cache Misses at Runtime

1D Layout:

Edge span 2 Edge span 4 Edge span 2

2p 2p4p+ + ( 2 1, () 2p 4p, )

Runtime edge span distribution CMRF

( )

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Expected Number of Cache Misses

♦ Approximate runtime edge span distribution with one of the layout

1

1

||n

iii pE

Edge span distribution of the layout

The number of vertices

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Outline

• Overview• Cache-oblivious metric• Results

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Cache-Oblivious Metric

• Decides if a local permutation reduces number of cache misses♦ Probabilistic formulation♦ Reduces to geometric volume

computation

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Does a Local Permutation Decrease Cache Misses?

1

1

||n

iii pE

1

1

|)||(|n

iiii pEE

|||| ii EE || iE

?

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Does a Local Permutation Decrease Cache Misses?

1

1

||n

iii pE

1

1

|)||(|n

iiii pEE

0||1

1

n

iii pE

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Monotonocity of CMRF,ip

• Assume CMRF is a monotonically increasing function of edge span

0

1Cache miss

ratio

Edge span

ip

1 ∞i

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Exact Cache-Oblivious Metric

0||1

1

n

iii pE

where

All the possible cache configurations

1...0 1221 nn pppp

Monotonicity of CMRF

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Geometric Formulation

where

0||1

1

n

iii pE

1...0 1221 nn pppp

Half hyperspacep2

p10

Closed hyperspace1 n

p2

p10

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Geometric Volume Computation

• Assume each CMRF to be equally likely

• Half hyperspace (blue area)♦ Space of CMRFs that reduce cache misses

p2

p10where

0||1

1

n

iii pE

1...0 1221 nn pppp

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Geometric Volume Computation

Time complexity♦ Exact: [Lasserre and Zeron

01]♦ Approximate: [Kannan et al. 97]

)( 1nnO)( 5nO

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

p2

p10

Fast and Approximate Volume Comparison

• Define a top polytope in closed hyperspace

• Compute the centroid, C, of the top polytope

Top polytope Centroid, C

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

p2

p10

Fast and Approximate Volume Comparison

• Use the centroid for approximate volume comparison♦ The volume containing the centroid is

likely to be larger

Centroid, C

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Bound of Approximation

• 0.1% ~ 0.3% compared to the exact metric

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Final Approximate Metric

0||1

)(

m

jjl jE

Centroid

Pack non-zero to 1,…, m || iE

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Layout Optimization

• Find an optimal layout that minimizes our metric♦ Combinatorial optimization problem

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Multilevel Minimization

Step 1: Coarsening

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Multilevel Minimization

Step 2: Ordering of coarsest graph

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Multilevel Minimization

Step 3: Refinement and

local optimization

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Outline

• Overview• Cache-oblivious layouts• Results

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Layout Computation Time

• Process 70 million vertices per hour♦ Takes 2.6 hours to lay out St.

Matthew model (372 million triangles)

♦ 2.4GHz of Pentium 4 PC with 1 GB main memory

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Edge Span Distributions of Different Layouts

Cache-oblivious layout

Spectral layout

Original layout

Edge span

Nu

mb

er o

f ed

ges

>

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Applications

• View-dependent rendering• Collision detection• Isocontour extraction

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

View-Dependent Rendering

• Layout vertices and triangles of CHPM [Yoon et al. 04]♦ Reduce misses of GPU vertex cache

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

View-Dependent Rendering

Models # of Tri.Our

layout

Simplification layout

[Yoon et al. 04]

St. Matthew

372M 106 M/s 23 M/s

Isosurface 100M 90 M/s 20 M/s

Double Eagle

Tanker82M 47 M/s 22 M/s

4.5X

2.1X

Peak performance: 145 M tri / s on GeForce 6800 Ultra

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Realtime Captured Video – St. Matthew Model

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Comparison with Other Rendering Sequences

Our layout

Universal rendering sequences[Bogomjakov and Gotsman 2002]

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Comparison with Other Rendering Sequences

Our layout

[Hoppe 99]

Optimized for 16 vertex cache sizewith FIFO replacement

Optimized for no particular cache size

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Performance during View-Dependent Rendering

Our layout

[Hoppe 99]

Optimized for various resolutions

Optimized for full resolution

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Comparison with Space Filling Curve on Power Plant Model

Our layout

Space filling curve (Z-curve)

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Collision Detection

• Bounding volume hierarchies♦ Widely used to accelerate the

performance of collision detection♦ Traversed to find contacting area♦ Uses pre-computed layouts of OBB

trees [Gottschalk et al. 96]

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Rigid Body Simulation

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Collision Detection Time

2X on average

Depth-first layout

Cache-oblivious layout

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Isocontour Extraction

• Contour tree [van Kreveld et al. 97]

• Use mesh as the input graph

• Extract an isocontour that is orthogonal to z-axis

Puget sound, 134 M triangles

Isocontourz(x,y) = 500m

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Comparison – FirstExtraction of Z(x,y) = 500m

Relative Performance

overZ-axis sorted

layout

Nearly optimized for particular isocontour

2

21

13

1

Disk access time is bottleneck

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Comparison – Second Extraction of Z(x,y) = 500m

Relative Performance

overZ-axis sorted

layout

2

21

13

379

212

10.8

Memory and L1/L2 cache access times are bottleneck

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Limitations

• Assumptions on CMRF♦ May not work well for all applications

• Does not compute global optimum♦ Greedy solution

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Advantages

• General ♦ Applicable to all kinds of polygonal

models♦ Works well for various applications

• Cache-oblivious♦ Can have benefit from CPU/GPU

cache to memory and disk

• No modification of runtime application♦ Only layout computation

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

OpenCCL: Cache-Coherent Layouts of Graphs and Meshes• Source codes for computing a

cache-coherent layout • Easy to use

CLayoutGraph Graph (NumVertex);

0

1 2

Graph.AddEdge (0, 1);Graph.AddEdge (0, 2);Graph.AddEdge (1, 2);

int Order [NumVertex];Graph.ComputeOrdering (Order);

Google “Cache Oblivious Mesh Layout” or

Http://gamma.cs.unc.edu/COL

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Conclusion

• Novel algorithm for computing cache-oblivious mesh layouts♦ Cast the problem as an optimization♦ Probabilistically compute the

expected number of caches misses♦ Achieve significant improvements (2

to 20X) without modifying runtime applications

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Ongoing and Future Work

• Apply to other applications ♦ Simplification and approximate

collision detection [Yoon et al. 04]♦ Shortest path computation, etc.

• Investigate optimality

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Ongoing and Future Work

• Cache-Oblivious Layouts of Bounding Volume Hierarchies [Yoon and Manocha 05] ♦ Tech. Report, University of North

Carolina at Chapel Hill

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Acknowledgements

• Anonymous donor ♦ Power plant model

• Digital Michelangelo Project♦ St. Matthew model at Stanford

University

• LLNL ASCI VIEWS♦ Isosurface model

• Newport news shipbuilding♦ Double eagle tanker

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Acknowledgements

• Army Research Office• DARPA• Intel Corporation• Lawrence Livermore Nat’l Lab.• National Science Foundation• Office of Naval Research• RDECOM

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

• Martin Isenburg• Dawoon Jung• Brandon Lloyd• Elise London• Brian Salomon• Avneesh Sud

Acknowledgements

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Questions?

Project URLhttp://gamma.cs.unc.edu/COL

top related