ddddrraw: a prototype toolkit for distributed real-time rendering on commodity clusters
Post on 21-Mar-2016
23 Views
Preview:
DESCRIPTION
TRANSCRIPT
DDDDRRaw: A Prototype Toolkit for Distributed Real-Time Rendering on
Commodity Clusters
Thu D. Nguyen and Christopher Peery Department of Computer Science
Rutgers University
John ZahorjanDepartment of Computer Science & Engineering
University of Washington
IPDPS 2001
Overview
Improve real-time rendering performance using distributed rendering on commodity clusters• Real-time rendering -> interactive rendering applications• Improve performance -> Render more complex scenes at
interactive rates Why real-time rendering?
• A critical component of an increasing number of continuous media applications
Virtual reality, data visualization, CAD, flight simulators, etc.
• Rendering performance will continue to be a bottleneck Model complexity increasing as fast (or faster) than hardware performance Part of the challenge is to leverage increasingly powerful hardware accelerators
IPDPS 2001
Challenges
How to structure the distributed renderer to leverage hardware-assisted rendering• Information that is useful for work partitioning and
assignment may be hidden in the hardware rendering pipeline
How to minimize non-parallelizable overheads (avoiding Amdhal’s Law)
How to decouple bandwidth requirement from the complexity of the scene and the cluster size
IPDPS 2001
Image Layer Decomposition (ILD)
Per-frame rendering load is partitioned using ILD• presented in IPDPS 2000
Briefly review ILD because it affects DDDDRRaW’s architecture and performance
Basic idea: assign scene objects such that sets of objects assigned to different nodes are not mutually occlusive
Advantages of using ILD• Do not need position of polygons in 2D
This information may be hidden inside the graphics pipeline
• Do not need Z-buffer information This reduces the required bandwidth by at least 50%
IPDPS 2001
Spatial partitioning
Image Layer Decomposition (ILD)
1 2
3 4
5 6
3
5 4 1
26
IPDPS 2001
Non-mutually occlusive assignment -> legal for back-to-front compositing
Use heuristic-based algorithm to• Balance load across cluster• Minimize the screen real-estate covered by each assignment
ILD: Work Assignment
3
5 4 1
6 2
Legal
IPDPS 2001
App.
DDDDRRaWLibrary
DDDDRRaWLibrary
DDDDRRaWLibrary
DDDDRRaWLibrary
DDDDRRaWLibrary…
Display
WorkAssignment
PartialImage
VRMLScene,DisplayWindow
Viewpoint
DisplayNode
Rendering Nodes
Implementation: Architecture
• Partitioning• Assignment• Decompress• Compositing
• Rendering• Compress
IPDPS 2001
Implementation Details
Implemented an optimization to ILD: dynamic selection of octants to be rendered• Minimize overhead of geometric transformation due to
polygon splitting (in scene decomposition) Compression of image layers before communication
• Reduce bandwidth requirement to accommodate slower networks (eg., 100 Mb/s LANs)
Use dynamic clipping to enforce octant boundaries for scene with smooth shading and/or texturing• Simplification to ease implementation of prototype – this
clipping could/should be done statically• 20-25 percent overhead for 5 of our 6 test scenes that would
not be present in a production system
IPDPS 2001
Performance Measurement
Application: VRML viewer• VRweb – http://www.iicm.edu/vrwave
Collected 6 VRML scenes from the web• Use fix paths through scenes to measure performance in terms of
average frame rate (frames/sec) Two clusters representing different points in the technology
spectrum• Cluster of 5 SGI O2s
180 MHz Mips R5000, 256 MB memory, SGI Graphics Accelerator, 100 Mb/s switched Ethernet LAN
IRIX 6.5.7
• Cluster of 13 PCs Pentium III 800 MHz, 512 MB memory, Giganet 1 Gb/s cLAN Red Hat Linux (kernel 2.2.14), Mesa 3D library version 3.2
IPDPS 2001
Two Test Scenes
IPDPS 2001
Overheads on SGI O2s
Operation Time (ms)Display Node Rendering Node
P=1 P=2 P=4 P=1 P=2 P=4ILD 2.08 1.97 8.68
Clear Image Buffer
3.50 3.50 3.50
Decompress 18.08 22.84 30.28
Display Frame 0.18 0.18 0.18
Compress 36.03 27.13 17.70
IPDPS 2001
Overheads on PCs
Operation Time (ms)Display Node Rendering Node
P=1 P=4 P=8 P=12 P=1 P=4 P=8 P=12
ILD 2.62 2.63 2.63 2.70
Clear Image Buffer
4.98 5.01 5.37 5.24
Decompress 3.29 4.11 4.33 4.46
DisplayFrame
15.79 15.34 15.73 15.73
Compress 7.42 7.52 7.46 7.79
IPDPS 2001
Speed-up of Average Frame Rate on O2s
0
1
2
3
4
5
6
Aztec City Cham ber Hall Coronary Left Lung CSBuilding
Spe
ed-u
p
SequentialP=1P=4
IPDPS 2001
Speed-up of Average Frame Rate on PCs
0
1
23
4
5
67
8
9
1 2 3 4 5 6 7 8 9 10 11 12Number of Rendering Nodes (P)
Spe
ed-u
p
CS BuildingHallChamberAztec CityCoronary
IPDPS 2001
Speed-up of Rendering Component on PCs
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10 11 12Number of Rendering Nodes (P)
Spe
ed-u
p
Aztec CityCoronary
IPDPS 2001
Conclusions
Can build an ILD-based distributed renderer to significantly improve real-time rendering performance on commodity hardware
DDDDRRaW currently scales to modestly sized cluster• This limitation is due to non-optimal hardware configurations• This is NOT because more suitable hardware is not available!• Expect good scalability to clusters of 16-32 nodes
Overlapping communication with computation increases average frame rate but ONLY at the expense of increasing frame latency• Problem is CPU contention for rendering & communication• Either need dedicated hardware or can only optimize after reaching
10-15 fps, the nominal interactive frame rate Project URL: www.cs.washington.edu/research/ddddrraw/
IPDPS 2001
Overlapping Communication & Computation
Communication and compression are significant sources of overhead
Apply standard parallel optimization technique: overlap communication of rendered image layers for one frame with rendering of the next
Requires pipelining of DDDDRRaW
IPDPS 2001
The DDDDRRaw Pipeline
Render Compress
Receive
Send
DecompressComposite & DisplayILD Send
Receive
Stage 1 Stage 3
Stage 2
Display Node
Rendering Nodes
IPDPS 2001
Average Frame Rates
0123456789
Aztec City Chamber Hall Coronary Left Lung CSBuilding
Fram
e R
ate
(fps
)
Avg SeqAvg STAvg MT
IPDPS 2001
Average Frame Latency
0
100
200
300
400
500
600
700
800
900
Aztec City Chamber Hall Coronary Left Lung CSBuilding
late
ncy
(ms)
Avg STAvg MT
top related