aa220/cs238 - parallel methods in numerical analysis...

Post on 19-Aug-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lecture 27

November 26, 2003

AA220/CS238 - Parallel Methods in Numerical Analysis

Parallel Visualization in the ASCI Program

Overview

• Visualization of large-scale datasets generated with massively

parallel machines is a very compute intensive task:

– Large datasets

– Usually time-dependent

– Complex solution features yield large I/O requirements

– Floating point operations needed to render the image

• Advancements are required in several areas

– Basic improvements in visualization algorithms

– Parallel implementation of visualization algorithms

– Parallel visualization hardware (scalable and cost-effective)

Overview - Cont’d

• Examples in this lecture drawn from:

– Stanford ASCI work in unsteady turbomachinery flow simulations

– University of Utah Scientific Computing and Imaging Institute

– Collaboration with MIT on parallel pV3

• A number of research groups are working on parallel

visualization techniques (both hardware and software):

– Stanford University

– U. of Utah

– DoE National Laboratories

– Etc…

Large-Scale Scientific Visualization

Scientific Computing and Imaging InstituteScientific Computing and Imaging Institute

University of UtahUniversity of Utah

Chris Johnson

Interactive Large-Scale Visualization

Medical

Scientific

ComputingGeoScience

The Visualization Pipeline

OnlineOffline

Search RenderConstruct

Isovalue

• Dynamic extraction of isosurfaces

• Rapid extractions

Pre-

process

Generate Render

Visualization Process

Isosurface Extraction

• Marching Cubes

• Octree

• Extrema Graphs

• Sweeping Simplices

• The Span Space

• Livnat, Shen, Johnson

Isovalue

Isovalue

Maximum

Minimum

min =

max

Isosurface Extraction

• Marching Cubes

• Octree

• Extrema Graphs

• Sweeping Simplices

• The Span Space

• Livnat, Shen, Johnson

Isovalue

Isovalue

Maximum

Minimum

min =

max

Isosurface Extraction

• Marching Cubes

• Octree

• Extrema Graphs

• Sweeping Simplices

• The Span Space

• Livnat, Shen, Johnson

Maximum

Minimum

min =

max

– NOISE: O( n+k)

The Visualization Pipeline

Search Renderconstruct

Isovalue

OnlineOffline

• Reduce the amount of data

– Reduce during the search...

Pre-

process

View point

O(k) O(k)O(V(k)) O(V(k))

A View-dependent Approach

• Attractive for:

– Large datasets

– High depth complexity

– Remote visualization

A View-dependent Approach

• Three step method

Traverse

Project

To Graphics

Hardware

Front to back

1) Traverse front to back

2) Project onto a virtual screen

3) Render triangles on graphics hardware

A View-dependent Approach

• Flow chart

• Object Space • Value Space

• Image Space

• Prune non -

intersecting cells

• Front to Back

traversal

• Prune non-visible cells

• Graphics

Engine

• Z-buffer

• Rendering

•S

oft

ware

Hard

ware

• Final Image

• Visibility Part II

• Visibility Part I

Visible Woman

• Full View

• Isosurface depend

• Polys 2,246,000 246,000

• Create 177 sec 72 sec

• Render 2.32 sec 0.25 sec

Why Not Always Use Polygons?

• Marching cubes and similar algorithms can

generate millions of polygons for large data sets

– Reduce by decimation (e.g. Shekhar et. al ‘96)

– View dependent (e.g. Livnat and Hansen ‘98)

Real-Time Ray Tracer

Real-Time Ray Tracer (RTRT)

• Implemented on SGI Origin 3000 ccNUMA

architecture - up to 512 processors (now

working on a distributed version)

• Approximately linear speedup

• Load balancing and memory coherence are

key to performance

Algorithm - 3 Phases

• Traversing a ray through cells that do notcontain an isosurface

• Analytically computing the isosurface whenthe intersecting volume contains an isosurface

• Shading the resulting intersection point

0

5

10

15

20

Frame Number (time)

Fra

mes/

seco

nd

(3

2 p

ro

cess

ors)

Real-Time Ray Tracer - Scalability

RTRT Time Varying Visualization

Real-time Volume Rendering

Volume Rendering

enamel /

backgrounddentin / background dentin / enamel dentin / pulp

1D: not possible

2D: specificity not as good

Volume Rendering - 3D Transfer Function

Vector FieldsVector Fields

© ZIB© ZIB

© © UofUUofU

LIC Flow (Banks and Interrante)

Illuminated Lines - C. Hege, ZIB

Tensor Visualization - Hesselink

Brush Strokes (Laidlaw `98)

Lecture 27

November 26, 2003

AA220/CS238 - Parallel Methods in Numerical Analysis

Large-Scale Visualization of

Turbomachinery Flows Using pV3

Objectives

• Utilize existing software and hardware

technologies to visualize large datasets with

proper scalability in both

– Display size / resolution

– Rendering speed

• Interactive visualization of large-scale

datasets for useful investigation of simulation

results

• Understand what can be done with the kind of

visualization systems that will be available on

the desktop in 2-3 years

Motivation

• At Stanford, in the DoE ASCI (Accelerated Strategic

Computing Initiative) we are trying to simulate very large

scale flows in turbomachinery. The visualization of these

flows is rather difficult and time consuming.

• Our CS group has a lot of expertise in software and hardware

for parallel rendering.

• Can we leverage these tools in the context of an engineering-

usable visualization package?

Objective - Demonstrate Potential of

Hi-Fi Gas Turbine Engine Simulation

• Integrated fan/compressor/combustor/turbine/secondariesunsteady flow and turbulentcombustion simulation

– RANS Turbomachinery

– Combustor

• RANS (NASA-NCC)

• LES (CITS)

– Multi-Code Interface

• Complex code coupling

• Will require 100 TFLOPS

• Have industry and NASAparticipation and interest

P&W 6000 Engine

FlamletFlamlet-progress variable model-progress variable model

for combustion LESfor combustion LES

Mixture fraction

Product mass

fraction

P&W combustor 2.5D grid 1P&W combustor 2.5D grid 1

Stanford-ASCI TFLO Project Goals

• To develop a scalable code (TFLO) that is capable of:

– tackling large-scale unsteady flow simulations of multistageturbomachinery, as well as interactions between compressor, combustor,and turbine

– rapid and cost-effective steady and unsteady analyses required in a designenvironment (single blade passages, multiple stage simulation with lowblade counts) comparable to existing industrial practice

– incorporate advanced turbulence models with corrections to account foreffects typical in turbomachinery (streamline curvature, rotation, etc.)

• To contribute to the development of numerical simulationtechniques that make this type of calculations computationallyaffordable

• To demonstrate integrated calculations simulating the interactionbetween the compressor, combustor, and HP/LP turbine

Gas-Turbine Components

TFLO performance on P&W 6000 turbine

Unsteady Simulation of

Aachen Turbine Rig (TFLO)

Entropy

x/C

pressureenvelope(p/pref)

0 0.25 0.5 0.75 11.2

1.25

1.3

1.35

1.4

passage count 1-1-1passage count 6-7-6

Aachen BladeUnsteady Pressure

Envelopes• Simulation Completed,

AIAA Paper Presented

• 13.5 M Points

• 374 Blocks

• 187 Processors

• 2,800 Time-Steps (w/ 30

inner iterations per time-

step) Required

• 1,985 Hours (clock

time), 371,000 Hours

(cpu time) Required

Frequency/BPF

PressureAmplitude(Pa)

0 1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

70

80

90

Estimated T.E. vortex sheddingfrequency: 10 BPF

1.00

0.69 0.660.83

0.76

0.28

0.03

0.21

0.240.24

0.10

0.03

0.17

0.21

0.10

0.38

0.030.21

0.41

0.14

0.310.03

0.55

0.45

0.38

0.21 0.03

0.03

0.17

0.03

0.07

0.21

T.E.L.E. L.E.

S.S. P.S.

Amplitude of Harmonics

Frequency Spectrum

PredictedMeasured

Plane No. 00Time Index: 1

Secondary Velocity Field

Unsteady Flow Simulation of

P&W Turbine Rig (TFLO)

PressureEntropy

Blade Trailing

Edge Shocks

Shock/Blade

Interaction:

Reflected

Waves from

Vane

Viscous

Wake/Blade

Interaction

Vane/Blade

Potential

Interaction • One Global (1/6 Circumference)

(33% of Total) Completed

• 31.2 M Points

• 652 Blocks

• 196 Processors

• 4,200 Time-Steps (w/ 30 inner

iterations per time-step) Required

• 4,125 Hours (clock time), 808,500

Hours (cpu time) Required

Pressure Loading

Compares Well with

Experiment and PW

Prediction

Predicted Aerodynamic

Losses Compare

Favorably with PW

Prediction

Unsteady Flow Simulation of PW6000 Turbine

(TFLO)

Entropy

• 63% of One GlobalCycle (1/6Circumference)(21% of Total)Completed

• 93.8 M Points• 2192 Blocks

• 512 - 1024Processors

• 5,700 Time-Steps(w/ 30 inneriterations per time-step) Required

• 5,970 Hours (clocktime), 3,060,000Hours (cpu time)Required

HPT (1,2,3)

LPT (5,6,7)

Main/Secondary Flow Path Integration

Direct Coupling (SPMD)

Temperature and

Streamlines

(projected inconstant )

Temperature and 3D Blade-

Relative Streamlines

Pressure and

Streamlines

(projected inconstant )

• Simulation Complete

• 9.4 M Points, 238 Blocks, 144

Processors,

• 1-200,000 Time-Steps Required

• 3,700 Hours (clock time), 532,800

Hours (cpu time) Required

Key Technologies

• Hardware

– High resolution displays• Powerwall

• Super-high resolution displays (5000x3000 kind)

– High speed network interconnects for commodityclusters

• Software

– Support for tiled / high resolution displays with wireGL

– Parallel software implementation for scalablerendering using pV3

Why pV3?

• pV3 is already setup for

– Parallel feature extraction

– Concurrent visualization

– Distributed visualization

– Computational steering

• Work to be done

– Use of wireGL for tiled displays (completed)

– Parallelization of renderer (almost completed)

Current Large-Scale Visualization Setup

GR

1

GR

2

GR

3

GR

4

CPU 1

CPU 2

CPU 3

CPU 4

WAN

CPU 1

CPU 2

CPU 3

CPU 4

CPU 5

CPU 6

CPU 7

CPU 8

CPU 9

CPU 10

CPU 11

CPU 12

CPU 13

CPU 14

CPU 15

CPU 16

pV3

clients

pV3

server

(wiregl)

Feature Extraction

Rendering

wiregl

Bottlenecks in WAN (avoidable),

single renderer (in progress),

internal network

Future Large-Scale Visualization Setup

GR

1

GR

2

GR

3

GR

4

CPU 1

CPU 2

CPU 3

CPU 4

WAN

CPU 1

CPU 2

CPU 3

CPU 4

CPU 5

CPU 6

CPU 7

CPU 8

CPU 9

CPU 10

CPU 11

CPU 12

CPU 13

CPU 14

CPU 15

CPU 16

pV3

clients

pV3

server

(wiregl)

Feature Extraction

Rendering

wiregl

Advantages / Expected Outcome

• Rendering speed 12 x on current display

(best case scenario)

• High resolution images for flow details

• Large degree of interactivity for

turbomachinery flow visualizations

• Parallel I/O will be necessary for unsteady

flow visualizations

top related