parallel system for interactive multi-experiment computational studies (psimecs)

57
Parallel System for Interactive Multi- Experiment Computational Studies (pSIMECS)

Upload: clement-may

Post on 04-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Parallel System for Interactive Multi-Experiment

Computational Studies(pSIMECS)

Page 2: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Problem Description

● Multi-Experiment Computational Studies:– Computational Studies involving multiple

experiments, each corresponding to an individual execution of a simulation software

● Example: Design Space Exploration– Goal: Given a set of possible parameter values (a

parameter space), an experiment that maps a parameter value to a performance metric, find a subset of the parameter space whose performance metrics fit certain criteria.

Page 3: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Problem Description

● Model Application: Pareto Frontier Discovery. ● Pareto Frontier is a set of points on the parameter

space that is not completely dominated by any other point in the parameter space.– p “completely dominates” q iff there is all

components in p's performance metric perform better than q's.

Page 4: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Pareto Frontier Insights

● Simulations are independent – embarrassingly parallel

● An experiment corresponds to an execution of a simulation software, which can itself be parallel or sequential

● Result from one simulation can be used to speed up simulations of nearby parameter values (e.g., as initial guess for Newton Iteration.)

Page 5: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Pareto Frontier Insights

● Decisions can be made with imprecise results: can trade off precision Vs resources

● If parameter space is large, sweeps are inefficient.● Need to prune portions of the space as the study

progresses, either automatically or interactively. ● Active Sampler can automatically pick

"interesting" simulations (e.g., close to boundary)

Page 6: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Example Problem

● Bridge design computational study: 1D bridge in 2D space, with end points clamped. Two elastic supports are added to the middle of bridge.

● Parameter space: distance of the two supports from the end of the bridge.

● Performance measures: maximum deflection of the bridge, and the cost of supports

● Bridge is clamped at all support points, with bending and stretching forces, and uniform load.

Page 7: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Example Problem

Test Problem.Parameter: <r

0, r

1>

Performance metric: <max

0<r<Lf(r), c(r

0 ) + c(r

1)>.

Cost function: c(r)

Page 8: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Goal

● Simecs: Software on parallel systems that manages simulation processes in a Multi-Experiment Computational Study.

● Frees users and application developers from micromanaging every simulation process

● Goal: Interactive, Steerable Design Space Exploration

Page 9: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – User View

● Two types of parameters – technique parameters (e.g., discretisation of nodes,

convergence tolerance) – model parameters (e.g., young's modulus of a

material, viscosity of a fluid).

● Goal: As the Pareto frontier obtained from one set of parameters is forming, the user can switch to another setup and continue the study. – e.g., Limit the exploration space but increase the

resolution.

Page 10: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Developer View

● Application Developer provides 3 modules:– Simulation: Maps a parameter space point to

performance space point– Visualisation & interaction: Displays the relevant

information to user; Collects information from user, and maps the information into the Simulation module

– Transformation: Transform a state of a simulation on one technique parameter into another.

● e.g., interpolate checkpoints from different resolutions

Page 11: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – System View

● Shared object layer, Active sampler, Resource Allocator

Page 12: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – System View

● Shared object space layer: System-wide repository of shared objects (e.g., checkpoints, error estimations, results)

● Sampler: Based on users' specifications, issues sample points where simulations will be run

● Resource Allocator / Manager: Maps simulations into computing elements, decides whether to use a checkpoint.

Page 13: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – SISOL

● Spatially-Indexed Shared Object Layer (SISOL)● Used for storing system-wide shared objects.● For the model problem, checkpoints, and results

(performance metric at each parameter point). ● <Index, object set id> names a unique object in

the system.

Page 14: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – SISOL

● Objects are typed: SISOL requires pack() and unpack() implementations for each type. For parallel object types, also requires a function to map parallel objects into different decompositions.

● Supports split-phase create, delete, read and write: to enforce read-modify-write consistency

● Supports neighborhood query

Page 15: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – SISOL Implementation

● Ideal implementation: directory-based cache, where each node participates in storing of objects.

● Current implementation: – Single TCP Server – In core– Hash-map based lookup– Linear lookup for nearest neighbor– Supports only sequential objects

Page 16: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – SISOL Implementation

– Object sets created on server– Nearest neighbor query retrieves coordinates only– Supports Sequential Petsc Vector object type by

default.

● Sufficient for small sets, small objects

Page 17: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – SISOL Use

● Current Pareto Frontier problem uses two object sets:– Result set (parameter point => performance metric) – Checkpoint set (parameter point => Sequential Petsc

vectors)

● In the test problem, parameter point is a 2D vector, so result set & checkpoint set have 2D indices.

Page 18: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – FUEL

● Frame/Update Exchange Layer: Control layer between the manager and simulation processes

● Codes that represent a functional aspect of a steerable application are grouped together (called a Satellite).

● Event-based on manager process; Poll-based on simulation processes

● Dynamic model: Satellites can be activated and decommissioned as a simulation is running

Page 19: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – FUEL Interaction

● As simulator runs one simulation for a parameter point, the manager is processing the last one(s). Simulator Process

Manager Process

Calculate point X

Query Sampler, gets point Y

Time

Register X result, Query Sampler, get point Z

Calculate point Y Calculate point Z

Register Y result, Query Sampler, get point A

X resultY

Z resultA

Y resultZ

Page 20: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Active Sampler

● Resolves the pareto frontier progressively– Maintains a task queue and a result set– Task queue = points in parameter space of interest,

result set = points discovered so far that are undominated (i.e., current pareto set candidates)

– Seeds a task queue with points from a lattice on the parameter space.

– Run the task queue.

Page 21: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Active Sampler

– For each result that comes back, decide if the point is undominated by all points in the result set. If so, remove all points in the result set that are dominated by it, add it to the result set, and insert its lattice neighbors into the task queue.

– Continue until task queue is empty. – Refine the lattice, then repeat

● Effect: result set contains a set of pareto point candidates that had originated from a lattice. The lattice is finer as more time is spent.

Page 22: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Active Sampler

Initial Grid

Page 23: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Active Sampler

1st level results

Page 24: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Active Sampler

First Level Pareto Frontier

Page 25: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Active Sampler

First Refinement

Page 26: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Active Sampler

2nd level results

Page 27: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Active Sampler

Second level Pareto Frontier

Page 28: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Active Sampler

2nd Refinement

Page 29: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Active Sampler

3rd level results

Page 30: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Active Sampler

3rd level Pareto Frontier

Page 31: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Manager

● Spawns off simulation processes● When the result of a simulation comes back (via a

FUEL callback):– Registers the result– Asks active sampler for the next point to run– Looks up the SISOL for a checkpoint to jump-start

the next point– Sends the parameters of the next simulation,

coordinates of the checkpoint, and error tolerances to the simulation process.

Page 32: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test System

● Single Server implementation of SISOL to store checkpoint set

● 3 Versions Samplers: Active, Random, and Sweep

● TCP-based FUEL● Simulation implemented with PETSc SNES

solver. ● Jump-start from Checkpoints = use checkpoint's

configuration as the starting guess

Page 33: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test System

● Heterogenous cluster: – 1 1.5GHz Athlon node (manager, SISOL Server), – 22 1.2GHz Duron nodes (simulation processes)– 10 3 GHz Pentium 4 nodes. (simulation processes)– 100Mbps switched Ethernet network between Athlon

and Duron nodes, 10Mbps Ethernet between Pentium 4 nodes.

Page 34: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test Result (Sampler)

● Active Sampler compared against: 1) Grid-based sampler, which performs a parameter sweep on the grid with increasing refinement, 2) Random sampler

● Both run for 1500 simulations, and the partial frontiers are dumped at periodic intervals. Housedorff distance is measured, using the final Active Sampler-based frontier with 1500 simulations as the ground truth.

Page 35: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test Result (Sampler)

Page 36: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test Result (Sampler)

Page 37: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test Result (Sampler)

Page 38: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test Result (Sampler)

Page 39: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test Result (Sampler)

Page 40: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test Result (Sampler)

Page 41: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test Result (Sampler)

Page 42: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test Result (Sampler)

Page 43: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test Result (Sampler)

Page 44: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test Result (Sampler)

Page 45: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test Result (Sampler)

Page 46: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test Results (Sampler)

Page 47: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test Results (Sampler)

Page 48: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test Results (Sampler)

Page 49: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test Results (Sampler)

Page 50: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test Results (Sampler)

Page 51: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test Results (Sampler)

Page 52: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs - Test Result (Checkpoints)

● Cuts down number of iterations per simulation.

Page 53: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test Result (Scaling)

Duron nodes added (Slower speed, faster communication)

Page 54: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Test Result (Scaling)

Page 55: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Conclusions

● Multiple experiments can be managed automatically

● Interactive speed can be achieved via re-use of checkpoints, active sampling, and partial results – run time goes from 3088 seconds down to 17, and lower if partial frontiers can be used

Page 56: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Conclusions

● TCP-based communication framework provides system with portability - can be used on heterogeneous clusters

● Spatially-indexed object sets are useful communication substrate

Page 57: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Simecs – Future work

● Distributed implementation of SISOL ● Parallelise individual simulations (SISOL

Support for Parallel Objects)● MPI-based communication for SISOL and FUEL● Interactivity