siu yau dissertation defense, dec 08
DESCRIPTION
Using Application-Domain Knowledge in the Runtime Support of Multi-Experiment Computational Studies. Siu Yau Dissertation Defense, Dec 08. Multi-Experiment Study (MES). Simulation software rarely run in isolation Multi-Experiment Computational Study - PowerPoint PPT PresentationTRANSCRIPT
Using Application-Domain Knowledge in the Runtime
Support of Multi-Experiment Computational Studies
Siu Yau
Dissertation Defense, Dec 08
Multi-Experiment Study (MES)
• Simulation software rarely run in isolation
• Multi-Experiment Computational Study – Multiple executions of a simulation experiment– Goal: Identify interesting regions in input
space of simulation code
• Examples in engineering, science, medicine, finance
• Interested in aggregate result– Not individual experiment
MES Challenges
• Systematically cover input space – Refinement + High dimensionality
Large number of experiments (100s or 1000s) and/or user interaction
• Accurate individual experiments– Spatial + Temporal refinement
Long-running individual experiments (days or weeks per experiment)
• Subjective goal– Require study-level user guidance
MES on Parallel Architectures
• Parallel architecture maps well to MES
• Dedicated, local access to small- to medium-sized parallel computers– Interactive MES
User-directed coverage of exploration space
• Massively-parallel systems– Multiple concurrent parallel experiments
exploit power of massively parallel systems
• Traditional systems lack high-level view
Thesis Statement
To meet the interactive and computational requirements of Multi-Experiment Studies, a parallel run-time system must view an entire study as a single entity, and use application-level knowledge that are made available from the study context to inform its scheduling and resource allocation decisions.
Outline
• MES Formulation, motivating examples – Defibrillator Design, Helium Model Validation
• Related Work
• Research Methodology
• Research Test bed: SimX
• Optimization techniques– Sampling, Result reuse, Resource allocation
• Contributions
MES Formulation
• Simulation Code: maps input to result– Design Space: Space of possible inputs to
simulation code
• Evaluation Code: maps result to performance metric– Performance Space: Space of outputs of
evaluation code
• Goal: Find Region of Interest in Design & Performance Space
Example: Defibrillator Design
• Help design implantable defibrillators
• Simulation Code: – Electrode placements + shock voltage
Torso potential
• Evaluation Code: – Torso potential + activation/damage thresholds
% activated & damaged heart tissues
• Goal: Placement + voltage combination to max activation, min damage
Example: Gas Model Validation
• Validate gas-mixing model
• Simulation Code:– Prandtl number + Gas inlet velocity
Helium plume motion
• Evaluation Code:– Helium plume motion
Velocity profile deviation from real-life data
• Goal: Find Prandtl number + inlet velocity to minimize deviation
Example: Pareto Optimization
• Set of inputs that cannot be improved in all objectives
Damage
Activa
tion
Example: Pareto Optimization
• Set of inputs that cannot be improved in all objectives
• Interactive Exploration of Pareto Frontier– Change set up (voltage, back electrode, etc.)
new study– Interactive exploration of “study space”
• One user action one aggregate result
• Need study-level view, interactive rate
Challenge: Defibrillator Design
Challenge: Model Validation
• Multiple executions of long-running code– 6x6 grid = 36 experiment – ~3000 timesteps per experiment @ 8
seconds per timestep – 6.5 hours per experiment
10 days per study• Schedule and allocate resource as a
single entity: how to distribute parallel resources?
• Grid Schedulers– Condor, Globus– Each experiment treated as a “black box”
• Application-aware grid infrastructures:– Nimrod/O and Virtual Instrument – Take advantage of application knowledge –
but in ad-hoc fashion– No consistent set of APIs reusable across
different MESs
Related Work: Grid Schedulers
Related Work: Parallel Steering
• Grid-based Steering– Grid-based: RealityGrid, WEDS– Steer execution of inter-dependent tasks– Different focus: Grid Vs cluster
• Parallel Steering Systems– Falcon, CUMULVS, CSE– Steers single executions (not collections) on
parallel machines
Methodology
• Four example MESs, varying properties
Study Bridge Design
Defibrillator Design
Animation Design
Gas Model Validation
User interaction
No Yes Yes No
No. of experiments
100K 65K ~100K 36
Time per experiment
7 secs 2 secs < 1 sec 6.5 hours
Parallel code?
No No No Yes
Study goal Pareto Optimization
Pareto Optimization
Aesthetic
Measure
Pareto Optimization
Methodology (cont’d)
• Identify application-aware system policies– Scheduling, Resource allocation, User
interface, Storage support
• Construct research test bed (SimX)– API to import application-knowledge– Implemented on parallel clusters
• Conduct example MESs– Implement techniques, measure effect of
application-aware system policies
Test bed: SimX
• Parallel System for Interactive Multi-Experiment Studies (SIMECS)
• Support MESs on parallel clusters
• Functionality-based components– UI, Sampler, Task Queue, Resource
Allocator, Simulation container, SISOL
• Each component with specific API
• Adapt API to the needs of the MES
SISOLAPI
Test bed: SimX
Front-end Manager Process
Worker Process Pool
User Interface: Visualisation &
Interaction
Sampler
ResourceAllocator
FU
EL
Inte
rfac
e
SISOL Server Pool
Data Server
Data Server
Data Server
Data Server
Dir
Ser
ver
TaskQueue
Simulationcode
FU
EL
Inte
rfac
eEvaluation
code
SimulationContainer
Optimization techniques
• Reduce number of experiments needed:– Automatic sampling– Study-level user steering– Study-level result reuse
• Reduce run time of individual experiments: – Reuse results from another experiment:
checkpoints, internal states
• Improve resource utilization rate– Min. parallel. overhead & max. reuse potential– Preemption: claim idle resources
Active Sampling
• If MES is optimization study (i.e., region of interest is to optimize a function)– Incorporate search algorithm in scheduler
• Pareto optimizations: Active Sampling– Cover design space from coarse to fine grid – Use aggregate results from coarse level to
identify promising regions
• Reduce number of experiments needed
Active Sampler (cont’d)
Initial Grid
1st level results
First Refinement
2nd level results
2nd Refinement
3rd level results
CustomSampler SISOL
API
Support for Sampling
Front-end Manager Process
Worker Process Pool
Naïve (Sweep)Sampler
ResourceAllocator
FU
EL
Inte
rfac
e
SISOL Server Pool
Data Server
Data Server
Data Server
Data Server
Dir
Ser
ver
TaskQueue
Simulationcode
FU
EL
Inte
rfac
eEvaluation
code
RandomSampler
Active (Pareto) SamplerSampler
SimulationContainer
void setStudy(StudySpec)
void registerResult(experiment, performance)
experiment getNextPointToRun ()
SimX Sampler API
User Interface: Visualisation &
Interaction
Evaluation: Active Sampling
• Helium validation study– Resolve Pareto frontier on 6x6 grid– Reduce no. of experiments from 36 to 24
• Defibrillator study– Resolve Pareto frontier on 256x256 grid– Reduce no. of experiments from 65K to 7.3K– Non-perfect scaling due to dependencies– At 128 workers: Active sampling: 349 secs;
Grid sampling: 900 secs
Result reuse
• MES: many similar runs of simulation code
• Share information between experiments – speed up experiment that reuse information– only need to calculate deltas
• Many types: depends on information used– varying degrees of generality
• Reduce individual experiment run time– except study-level reuse
Result reuse typesType Result reused Applicability
Checkpoint reuse
Simulation code output
Time-stepping code,
Iterative solver
Preconditioner reuse
Preconditioner Iterative linear solver
Intermediate result reuse
Internal state Simulation code with shared internal states
Simulation result reuse
Simulation code output
Interactive MESs
Performance metric reuse
Evaluation code output
Interactive MESs
Study-level reuse
Aggregate result of study
Interactive MESs
Intermediate Result Reuse
• Defibrillator simulation code solves 3 systems, linearly combine solutions
• Same system needed by different experiments
• Cache the solutions Adx=bd
Aax=ba
Abx=bbAcx=bc
Store Ac-1bc and Ab
-1bb
SISOLAPI
Support for Result Reuse
Front-end Manager Process Worker Process Pool
Sampler
ResourceAllocator
FU
EL
Inte
rfac
e
SISOL Server Pool
Data Server
Data Server
Data Server
Data Server
Dir
Ser
ver
TaskQueue
SimulationCode
FU
EL
Inte
rfac
e
Evaluation code
Ab-1bb
Aa-1ba
SISOL API:
object StartRead(objSet, coord)
void EndRead(object)
object StartWrite(objSet, coord)
void EndWrite(objSet, object)
User Interface: Visualisation &
Interaction
Checkpoint Result Reuse
• Helium code terminates when KE stabilizes• Start from another checkpoint – stabilize faster• Must have same inlet velocities
Study-level Result Reuse
• Interactive study: two similar studies
• Use Pareto frontier from first study as a guide for next study
Evaluation: Result Reuse
• Checkpoint reuse in Helium Model Study:– No reuse: 3000 timesteps; with reuse: 1641– 18 experiments out of 24 able to reuse– 28% improvement overall
• Defibrillator study– No reuse: 7.3K experiments @ 2 secs each =
349 secs total on 128 procs– With reuse: 6.5K experiments @ 1.5 secs =
123 secs total on 128 procs– 35% improvement overall
Resource Allocation
• MES made up of parallel simulation codes
• How to divide cluster among experiments? – Parallelization overhead
• fewer processes per experiment
– Active sampling + reuse • Some experiments more important; more
processes for those experiments
• Adapt allocation policy to MES: – Use application knowledge to decide which
experiments are prioritized
Resource Allocation
• Batching strategy: select subset (batch) and assign high priority, run concurrently– Considerations for batching policies
• Scaling behavior: maximize batch size• Sampling policy: prioritize “useful” samples• Reuse potential: prioritize experiments with reuse
• Preemption strategy: – claim unused processing elements and assign
to experiments in progress
Resource Allocation: Batching
• Batch for Active Sampling• Identify independent experiments in sampler• Max. parallelism while allowing active sampling
First Batch
1st Pareto-Optimal
Second Batch
1st & 2nd Pareto Opt.
3rd Batch
1st to 3rd Pareto Opt.
4rd Batch
Pareto Frontier
Prantl Number
Inle
t V
eloc
ity
Resource Allocation: Batching
• Active Sample batching
1st Batch
2nd Batch 3rd Batch 4th Batch
Resource Allocation: Batching
• Batch for reuse class• Sub-divide each batch into 2 smaller batches:
– 1st sub-batch: first in reuse class; no two belong to same reuse class
– No two concurrent from- scratch experiments can reuse each other’s checkpoints(max. reuse potential)
– Experiments in samebatch have comparable run times (reduce holes)
Prantl Number
Inle
t V
eloc
ity
Resource Allocation: Batching
• Batching for reuse classes
1st Batch
2nd Batch
3rd Batch4th Batch 5th Batch
6th Batch
Resource Allocation: Preemption
• With preemption
1st Batch
2nd Batch
3rd Batch 4th Batch5th Batch
6th Batch
SISOLAPI
Support for Resource Allocation
Front-end Manager Process
Worker Process Pool
User Interface: Visualisation &
Interaction
Sampler
ResourceAllocator
FU
EL
Inte
rfac
e
SISOL Server Pool
Data Server
Data Server
Data Server
Data Server
Dir
Ser
ver
TaskQueue
Simulationcode
FU
EL
Inte
rfac
e
Evaluation code
SimulationContainer
TaskQueue::AddTask(Experiment)
TaskQueue:: CreateBatch(set<Experiment>&)
TaskQueue::GetIdealGroupSize()
TaskQueue:: AssignNextTask(GroupID)
Reconfigure(const int* assignment)
Evaluation: Resource AllocationKnowledge used
Total time Utilization
Rate
Avg. time per run
Improvement
None (run on 1 worker)
12 hr 35 min 56.3% 6 hr 17 min N/A
None (run 1 experiment)
20 hr 35 min 100% 34.3 min N/A
+ Active Sampling
6 hr 10 min 71.1% 63.4 min 51% / 70%
+ Reuse classes
5 hr 10 min 71.3% 39.7 min 59% / 75%
+ Preemption 4 hr 30 min 91.8% 34.5 min 64% / 78%
Contributions
• Demonstrate the need to consider the entire end-to-end system
• Identify system policies that can benefit from application-level knowledge– Scheduling (Sampling): for optimization MESs – User steering: for MESs with subjective goals and
MES with high design space dimensionality– Result reuse: for MESs made up of similar executions
of simulation code– Resource allocation: for MESs made up of parallel
simulation codes
Contributions
• Demonstrate with prototype system– API to import relevant application-knowledge
• Quantify the benefits of application-aware techniques– Sampling: orders of magnitude improvement in bridge
design and defibrillator study; 33% improvement in Helium model validation study
– User steering: enable interactivity in animation design study and defibrillator study
– Result reuse: multi-fold improvement in bridge design, defibrillator, and helium model validation studies
– Application-aware resource allocation: multi-fold improvement in Helium model validation study