21 september 2010 gene wagenbreth for robert f. lucas { genew, rflucas } @isi.edu (310) 448-8213,...
TRANSCRIPT
21 September 2010Gene Wagenbreth forRobert F. Lucas{ genew, rflucas } @isi.edu
(310) 448-8213, -9449
Approved for public release; distribution is unlimited.
Equation-Based Models in Equation-Based Models in Discrete-Element and Agent-Based Discrete-Element and Agent-Based
SimulationsSimulations
Overview
• Introduction
• Goal: equation-based modules in agent-based
sims
• JFCOM experience in large-scale ABM
simulations
• New power from GPUs
• Implementing equation-based modules in ABM
using GPU accelerators
• Summary
Basic Concept
Newly emerging heterogeneous computing proving useful with General Purpose Graphics Processing Units (GPGUs).
Experience is showing these are effective in addressing many issues that have been problematic in DoD simulations for some time.
GPUs can accelerate the speed of equation solutions. This speed-up should allow the inclusion of equation-based modules:
CFD for chemical agent dispersionMCAE for physical changes due to collisions/detonationsIllumination simulations for environmental changes…
GPUs can ~ double the throughput of many equation solutions, when measured in a realistic "end-to-end" performance test.
This in turn promises to offer a very cost-effective speedup to many problems facing DoD simulators.
HPC Large-Scale JFCOM ABM Simulation NetJFCOM running urban ABM
simulations on trans-continental net for years
Used for projecting future for:
AnalysisEvaluationTraining
Not enough horse power to simulated urban area AND do real-time equation solutions suggested above.
GPUs should provide extra power needed.
Parallelism
• Vector processing – SSE instructions
• Multicore
•OMP
•Pthreads
• Multiple CPU
•MPI
•TCP
•Batch jobs(!!)
• GPGPU – latest entry in the field
1
Projects at JFCOM
• Entity simulation extended from a network of workstations to multiple geographically distributed workstations
• HLA – RTI
• Distributed MYSQL databasereal time queriespost event processing
• Human in the loop
• 24 hr operation
1
Map View from JESSP
1
Stealth View:JESSP
1
Joshua GPGPU-Enhanced Cluster
20 of 28 racks
256 Nodes
256 GPGPUs
1024 Cores
Suffolk Va.
Even HPC has Limits to Performance
Nu
mb
er
an
d
Com
ple
xit
y o
f JS
AF
En
titi
es
JSAF/SPP Joshua (2008)
10,000,010,000,00000
UE 98-1
(1997)
JSAF/SPP Capability (2006)
JSAF/SPP Urban
Resolve (2004)
JSAF/SPP
Tests (2004)
J9901 (1999)
SAF Expres
s (1997)
3,600 3,600 12,000 12,000 107,00107,00
0 0
AO-00 (2000)
50,000 50,000
1,400,001,400,00
1,000,001,000,0000
250,000250,000
SPP Proof of Principle DARPA / Caltech
Experiments continue to require orders of magnitude larger &
more complex battlespaces
SCALEand FIDELITY
DC Clusters at MHPCC & ASCMSRC
DHPI GPU-
Enhanced Cluster
MCAE using LS DYNA
1
End-to-End Results in LS-DYNA
1
EXPOSE Project to Examine CPUs
• CAT/Image processing to detect altered chips
• GPGPU
•CUDA
•CUDA FFT library
•CUDA within OMP loops
•Multicore/multipleGPU
• Multicore – OMP
• Multiple CPUs via batch jobs (!!!) Use whatever works and makes
sense 1
ParallelSoftware Technologies
• Compilers – Much touted, still waiting• Code Analysis – Important step• Translators – Quick results, low
performance• Libraries – Increasingly available,
useful• Directives – More smoke than fire?• Multiple languages – To each their
own 1
Several Levels of Parallel Processing
• Parallel among the Linux cluster nodes and parallel within the GPU
• Experience with all the segments of HPC/Parallel Processing necessary
• Must analyse problem for best design
• Implement effective GPU acceleration of equation solutions
• Optimize integration into HLA/RTI structure of ABM Simulations such as JSAF or OneSAF
1
Candidates for Accelerated Solutions in SAFs
Line of Sight – Currently using ray tracing or similar algorithms, this is pacing factor in entity count
(next few slides addresses this issue)Route Finding – Another factor sometimes stressing
cluster, e.g. when sudden event causes panicIllumination – Currently an “under-served” need in SAF
simulations with simple illumination programs being used to conserve CPU cycles
Weather – Manually input and almost entirely static for duration of the event; needs GPU power
Plume Dispersal – Currently rule based but would like to make realistic and add atmospheric chemistry
Why GPUs?
●GPU can’t improve serial code• Unless there are enough threads to
parallelize• I/O overhead getting to and from GPU
costly●Millions of route planning solutions
needed• Rush hour• After explosion• Any disruption
●Causes spikes in computation●Humans-In-The-Loop preclude pausing
simulation while computing new routes for all impacted entities in urban setting
Route Planning Performance ImpactTime Spent in Route Planning is Critical Bottleneck
Same With Line of Sight Time Spent in Line of Sight not Great, but
Critical
Conclusions
New accelerator technologies support the analysis of complex equation-based simulations of both man-made and natural phenomena.
This new power can be focused to prevent unacceptable delays in analysis.
This should enable interactive participation by Humans-In-The-Loop, providing a capability that is required for a higher-level, system wide understanding during the test.
Heterogeneous programming, e.g. using Graphics Processing Units as accelerators, has shown practical promise.
The area of equation-based simulation modules resident in ABM simulations seems particularly amenable to this approach.
This material is based on research sponsored by the U.S. Joint Forces Command via a contract with the Lockheed Martin Corporation and SimIS, Inc., and on research sponsored by the Air Force Research Laboratory under agreement numbers F30602-02-C-0213 and FA8750-05-2-0204. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government. Approved for public release; distribution is unlimited.
Research Funded by JFCOM and AFRL