what mum never told me about parallel simulation
Post on 02-Feb-2016
31 Views
Preview:
DESCRIPTION
TRANSCRIPT
What Mum Never Told Me about Parallel Simulation
Karim DjemameInformatics Research Lab. &
School of ComputingUniversity of Leeds
Plan of the Lecture
Goals• Learn about issues in
the design and execution of Parallel Discrete Event Simulation (PADS)
Overview• Discrete Event Simulation – a
Review• Parallel Simulation – a Definition• Applications• Synchonisation Algorithms
• Conservative• Optimistic• Synchronous
• Parallel Simulation Languages• Performance Issues• Conclusion
Why Simulation?
Mathematical models too abstract for complex systems
Building real systems with multiple configurations too expensive
Simulation is a good compromise!
Discrete Event Simulation (DES)
• a DES system can be viewed as a collection of simulated objects and a sequence of event computations
• Changes in state of the model occur at discrete points in time
• The passage of time is modelled using a simulation clock
• Event scheduling is the most well used • provides locality in time: each event describes related
actions that may all occur in a single instant
• The model maintains a list of events (Event List) that• have been scheduled• have not occurred yet
Processing the Event List on a Uni-processor Computer
• An event contains two fields of information- the event it represents (eg. arrival in a queue)- time of occurrence: time when the event should happen -
also timestamp
e1 e2 en 7 9 20...
EVL time event
• The event list- contains the events- is always ordered by increasing occurrence of
time• The events are processed sequentially by a single processor
Event-Driven Simulation Engine
e1 e2 en 7 9 20...EVL
• Remove 1st event (lowest time of occurrence) from EVL• Execute corresponding event routine; modify state (S) accordingly• Based on new S, schedule new future events
e1 e2 en 7 9 20...EVL
e314
e2 e3 en 9 14 20...EVL
(1)
(2)
(3)
Why change? It ’s so simple!
Models becomes larger and larger The simulation time is overwhelming or the
simulation is just untractable Example:
parallel programs with millions of lines of codes, mobile networks with millions of mobile hosts, Networks with hundreds of complex switches,
routers multicast model with thousands of sources, ever-growing Internet, and much more...
Some Figures to Convince...
ATM network models Simulation at the cell-level, 200 switches 1000 traffic sources, 50Mbits/s 155Mbits/s links, 1 simulation event per cell arrival.
simulation time increases as link speed increases, usually more than 1 event per cell arrival, how scalable is traditional simulation?
More than 26 billions events to simulate 1 second!30 hours if 1 event is processed in 1us
Motivation for Parallel Simulation
Sequential simulation very slow Sequential simulation does not exploit the
parallelism inherent in models
So why not use multiple processors ?
• Variety of parallel simulation protocols• Availability of parallel simulation tools to
achieve a certain speedup over the sequential simulator
Processing the Event List on a Multi-Processor Computer
• The events are processed by many processors. Example:
• Processor 1 generates event 3 at 9 to be processed by processor 2
Processors
Time
p1 p2
7
9
14
Event 1
Event 2
In parallelEvent 3
• Processor 2 has already processed event 2 at 14• Problem:
- the future can affect the past !- this is the causality problem
Causal Dependencies
e1, 7 e2, 9 e3, 14 e4, 20 e5, 27 e6, 40
e1, 7 e2, 9
e3, 14
e4, 20
e5, 27
e6, 40
EVL
EVL
• Scheduled events in timestamp order
• Sequence ordered by causal dependencies
• Causal dependencies mean restrictions• The sequence of events (e1, e2, e4, e6) can be executed in parallel with (e3, e5)• If any event were simulated with e1: violation of causal dependencies
Parallel Simulation - Principles
Execution of a discrete event simulation on a parallel or distributed system with several physical processors
The simulation model is decomposed into several sub-models (Logical Processes, LP) that can be executed in parallel spatial partitioning LPs communicate by sending timestamped
messages
Fundamental concepts each LP can be at a different simulation time local causality constraint: events in each LP must be
executed in time stamp order
Parallel Simulation – example 1
logical process (LP)
packetheventt
parallel
Parallel Simulation – example 2
Logical processes (LPs) modelling airports, air traffic sectors, aircraft, etc.
LPs interact by exchanging messages (events modelling aircraft departures, landings, etc.)
LPLP
LPLP
LPLPLPLP
LPLP
Synchronisation Mechanisms
Synchronisation Algorithms Conservative: avoids local causality violations
by waiting until it ’s safe to proceed a message or event
Optimistic: allows local causality violations but provisions are done to recover from them at runtime
Synchronous: all LPs process messages/events with the same timestamp in parallel
PDES Applications
VLSI circuit simulation Parallel computing Communication networks Combat scenarios Health care systems Road traffic Simulation of models
Queueing networks Petri nets Finite state machines
Conservative Protocols
Architecture of a conservative LPThe Chandy-Misra-Bryant protocolThe lookahead ability
Architecture of a Conservative LP
LPs communicate by sending non-decreasing timestamped messages
each LP keeps a static FIFO channel for each LP with incoming communication
each FIFO channel (input channel, IC) has a clock ci that ticks according to the timestamp of the topmost message, if any, otherwise it keeps the timestamp of the last message
LPB LPA
LPC LPD
c1=tB1
tB1tB
2
tC3tC
4tC5
tD4
c2=tC3
c3=tD3
A Simple Conservative Algorithm
each LP has to process event in time-stamp order to avoid local causality violations
The Chandy-Misra-Bryant algorithm
while (simulation is not over) { determine the ICi with the smallest Ci
if (ICi empty) wait for a message else { remove topmost event from ICi
process event }}
Safe but Has to Block
LPB LPA
LPC LPD
36
147
10
5
IC1
IC2
IC3
min IC event
12
31
42
53
BLOCK3
61
729
Blocks and Even Deadlocks!
SA
B
M
merge point
BLOCKED
S sends allmessages to B
444 446
How to Solve Deadlock: Null-Messages
SA
B
M
Use of null-messages for artificial propagation of simulation time
10 10
4410 445
67
12
10
UNBLOCKED
What frequency?
How to Solve Deadlock: Null-Messages
a null-message indicates a Lower Bound Time Stampminimum delay between links is 4LP C initially at simulation time 0
11 910 7A B C
4
LP C sends a null-message with time stamp 4
LP A sends a null-message with time stamp 8
8
LP B sends a null-message with time stamp 12
12
LP C can process event with time stamp 7
12
The Lookahead Ability
Null-messages are sent by an LP to indicate a lower bound time stamp on the future messages that will be sent
null-messages rely on the « lookahead » ability communication link delays server processing time (FIFO)
lookahead is very application model dependent and need to be explicitly identified
Conservative: Pros & Cons
Pros simple, easy to implement good performance when lookahead is large
(communication networks, FIFO queue) Cons
pessimistic in many cases large lookahead is essential for performance no transparent exploitation of parallelism performances may drop even with small
changes in the model (adding preemption, adding one small lookahead link…)
Optimistic Protocols
Architecture of an optimistic LPTime Warp
Architecture of an Optimistic LP
LPs send timestamped messages, not necessarily in non-decreasing time stamp order
no static communication channels between LPs, dynamic creation of LPs is easy
each LP processes events as they are received, no need to wait for safe events
local causality violations are detected and corrected at runtime
Most well known optimistic mechanism: Time Warp
LPB LPA
LPC LPD
tB1tB
2 tC3tC
4 tC5 tD
4
Processing Events as They Arrive
11
LPB
13
LPD
18
LPB
22
LPC
25
LPD
28
LPC
36
LPB
32
LPD
LPB
LPA
LPC
LPD
LPA
processed!
what to do with late messages?
TimeWarp
Do,
Undo,
Redo
TimeWarp Rollback - How?
Late messages (stragglers) are handled with a rollback mechanism undo false/uncorrect local computations,
• state saving: save the state variables of an LP• reverse computation
undo false/uncorrect remote computations,• anti-messages: anti-messages and (real) messages
annihilate each other process late messages re-process previous messages: processed events are
NOT discarded!
Need for a Global Virtual Time
Motivations an indicator that the simulation time advances reclaim memory (fossil collection)
Basically, GVT is the minimum of all LPs ’ logical simulation time timestamp of messages in transit
GVT garantees that events below GVT are definitive events no rollback can occur before the GVT state points before GVT can be reclaimed anti-messages before GVT can be reclaimed
Time Warp - Overheads
Periodic state savings states may be large, very large! copies are very costly
Periodic GVT computations costly in a distributed architecture, may block computations,
Rollback thrashing cascaded rollback, no advancement!
Memory! memory is THE limitation
Optimistic Mechanisms: Pros & Cons
Pros exploits all the parallelism in the model,
lookahead is less important transparent to the end-user can be general-purpose
Cons very complex, needs lots of memory large overheads (state saving, GVT,
rollbacks…)
Mixed/Adaptive Approaches
General framework that (automatically) switches to conservative or optimistic
Adaptive approaches may determine at runtime the amount of conservatism or optimism
conservative optimistic
mixed
messages
performance
optimistic
conservative
Synchronous Protocols
Architecture of a synchronous LP
Synchronous Protocols
TOUS pour UN
et UN pour TOUS!
The Three MusketeersAlexandre Dumas (1802 – 1870)
A Simple Synchronous Algorithm
avoids local causality violations LP: same data structures of a single sequential simulator Global clock shared among all LPS – same value Some data structures are private
LPB LPA
LPC LPC
My min timestamp is 5
My min timestamp
is 12
My min timestamp is 10
My min timestamp is 8
Global clock = 5
A Simple Synchronous Algorithm
Clock = 0;while (simulation is not over) { t = minimum_timestamp(); clock = global_minimum(); simulate_events(clock); synchronise();}
Basic operations1. Computation of Minimum timestamp – reduction
operation2. Event Consumption3. Message distribution4. Message Reception – barrier operation
Synchronous Mechanisms: Pros & Cons
Pros simple, easy to implement good performance if parallelism exploited
with a moderate synchonisation cost Cons
pessimistic in many cases Worst case: simulator behaves like the
sequential one performance may drop if cost of LPs
synchronisation (reduction, barrier) is high
PDES Simulation Languages
• a number of PDES languages have been developed in recent years
• PARSEC• Compose• ModSim• etc
• Most of these languages are general purpose languages
PARSEC• Developed at UCLA Parallel Computing Lab. •Availability - http://pcl.cs.ucla.edu/projects/parsec/• Simplicity • Efficient event scheduling mechanism.
PDES Languages
• Optimistic discrete event simulator developed by PADS group of Georgia Institute of Technology
http://www.cc.gatech.edu/computing/pads/tech-parallel-gtw.html
• Support small granularity simulation
• GTW runs on shared-memory multiprocessor machines
• Sun Enterprise, SGI Origin
• TeD: Telecommunications Description Language
•language that has been developed mainly for modeling telecommunicating network elements and protocols
• Jane: simulator-independent Client/Server-based graphical interface and scripting tool for interactive parallel simulations
•TeD/GTW simulations can be executed using the Jane system
Georgia Tech Time Warp (GTW)
BYOwS !
• BYOwS : Build Your Own Simulator• Choose a programming language
• C, C++, Java
• Learn basic MPI• MPI: Message Passing Interface• Point-to-Point Communication• Available on the school Linux machines
• Implement a simple PDES protocol• Case study: a simple queueing network
Parallel Simulation Today
Lots of algorithms have been proposed variations on conservative and optimistic adaptives approaches
Few end-users Compete with sequential simulators in terms of user
interface, generability, ease of use etc.
Research mainly focus on applications, ultra-large scale simulations tools and execution environments (clusters) Federated simulations
• different simulators interoperate with each other in executing a single simulation
– battle field simulation, distributed multi-user games
Parallel Simulation - Conclusion
Pros reduction of the simulation time increase of the model size
Cons causality constraints are difficult to maintain need of special mechanisms to synchronize the
different processors increase both the model and the simulation kernel
complexity
Challenges ease of use, transparency.
References
Parallel simulation R. Fujimoto, Parallel and Distributed Simulation Systems,
John Wiley & Sons, 2000 R. Fujimoto, Parallel Discrete Event Simulation,
Communications of the ACM, Vol. 33(10), Oct. 90, pp31-53 Parallel Simulation – Links http://www.cs.utsa.edu/research
/ParSim/
top related