1 parallel simulation made easy with omnet++ y. ahmet Şekerciuğlu 1, andrás varga 2, gregory k....

1

Parallel Simulation Made Easy With OMNeT++

Y. Ahmet Şekerciuğlu1, András Varga2, Gregory K. Egan1

1 CTIE, Monash University, Melbourne, Australia2 Omnest Global, Inc.

2

What is OMNeT++?

An open-source, generic simulation framework -- best suited for simulation of communication networks, multiprocessors and other complex distributed systems (further examples: queuing systems, hardware architectures, server farm, business processes, call centers)

C++-based simulation kernel plus a set of libraries and tools (GUI and command-line); platform: Unix, Windows

Active user community (mailing list has about 240 subscribers) Home page: www.omnetpp.org Commercial version also exists:

www.omnest.com

3

Model Structure

Component-oriented approach:The basic building block is a simple module (programmed

in C++).Simple modules can be grouped to form compound modules.Modules are connected with each other.

4

Defining the Topology NED (Network Description Language) defines

topology: what modules exist, how they are connected and assembled to form larger modules

//// Host with an Ethernet interface//module EtherStation parameters: ... gates: ... submodules: app: EtherTrafficGen; llc: EtherLLC; mac: EtherMAC; connections: app.out --> llc.hl_in; app.in <-- llc.hl_out; llc.ll_in <-- mac.hl_out; llc.ll_out --> mac.hl_in; mac.ll_in <-- in; mac.ll_out --> out;endmodule

The graphical editor GNED operates directly on NED

files

5

Defining the Behaviour

Behaviour is encapsulated in simple modules.A simple module:

sends messages, reacts to received messages collects statistics

Simple modules are programmed C++ one can choose between process-oriented or event-

oriented programming simulation class library covers commonly needed

functionality, such as: random number generation, statistics colleection (histograms, etc), queues and other containers, support for topology discovery and routing, etc.

6

Running the Model Under the GUI

Without extra programming, one can:

run or single-step the simulation

examine scheduled events

explore modules and see message flow

monitor state of simulation and execution speed

see what’s happening inside the model

examine model object tree

7

Exploring Model Internals

examine contents of queues, messages and other objects

look at state variables and statistics

trace what one module is doing

step to next event in a module

find out pointer values for C++ debugging

(gdb)

8

Exploring Model Internals

look at results being recorded

and much more…

or any objects by their names

find all messages, all statistics objects or all queues (NEW)

and inspect them

9

Modular Architecture

UI and simulations are separated, and interact via a well-defined API

provides command-line and graphical user interface; user interfaces can be customized, or specialized ones can be created

enables embedding of simulations into larger applicationsuser interfacesimulation

SIM ENVIRmain()

CMDENVor

TKENVor

...

Simulation model

ModelComponent

Library

10

Large-Scale Network Simulations

PDES: Parallel Discrete Event SimulationMotivation:

speedup: make use of multiple CPUs to reduce execution time ability to run large models by distributing resource

requirements

We want to use clusters that can provide supercomputing power at affordable costs -- inexpensive workstations connected via a high-speed network

example: VPAC Linux cluster contains 96 IBM xSeries (dual 2.8GHz Xeon) PCs running Linux 2.4 yielding 629.7 Gflops; Myrinet interconnection provides 4μs end-to-end delay

communication method: MPI MPI (Message Passing Interface) is a standard for high-performance

computing several implementations exist: LAM/MPI, MPICH, plus vendor-specific

implementations

11

Why do we need large-scale simulations?

Research on Internet protocols and technologies extensively relies on simulation

Systems are too large and too complex for analytic treatment

Small experimental networks do not reflect large-scale dynamics

Large-scale simulations (10,000-1,000,000 nodes) are needed to:

… properly understand dynamics of routing protocols … to test various extensions proposed to improve

performance of current Internet protocols … to demonstrate scalability of multicast protocols … plus more

12

Parallel DES

Partitioning to Logical Processes (LPs):

LP1 (on CPU1)

LP2 (on CPU2)

LP3 (on CPU3)

Each partition maps to a separate LP with its own virtual time and list of scheduled events (Future Events Set)

LPs are executed on different processors Synchronization mechanism (e.g. null messages; Chandy-Misra-Bryant

1979) is needed to prevent incausalities from happening

13

PDES Support in OMNeT++

To try running existing OMNeT++ models in parallel, you only need to:

1. enable parallel simulationparallel-simulation=true

2. specify partitioning in configuration file

3. run

14

PDES Support in OMNeT++

Nearly every model can be run in parallel. Constraints: modules may communicate via sending messages only (no direct

method call or member access) unless mapped to the same processor

no global variables limitations on direct sending (no sending to a submodule of another

module, unless mapped to the same processor) lookahead must be present in the form of link delays currently we only support static topologies (this can be improved)

Models run without modification (no special instrumentation needed)

Partitioning is part of configuration, no model change required

follows “separation of model from experiments” principle Code will be publicly released before end 2003

(available on request until then)

15

Extensible PDES Architecture

Pluggable communication library (“transport layer”): currently implemented:

MPI (Message Passing Interface), named pipes, shared directory (for demonstration and debug purposes only)

Pluggable PDES algorithm: currently implemented:

Null Message Algorithm, Ideal Simulation Protocol (for benchmarking), no synchronization (to demonstrate the need for

synchronization)

16

Simulation Kernel

Parallel Simulation Architecture

Parallel simulation subsystem

Synchronization

Communication

Partition (LP)

Simulation Model

Event scheduling,sending, receiving

communications library (MPI, sockets, etc.)

17

Communication Layer

Must implement the following abstract interface:

/** * Provides an abstraction layer above MPI, * PVM, shared-memory communications, etc… */class cParsimCommunications{ virtual void init() = 0; virtual void shutdown() = 0;

virtual cCommBuffer *createCommBuffer() = 0; virtual void recycleCommBuffer(cCommBuffer *buffer) = 0;

virtual void send(cCommBuffer *buffer, int tag, int destination) = 0; virtual void boadcast(cCommBuffer *buffer, int tag) = 0; virtual void receiveBlocking(cCommBuffer *buffer, int& rcvdTag,int& srcProcId) = 0; virtual bool receiveNonblocking(cCommBuffer *buffer, int& rcvdTag, int& srcProcId) = 0; virtual void synchronize() = 0;};

class cMPICommunications : public cParsimCommunications { … };class cNamedPipeCommunications : public cParsimCommunications { … };class cFileCommunications : public cParsimCommunications { … };

Communication buffers encapsulate pack/unpack operations. The cCommBuffer interface (abstract class) has multiple implementations for MPI, etc.

Simulation objects are able to pack/unpack themselves to/from communication buffers, using methods from the cCommBuffer interface.

18

Model Partitioning

OMNeT++ uses placeholder modules and proxy gates:

nodeB(placeholder)

nodeA

CPU0

nodeBnodeA

(placeholder)

CPU1

communication (MPI, pipe, etc.)

19

(placeholder for compound module)

Model Partitioning, cont’d If compound modules themselves are distributed across LPs,

the solution is slightly more complicated:

simple module

CPU0

simple module

(placeholder)CPU1 (placeholder)

(placeholder)(placeholder)CPU2 simple module

20

Placeholder Approach

Advantage of placeholder approach: when simulating telecommunication networks, all nodes (routers, ASes, hosts,

etc) are present (at least as placeholders) in all LPs, so algorithms such as topology discovery for routing can proceed unhindered.

LP1 (on CPU1) placeholders

21

Synchronization Layer

/** * Abstract base class for parallel simulation algorithms... */class cParsimSynchronizer : public cScheduler{ virtual void startRun() = 0; virtual void endRun() = 0;

/** * Scheduler function -- it comes from cScheduler interface... */ virtual cMessage *getNextEvent() =0;

/** * Hook, called when a cMessage is sent out of the segment... */ virtual void processOutgoingMessage(cMessage *msg, int procId, int moduleId, int gateId, void *data) = 0;};

Parallel simulation protocols must implement the following abstract interface:

22

Synchronization Layer

Currently implemented parallel simulation algorithms:

23

Example: Distributed CQN

Closed Queuing Network (CQN)described in the “Performance Evaluation of Conservative Algorithms”, R. Bagrodia et al., 2000

N tandem queues (switch+queues); exponential service times; propagation delay all links link

Lookahead: propagation delay on links

CPU2

CPU1

CPU0

24

Example: Distributed CQN

OMNeT++ model for CQN wraps tandems into compound modules

25

[General]

parallel-simulation=true

#parsim-communications-class=“cFileCommunications"

parsim-communications-class="cMPICommunications"

parsim-synchronization-class= "cNullMessageProtocol"

[Partitioning]

*.tandemQueue[0]*.segment-id=0



Configuring for Parallel Execution

Configuration file: enable parallel simulation

select communication library and parallel simulation protocol

assign modules to processors

Each partition is simulated in its own process.

26

CQN Partitioning in Tkenv

If simulation executes under the GUI, placeholder modules and proxy gates are shown

27

Running Parallel Simulation

If GUI is used, operation of the Null Message Algorithm can be followed in trace windows

28

Experimental Results

Present simulation framework was used to verify the efficiency criterion for the Null Message Algorithm: LE >> 1 and λ=LE/P >> 1 are necessary for efficient PDES execution

see paper “A Practical Efficiency Criterion For The Null Message Algorithm”, András Varga, Y. Ahmet Şekerciuğlu, Gregory K. Egan in the Proceedings

29

Ongoing Work

Optimisations on the parallel simulation kernel Create support for node mobility across LPs Test large-scale IPv6 simulations (using the IPv6Suite for

OMNeT++, developed at CTIE, Monash University, Australia)

Further verification and refinement of the efficiency criteria for the Null-Message Algorithm

1 parallel simulation made easy with omnet++ y. ahmet Şekerciuğlu 1, andrás varga 2, gregory k....

Documents

simple modules

parallel simulation

generic simulation framework

null message algorithm

nullmessage algorithm

compound modules

larger modules host

efficiency criteria