distributed simulation with mpi in ns-3 joshua pelkey and dr. george riley wns3 march 25, 2011

Distributed simulation with MPI in ns-3

Distributed simulation with MPI in ns-3

Joshua Pelkey and Dr. George Riley

Wns3 March 25, 2011

2

OverviewOverview

• Standard sequential simulation techniques with substantial network traffic– Lengthy execution times– Large amount of computer memory

• Parallel and distributed discrete event simulation [1]– Allows single simulation program to run on multiple

interconnected processors– Reduced execution time! Larger topologies!

3

Overview (cont.)Overview (cont.)

• Important Note– It is mandatory that distributed simulations produce the

same results as identical sequential simulations

4

Overview: terminologyOverview: terminology

• Logical Process (LP)– An individual sequential simulation

• Rank or system id– The unique number assigned to each LP

Figure 1. Simple point-to-point topology, distributed

5

Overview: related workOverview: related work

• Parallel/Distributed ns (PDNS) [2]• Georgia Tech Network Simulator (GTNetS) [3]

– Both use a federated approach and a conservative (blocking) mechanism

6

Implementation Details in ns-3Implementation Details in ns-3

• LP communication– Message Passing Interface (MPI) standard– Send/Receive time-stamped messages– MpiInterface in ns-3

• Synchronization– Conservative algorithm using lookahead– DistributedSimulator in ns-3

7

Implementation Details in ns-3 (cont.)Implementation Details in ns-3 (cont.)

• Assigning rank to nodes– Handled manually in simulation script

• Remote point-to-point links– Created automatically between nodes with different

ranks through point-to-point helper– When a packet is set to cross a remote point-to-point link,

the packet is transmitted via MPI using our interface• Merged since ns-3.8

8

• All nodes created on all LPs, regardless of rank– It is up to the user to only install applications on the

correct rank• Nodes are assigned rank manually

– An MpiHelper class could be used to assign rank to nodes automatically. This would enable easy distribution of existing simulation scripts.

• Pure distributed wireless is currently not supported– At least one point-to-point link must exist in order to

divide the simulation

Implementation Details in ns-3: limitationsImplementation Details in ns-3: limitations

9

Performance StudyPerformance Study

• DARPA NMS campus network simulation– Using nms-p2p-nix-distributed example available in ns-3– Allows creation of very large topologies– Any number of campus networks are created and

connected together– Different campus networks can be placed on different LPs– Tested with 2 CNs, 4 CNs, 6 CNs, 8 CNs, and 10 CNs

10

Performance Study: campus network topologyPerformance Study: campus network topology

Figure 2. Campus network topology block [4]

200 ms, 10 us

11

Performance Study: Georgia Tech clusters usedPerformance Study: Georgia Tech clusters used

• Hogwarts Cluster– 6 nodes, each with 2 quad-core processors and 48GB of

RAM• Ferrari Cluster

– Mix of machines, including 3 quad-core nodes and 8 dual-core nodes

12

Performance Study: simulations on HogwartsPerformance Study: simulations on Hogwarts

Figure 3. Campus network simulations on Hogwarts with(A) 2 CNs (B) 4 CNs (C) 6 CNs (D) 8 CNs (E) 10 CNs

13

Performance Study: simulations on FerrariPerformance Study: simulations on Ferrari

Figure 4. Campus network simulations on Ferrari with(A) 2 CNs (B) 4 CNs (C) 6 CNs (D) 8 CNs (E) 10 CNs

14

Performance Study: speedupPerformance Study: speedup

Figure 5. Speedup using distributed simulation for campus

network topologies on the (A) Hogwarts cluster and (B) Ferrari cluster

15

Performance Study: speedup (cont.)Performance Study: speedup (cont.)

• Linear speedup for Hogwarts, not for Ferrari. Further investigation revealed Ferrari consisted of a mix of machines, with the first two nodes considerably faster

2 CNs 4 CNs 6 CNs 8 CNs 10 CNs

Hogwarts 1.8 3.3 5.8 6.9 8.3

Ferrari 1.9 1.6 2.0 2.3 2.4

Table 1: Speedup for Hogwarts and Ferrari

16

Performance Study: changing the lookaheadPerformance Study: changing the lookahead

• By changing the delay between campus networks, the lookahead was varied (200ms to 10 µs)

• For Hogwarts and Ferrari, the 10 µs simulations ran, on average, 25% and 47% slower, respectively

• As expected, a smaller lookahead time decreases the potential speedup, as the simulators must synchronize with a greater frequency

17

Future WorkFuture Work

• MpiHelper class to facilitate creating distributed topologies– Nodes assigned rank automatically– Existing simulation scripts could be distributed easily

• Distributing the topology could occur at the node level, rather than the application– Ghost nodes, save memory

• Pure distributed wireless support

18

SummarySummary

• Distributed simulation in ns-3 allows a user to run a single simulation in parallel on multiple processors

• Very large-scale simulations can be run in ns-3 using the distributed simulator

• Distributed simulation in ns-3 offers potentially optimal linear speedup compared to identical sequential simulations

19

ReferencesReferences

[1]R.M. Fujimoto. Parallel and Distributed Simulation Systems. Wiley Interscience, 2000.

[2]PDNS - Parallel/Distributed ns. http://www.cc.gatech.edu/computing/compass/pdns, March 2004.

[3] G. F. Riley. The Georgia Tech Network Simulator. In Proceedings of the ACM SIGCOMM workshop on Models, methods and tools for reproducible network research, MoMeTools ’03, pages 5-12, New York, NY, USA, 2003 ACM.

[4] Standard baseline NMS challenge topology. http://www.ssfnet.org/Exchange/gallery/baseline, July 2002

distributed simulation with mpi in ns-3 joshua pelkey and dr. george riley wns3 march 25, 2011

Documents

campus network simulations

distributed simulations

campus network topologies

campus network topologyfigure

campus network topology

quadcore nodes

dualcore nodes

speedup cont