active reliable multicast how it works, how it can be used on computational grids

Post on 08-Jan-2016

35 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

ACTIVE RELIABLE MULTICAST HOW IT WORKS, HOW IT CAN BE USED ON COMPUTATIONAL GRIDS. C ongduc PHAM SUN's "Gourmandise Cérébrale" SUN Labs Europe, Thursday, February 14th , 200 2. http://www.ens-lyon.fr/LIP/RESAM. Outline. Introduction How it works How it can be used on computational grids. - PowerPoint PPT Presentation

TRANSCRIPT

ACTIVE RELIABLE MULTICAST

HOW IT WORKS, HOW IT CAN BE USED ON COMPUTATIONAL GRIDS

Congduc PHAM

SUN's "Gourmandise Cérébrale"SUN Labs Europe, Thursday, February 14th, 2002

http://www.ens-lyon.fr/LIP/RESAM

2

Outline

Introduction How it works How it can be used on computational

grids

multica

st!

multicast!multicast!Everybody's talking

about multicast! Really annoying ! Why would I need

multicast for by the way?

multicast!

multicast!

multicast!

multicast!

multicast!

multicast!

multicast!

multicast!

multicast!

mu

ltic

ast!

multicast!alone

multicast!

4

high-speed www video-conferencing video-on-demand interactive TV programs remote archival systems tele-medecine, white board high-performance computing, grids virtual reality, immersion systems distributed interactive

simulations/gaming…

Challenges for the Internet

Think about…

5

From unicast…

Problem Sending same

data to many receivers via unicast is inefficient

Example Popular WWW

sites become serious bottlenecks

Sender

data

datadata

data

Receiver Receiver Receiver

datadata

6

…to multicast on the Internet.Sender

Not n-unicast from the sender perspective

Efficient one to many data distribution

Towards low latence, high bandwidth

data

datadata

data

Receiver Receiver Receiver

7

User perspective of the Internet

from UREC, http://www.urec.fr

8

What it is in reality…

from UREC, http://www.urec.fr

9

Links: the basic element in networks

Backbone links optical fibers 10 to 160 GBits/s with DWDM techniques

End-user access V.90 56Kbits/s modem on twisted pair 512Kbits/s to 2Mbits/s with xDSL modem 1Mbits/s to 10Mbits/s Cable-modem 64Kbits/s to 1930Kbits/s ISDN access 9.6Kbits/s (GSM) to 2Mbits/s (UMTS) 155Mbits/s to 1Gbits/s SDH

10

Routers: key elements of internetworking

Routers run routing protocols and build routing

table, receive data packets and perform

relaying, may have to consider Quality of Service

constraints for scheduling packets, are highly optimized for packet

forwarding functions.

11

The Wild Wild Web

important data

heterogeneity,link failures,

congested routerspacket loss, packet drop,bit errors…

?

12

At the routing level management of the group address (IGMP) dynamic nature of the group membership construction of the multicast tree (DVMRP,

PIM, CBT…) multicast packet forwarding

At the transport level reliability, loss recovery strategies flow control congestion avoidance

Multicast difficulties

13

Reliable multicast

What is the problem of loss recovery? feedback (ACK or NACK) implosion replies/repairs duplications difficult adaptability to dynamic

membership changes Design goals

reduces recovery latencies reduces the feedback traffic improves recovery isolation

How does it work?

Active Reliable Multicast

15

What is active networking?

Programmable nodes/routers Customized computations on packets Standardized execution environment

and programming interface No killer applications, only a different

way to offer high-value services, in an elegant manner

However, adds extra processing cost

16

Motivations behind active networking

user applications can implement, and deploy customized services and protocols

specific data filtering criteria (DIS, HLA) fast collective and gather operations…

globally better performances by reducing the amount of traffic

high throughput low end-to-end latency

17

Active networks implementations

Discrete approach (operator's approach) Adds dynamic deployment features in

nodes/routers New services can be downloaded into

router's kernel Integrated approach

Adds executable code to data packets Capsule = data + code Granularity set to the packets

18

DataData

The discrete approach

Separates the injection of programs from the processing of packets

active code A1

active code A2

A1A2

19

The integrated approach

User packets carry code to be applied on the data part of the packet

High flexibility to define new services

data code

data datacode

data

datadata

20

An active router

IP packet

IP packet

Filter Action

Forwardingtable

Routingagent

IP input processing IP output processing

IP packet

Packet scheduler

IP output processing

IP packet

Packet scheduler

some layer for executing code.Let's call it Active Layer

AL packet

21

Solutions for Reliable Multicast

Traditional end-to-end retransmission schemes scoped retransmission with the TTL

fields receiver-based local NACK suppression

Active contributions cache of data to allow local recoveries feedback aggregation subcast …

22

A step toward active services: LBRM

23

Active local recovery

routers perform cache of data packets repair packets are sent by routers,

when available

data1data2data3data4data5

datadatadata5

NACK4data4

data1data2data3data4data5

data1data2data3data5

24

Global NACKs suppression

NACK4NACK4

NACK4

NACK4data4

NACK4

only one NACK is forwarded to the source

25

Local NACKs suppression

data

NACK

NACK

NACK

NACK

NACK

26

Active subcast features

Send repair packet only to the relevant set of receivers

NACK4

NACK4

NACK4

NACK4

data

4

data4

data4

data4

data4

data4

data4data4

data

4

data4

data4

How can it be

used?

Active Reliable Multicast

Computational gridsThe DyRAM frameworkSome simulation resultsConclusions and

perspectives

GRID?

28

What is a computational grid?

application user

from Dorian Arnold: Netsolve Happenings

29

Distributed & interactive simulations:DIS, HLA,Training.

Some grid applicationsAstrophysics:Black holes, neutron stars, supernovae

Mechanics:Fluid dynamic,CAD, simulation.

Chemistry&biology:Molecular simulations, Genomic simulations.

W ide- ar ea int er act ive simulat ions

IN T E R N E T

human in t he loopfl ight s imulat or

bat t le fi eld simulat ion

displaycomput er - basedsub- mar ine simulat or

30

Data replications

Code & data transfers, interactive job submissions

Data communications for distributed applications (collective & gather operations, sync. barrier)

Databases, directories services

Data replications

Code & data transfers, interactive job submissions

Data communications for distributed applications (collective & gather operations, sync. barrier)

Databases, directories services

Reliable multicast: a big win for grids

Multicast address group 224.2.0.1

224.2.0.1

SDSC IBM SP1024 procs5x12x17 =1020

NCSA Origin Array256+128+1285x12x(4+2+2) =480

CPlant cluster256 nodes

31

••

From reliable multicast to Nobel prize!

We see something,but too weak.

Please simulateto enhance signal!

Resource Broker:7 sites OK, but need to send data fast…

Resource Broker:LANL is best match…

but down for the moment

OK! Resource EstimatorSays need 5TB, 2TF.Where can I do this?

From President@earth.org

Congratulations, you have done a great job, it's the discovery of the century!!

The phenomenon was short but we manage to react quickly. This would have not been possible without efficient multicast facilities to enable quick reaction and fast distribution of data.

Nobel Prize is on the way :-)

From President@earth.org

Congratulations, you have done a great job, it's the discovery of the century!!

The phenomenon was short but we manage to react quickly. This would have not been possible without efficient multicast facilities to enable quick reaction and fast distribution of data.

Nobel Prize is on the way :-)

32

Multicast communications on grids

Dynamic groups are very difficult to handle with the reliability constraint

Mixture of high-throughput (data replication) and low latencies (distributed applications) needs

The application under consideration can have a great impact on the protocol design (i.e. local recoveries)

A one protocol-fits-all solution is difficult!

33

The DyRAM framework (M. Maimour)

Receiver-based: use of NACKs. No cache in routers, receivers

perform local recoveries… …which are based on a tree structure

constructed on a per-packet basis. Routers play an active role. Low-overhead active services Focus on low latency Load balancing features

core networkGbits rate

1000 Base FX

active routeractive router

active router

active router

active router

Server

100 Base TX

where to put activecomponents?

35

Related works on local recovery

SRM any receiver in the neighborhood

RMTP, TMTP, LMS, PGM, TRAM a designated receiver

LBRM a logging server

36

Active services in DyRAM

Designed to provide low latencies Session initialization Early packet loss detection NACK aggregation Subcast of repair packets Dynamic replier election

37

DyRAM and IP multicast

Relies on IP multicast but has few interactions

Runs its own simple session protocol to gather additional topological information at the DyRAM level to enhance the group anonymity imposed by IP multicast

DyRAM: session initialization

IP multicastIP multicast

IP multicast

DyRAMDyRAM

IP multicast

IP multicast

DyRAMDyRAMINIT

INITINIT

INIT

INIT

Reply @ Reply @

Reply @

R1

R2R3R4

R5 R6 R7

Reply @D1

Total Replies=5@R1,vif 1@R2,vif 2@R3,vif 2@R4,vif 2@D1,vif 0

Reply @ Reply @

Total Replies=3

@R5,vif 1@R6,vif 1@R7,vif 0

0

12

1 0

D1

D0

39

How and where losses can occur

Packet losses occur mainly in edge routers

In this case, all downstream links would most likely be affected by a packet loss

On medium speed LAN, when a packet has been sent on the wire all computers will usually be able to receive it

On very high-speed LAN, computers can be the bottleneck

40

DyRAM: early packet loss detection

The repair latency can be reduced if the lost packet could be requested as soon as possible

DyRAM realizes this functionality by enabling some routers to detect losses and therefore to generate NACKs towards the source

This loss detection service should be located near the source, but not too near!

41

DyRAM: replier election

A receiver is elected to be a replier for each lost packet

Several recovery trees at a given time Load balancing can be taken into

account, several optimizations possible

Uses the topological information gathered during the session initialization

DyRAM: replier election

IP multicastIP multicast

IP multicast

DyRAMDyRAM

IP multicast

IP multicast

DyRAMDyRAM

R1

R2R3R4

R5 R6 R7

@R5,vif 1@R6,vif 1@R7,vif 0

0

12

1 0

NAK 2,@ NAK 2,@

NAK 2,@

NAK 2 from R1

NAK 2 from R2

NAK 2 from R3

NAK 2 from R4

NAK 2

Repair 2

Repair 2

Repair 2

Repair 2

D0

D1

@R1,vif 1@R2,vif 2@R3,vif 2@R4,vif 2@D1,vif 0

NAK 2

NAK 2

43

DyRAM: subcasting

Tries to solve the exposure problem Using the NACK pattern to select

relevant links can not avoid exposure Use of IP addresses is more costly

but allows for an exact matching Several optimizations possible,

including a dynamic selection of the appropriate mechanism

44

Routers’ soft state

The NACK State (NS) structure which maintains for each lost packet,

seq : the sequence number of the requested packet.

rank : the number of NACK received. subList : List of the links from which

similar NACKs arrived (or IP addresses).

45

Routers’ soft state (cont.)

The Track List (TL) structure which maintains for each multicast session,

lastOrdered : the sequence number of the last received packet in order

lastReceived : the sequence number of the last received data packet

lostList : a bit vector that keeps track of received packet

Reduces the replier election delay.

core networkGbits rate

100 Base TX

active routeractive router

active router

active router

active router

1000 Base FX

sourcesource

The backbone is fast, very fast (DWDM, 10Gbits/s not uncommun), so nothing else than fast forwarding functions.

The active router associated to the source can perform early processing on packets. For instance our DyRAM protocol uses subcast and loss detection facilities in order to reduce the end-to-end latency.

A hierarchy of active routers can be used for processing specific functions at different layers of the hierarchy. For instance, having an active router at the nearest location from the source/destination could performs very efficient NACK packets suppression

Any receiver can be designated as a replier for a loss

packet.The election is performed by the

associated upstream active router on a per-

packet basis. Therefore several

loss recovery trees can co-exist in

parallel at a given time.

DyRAM can increases performances by associating a dedicated active router to a pool of computing resources.

One benefit of active networking is to unload the source from heavy retransmission overheads.

DyRAM overview

47

Some simulation results

Network model and used metrics Local recovery from the receivers DyRAM vs. ARM DyRAM combined with cache at

routers

48

Network model

10 MBytes file transfer

49

Metrics

Load at the source : the number of the retransmissions from the source.

Load at the network : the consumed bandwidth.

Completion time per packet (latency).

50

Local recovery from the receivers (1)

Local recoveries reduces the load at the source (especially for high loss rates and a large number of the receivers).

p=0.25#grp: 6…24

4 receivers/group

51

Local recovery from the receivers (2)

As the groups size increases, doing the recoveries from the receivers greatly reduces the bandwidth consumption

48 receivers distributed in g groups #grp: 2…24

52

Local recovery from the receivers (3)

Local recoveries reduces the end-to-end delay (per packet)

#grp: 6…24

4 receivers/group

p=0.25

53

DyRAM vs ARM

ARM performs better than DyRAM only for very low loss rates and with considerable caching requirements

54

DyRAM with cache at the routers (1)

When DyRAM benefits from the cache at the routers in addition to the recovery from the receivers, it always performs better than ARM.

p=0.25

ARM without cache

55

DyRAM with cache at the routers (2)

When DyRAM benefits from the cache at the routers in addition to the recovery from the receivers, it always performs better than ARM.

p=0.25

ARM without cache

56

DyRAM: early loss detection

p=0.25 p=0.5

#grp: 6…244 receivers/group

57

Conclusions

Reliability on large-scale multicast session in difficult. Active services can provide efficient solutions for avoiding implosion and exposure.

The main design goals for DyRAM is to reduce the end-to-end delays (recovery for instance) to enable large distributed applications on computational grids.

58

References

D. L. Tennehouse, J. M. Smith, W. D. Sincoskie, D. J. Wetherall, and G. J. Winden. A survey of active network research. IEEE Communications Magazine, pages 80--86, January 1997.

L. Wei, H. Lehman, S. J. Garland, and D. L. Tennenhouse. Active reliable multicast. IEEE INFOCOM'98, March 1998.

M. Maimour, C. Pham. A Throughput Analysis of Reliable Multicast Protocols in an Active Networking Environment. IEEE ISCC'2001, Hammanet, Tunisia.

top related