1 introduction - association aristote fault tolerance " existing protocols (tcp, ... "...

66

Upload: phamcong

Post on 16-Apr-2018

221 views

Category:

Documents


2 download

TRANSCRIPT

2

Introduction 1

The communication protocol P2PSAP 3

Computational experiments 5

H The decentralized environment P2PDC 2

Numerical problem and distributed algorithms 4

Conclusions and perspectives 6

!  Need for intensive computation: " Numerical simulation, optimization, " Simulation of complex systems, e.g. aircrafts and space vehicles " Meteorology " Telecommunications

! Supercomputers, clusters, grids " Expensive " Limited access " Nonflexible

4

!  Important evolution of network technologies. ! Convergence of parallel and distributed computing. ! New concepts, distributed HPC applications: solution of numerical

simulation problems on networks of processors like peer-to-peer networks.

! Use of parallel iterative methods. ! Fast data exchanges.

5

!  Great development of peer-to-peer applications " File sharing, video, ...

! Challenges " Scalability " Heterogeneity " Volatility " Fault tolerance " Existing protocols (TCP, UDP, etc.) are not well suited to HPC.

6

!  Existing P2P environments: " Centralized architecture " Bag-of-tasks applications: neither synchronization nor

communication between peers. " Parallel iterative methods or recursive methods are not considered " Java: non efficient for HPC applications

8

! Coordinated by Dr. Didier EL BAZ, LAAS-CNRS ! Started: January 2008 till December 2011. ! Goals: proposing innovative tools and demonstrators for the

implementation of HPC applications, e.g. numerical simulation and optimization on peer-to-peer networks

! 3 sub-projects: " P2PDC: designing an environment for peer-to-peer HPC, LAAS; " dPerf: designing a simulation tool for performance prediction of

peer-to-peer applications Femto-ST, University of Franche-Comté; " P2PDem: building demonstrators for applicative challenges,

# P2PRo: complex combinatorial problems, e.g. cutting stock problems – University of Picardie, LAAS-CNRS;

# P2PSimul: numerical simulation applications, e.g. financial applications, process engineering, ENSEEIHT-IRIT, LAAS.

9

! Reduced set of communication operations: " P2P_Send, P2P_Receive, P2P_wait " Programmer selects the scheme of computation " Communication mode : determined by protocol

according to the context at application level and transport level.

$Facilitate programming, hide complexity

! Application programming model: " 3 functions:

# Task_Denition() # Calculate() # Results_Aggregation()

10

! Decentralized environment (P2PDC) for peer-to-peer HPC. ! Based on a self-adaptive communication protocol (P2PSAP)

dedicated to P2P HPC applications.

11

12

!  Server: manages information regarding trackers connection /disconnection

!  Tracker: manages information regarding a set of peers, called a zone. !  Peers: donors of computational resources

13

! Tracker topology " Line topology " Ni: set of closest trackers Ti

#  |Ni|/2: having IP greater than IP of Ti

#  |Ni|/2: having IP smaller than IP of Ti

" Each tracker maintains 2 connections with 2 closest trackers on left and right side.

14

! IP-based proximity metric

! Tracker joins:

15

! Tracker leaves:

! Peer collection: " Submitter joins the overlay network " Submitter sends a request with peers requirements to its tracker. " Tracker filters connected peers in its zone that satisfy requirements of

the request and sends the address of these peers back to submitter. " If number of peers collected by this tracker is not enough, then

submitter requests peers from trackers in its local tracker list. " If number of collected peers is not enough after having sent requests

to all trackers in its local tracker list, then submitter requests more trackers address from the two farthest trackers on the two sides in its local tracker list and requests peers from new received trackers.

16

!  Creation of groups with coordinators !  Cmax: maximum number of peers in a group !  Submitter sends subtasks to groups coordinators; subtasks are then sent

by coordinators to peers. !  Subtasks results are sent in reverse direction, i.e. peers send their

subtask result to coordinator, then coordinator transfers results to the submitter.

=> More efficient: avoid bottleneck at submitter

17

! Dynamic application repository " Applications: compiled independently with P2PDC as dynamic

libraries (.so in Linux or .dll in Windows). " Load dynamically at runtime

! File transfer " Transferring automatically data input file from submitter to workers

and results file from workers to submitter. " 2 types of file transferred from submitter to workers

# Common file: transferred via the hierarchical allocation architecture.

# Private file: transferred directly from submitter to workers.

18

!  Introduced in x-kernel

! A micro-protocol: building block implementing a functionality " Communication: Synchronous, Asynchronous. " Fragmentation: FixeSize, Resize. " Reliability: Retransmission, PositiveAck, NegativeAck, DuplicateAck. " Order: LossyFifo, ReliableFifo, etc. " Congestion control: NewRenoTCP Congestion Control, etc.

! Protocol: composition of micro-protocols

$Reuse code, facilitate design, configure dynamically

20

! Flexible, efficient.

! Two grain levels: " Composite protocols: individual protocol made of micro-protocols. " Protocol stack: composite protocols layered on the top of each others.

! Reconfiguration: substitution of protocols or micro-protocols.

! Event based framework: " Events: state changes, e.g. arrival of messages. " Micro-protocol : a collection of event handlers " Event handler: procedure like segments of codes bound to events. " An event occurs $ execution of all handlers bound to that event.

21

! CTP protocol: " Configurable Transport

Protocol " Includes a wide range of

micro-protocols

22

23

! Example of scenario

! Some modifications to Cactus : " Concurrent handler execution(multicore machines) " Eliminate unnecessary copies between layers (use pointers) " Operation for micro-protocol removing

24

Data transfer

API ~~~~~~~~~~~

Transport layer

~~~~~~~~~~~Physical layer

Manages session opening and closure

Captures context information

Reconfigure data channel

Coordinate machines

Stack of composite transport protocols

CTP*

Design of an Infiniband physical layer composite protocol

25

Host Channel Adapter

Infiniband network

Machine

CTP*/IB/Verbs/Infiniband

Communication operations

26

Context Synchronous scheme Asynchronous scheme

Intra cluster Inter cluster Intra cluster Inter cluster protocol IB Eth IB Eth IB Eth IB Eth synchronous X X X X asynchr. X X X X transpotDrive X X X X X X X X resize X X X X X X X X sequecedseg. X X positiveAck X X retransmit X X reliableFIFO X X duplicateAck X DCCPAck X TCPconavoid X

27

Search

Write the data segment

Retrieve the result of Send Word Request Release the block

28

Receive buffer

Delivering the segment

Copy data

Calculates position of the block

! Obstacle problem " Numerical simulation problems " Occurs in mechanics. " Occurs as a sub-problem in financial mathematics, e.g. option

pricing. ! Problem formulation

! Fixed point problem

30

! 3D Obstacle problem ! Parallel projected Richardson

method. ! Pillar decomposition:

" Reduce the size of messages exchanged between workers.

31

! Fixed point problem:

! Successive approximation method :

! Synchronous distributed iterations

! Asynchronous distributed iterations

32

33

! Asynchronous model

!  Implementation of termination method of Bersekas and Tsitsiklis for asynchronous scheme of computation

34

! Grid’5000 platform: 2970 processors with a total of 6906 cores distributed over 9 sites in France

! Considered problem: " n=256 (17.000.000 points) " 8 clusters in 5 sites

36

! Ethernet ! Grid’5000 platform: 256 peers in 5 sites ! Lyon, Nancy, Orsay, Sophia, Toulouse ! Problem: n=256 (17,000,000 points)

37

! Ethernet vs Infiniband ! Grid’5000 platform: Graphene cluster, Nancy !  Intel Xeon X3440, Gigabit Ethernet, Infiniband-20G ! Problem: n=256 (17,000,000 points)

38

! Ethernet vs Myrinet ! Grid’5000 platform: Helios cluster, Sophia ! Sun X4100, Gigabit Ethernet, Myri-2G ! Problem: n=128 (2,100,000 points)

39

! Considered problem: "  3D obstacle problem with size 256x256x256 " Grid’5000 platform: cluster Sagittaire at Lyon and cluster gdx at Orsay

! Coordinator replication overhead: negligible ! Worker checkpointing and recovery overhead

40

!  Influence of worker failures on computational time

41

! PlanetLab ! 24 machines from 12 sites (4 in the US, 8 in Europe) ! Latency: 0.1 ms (same site), 30 ms up to 330 ms (different sites) ! Problem: n=192 (7,100,000 points)

42

! Black Scholes equation, European options ! Grid 5000, 128 peers in two sites (Lille, Orsay) ! Problem: n=256 (17,000,000 points)

43

45

! Communication protocol P2PSAP for P2P HPC.

! Decentralized and robust environment P2PDC for P2PHPC.

! Asynchronous and hybrid iterative schemes.

! Combination of asynchronous iterative schemes and P2PDC $ very efficient.

! Experiments with fast networks like Infiniband and Myrinet

! Obstacle problem and financial applications

46

!  Improvements on codes and protocols $ better efficiency. ! Test on mixed networks: Infiniband, Myrinet and Ethernet. ! Tests of hybrid methods on fast networks. ! Large scale application deployment, e.g., PlanetLab ! Web portal for P2PDC application deployment. ! Combination of P2PDC with GPUs. ! Other applications: logistics, process engineering…

48

! Grid 5000 " Sequential time

49

!  Server: manages information regarding trackers connection /disconnection

!  Tracker: manages information regarding a set of peers, called a zone. !  Peers: donors of computational resources

50

! Tracker topology " Line topology " Ni: set of closest trackers Ti

#  |Ni|/2: having IP greater than IP of Ti

#  |Ni|/2: having IP smaller than IP of Ti

" Each tracker maintains 2 connections with 2 closest trackers on left and right side.

51

! IP-based proximity metric

! Tracker joins:

52

! Tracker leaves:

! Definition Peer-to-peer computing or networking is a distributed application

architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the application. (wiki)

P2P is a class of applications that take advantage of resources storage, cycles, content, human presence available at the edges of the Internet. Because accessing these decentralized resources means operating in an environment of unstable connectivity and unpredictable IP addresses, peer-to-peer nodes must operate outside the DNS and have significant or total autonomy of central servers. (Oram 2001)

The term peer to peer refers to a class of systems and applications that

use distributed resources to perform a function in a decentralized manner. (Dejan 2003)

=> all participants play a similar role 53

Characteristics:

! Decentralization

! Scalability

! Transparency

! Robustness

! Performance

54

Architectures:

55

Decentralization Centralization Hybrid

! Grid computing: " Makes use of supercomputers, clusters and park of workstations

owned by universities, research labs inter-connected by high bandwidth network links in order to form a super virtual computer.

" Resources: turn on all the time. " Middleware: Globus, Condor, etc.

" Characteristics: # Access limited. #  Inflexible

56

! Global computing: " Makes use idle computing power of volunteer computers or

institutional computers connected to the Internet in order to solve some large granularity applications.

" Projects SETI@home, GENOME@home => BOINC framework " XtremWeb

" Characteristics: # Centralized architecture => single point of failure # Direct communication: not yet supported

57

! Peer-to-peer high performance computing: " All participants, i.e. peers, can carry out their application.

! P2P environments:

" JNJI " Ourgrid " ParCop " MapReduce " Vishwa " P2P-MPI

58

"  Centralized architecture "  Bag-of-tasks applications: neither synchronization nor communication. "  Java: non efficient for HPC applications "  Asynchronous iterative algorithms are not considered

! Convergence of asynchronous iterations " Linear case: Chazan 1969

" Nonlinear case: El Baz 1990, El Baz 1994

" Nonlinear case for H-accretive mappings: Miellou 1985b

" Nonlinear fixed point problems: Miellou 1975, Baudet 1978

" Sub-structuring methods: Venet 2010

" Convergence theorem of Bertsekas: Bertsekas 1983, Bertsekas 1989

" Bounded delay iterations: Lubachevsky 1986

" Multisplitting methods: Frommer 1997, Szyld 1998

" Order intervals: El Baz 1996b, Miellou 1998, El Baz 1998

59

! Convergence detection and termination of asynchronous iterations " Empirical methods: Bertsekas 1989, Miellou 1989, Chajakis 1991

" Method of Bertsekas and Tsitsiklis: Bertsekas 1989, Bertsekas 1991

" Method of Savari and Bertsekas: Savari 1996

" Method of level sets: El Baz 1996a, El Baz 1998

" Other termination methods # Supervised termination: Savari 1996 # Use of secondary algorithm: Miellou 1975, Miellou 1990

60

! Peer collection: " Submitter joins the overlay network " Submitter sends a request with peers requirements to its tracker. " Tracker filters connected peers in its zone that satisfy requirements of

the request and sends the address of these peers back to submitter. " If number of peers collected by this tracker is not enough, then

submitter requests peers from trackers in its local tracker list. " If number of collected peers is not enough after having sent requests

to all trackers in its local tracker list, then submitter requests more trackers address from the two farthest trackers on the two sides in its local tracker list and requests peers from new received trackers.

61

!  Creation of groups with coordinators !  Cmax: maximum number of peers in a group !  Submitter sends subtasks to groups coordinators; subtasks are then sent

by coordinators to peers. !  Subtasks results are sent in reverse direction, i.e. peers send their

subtask result to coordinator, then coordinator transfers results to the submitter.

=> More efficient: avoid bottleneck at submitter

62

! Dynamic application repository " Applications: compiled independently with P2PDC as dynamic

libraries (.so in Linux or .dll in Windows). " Load dynamically at runtime

! File transfer " Transferring automatically data input file from submitter to workers

and results file from workers to submitter. " 2 types of file transferred from submitter to workers

# Common file: transferred via the hierarchical allocation architecture.

# Private file: transferred directly from submitter to workers.

63

!  Great development of peer-to-peer applications " File sharing, video, ... " Recent advances in microprocessor architecture and high

bandwidth network → new applications like distributed HPC computing/computing on the Internet.

! Challenges " Scalability " Heterogeneity " Volatility " Fault tolerance " Existing protocols (TCP, UDP, etc.) not well suited to HPC.

64

!  Existing P2P environments: " Centralized architecture " Bag-of-tasks applications: neither synchronization nor

communication between peers. " Parallel iterative methods or recursive methods are not considered " Java: non efficient for HPC applications

65

! Protocol composition framework→ deployment of architecture " Hierarchical model

# Stackof protocols #  x-kernel, APPIA

" Nonhierarchical model # No particular order # SAMOA

" Hybrid model: # Combo # Cactus, FPTP

66

x-kernel

FPTP