1 introduction - association aristote fault tolerance " existing protocols (tcp, ... "...
TRANSCRIPT
2
Introduction 1
The communication protocol P2PSAP 3
Computational experiments 5
H The decentralized environment P2PDC 2
Numerical problem and distributed algorithms 4
Conclusions and perspectives 6
! Need for intensive computation: " Numerical simulation, optimization, " Simulation of complex systems, e.g. aircrafts and space vehicles " Meteorology " Telecommunications
! Supercomputers, clusters, grids " Expensive " Limited access " Nonflexible
4
! Important evolution of network technologies. ! Convergence of parallel and distributed computing. ! New concepts, distributed HPC applications: solution of numerical
simulation problems on networks of processors like peer-to-peer networks.
! Use of parallel iterative methods. ! Fast data exchanges.
5
! Great development of peer-to-peer applications " File sharing, video, ...
! Challenges " Scalability " Heterogeneity " Volatility " Fault tolerance " Existing protocols (TCP, UDP, etc.) are not well suited to HPC.
6
! Existing P2P environments: " Centralized architecture " Bag-of-tasks applications: neither synchronization nor
communication between peers. " Parallel iterative methods or recursive methods are not considered " Java: non efficient for HPC applications
8
! Coordinated by Dr. Didier EL BAZ, LAAS-CNRS ! Started: January 2008 till December 2011. ! Goals: proposing innovative tools and demonstrators for the
implementation of HPC applications, e.g. numerical simulation and optimization on peer-to-peer networks
! 3 sub-projects: " P2PDC: designing an environment for peer-to-peer HPC, LAAS; " dPerf: designing a simulation tool for performance prediction of
peer-to-peer applications Femto-ST, University of Franche-Comté; " P2PDem: building demonstrators for applicative challenges,
# P2PRo: complex combinatorial problems, e.g. cutting stock problems – University of Picardie, LAAS-CNRS;
# P2PSimul: numerical simulation applications, e.g. financial applications, process engineering, ENSEEIHT-IRIT, LAAS.
9
! Reduced set of communication operations: " P2P_Send, P2P_Receive, P2P_wait " Programmer selects the scheme of computation " Communication mode : determined by protocol
according to the context at application level and transport level.
$Facilitate programming, hide complexity
! Application programming model: " 3 functions:
# Task_Denition() # Calculate() # Results_Aggregation()
10
! Decentralized environment (P2PDC) for peer-to-peer HPC. ! Based on a self-adaptive communication protocol (P2PSAP)
dedicated to P2P HPC applications.
11
! Server: manages information regarding trackers connection /disconnection
! Tracker: manages information regarding a set of peers, called a zone. ! Peers: donors of computational resources
13
! Tracker topology " Line topology " Ni: set of closest trackers Ti
# |Ni|/2: having IP greater than IP of Ti
# |Ni|/2: having IP smaller than IP of Ti
" Each tracker maintains 2 connections with 2 closest trackers on left and right side.
14
! IP-based proximity metric
! Peer collection: " Submitter joins the overlay network " Submitter sends a request with peers requirements to its tracker. " Tracker filters connected peers in its zone that satisfy requirements of
the request and sends the address of these peers back to submitter. " If number of peers collected by this tracker is not enough, then
submitter requests peers from trackers in its local tracker list. " If number of collected peers is not enough after having sent requests
to all trackers in its local tracker list, then submitter requests more trackers address from the two farthest trackers on the two sides in its local tracker list and requests peers from new received trackers.
16
! Creation of groups with coordinators ! Cmax: maximum number of peers in a group ! Submitter sends subtasks to groups coordinators; subtasks are then sent
by coordinators to peers. ! Subtasks results are sent in reverse direction, i.e. peers send their
subtask result to coordinator, then coordinator transfers results to the submitter.
=> More efficient: avoid bottleneck at submitter
17
! Dynamic application repository " Applications: compiled independently with P2PDC as dynamic
libraries (.so in Linux or .dll in Windows). " Load dynamically at runtime
! File transfer " Transferring automatically data input file from submitter to workers
and results file from workers to submitter. " 2 types of file transferred from submitter to workers
# Common file: transferred via the hierarchical allocation architecture.
# Private file: transferred directly from submitter to workers.
18
! Introduced in x-kernel
! A micro-protocol: building block implementing a functionality " Communication: Synchronous, Asynchronous. " Fragmentation: FixeSize, Resize. " Reliability: Retransmission, PositiveAck, NegativeAck, DuplicateAck. " Order: LossyFifo, ReliableFifo, etc. " Congestion control: NewRenoTCP Congestion Control, etc.
! Protocol: composition of micro-protocols
$Reuse code, facilitate design, configure dynamically
20
! Flexible, efficient.
! Two grain levels: " Composite protocols: individual protocol made of micro-protocols. " Protocol stack: composite protocols layered on the top of each others.
! Reconfiguration: substitution of protocols or micro-protocols.
! Event based framework: " Events: state changes, e.g. arrival of messages. " Micro-protocol : a collection of event handlers " Event handler: procedure like segments of codes bound to events. " An event occurs $ execution of all handlers bound to that event.
21
23
! Example of scenario
! Some modifications to Cactus : " Concurrent handler execution(multicore machines) " Eliminate unnecessary copies between layers (use pointers) " Operation for micro-protocol removing
24
Data transfer
API ~~~~~~~~~~~
Transport layer
~~~~~~~~~~~Physical layer
Manages session opening and closure
Captures context information
Reconfigure data channel
Coordinate machines
Stack of composite transport protocols
CTP*
Design of an Infiniband physical layer composite protocol
25
Host Channel Adapter
Infiniband network
Machine
CTP*/IB/Verbs/Infiniband
Communication operations
26
Context Synchronous scheme Asynchronous scheme
Intra cluster Inter cluster Intra cluster Inter cluster protocol IB Eth IB Eth IB Eth IB Eth synchronous X X X X asynchr. X X X X transpotDrive X X X X X X X X resize X X X X X X X X sequecedseg. X X positiveAck X X retransmit X X reliableFIFO X X duplicateAck X DCCPAck X TCPconavoid X
! Obstacle problem " Numerical simulation problems " Occurs in mechanics. " Occurs as a sub-problem in financial mathematics, e.g. option
pricing. ! Problem formulation
! Fixed point problem
30
! 3D Obstacle problem ! Parallel projected Richardson
method. ! Pillar decomposition:
" Reduce the size of messages exchanged between workers.
31
! Fixed point problem:
! Successive approximation method :
! Synchronous distributed iterations
! Asynchronous distributed iterations
32
! Implementation of termination method of Bersekas and Tsitsiklis for asynchronous scheme of computation
34
! Grid’5000 platform: 2970 processors with a total of 6906 cores distributed over 9 sites in France
! Considered problem: " n=256 (17.000.000 points) " 8 clusters in 5 sites
36
! Ethernet ! Grid’5000 platform: 256 peers in 5 sites ! Lyon, Nancy, Orsay, Sophia, Toulouse ! Problem: n=256 (17,000,000 points)
37
! Ethernet vs Infiniband ! Grid’5000 platform: Graphene cluster, Nancy ! Intel Xeon X3440, Gigabit Ethernet, Infiniband-20G ! Problem: n=256 (17,000,000 points)
38
! Ethernet vs Myrinet ! Grid’5000 platform: Helios cluster, Sophia ! Sun X4100, Gigabit Ethernet, Myri-2G ! Problem: n=128 (2,100,000 points)
39
! Considered problem: " 3D obstacle problem with size 256x256x256 " Grid’5000 platform: cluster Sagittaire at Lyon and cluster gdx at Orsay
! Coordinator replication overhead: negligible ! Worker checkpointing and recovery overhead
40
! PlanetLab ! 24 machines from 12 sites (4 in the US, 8 in Europe) ! Latency: 0.1 ms (same site), 30 ms up to 330 ms (different sites) ! Problem: n=192 (7,100,000 points)
42
! Black Scholes equation, European options ! Grid 5000, 128 peers in two sites (Lille, Orsay) ! Problem: n=256 (17,000,000 points)
43
45
! Communication protocol P2PSAP for P2P HPC.
! Decentralized and robust environment P2PDC for P2PHPC.
! Asynchronous and hybrid iterative schemes.
! Combination of asynchronous iterative schemes and P2PDC $ very efficient.
! Experiments with fast networks like Infiniband and Myrinet
! Obstacle problem and financial applications
46
! Improvements on codes and protocols $ better efficiency. ! Test on mixed networks: Infiniband, Myrinet and Ethernet. ! Tests of hybrid methods on fast networks. ! Large scale application deployment, e.g., PlanetLab ! Web portal for P2PDC application deployment. ! Combination of P2PDC with GPUs. ! Other applications: logistics, process engineering…
! Server: manages information regarding trackers connection /disconnection
! Tracker: manages information regarding a set of peers, called a zone. ! Peers: donors of computational resources
50
! Tracker topology " Line topology " Ni: set of closest trackers Ti
# |Ni|/2: having IP greater than IP of Ti
# |Ni|/2: having IP smaller than IP of Ti
" Each tracker maintains 2 connections with 2 closest trackers on left and right side.
51
! IP-based proximity metric
! Definition Peer-to-peer computing or networking is a distributed application
architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the application. (wiki)
P2P is a class of applications that take advantage of resources storage, cycles, content, human presence available at the edges of the Internet. Because accessing these decentralized resources means operating in an environment of unstable connectivity and unpredictable IP addresses, peer-to-peer nodes must operate outside the DNS and have significant or total autonomy of central servers. (Oram 2001)
The term peer to peer refers to a class of systems and applications that
use distributed resources to perform a function in a decentralized manner. (Dejan 2003)
=> all participants play a similar role 53
! Grid computing: " Makes use of supercomputers, clusters and park of workstations
owned by universities, research labs inter-connected by high bandwidth network links in order to form a super virtual computer.
" Resources: turn on all the time. " Middleware: Globus, Condor, etc.
" Characteristics: # Access limited. # Inflexible
56
! Global computing: " Makes use idle computing power of volunteer computers or
institutional computers connected to the Internet in order to solve some large granularity applications.
" Projects SETI@home, GENOME@home => BOINC framework " XtremWeb
" Characteristics: # Centralized architecture => single point of failure # Direct communication: not yet supported
57
! Peer-to-peer high performance computing: " All participants, i.e. peers, can carry out their application.
! P2P environments:
" JNJI " Ourgrid " ParCop " MapReduce " Vishwa " P2P-MPI
58
" Centralized architecture " Bag-of-tasks applications: neither synchronization nor communication. " Java: non efficient for HPC applications " Asynchronous iterative algorithms are not considered
! Convergence of asynchronous iterations " Linear case: Chazan 1969
" Nonlinear case: El Baz 1990, El Baz 1994
" Nonlinear case for H-accretive mappings: Miellou 1985b
" Nonlinear fixed point problems: Miellou 1975, Baudet 1978
" Sub-structuring methods: Venet 2010
" Convergence theorem of Bertsekas: Bertsekas 1983, Bertsekas 1989
" Bounded delay iterations: Lubachevsky 1986
" Multisplitting methods: Frommer 1997, Szyld 1998
" Order intervals: El Baz 1996b, Miellou 1998, El Baz 1998
59
! Convergence detection and termination of asynchronous iterations " Empirical methods: Bertsekas 1989, Miellou 1989, Chajakis 1991
" Method of Bertsekas and Tsitsiklis: Bertsekas 1989, Bertsekas 1991
" Method of Savari and Bertsekas: Savari 1996
" Method of level sets: El Baz 1996a, El Baz 1998
" Other termination methods # Supervised termination: Savari 1996 # Use of secondary algorithm: Miellou 1975, Miellou 1990
60
! Peer collection: " Submitter joins the overlay network " Submitter sends a request with peers requirements to its tracker. " Tracker filters connected peers in its zone that satisfy requirements of
the request and sends the address of these peers back to submitter. " If number of peers collected by this tracker is not enough, then
submitter requests peers from trackers in its local tracker list. " If number of collected peers is not enough after having sent requests
to all trackers in its local tracker list, then submitter requests more trackers address from the two farthest trackers on the two sides in its local tracker list and requests peers from new received trackers.
61
! Creation of groups with coordinators ! Cmax: maximum number of peers in a group ! Submitter sends subtasks to groups coordinators; subtasks are then sent
by coordinators to peers. ! Subtasks results are sent in reverse direction, i.e. peers send their
subtask result to coordinator, then coordinator transfers results to the submitter.
=> More efficient: avoid bottleneck at submitter
62
! Dynamic application repository " Applications: compiled independently with P2PDC as dynamic
libraries (.so in Linux or .dll in Windows). " Load dynamically at runtime
! File transfer " Transferring automatically data input file from submitter to workers
and results file from workers to submitter. " 2 types of file transferred from submitter to workers
# Common file: transferred via the hierarchical allocation architecture.
# Private file: transferred directly from submitter to workers.
63
! Great development of peer-to-peer applications " File sharing, video, ... " Recent advances in microprocessor architecture and high
bandwidth network → new applications like distributed HPC computing/computing on the Internet.
! Challenges " Scalability " Heterogeneity " Volatility " Fault tolerance " Existing protocols (TCP, UDP, etc.) not well suited to HPC.
64
! Existing P2P environments: " Centralized architecture " Bag-of-tasks applications: neither synchronization nor
communication between peers. " Parallel iterative methods or recursive methods are not considered " Java: non efficient for HPC applications
65