branch-and-bound with peer-to-peer for large-scale grid

198
Branch-and-Bound with Peer-to-Peer for Large-Scale Grids Alexandre di Costanzo INRIA - CNRS - I3S - Université de Sophia Antipolis Ph.D. defense Friday 12th October 2007 Advisor: Prof. Denis Caromel

Upload: others

Post on 24-Oct-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Branch-and-Bound with Peer-to-Peerfor Large-Scale Grids

Alexandre di Costanzo

INRIA - CNRS - I3S - Université de Sophia Antipolis

Ph.D. defenseFriday 12th October 2007

Advisor: Prof. Denis Caromel

Page 2: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

The Big Picture

2

Page 3: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

The Big PictureObjective

2

Page 4: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

The Big PictureObjective

Solving combinatorial optimization problems with Grids

2

Page 5: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

The Big PictureObjective

Solving combinatorial optimization problems with Grids

Approach

2

Page 6: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

The Big PictureObjective

Solving combinatorial optimization problems with Grids

Approach

Parallel Branch-and-Bound and Peer-to-Peer

2

Page 7: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

The Big PictureObjective

Solving combinatorial optimization problems with Grids

Approach

Parallel Branch-and-Bound and Peer-to-Peer

Contributions

2

Page 8: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

The Big PictureObjective

Solving combinatorial optimization problems with Grids

Approach

Parallel Branch-and-Bound and Peer-to-Peer

Contributions

1. Branch-and-Bound framework for Grids

2

Page 9: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

The Big PictureObjective

Solving combinatorial optimization problems with Grids

Approach

Parallel Branch-and-Bound and Peer-to-Peer

Contributions

1. Branch-and-Bound framework for Grids

2. Peer-to-Peer Infrastructure for Grids

2

Page 10: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

The Big PictureObjective

Solving combinatorial optimization problems with Grids

Approach

Parallel Branch-and-Bound and Peer-to-Peer

Contributions

1. Branch-and-Bound framework for Grids

2. Peer-to-Peer Infrastructure for Grids

3. Large-scale experiments2

Page 11: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Agenda

Context, Problem, and Related Work

Contributions

Branch-and-Bound Framework for Grids

Desktop Grid with Peer-to-Peer

Mixing Desktops & Clusters

Perspectives & Conclusion

3

Page 12: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Context

4

Page 13: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Context

Combinatorial Optimization Problems (COPs)

costly to solve (finding the best solution)

4

Page 14: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Context

Combinatorial Optimization Problems (COPs)

costly to solve (finding the best solution)

Branch-and-Bound (BnB)

well adapted for solving COPs

relatively easy to provide parallel version

4

Page 15: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Context

Combinatorial Optimization Problems (COPs)

costly to solve (finding the best solution)

Branch-and-Bound (BnB)

well adapted for solving COPs

relatively easy to provide parallel version

Grid Computing

large pool of resources

large-scale environment

4

Page 16: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Branch-and-Bound

5

Page 17: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Branch-and-Bound

5

Consists of a partial enumeration of all feasible solutions and returns the guaranteed

optimal solution

Page 18: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

5

Consists of a partial enumeration of all feasible solutions and returns the guaranteed

optimal solution

Page 19: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

3 operations:

5

Consists of a partial enumeration of all feasible solutions and returns the guaranteed

optimal solution

Page 20: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

3 operations:

Branching: split in sub-problems

5

Consists of a partial enumeration of all feasible solutions and returns the guaranteed

optimal solution

Page 21: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

3 operations:

Branching: split in sub-problems

Bounding: compute lower/upper bounds (objective function)

5

Consists of a partial enumeration of all feasible solutions and returns the guaranteed

optimal solution

Page 22: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

3 operations:

Branching: split in sub-problems

Bounding: compute lower/upper bounds (objective function)

Pruning: eliminate bad branches

5

Consists of a partial enumeration of all feasible solutions and returns the guaranteed

optimal solution

Page 23: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

3 operations:

Branching: split in sub-problems

Bounding: compute lower/upper bounds (objective function)

Pruning: eliminate bad branches

Well adapted for solving COPs [Papadimitriou 98]

5

Consists of a partial enumeration of all feasible solutions and returns the guaranteed

optimal solution

Page 24: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

3 operations:

Branching: split in sub-problems

Bounding: compute lower/upper bounds (objective function)

Pruning: eliminate bad branches

Well adapted for solving COPs [Papadimitriou 98]

return the best combinaison out of all5

Consists of a partial enumeration of all feasible solutions and returns the guaranteed

optimal solution

Page 25: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

3 operations:

Branching: split in sub-problems

Bounding: compute lower/upper bounds (objective function)

Pruning: eliminate bad branches

Well adapted for solving COPs [Papadimitriou 98]

return the best combinaison out of all5

Consists of a partial enumeration of all feasible solutions and returns the guaranteed

optimal solution

Branching: split in sub-problems

Bounding: compute local lower/upper bounds

Pruning: local lower bound higher than the global one

Not generated and explored parts of the tree

Page 26: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Parallel Branch-and-Bound

6

Page 27: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Parallel Branch-and-BoundCOPs are difficult to solve:

enumeration size & NP-hard class

6

Page 28: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Parallel Branch-and-BoundCOPs are difficult to solve:

enumeration size & NP-hard class

Many studies on parallel approach [Gendron 94, Crainic 06]

node-based: parallel bounding on sub-problems

tree-based: building the tree in parallelmulti-search: several trees are generated in parallel

6

Page 29: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Parallel Branch-and-BoundCOPs are difficult to solve:

enumeration size & NP-hard class

Many studies on parallel approach [Gendron 94, Crainic 06]

node-based: parallel bounding on sub-problems

tree-based: building the tree in parallelmulti-search: several trees are generated in parallel

Tree-based is the most studied [Authié 95 , Crainic 06]

the solution tree is not known beforehand

no part of the tree may be estimated at compilation

tasks are dynamically generatedtask allocations to processors must be done dynamically

distributing issues, such as load-balancing & information sharing

6

Page 30: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Parallel Branch-and-BoundCOPs are difficult to solve:

enumeration size & NP-hard class

Many studies on parallel approach [Gendron 94, Crainic 06]

node-based: parallel bounding on sub-problems

tree-based: building the tree in parallelmulti-search: several trees are generated in parallel

Tree-based is the most studied [Authié 95 , Crainic 06]

the solution tree is not known beforehand

no part of the tree may be estimated at compilation

tasks are dynamically generatedtask allocations to processors must be done dynamically

distributing issues, such as load-balancing & information sharing

Sharing a global bound for optimizing the prune operation

6

Page 31: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Parallel Branch-and-BoundCOPs are difficult to solve:

enumeration size & NP-hard class

Many studies on parallel approach [Gendron 94, Crainic 06]

node-based: parallel bounding on sub-problems

tree-based: building the tree in parallelmulti-search: several trees are generated in parallel

Tree-based is the most studied [Authié 95 , Crainic 06]

the solution tree is not known beforehand

no part of the tree may be estimated at compilation

tasks are dynamically generatedtask allocations to processors must be done dynamically

distributing issues, such as load-balancing & information sharing

Sharing a global bound for optimizing the prune operation

6

Page 32: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Parallel Branch-and-BoundCOPs are difficult to solve:

enumeration size & NP-hard class

Many studies on parallel approach [Gendron 94, Crainic 06]

node-based: parallel bounding on sub-problems

tree-based: building the tree in parallelmulti-search: several trees are generated in parallel

Tree-based is the most studied [Authié 95 , Crainic 06]

the solution tree is not known beforehand

no part of the tree may be estimated at compilation

tasks are dynamically generatedtask allocations to processors must be done dynamically

distributing issues, such as load-balancing & information sharing

Sharing a global bound for optimizing the prune operation

6

Proposition: Parallel BnB + Grid

Tree-based is the most known by users &Related difficulties are known

Page 33: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Related Work

7

Frameworks Algorithms Parallelization Machines PUBB Low-level, Basic B&B SPMD Cluster/PVM BOB++ Low-level, Basic B&B SPMD Cluster/MPI PPBB-Lib Basic B&B SPMD Cluster/PVM PICO Basic B&B, Mixed-integer LP hier. master-worker Cluster/MPI MallBa Low-level, Basic B&B SPMD Cluster/MPI ZRAM Low-level, Basic B&B SPMD Cluster/PVM

ALPS/BiCePS •Low-level•Basic B&B•Mixed-integer LP•Branch&Price&Cut

hier. master-worker Cluster/MPI

Metacomputing MW Basic B&B master-worker Grids/Condor Symphony Mixed-integer LP, Branch&Price&Cut master-worker Cluster/PVM

Page 34: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Related Work

7

Frameworks Algorithms Parallelization Machines PUBB Low-level, Basic B&B SPMD Cluster/PVM BOB++ Low-level, Basic B&B SPMD Cluster/MPI PPBB-Lib Basic B&B SPMD Cluster/PVM PICO Basic B&B, Mixed-integer LP hier. master-worker Cluster/MPI MallBa Low-level, Basic B&B SPMD Cluster/MPI ZRAM Low-level, Basic B&B SPMD Cluster/PVM

ALPS/BiCePS •Low-level•Basic B&B•Mixed-integer LP•Branch&Price&Cut

hier. master-worker Cluster/MPI

Metacomputing MW Basic B&B master-worker Grids/Condor Symphony Mixed-integer LP, Branch&Price&Cut master-worker Cluster/PVM

Most Previous work are based on SPMD and target clustersbetter for sharing bounds

Page 35: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid Computing

8

Page 36: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid ComputingDistributed shared computing infrastructure

multi-institutional virtual organization

8

Page 37: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid ComputingDistributed shared computing infrastructure

multi-institutional virtual organization

Provide large pool of resources

8

Page 38: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid ComputingDistributed shared computing infrastructure

multi-institutional virtual organization

Provide large pool of resources

New challenges

geographically distributed

deployment

scalability

communication

fault-tolerance

multiple administrative domains, heterogeneity, high performance, programming model, etc.

8

Page 39: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid ComputingDistributed shared computing infrastructure

multi-institutional virtual organization

Provide large pool of resources

New challenges

geographically distributed

deployment

scalability

communication

fault-tolerance

multiple administrative domains, heterogeneity, high performance, programming model, etc.

8

Parallel BnB related work: SPMDGrids are not adapted for SPMD (heterogeneity, latency, etc.)

Page 40: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid Computing

9

Grids involve new concepts for development & execution of applications

Page 41: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid Computing

9

Grid FabricSchedulers, Networking, OSFederated Hardware Resources

Grids involve new concepts for development & execution of applications

Page 42: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid Computing

9

Grid FabricSchedulers, Networking, OSFederated Hardware Resources

Grid Middleware Infrastructure

Super-Schedulers, Resource Trading, Information, Security, etc.

Grids involve new concepts for development & execution of applications

Page 43: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid Computing

9

Grid FabricSchedulers, Networking, OSFederated Hardware Resources

Grid Middleware Infrastructure

Super-Schedulers, Resource Trading, Information, Security, etc.

Grid ProgrammingModels, Tools,High-Level Access to Middleware

Grids involve new concepts for development & execution of applications

Page 44: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid Computing

9

Grid Applications & Portals

Grid FabricSchedulers, Networking, OSFederated Hardware Resources

Grid Middleware Infrastructure

Super-Schedulers, Resource Trading, Information, Security, etc.

Grid ProgrammingModels, Tools,High-Level Access to Middleware

Grids involve new concepts for development & execution of applications

Page 45: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid Computing

9

Grid Applications & Portals

Grid FabricSchedulers, Networking, OSFederated Hardware Resources

Grid Middleware Infrastructure

Super-Schedulers, Resource Trading, Information, Security, etc.

Grid ProgrammingModels, Tools,High-Level Access to Middleware

Grids involve new concepts for development & execution of applications

Branch-and-Bound API

Page 46: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Challenges

10

Combinatorial Optimization Problemscostly to solve

Page 47: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Challenges

10

Branch-and-Boundadapted for solving COPs

easy to parallelize

Combinatorial Optimization Problemscostly to solve

Page 48: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Challenges

10

Branch-and-Boundadapted for solving COPs

easy to parallelize

Grid Computinghuge number of resources

Combinatorial Optimization Problemscostly to solve

Page 49: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Challenges

10

Branch-and-Boundadapted for solving COPs

easy to parallelize

Grid Computinghuge number of resources

Combinatorial Optimization Problemscostly to solve

Use Branch-and-Bound on Grids for solving COPs

+

Page 50: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Challenges

10

Branch-and-Boundadapted for solving COPs

easy to parallelize

Grid Computinghuge number of resources

Combinatorial Optimization Problemscostly to solve

Use Branch-and-Bound on Grids for solving COPs

+

Efficient communications with Grids is difficult Problem with sharing bounds

Page 51: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Agenda

Context, Problem, and Related Work

Contributions

Branch-and-Bound Framework for Grids

Desktop Grid with Peer-to-Peer

Mixing Desktops & Clusters

Perspectives & Conclusion

11

Page 52: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB for Grids Related Work

12

Page 53: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB for Grids Related Work

Aida et al. focus on the design:hierarchical master-worker scales on Grids

12

Page 54: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB for Grids Related Work

Aida et al. focus on the design:hierarchical master-worker scales on Grids

Foster et al. fully decentralized approach:communication overhead

12

Page 55: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB for Grids Related Work

Aida et al. focus on the design:hierarchical master-worker scales on Grids

Foster et al. fully decentralized approach:communication overhead

ParadisEO focus on meta-heuristics (not exact solution):master-worker

12

Page 56: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB for Grids Related Work

Aida et al. focus on the design:hierarchical master-worker scales on Grids

Foster et al. fully decentralized approach:communication overhead

ParadisEO focus on meta-heuristics (not exact solution):master-worker

12

Skeletons with farm or divide-and-conquerSatin for divide-and-conquer

Page 57: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB on Grids: Problems and Solutions

13

Page 58: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB on Grids: Problems and Solutions

13

Page 59: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

Page 60: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

Page 61: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

Page 62: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

Page 63: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

Page 64: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

Page 65: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

Page 66: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

Page 67: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

Page 68: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

Page 69: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

Objective: hide Grid difficulties to usersEspecially communication problems

Page 70: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework - Entities and Roles

14

Page 71: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework - Entities and Roles

Root Task:implemented by usersobjective-function and splitting/branching operation

14

Page 72: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework - Entities and Roles

Root Task:implemented by usersobjective-function and splitting/branching operation

Master: Entry Pointsplits the problem in taskscollects partial-results ➟ the best solution

14

Page 73: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework - Entities and Roles

Root Task:implemented by usersobjective-function and splitting/branching operation

Master: Entry Pointsplits the problem in taskscollects partial-results ➟ the best solution

Sub-Master:intermediary between master and workers

14

Page 74: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework - Entities and Roles

Root Task:implemented by usersobjective-function and splitting/branching operation

Master: Entry Pointsplits the problem in taskscollects partial-results ➟ the best solution

Sub-Master:intermediary between master and workers

Worker:computes tasks

14

Page 75: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Worker

Worker

Worker Worker

BnB Framework - Architecture

Page 76: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Problem

Worker

Worker

Worker Worker

BnB Framework - Architecture

Page 77: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Problem

First splitting

Worker

Worker

Worker Worker

BnB Framework - Architecture

Page 78: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

First splitting

Worker

Worker

Worker Worker

BnB Framework - Architecture

Page 79: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

Worker

Worker

Worker Worker

BnB Framework - Architecture

Page 80: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

Ask task

Worker

Worker

Worker Worker

BnB Framework - Architecture

Page 81: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

Worker

Worker

Worker Worker

BnB Framework - Architecture

Page 82: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

Ask task

Worker

Worker

Worker Worker

BnB Framework - Architecture

Page 83: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

Worker

Worker

Worker Worker

BnB Framework - Architecture

Page 84: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

Got task

Worker

Worker

Worker Worker

BnB Framework - Architecture

Page 85: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

Worker

Worker

Worker Worker

BnB Framework - Architecture

Page 86: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

Got task

Worker

Worker

Worker Worker

BnB Framework - Architecture

Page 87: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

Worker

Worker

Worker Worker

BnB Framework - Architecture

Page 88: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

ComputingWorker

Worker

Worker Worker

BnB Framework - Architecture

Page 89: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

Computing

Sending result

Worker

Worker

Worker Worker

BnB Framework - Architecture

Page 90: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

ComputingWorker

Worker

Worker Worker

BnB Framework - Architecture

Page 91: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

ComputingWorker

Worker

Worker Worker

BnB Framework - Architecture

Page 92: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasks Partial resultsProblem

ComputingWorker

Worker

Worker Worker

BnB Framework - Architecture

Page 93: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework - Solutions

16

Page 94: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework - SolutionsContext ProActive Java Grid middleware:

latency ➟ asynchronous communicationunderlaying Grid infrastructure ➟ deployment framework (abstraction)

16

Page 95: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework - SolutionsContext ProActive Java Grid middleware:

latency ➟ asynchronous communicationunderlaying Grid infrastructure ➟ deployment framework (abstraction)

16

Implement the tree-based parallelMaster-worker architectureProblem: workers need to share bounds

difficult to adapt SPMD for Grids (heterogeneity, distribution, etc.)

Page 96: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework - SolutionsContext ProActive Java Grid middleware:

latency ➟ asynchronous communicationunderlaying Grid infrastructure ➟ deployment framework (abstraction)

16

Solution 1: Master keeps the bound

Implement the tree-based parallelMaster-worker architectureProblem: workers need to share bounds

difficult to adapt SPMD for Grids (heterogeneity, distribution, etc.)

Page 97: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework - SolutionsContext ProActive Java Grid middleware:

latency ➟ asynchronous communicationunderlaying Grid infrastructure ➟ deployment framework (abstraction)

16

Solution 1: Master keeps the bound- previous work shows that not scale [Aida 2003]

Implement the tree-based parallelMaster-worker architectureProblem: workers need to share bounds

difficult to adapt SPMD for Grids (heterogeneity, distribution, etc.)

Page 98: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework - SolutionsContext ProActive Java Grid middleware:

latency ➟ asynchronous communicationunderlaying Grid infrastructure ➟ deployment framework (abstraction)

16

Solution 1: Master keeps the bound- previous work shows that not scale [Aida 2003]

Solution 2: Message framework (Enterprise Service Bus)

Implement the tree-based parallelMaster-worker architectureProblem: workers need to share bounds

difficult to adapt SPMD for Grids (heterogeneity, distribution, etc.)

Page 99: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework - SolutionsContext ProActive Java Grid middleware:

latency ➟ asynchronous communicationunderlaying Grid infrastructure ➟ deployment framework (abstraction)

16

Solution 1: Master keeps the bound- previous work shows that not scale [Aida 2003]

Solution 2: Message framework (Enterprise Service Bus)- Grid middleware dependent / Good for SOA

Implement the tree-based parallelMaster-worker architectureProblem: workers need to share bounds

difficult to adapt SPMD for Grids (heterogeneity, distribution, etc.)

Page 100: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework - SolutionsContext ProActive Java Grid middleware:

latency ➟ asynchronous communicationunderlaying Grid infrastructure ➟ deployment framework (abstraction)

16

Solution 1: Master keeps the bound- previous work shows that not scale [Aida 2003]

Solution 2: Message framework (Enterprise Service Bus)- Grid middleware dependent / Good for SOA Solution 3: Broadcasting

Implement the tree-based parallelMaster-worker architectureProblem: workers need to share bounds

difficult to adapt SPMD for Grids (heterogeneity, distribution, etc.)

Page 101: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework - SolutionsContext ProActive Java Grid middleware:

latency ➟ asynchronous communicationunderlaying Grid infrastructure ➟ deployment framework (abstraction)

16

Solution 1: Master keeps the bound- previous work shows that not scale [Aida 2003]

Solution 2: Message framework (Enterprise Service Bus)- Grid middleware dependent / Good for SOA Solution 3: Broadcasting- 1 to n communication cannot scale

Implement the tree-based parallelMaster-worker architectureProblem: workers need to share bounds

difficult to adapt SPMD for Grids (heterogeneity, distribution, etc.)

Page 102: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework - SolutionsContext ProActive Java Grid middleware:

latency ➟ asynchronous communicationunderlaying Grid infrastructure ➟ deployment framework (abstraction)

16

Solution 1: Master keeps the bound- previous work shows that not scale [Aida 2003]

Solution 2: Message framework (Enterprise Service Bus)- Grid middleware dependent / Good for SOA Solution 3: Broadcasting- 1 to n communication cannot scale

✓ hierarchical broadcasting scale [Baduel 05]

Implement the tree-based parallelMaster-worker architectureProblem: workers need to share bounds

difficult to adapt SPMD for Grids (heterogeneity, distribution, etc.)

Page 103: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Organizing Communications for Broadcasting

17

Page 104: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Organizing Communications for Broadcasting

Idea: Grids are composed of clusters ➟ organizing Workers in groups

17

Page 105: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Organizing Communications for Broadcasting

Idea: Grids are composed of clusters ➟ organizing Workers in groups

clusters are high-performance communication environments

17

Page 106: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Organizing Communications for Broadcasting

Idea: Grids are composed of clusters ➟ organizing Workers in groups

clusters are high-performance communication environments

17

Page 107: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Organizing Communications for Broadcasting

Idea: Grids are composed of clusters ➟ organizing Workers in groups

clusters are high-performance communication environments

Solution:

add a new entity for organizing communications: Leader

Leader is a Worker chose by the Master for each group

17

Page 108: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Organizing Communications for Broadcasting

Idea: Grids are composed of clusters ➟ organizing Workers in groups

clusters are high-performance communication environments

Solution:

add a new entity for organizing communications: Leader

Leader is a Worker chose by the Master for each group

17

Page 109: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Organizing Communications for Broadcasting

Idea: Grids are composed of clusters ➟ organizing Workers in groups

clusters are high-performance communication environments

Solution:

add a new entity for organizing communications: Leader

Leader is a Worker chose by the Master for each group

Process to update Bounds:

1. the Worker broadcasts the new Bound inside its group

2. the group Leader broadcasts the new Bound to all Leaders

3. each Leader broadcasts the new value inside their groups

17

Page 110: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

18

Cluster

Worker Worker

Worker Leader

Cluster

Worker Leader

Worker Worker

Cluster

Worker Leader

Worker Worker

Cluster

Leader Worker

Worker Worker

Sub-master

Sub-master

BnB framework - Communications for Sharing Bound

Page 111: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

18

Cluster

Worker Worker

Worker Leader

Cluster

Worker Leader

Worker Worker

Cluster

Worker Leader

Worker Worker

Cluster

Leader Worker

Worker Worker

Sub-master

Sub-master

BnB framework - Communications for Sharing Bound

Page 112: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

18

Cluster

Worker Worker

Worker Leader

Cluster

Worker Leader

Worker Worker

Cluster

Worker Leader

Worker Worker

Cluster

Leader Worker

Worker Worker

Sub-master

Sub-master

BnB framework - Communications for Sharing Bound

Page 113: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

18

Cluster

Worker Worker

Worker Leader

Cluster

Worker Leader

Worker Worker

Cluster

Worker Leader

Worker Worker

Cluster

Leader Worker

Worker Worker

Sub-master

Sub-master

BnB framework - Communications for Sharing Bound

Page 114: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

18

Cluster

Worker Worker

Worker Leader

Cluster

Worker Leader

Worker Worker

Cluster

Worker Leader

Worker Worker

Cluster

Leader Worker

Worker Worker

Sub-master

Sub-master

new best bound!

BnB framework - Communications for Sharing Bound

Page 115: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

18

Cluster

Worker Worker

Worker Leader

Cluster

Worker Leader

Worker Worker

Cluster

Worker Leader

Worker Worker

Cluster

Leader Worker

Worker Worker

Sub-master

Sub-master

new best bound!

BnB framework - Communications for Sharing Bound

Page 116: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

18

Cluster

Worker Worker

Worker Leader

Cluster

Worker Leader

Worker Worker

Cluster

Worker Leader

Worker Worker

Cluster

Leader Worker

Worker Worker

Sub-master

Sub-master

new best bound!

BnB framework - Communications for Sharing Bound

Page 117: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

18

Cluster

Worker Worker

Worker Leader

Cluster

Worker Leader

Worker Worker

Cluster

Worker Leader

Worker Worker

Cluster

Leader Worker

Worker Worker

Sub-master

Sub-master

new best bound!

BnB framework - Communications for Sharing Bound

Page 118: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

18

Cluster

Worker Worker

Worker Leader

Cluster

Worker Leader

Worker Worker

Cluster

Worker Leader

Worker Worker

Cluster

Leader Worker

Worker Worker

Sub-master

Sub-master

new best bound!

BnB framework - Communications for Sharing Bound

Page 119: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

18

Cluster

Worker Worker

Worker Leader

Cluster

Worker Leader

Worker Worker

Cluster

Worker Leader

Worker Worker

Cluster

Leader Worker

Worker Worker

Sub-master

Sub-master

new best bound!

BnB framework - Communications for Sharing Bound

Grid middleware provides communications

Page 120: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Search StrategiesThe improvement of the Bound

1. depends of how it is shared (communication)

2. depends of how the search-tree is generated:

Classical

Depth-First Search

Breadth-First Search

Contribution

First-In-First-Out (FIFO)

Priority

open API ...19

Page 121: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework: Fault-Tolerance

20

Page 122: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework: Fault-Tolerance

20

Manage user exception ➟ computation stopped

Page 123: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework: Fault-Tolerance

20

Manage user exception ➟ computation stopped

For us, a fault is a Failed Stop

Page 124: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework: Fault-Tolerance

20

Manage user exception ➟ computation stopped

For us, a fault is a Failed Stop

Worker fault ➟ handled by (sub-)master

Page 125: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework: Fault-Tolerance

20

Manage user exception ➟ computation stopped

For us, a fault is a Failed Stop

Worker fault ➟ handled by (sub-)master

Leader fault ➟ master choose new one

Page 126: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework: Fault-Tolerance

20

Manage user exception ➟ computation stopped

For us, a fault is a Failed Stop

Worker fault ➟ handled by (sub-)master

Leader fault ➟ master choose new one

Sub-master fault ➟ manager change a worker to sub-master

Page 127: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework: Fault-Tolerance

20

Manage user exception ➟ computation stopped

For us, a fault is a Failed Stop

Worker fault ➟ handled by (sub-)master

Leader fault ➟ master choose new one

Sub-master fault ➟ manager change a worker to sub-master

Master fault ➟ back-up file

Page 128: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework: Fault-Tolerance

20

Manage user exception ➟ computation stopped

For us, a fault is a Failed Stop

Worker fault ➟ handled by (sub-)master

Leader fault ➟ master choose new one

Sub-master fault ➟ manager change a worker to sub-master

Master fault ➟ back-up file

Load-Balancing is natural with master-worker

Page 129: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

BnB Framework: Fault-Tolerance

20

Manage user exception ➟ computation stopped

For us, a fault is a Failed Stop

Worker fault ➟ handled by (sub-)master

Leader fault ➟ master choose new one

Sub-master fault ➟ manager change a worker to sub-master

Master fault ➟ back-up file

Load-Balancing is natural with master-worker

the framework provides a function to get the number of free Workers ➟ users use it to decide branching

Page 130: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid’BnB [HiPC07] Features

21

Page 131: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid’BnB [HiPC07] Features

21

Design

Page 132: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid’BnB [HiPC07] FeaturesAsynchronous communicationsHierarchical master-worker with com.Dynamic task splittingEfficient communications with groupsFault-tolerance

21

Design

Page 133: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid’BnB [HiPC07] FeaturesAsynchronous communicationsHierarchical master-worker with com.Dynamic task splittingEfficient communications with groupsFault-tolerance

21

Design

Users

Page 134: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid’BnB [HiPC07] FeaturesAsynchronous communicationsHierarchical master-worker with com.Dynamic task splittingEfficient communications with groupsFault-tolerance

21

Hidden parallelism and Grid difficultiesAPI for COPsEase of deploymentPrincipally tree-basedImplementing and testing search strategiesFocus on objective function

Design

Users

Page 135: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid’BnB [HiPC07] FeaturesAsynchronous communicationsHierarchical master-worker with com.Dynamic task splittingEfficient communications with groupsFault-tolerance

21

Hidden parallelism and Grid difficultiesAPI for COPsEase of deploymentPrincipally tree-basedImplementing and testing search strategiesFocus on objective function

Design

Users

Validate and Test Grid'BnB by experiments

Page 136: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Flow-Shop Experiments

NP-hard permutation optimization problem

22

Execution time

m1

m2

m3

m4

o1,1

o1,2

o1,3

o1,4

o2,1

o2,2

o2,3

o2,4

o3,1

o3,2

o3,3

o3,4

o4,1

o4,2

o4,3

o4,4

J = {j1, j2, . . . jn}ji = { oi1, oi2, . . . oim }M = {m1,m2, . . .mm}

Page 137: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

23

Flow-Shop: 16 Jobs / 20 Machines

0

50

100

150

200

250

20 30 40 50 60

CPUs

Tim

e in

min

utes

FIFO Depth-first search Breadth-first search Priority

Single Cluster: Search Strategies

Page 138: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

24

Flow-Shop: 16 Jobs / 20 Machines

0

10

20

30

40

50

60

70

80

20 30 40 50 60

CPUs

Tim

e in

min

utes

1

2

3

Ratio

Facto

r

With Group Communications Without Group Communications

Ratio (With Com / No Com)

Single Cluster: Communications

Page 139: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid'5000: the French Grid

25

Sophia

GrenobleLyon

Nancy

Orsay

Lille

Rennes

Bordeaux

Toulouse

9 sites - 14 clusters - 3586 CPUs

Page 140: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Flow-Shop: 17 Jobs / 17 Machines

0

20

40

60

80

100

120

0 100 200 300 400 500 600 700

CPUs

Tim

e in

min

ute

s

Execution Time

Grid Experimentations

26

up to 621 CPUs on 5 sites

Page 141: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Flow-Shop: 17 Jobs / 17 Machines

0

20

40

60

80

100

120

0 100 200 300 400 500 600 700

CPUs

Tim

e in

min

ute

s

Execution Time

Grid Experimentations

26

Flow-Shop: 17 Jobs / 17 Machines

0

20

40

60

80

100

120

0 100 200 300 400 500 600 700

CPUs

Tim

e in

min

ute

s

0

0,05

0,1

0,15

0,2

0,25

0,3

% o

f exp

lored

search

tree

Execution Time Total Work

up to 621 CPUs on 5 sites

Page 142: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Flow-Shop: 17 Jobs / 17 Machines

0

20

40

60

80

100

120

0 100 200 300 400 500 600 700

CPUs

Tim

e in

min

ute

s

Execution Time

Grid Experimentations

26

Flow-Shop: 17 Jobs / 17 Machines

0

20

40

60

80

100

120

0 100 200 300 400 500 600 700

CPUs

Tim

e in

min

ute

s

0

0,05

0,1

0,15

0,2

0,25

0,3

% o

f exp

lored

search

tree

Execution Time Total Work

up to 621 CPUs on 5 sites

Speedup anomalies [Lai 84, Li 86]

Page 143: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Speedup Anomalies & Efficiency

Parallel tree-based speedup may sometimes quite spectacular (> or < linear)[Mans 95]

Speedup Anomalies in BnB [Roucairol 87, Lai 84, Li 86]

speedup depends of how the tree is dynamically built

Efficiency: is a related measure computed as the speedup divided by the number of processors.

27

Page 144: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Speedup Anomalies & Efficiency

Parallel tree-based speedup may sometimes quite spectacular (> or < linear)[Mans 95]

Speedup Anomalies in BnB [Roucairol 87, Lai 84, Li 86]

speedup depends of how the tree is dynamically built

Efficiency: is a related measure computed as the speedup divided by the number of processors.

27

Flow-Shop: 17 Jobs / 17 Machines

0

0,2

0,4

0,6

0,8

1

1,2

90 140 190 240 290 340 390 440 490 540 590 640

CPUs

Eff

icie

ncy

Page 145: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid'BnB: Results

28

Page 146: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid'BnB: Results

Experimentally validate our BnB framework for Gridsvalidity of organizing communicationsscalability on Grid (up to 621 CPUs on 5 sites)

28

Page 147: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid'BnB: Results

Experimentally validate our BnB framework for Gridsvalidity of organizing communicationsscalability on Grid (up to 621 CPUs on 5 sites)

28

Page 148: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid'BnB: Results

Experimentally validate our BnB framework for Gridsvalidity of organizing communicationsscalability on Grid (up to 621 CPUs on 5 sites)

Problems:deployment on Grids is difficultdynamically acquiring new resources is difficultpopularity of Grid'5000 cannot mix Grid'5000 and under-utilized lab desktops

28

Page 149: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Grid'BnB: Results

Experimentally validate our BnB framework for Gridsvalidity of organizing communicationsscalability on Grid (up to 621 CPUs on 5 sites)

Problems:deployment on Grids is difficultdynamically acquiring new resources is difficultpopularity of Grid'5000 cannot mix Grid'5000 and under-utilized lab desktops

28

Grid middleware needs a better supporting ofdynamic infrastructure

Page 150: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Agenda

Context, Problem, and Related Work

Contributions

Branch-and-Bound Framework for Grids

Desktop Grid with Peer-to-Peer

Mixing Desktops & Clusters

Perspectives & Conclusion

29

Page 151: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Peer-to-Peer as Grid Middleware

30

Grid Computing and Peer-to-Peer share a common goal:sharing resources [Foster 03, Goux 00]

Grid related work: [Globus]

✓ providing computational resources- installing/deploying Grid middleware is difficultP2P related work: [Gnutella, Freenet, DHT]

- focusing on sharing data & mono-application✓ dynamic & easy to deploy

Page 152: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Peer-to-Peer as Grid Middleware

30

Grid Computing and Peer-to-Peer share a common goal:sharing resources [Foster 03, Goux 00]

Grid related work: [Globus]

✓ providing computational resources- installing/deploying Grid middleware is difficultP2P related work: [Gnutella, Freenet, DHT]

- focusing on sharing data & mono-application✓ dynamic & easy to deploy

Objective: provide a P2P infrastructure for Grids andsharing computational resources

Page 153: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Which P2P Architecture?Master-Worker (SETI@home)

• centralized

- targets only desktops

• good for embarrassingly parallelPure/Unstructured Peer-to-Peer (Gnutella)

- flooding problems

✓ supports high-churn (good for Desktop Grids)

✓ supports many kind of application (data, computational)Hybrid Peer-to-Peer (JXTA)

• uses central servers

✓ limits the flooding

- has to manage churnStructured Peer-to-Peer (Chord)

- high cost for managing churn

• efficient for data sharing (Distributed Hash Table)

31

Worker

Master

Worker WorkerWorkerWorker ... ...

T' T'' T'''

pending tasks

R' R'' R'''

partial results

asking for a task

sending a task

sending a partial result

computing task computing task

B

A

Resource Y

B

A

Resource Y

B

A

Resource Y

Peers in acquaintance Message queryPeer

FloodingUnstructured P2P Peer A accesses directly to resource Y on B

B

A

Resource Y

B

A

Resource Y

B

A

Resource Y

Peers in acquaintance

Message query

Peer

FloodingHybrid P2P Peer A accesses directly to resource Y on B

Server

B -> Y

...

N1

N38

N8

N14

N21

N32N42

N48

N51

N56

lookup(K54)

Peer Query

02 -1m

K54

Finger table

N42N8+32

N32N8+16

N21N8+8

N14N8+4

N8+2 N14

N14N8+1

m=6

Page 154: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Which P2P Architecture?Master-Worker (SETI@home)

• centralized

- targets only desktops

• good for embarrassingly parallelPure/Unstructured Peer-to-Peer (Gnutella)

- flooding problems

✓ supports high-churn (good for Desktop Grids)

✓ supports many kind of application (data, computational)Hybrid Peer-to-Peer (JXTA)

• uses central servers

✓ limits the flooding

- has to manage churnStructured Peer-to-Peer (Chord)

- high cost for managing churn

• efficient for data sharing (Distributed Hash Table)

31

Worker

Master

Worker WorkerWorkerWorker ... ...

T' T'' T'''

pending tasks

R' R'' R'''

partial results

asking for a task

sending a task

sending a partial result

computing task computing task

B

A

Resource Y

B

A

Resource Y

B

A

Resource Y

Peers in acquaintance Message queryPeer

FloodingUnstructured P2P Peer A accesses directly to resource Y on B

B

A

Resource Y

B

A

Resource Y

B

A

Resource Y

Peers in acquaintance

Message query

Peer

FloodingHybrid P2P Peer A accesses directly to resource Y on B

Server

B -> Y

...

N1

N38

N8

N14

N21

N32N42

N48

N51

N56

lookup(K54)

Peer Query

02 -1m

K54

Finger table

N42N8+32

N32N8+16

N21N8+8

N14N8+4

N8+2 N14

N14N8+1

m=6

Pure Peer-to-Peer is the most adaptedNeeds to avoid the flooding problem

Page 155: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Contribution & Positioning

32

Grid Applications & Portals

Grid Programming

Grid Middleware Infrastructure

Grid Fabric

Branch-and-Bound API

Page 156: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Contribution & Positioning

32

Grid Applications & Portals

Grid Programming

Grid Middleware Infrastructure

Grid Fabric

Branch-and-Bound API

P2P Infrastructure

Page 157: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

The Peer-to-Peer Infrastructure [CMST06]

33

Page 158: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

The Peer-to-Peer Infrastructure [CMST06]

33

Pure Peer-to-Peer overlay networkUsing it as Grid middleware infrastructure

The proposed solution to avoid the flooding problem:3 node-request protocols:

1 node: Random walk algorithmn nodes: Breadth-First-Search (BFS) algorithm with acknowledgementmax nodes: BFS without acknowledgement

Best-effort

Page 159: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

INRIA Sophia P2P Desktop GridNeed to validate the infrastructure with desktops

260 desktops at INRIA Sophia lab

No disturbing normal users:

running in low priority

working schedules:

24/24 ≈ 50 machines (INRIA-2424)

night/weekend ≈ 260 machines (INRIA-ALL)

34

Page 160: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

INRIA Sophia P2P Desktop GridNeed to validate the infrastructure with desktops

260 desktops at INRIA Sophia lab

No disturbing normal users:

running in low priority

working schedules:

24/24 ≈ 50 machines (INRIA-2424)

night/weekend ≈ 260 machines (INRIA-ALL)

34

Deployed our P2P infrastructure as permanentDesktop Grid at INRIA Sophia

Page 161: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

INRIA Sophia P2P Desktop GridNeed to validate the infrastructure with desktops

260 desktops at INRIA Sophia lab

No disturbing normal users:

running in low priority

working schedules:

24/24 ≈ 50 machines (INRIA-2424)

night/weekend ≈ 260 machines (INRIA-ALL)

34

Deployed our P2P infrastructure as permanentDesktop Grid at INRIA Sophia

INRIA Sophia Antipolis - Desktop Machines Network

Desktop Machines - INRIA-2424 Desktop Machines - INRIA-All

Aquaintance

INRIA Sub Network

INRIA Sub Network

INRIA Sub Network

INRIA Sub Network

Page 162: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Long Running Experiments with the P2P Desktop Grid

Context: ETSI Grid Plugtests contest ➟ n-queens

n-queens:

embarrassingly parallel / CPU intensive / master-worker

35

Page 163: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Long Running Experiments with the P2P Desktop Grid

Context: ETSI Grid Plugtests contest ➟ n-queens

n-queens:

embarrassingly parallel / CPU intensive / master-worker

35

How many solution for 25 queens ?

Page 164: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

n-queens Experiment Results

World Record [Sloane Integers Sequence A000170]

What we learn from this experiments:

validate the workability of the infrastructure

validate the robustness of the infrastructure

hard to forecast machine's performances36

Total # of Tasks 12,125,199

Task Computation ≈ 138''

Computation Time ≈ 185 days

Cumulated Time ≈ 53 years

# of Desktop Machines 260

Total of Solution Found 2,207,893,435,808,352≈ 2 quadrillions

Page 165: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Agenda

Context, Problem, and Related Work

Contributions

Branch-and-Bound Framework for Grids

Desktop Grid with Peer-to-Peer

Mixing Desktops & Clusters

Perspectives & Conclusion

37

Page 166: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Mixing Desktops & Clusters [PARCO07]

38

Grid'BnBParallel BnB Framework for Grid

P2P Infrastructureas Grid Midlleware

Page 167: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Mixing Desktops & Clusters [PARCO07]

38

Grid'BnBParallel BnB Framework for Grid

P2P Infrastructureas Grid Midlleware

API for solving COPs &Dynamic Grid Infrastructure

+

Page 168: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Mixing Desktops & Clusters [PARCO07]

38

Grid'BnBParallel BnB Framework for Grid

P2P Infrastructureas Grid Midlleware

API for solving COPs &Dynamic Grid Infrastructure

+

Validate by experiments on a Grid of Desktops & Clusters

Page 169: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Mixing Desktops & Clusters [PARCO07]

38

Grid'BnBParallel BnB Framework for Grid

P2P Infrastructureas Grid Midlleware

API for solving COPs &Dynamic Grid Infrastructure

+

Validate by experiments on a Grid of Desktops & Clusters

New Problems:firewalls ➟ forwardersharing clusters with the P2P infrastructure

Page 170: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Sharing Cluster's Nodes with P2P

39

Host

shared node

Peer

Peer sharing local node

Page 171: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Sharing Cluster's Nodes with P2P

39

Host

shared node

Peer

Peer sharing local node

Peer sharing remote nodes

ClusterHost

shared node

Host

shared node

Host

shared node

Host

shared node

...

Host

Peer

Page 172: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Testbed

40

G5K PlatformINRIA Sophia - Desktops

Orsay GDX

Sophia

Lyon

Toulouse

Page 173: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Testbed

40

G5K PlatformINRIA Sophia - Desktops

Orsay GDX

Sophia

Lyon

Toulouse

G5K PlatformINRIA Sophia - Desktops

Orsay GDX

Sophia

Lyon

Toulouse

Peer

Peer Peer

Peer

Node Node

NodeNode

Peer

Node

Page 174: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Testbed

40

G5K PlatformINRIA Sophia - Desktops

Orsay GDX

Sophia

Lyon

Toulouse

G5K PlatformINRIA Sophia - Desktops

Orsay GDX

Sophia

Lyon

Toulouse

Peer

Peer Peer

Peer

Node Node

NodeNode

Peer

Node

G5K PlatformINRIA Sophia - Desktops

Orsay GDX

Sophia

Lyon

Toulouse

Peer

Peer Peer

Peer

Peer

Peer

Peer

Peer

Node

Node

Node

Node

Node

Node

Node

Node

Node

Node

Node

Node

Node

Node

Node

Node Node

NodeNode

RMI/SSH

Page 175: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Large-Scale Experiments

Goal: validate the infrastructure by experiments

With n-queens:

no communication between workers

test the workability of the infrastructure

With flow-shop:

communications between workers

test solving COPs

41

Page 176: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

42

1007

859

572

380

250

0 60 120 180 240 300

238,68

190,63

63,68

39,01

24,49

NQueens with n=22

Time in minutes

Tota

l N

um

ber

of C

PU

s for

Experim

enta

tions

1007

859

572

380

250

0 210 420 630 840 1!050

50

298

302

70

70

140

175

176

156

80

80

80

250

240

247

305

349

INRIA-ALL G5K Lyon

G5K Sophia G5K Bordeaux

G5K Orsay G5K Toulouse

Total !of CPUs

Lyon

Sophia

Orsay

Bordeaux

Toulouse

Sophia

Sophia

Sophia

Lyon

Lyon Bord. Orsay

N-Queens Results

Page 177: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

43

628

346

321

313

220

201

80

0 21,667 65,000 108,333

125,55

61,31

86,19

56,03

83,73

59,14

61,15

Flow-Shop 17 jobs / 17 machines

Time in minutes

Tota

l N

um

ber

of C

PU

s for

Experim

enta

tions 628

346

321

313

220

201

80

0 140 280 420 560 700

14218680

256

290

90

62

146

163

173

116

5624

55

57

57

58

56

42

INRIA-2424 G5K Lyon

G5K Sophia G5K Bordeaux

G5K Orsay G5K Nancy

G5K Rennes G5K Toulouse

Total # of CPUs

Lyon

Sophia

Sophia

Sophia

Sophia

Orsay

Orsay

Bordeaux

Bord.Toulouse

RennesNancy

Flow-Shop Results

Page 178: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

43

628

346

321

313

220

201

80

0 21,667 65,000 108,333

125,55

61,31

86,19

56,03

83,73

59,14

61,15

Flow-Shop 17 jobs / 17 machines

Time in minutes

Tota

l N

um

ber

of C

PU

s for

Experim

enta

tions 628

346

321

313

220

201

80

0 140 280 420 560 700

14218680

256

290

90

62

146

163

173

116

5624

55

57

57

58

56

42

INRIA-2424 G5K Lyon

G5K Sophia G5K Bordeaux

G5K Orsay G5K Nancy

G5K Rennes G5K Toulouse

Total # of CPUs

Lyon

Sophia

Sophia

Sophia

Sophia

Orsay

Orsay

Bordeaux

Bord.Toulouse

RennesNancy

Flow-Shop Results

With desktops & clusters speedup anomalies occur more

Page 179: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Mixing - AnalysisN-Queens problem scales well up to 1007 CPUs

349 Desktops + 5 Clusters

Flow-Shop up to 628 CPUs

42 Desktops + 5 Clusters

worse performances than using only clusters (anomalies are more frequents)

Experimented in closed environments -- security

Grid'5000 platform's success ➟ hard for running long exp.44

Page 180: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

P2P as a Meta-Grid infrastructure

45

Page 181: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

P2P as a Meta-Grid infrastructureObservation from Grid'5000:

300 nodes available for 2 minutes

provide best-effort queue

45

Page 182: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

P2P as a Meta-Grid infrastructureObservation from Grid'5000:

300 nodes available for 2 minutes

provide best-effort queue

Idea: take benefit from these nodes [Condor]

by hand: not easy

with the P2P infrastructure:

deploying peers in best-effort queue

45

Page 183: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

P2P as a Meta-Grid infrastructureObservation from Grid'5000:

300 nodes available for 2 minutes

provide best-effort queue

Idea: take benefit from these nodes [Condor]

by hand: not easy

with the P2P infrastructure:

deploying peers in best-effort queue

Result: a permanent P2P infrastructure over Grid'5000

45

Page 184: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

P2P as a Meta-Grid infrastructureObservation from Grid'5000:

300 nodes available for 2 minutes

provide best-effort queue

Idea: take benefit from these nodes [Condor]

by hand: not easy

with the P2P infrastructure:

deploying peers in best-effort queue

Result: a permanent P2P infrastructure over Grid'5000

45

P2P infrastructure as a dynamic Grid middleware

Page 185: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

P2P with N-Queens on Grid'5000

46

Experimentation time in minutes

200

400

600

800

1000

1200

1400#

of

CP

Us

Experimentation: n22.log.40

0 5 10 15 20 25 30 35 40200

400

600

800

1000

1200

1400

1600

1800

# o

f T

ask

s

1384 CPUs - 9 sites# of workers by minutes

tasks computed by minutes

Page 186: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

P2P with N-Queens on Grid'5000

46

Experimentation time in minutes

200

400

600

800

1000

1200

1400#

of

CP

Us

Experimentation: n22.log.40

0 5 10 15 20 25 30 35 40200

400

600

800

1000

1200

1400

1600

1800

# o

f T

ask

s

1384 CPUs - 9 sites# of workers by minutes

tasks computed by minutes

Embarrassingly applications can take benefit fromMeta-Grid Infrastructure

Page 187: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Agenda

Context, Problem, and Related Work

Contributions

Branch-and-Bound Framework for Grids

Desktop Grid with Peer-to-Peer

Mixing Desktops & Clusters

Perspectives & Conclusion

47

Page 188: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Summary

Grid'BnB: a BnB framework for Grids

communication between workers

organizing workers in groups

Grid infrastructure

based on P2P architecture

mixing desktops and clusters

deployed at INRIA Sophia lab

48

Page 189: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

PerspectivesPeer-to-Peer Infrastructure:

Job SchedulerResource Localization (PhD)

Large-Scale Experiments:International Grid: France, Japan, and NetherlandsGrid Pugtests

Deployment: Contracts in Grids (GCM Standard)

Industrialization:P2P ➟ Desktop Resource Virtualization

CPER - P2P1M€/4years to professionalize the INRIA Grid

49

Page 190: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Conclusion

50

Page 191: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Conclusion

Branch-and-Bound for Gridssolve COPshide Grid difficultiescommunication between workers

50

Page 192: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Conclusion

Branch-and-Bound for Gridssolve COPshide Grid difficultiescommunication between workers

50

Page 193: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Conclusion

Branch-and-Bound for Gridssolve COPshide Grid difficultiescommunication between workers

Peer-to-Peer as Grid infrastructuremixing desktops and clusters

50

Page 194: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Conclusion

Branch-and-Bound for Gridssolve COPshide Grid difficultiescommunication between workers

Peer-to-Peer as Grid infrastructuremixing desktops and clusters

50

Page 195: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Conclusion

Branch-and-Bound for Gridssolve COPshide Grid difficultiescommunication between workers

Peer-to-Peer as Grid infrastructuremixing desktops and clusters

Tested and experimented

50

Page 196: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Conclusion

Branch-and-Bound for Gridssolve COPshide Grid difficultiescommunication between workers

Peer-to-Peer as Grid infrastructuremixing desktops and clusters

Tested and experimentedAvailable in open source

50

Page 197: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Conclusion

Branch-and-Bound for Gridssolve COPshide Grid difficultiescommunication between workers

Peer-to-Peer as Grid infrastructuremixing desktops and clusters

Tested and experimentedAvailable in open source

50

➟ Provides framework and infrastructureto hide Grid difficulties

Page 198: Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

[SCCC05] Balancing Active Objects on a Peer to Peer InfrastructureJavier Bustos-Jimenez, Denis Caromel, Alexandre di Costanzo, Mario Leyton and Jose M. Piquer. Proceedings of the XXV International Conference of the Chilean Computer Science Society (SCCC 2005), Valdivia, Chile, November 2005. [HIC06] Executing Hydrodynamic Simulation on Desktop Grid with ObjectWeb ProActiveDenis Caromel, Vincent Cavé, Alexandre di Costanzo, Céline Brignolles, Bruno Grawitz, and Yann Viala. HIC2006: Proceedings of the 7th International Conference on HydroInformatics, Nice, France, September 2006.

[HiPC07] A Parallel Branch & Bound Framework for Grids Denis Caromel, Alexandre di Costanzo, Laurent Baduel, and Satoshi Matsuoka. Grid’BnB: HiPC’07, Goa, India, December 2007.

[CMST06] ProActive: an Integrated platform for programming and running applications on grids and P2P systemsDenis Caromel, Christian Delbé, Alexandre di Costanzo, and Mario Leyton. Journal on Computational Methods in Science and Technology, volume 12, 2006.

[PARCO07] Peer-to-Peer for Computational Grids: Mixing Clusters and Desktop MachinesDenis Caromel, Alexandre di Costanzo, and Clément Mathieu. Parallel Computing Journal on Large Scale Grid, 2007. [FGCS07] Peer-to-Peer and Fault-tolerance: Towards Deployment-based Technical ServicesDenis Caromel, Alexandre di Costanzo, and Christian Delbé. Journal Future Generation Computer Systems, to appear, 2007.

and Workshops, Master report, ...