the suitability of bsp/cgm model for hpc on clouds · 2012. 6. 30. · euler tour 0 5 10 15 20 25...

25
The suitability of BSP/CGM model for HPC on Clouds Alfredo Goldman, Daniel Cordeiro and Alessandro Kraemer IME - USP Cetraro, Italy, 28/06/2012 quinta-feira, 28 de junho de 2012

Upload: others

Post on 15-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

The suitability of BSP/CGM model for HPC on Clouds

Alfredo Goldman, Daniel Cordeiro and Alessandro KraemerIME - USP

Cetraro, Italy, 28/06/2012

quinta-feira, 28 de junho de 2012

Page 2: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

Great WorkshopHigh Quality Presentations

Amazing location

even without the old elevator

Great face to face contacts

Jogging with Ian Foster

Histories of Steve Wallach

Discussion about flash with Frank Baetke

Talk on teamwork with Natalie Bates

....

quinta-feira, 28 de junho de 2012

Page 3: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

Distributed Systems

Two main conferences

SBRC - Distributed Systems and Networks

30th Edition

1000 participants

SBAC-PAD - Computer Architecture and HPC

24th Edition

Papers in English

2012 Edition in New York

quinta-feira, 28 de junho de 2012

Page 4: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

Back to the WIP

Agenda

Motivation

Previous Experience

Some Related Works

Preliminary Experiments

Future Work

quinta-feira, 28 de junho de 2012

Page 5: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

Motivation

Paper from HP labs

Evaluation of HPC Applications on Cloud

A. Gupta and D. Milojivic

Cloud would be suitable for some HPC apps

quinta-feira, 28 de junho de 2012

Page 6: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

Main Points

On the Cloud

poor network performance / OS noise

can be cost-effective

Clouds are more cost-effective for:

Embarrassingly parallel/tree structured

Applications where comm. cost is hidden by computation

quinta-feira, 28 de junho de 2012

Page 7: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

Other applications ?

Map Reduce

Widely spread with hadoop

Compared to BSP has limitations

(Pace - ICCS, 2012)

How to deal with the Communication ?

Try to “minimize” them...

quinta-feira, 28 de junho de 2012

Page 8: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

Previous Experience

Integrade

www.integrade.org.br

Opportunistic Grid Middleware

With support for Parallel Computing

Bag of Tasks

Either MPI and BSP

quinta-feira, 28 de junho de 2012

Page 9: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

BSP

Bulk Synchronous Parallel

Valiant’90

Model that links software and hardware

Given the machine parameters it is easy to estimate the execution time

quinta-feira, 28 de junho de 2012

Page 10: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

BSP main points

Execution performed in super-steps

Computation and synchronization phases

Two communication mechanisms:

Direct Remote Memory Access (DRMA)

Bulk Synchronous Message Passing (BSPM)

Several existing implementations

BSPLib, Green BSPLib, PUB, BSP-G

quinta-feira, 28 de junho de 2012

Page 11: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

Schematics quinta-feira, 28 de junho de 2012

Page 12: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

Integrade - Checkpointing

Essential in opportunistic environments

Checkpoints are stored periodically

Using BSP

Checkpointing on InteGrade is portable and transparent to the programmer

quinta-feira, 28 de junho de 2012

Page 13: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

CGM

Coarse Grained Model

Theoretical model proposed by Dehne ’93

n data size, p processors with memory O(n/p)

n/p >> p

At each step processors exchange O(n/p) data

Goal: minimize the number of steps

quinta-feira, 28 de junho de 2012

Page 14: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

CGM algorithms

Randomized List Ranking

O(p log n) with high probability

All-Substrings longest common subsequence

O(log p)

Euler Tour

Efficient ways to do the h-relation

more than 10 thousand results on Google Scholar

quinta-feira, 28 de junho de 2012

Page 15: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

Back to BSP

Interest on large graphs

Pregel (2010)

suitable for large-scale graph computing

Vertex centric approach

designed to be

efficient, scalable and fault-tolerant

quinta-feira, 28 de junho de 2012

Page 16: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

Pregel (1/2)

Each process/core is assigned to one vertex

Loop, for each vertex

Receive data from the previous step

Change state

Send data to other vertices

May vote to halt

quinta-feira, 28 de junho de 2012

Page 17: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

Pregel (2/2)

Was applied in clusters with thousands of commodity computers

Applications:

Page Rank

Shortest Path

Bipartite-Matching

quinta-feira, 28 de junho de 2012

Page 18: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

Apache Hama

Apache Hama is a pure BSP computing framework on top of HDFS

For massive scientific computations such as matrix, graph and network algorithms

Computation Engines:

Map Reduce - for matrix computations

BSP, Dryad - for graph computations

quinta-feira, 28 de junho de 2012

Page 19: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

Several others

Apache Giraph

GPS: Graph Processing System

API for global comm., load balancing & distribution

Golden ORB

Phoebus

Bagelquinta-feira, 28 de junho de 2012

Page 20: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

Preliminary results

We have conducted some experiments with two classical graph problems:

Connected Components and Eulerian Path.

With one twist: the MapReduce algorithm only tests if it exists a Eulerian Path and find a single connected component while the BSP computes the path and find all connected components.

quinta-feira, 28 de junho de 2012

Page 21: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

Experimental environment

Private cloud

11 Intel Core Duo 2.66 GHz, 2GBytes, interconnected by a FastEthernet network

The PCs are shared by 33 Virtual Machines

Software used:

For BSP/CGM: mpich2, cgmlib 0.9.5 and NFS.

For MapReduce: sun java 5, hadoop 1.0.1 and HDFS.

quinta-feira, 28 de junho de 2012

Page 22: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

Euler Tour

0 5 10 15 20 25 30 350,0000

500,0000

1000,0000

1500,0000

2000,0000

2500,0000

3000,0000

Euler tour - 1,000 trees and 500,000 nodes

BSP/CGMMapReduce

Number of Virtual Machines

exec

utio

n tim

e (s

)

quinta-feira, 28 de junho de 2012

Page 23: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

Connected Components

0 5 10 15 20 25 30 350,0000

500,0000

1000,0000

1500,0000

2000,0000

2500,0000

Connected Components - 1,000 trees and 500,000 nodes

BSP/CGMMapReduce

Number of Virtual Machines

exec

utio

n tim

e (s

)

quinta-feira, 28 de junho de 2012

Page 24: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

Communication times for BSP

0 5 10 15 20 25 30 350,0000

40,0000

80,0000

120,0000

160,0000

200,0000

BSP communication times for Euler Tour

Average

Number of Virtual Machines

time

(s)

0 5 10 15 20 25 30 350,0000

20,0000

40,0000

60,0000

80,0000

100,0000

120,0000

BSP communication times for Connected Components

Average

Number of Virtual Machines

time

(s)

quinta-feira, 28 de junho de 2012

Page 25: The suitability of BSP/CGM model for HPC on Clouds · 2012. 6. 30. · Euler Tour 0 5 10 15 20 25 30 35 0,0000 500,0000 1000,0000 1500,0000 2000,0000 2500,0000 3000,0000 Euler tour

Future Directions

Explore Scalability

Apply Locality to place the BSP processes

Use partial synchronization

quinta-feira, 28 de junho de 2012