the ibis project: simplifying grid programming & deployment henri bal [email protected] vrije...

48
The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal [email protected] Vrije Universiteit Amsterdam

Post on 21-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

The Ibis Project:Simplifying Grid Programming &

Deployment

Henri [email protected]

Vrije Universiteit Amsterdam

Page 2: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam
Page 3: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

The ‘Promise of the Grid’

Efficient and transparent (i.e. easy-to-use) wall-socket computing over a distributed set of resources [Sunderam ICCS’2004, based on Foster/Kesselman]

Page 4: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Parallel computing on grids

● Mostly limited to● trivially parallel applications

● parameter sweeps, master/worker● applications that run on one cluster at a

time● use grid to schedule application on a suitable

cluster

● Our goal: run real parallel applications on a large-scale grid, using co-allocated resources

Page 5: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Efficient wide-area algorithms

● Latency-tolerant algorithms with asynchronous communication● Search algorithms (Awari-solver [CCGrid’08])● Model checkers (DiVinE [PDMC’08])

● Algorithms with hierarchical communication● Divide-and-conquer● Broadcast trees

● …..

Page 6: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

● Performance & scalability

● Heterogeneous● Low-level & changing

programming interfaces

● writing & deploying grid applications is hard

Reality: ‘Problems of the Grid’

● Connectivity issues

● Fault tolerance

● Malleability

Wide-Area Grid Systems

UseUserr

!

Page 7: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

The Ibis Project

● Goal:● drastically simplify grid

programming/deployment

● write and go!

Page 8: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Approach (1)

● Write & go: minimal assumptions about execution environment● Virtual Machines (Java) deal with

heterogeneity

● Use middleware-independent APIs● Mapped automatically onto middleware

● Different programming abstractions● Low-level message passing

● High-level divide-and-conquer

Page 9: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Approach (2)

● Designed to run in dynamic/hostile grid environment● Handle fault-tolerance and malleability● Solve connectivity problems automatically

(SmartSockets)

● Modular and flexible: can replace Ibis components by external ones● Scheduling: Zorilla P2P system or external

broker

Page 10: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Global picture

Page 11: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Rest of talk

Satin: divide & conquer

Communication layer (IPL)

SmartSockets

Applications

JavaGATZorilla P2P

Page 12: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Outline

● Grid programming● IPL● Satin● SmartSockets

● Grid deployment● JavaGAT● Zorilla

● Applications and experiments

Page 13: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Ibis Design

Page 14: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Ibis Portability Layer

(IPL)● Java-centric “run-anywhere” library

● Sent along with the application (jar-files)● Point-to-point, multicast, streaming, ….

● Efficient communication● Configured at startup, based on

capabilities (multicast, ordering, reliability, callbacks)

● Bytecode rewriter avoids serialization overhead

Page 15: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Serialization

● Based on bytecode-rewriting ● Adds (de)serialization code to serializable

types● Prevents reflection overhead

during runtimeJava

compilerbytecoderewriter JVM

JVM

JVM

source bytecodebytecode

Page 16: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Membership Model

● JEL (Join-Elect-Leave) model● Simple model for tracking resources,

supports malleability & fault-tolerance● Notifications of nodes joining or leaving● Elections

● Supports all common programming models● Centralized and distributed

implementations● Broadcast trees, gossiping

Page 17: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Programming models

● Remote Method Invocation (RMI)● Group Method Invocation (GMI)● MPJ (MPI Java 'standard')● Satin (Divide & Conquer)

Page 18: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Satin: divide-and-conquer

● Divide-and-conquer isinherently hierarchical

● More general thanmaster/worker

● Cilk-like primitives (spawn/sync) in Java● Supports malleability and fault-tolerance● Supports data-sharing between different

branches through Shared Objects

fib(1) fib(0) fib(0)

fib(0)

fib(4)

fib(1)

fib(2)

fib(3)

fib(3)

fib(5)

fib(1) fib(1)

fib(1)

fib(2) fib(2)cpu 2

cpu 1cpu 3

cpu 1

Page 19: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Satin implementation

● Load-balancing is done automatically● Cluster-aware Random Stealing (CRS)● Combines Cilk’s Random Stealing with

asynchronous wide-area steals

● Self-adaptive malleability and fault-tolerance● Add/remove machines on the fly● Survive crashes by efficient

recomputations/checkpointing

Page 20: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Self-adaptation with Satin

● Adapt #CPUs to level of parallelism● Migrate work from overloaded to idle CPUs● Remove CPUs with poor network connectivity● Add CPUs dynamically when

● Level of parallelism increases● CPUs were removed or crashed

● Can also remove/add entire clusters● E.g., for network problems

[Wrzesinska et al., PPoPP’07 ]

Page 21: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Approach

● Weighted Average Efficiency (WAE):

1/#CPUs * Σ speedi * (1 – overheadi )

overhead is fraction idle+communication time

speedi= relative speed of CPUi (measured periodically)

● General idea:

Keep WAE between Emin (30%) and Emax(50%)

Page 22: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Overloaded network link

● Uplink of 1 cluster reduced to 100 KB/s● Remove badly connected cluster, get new one

Iter

atio

n d

ura

tion

Iteration

Page 23: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Connectivity Problems

● Firewalls & Network Address Translation (NAT) restrict incoming traffic

● Addressing problems● Machines with >1 network interface (IP

address)● Machine on a private network (e.g., NAT)

● No direct communication allowed● E.g., between compute nodes and external

world

Page 24: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

SmartSockets library

● Detects connectivity problems● Tries to solve them automatically

● With as little help from the user as possible● Integrates existing and several new

solutions● Reverse connection setup, STUN, TCP splicing,

SSH tunneling, smart addressing, etc.● Uses network of hubs as a side channel

Page 25: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Example

Page 26: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Example

[Maassen et al., HPDC’07 ]

Page 27: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Overview

JavaGATZorilla P2P

Page 28: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

JavaGAT

● GAT: Grid Application Toolkit● Makes grid applications independent of the

underlying grid infrastructure

● Used by applications to access grid services● File copying, resource discovery, job

submission & monitoring, user authentication

● API is currently standardized (SAGA)● SAGA implemented on JavaGAT

Page 29: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Grid Applications with GAT

GAT Engine

Remote

FilesMonitoring

Info

service

Resource

Management

GridLab Globus Unicore SSH P2P LocalGAT

Grid ApplicationFile.copy(...)

submitJob(...)

gridftp globus

Intelligentdispatching

[van Nieuwpoort et al., SC’07 ]

Page 30: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Zorilla: Java P2P supercomputing

middleware

Page 31: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Zorilla components● Job management

● Handling malleability and crashes

● Robust Random Gossiping● Periodic information exchange between nodes● Robust against Firewalls, NATs, failing nodes

● Clustering: nearest neighbor● Flood scheduling

● Incrementally search for resources at more and more distant nodes

[Drost et al., HPDC’07 ]

Page 32: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Overview

Page 33: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Ibis applications● e-Science (VL-e)

● Brain MEG-imaging● Mass spectroscopy

● Multimedia content analysis● Various parallel applications

● SAT-solver, N-body, grammar learning, …

● Other programming systems● Workflow engine for astronomy (D-grid),

grid file system, ProActive, Jylab, …

Page 34: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Overview experiments

● DAS-3: Dutch Computer Science grid

● Satin applications on DAS-3● Zorilla desktop grid experiment● Multimedia content analysis● High resolution video processing

Page 35: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

DAS-3

272 nodes(AMD Opterons)

792 cores1TB memoryLAN: Myrinet 10G Gigabit

EthernetWAN

(StarPlane): 20-40 Gb/s

OPN

Heterogeneous: 2.2-2.6 GHz Single/dual-

core Delft no

Myrinet

Page 36: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Gene sequence comparison in Satin (on

DAS-3)

Speedup on 1 cluster Run times on 5 clusters

• Divide&conquer scales much better than master-worker

• 78% efficiency on 5 clusters (with 1462 WAN-msgs/sec)

Page 37: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Barnes-Hut (Satin) on DAS-3

• Shared object extension to D&C model improves scalability

• 57% efficiency on 5 clusters (with 1371 WAN-msgs/sec)

Speedup on 1 cluster Run times on 5 clusters

Page 38: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Zorilla Desktop Grid Experiment

● Small experimental desktop grid setup● Student PCs running Zorilla overnight ● PCs with 1 CPU, 1GB memory, 1Gb/s

Ethernet

● Experiment: gene sequence application● 16 cores of DAS-3 with Globus● 16 core desktop grid with Zorilla ● Combination, using Ibis-Deploy

Page 39: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Ibis-Deploy deployment tool

3574 sec 1099 sec

877 sec

• Easy deployment with Zorilla, JavaGAT & Ibis-Deploy

Page 40: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Multimedia content analysis

● Analyzes video streams to recognize objects

● Extract feature vectors from images● Describe properties (color, shape)● Data-parallel task implemented with C+

+/MPI

● Compute on consecutive images● Task-parallelism on a grid

Page 41: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

MMCA application

Client

(Java)

Broker

(Java)

Parallel

Horus

Server

Parallel

Horus

Servers

Servers

(C++)

(any machine world-wide)

(local desk-top machine)IbisIbis

(Java)

(grid)

Page 42: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

MMCA with Ibis● Initial implementation with TCP was unstable● Ibis simplifies communication, fault tolerance● SmartSockets solves connectivity problems● Clickable deployment interface● Demonstrated at many

conferences (SC’07)● 20 clusters on 3 continents, 500-800 cores

● Frame rate increased from 1/30 to 15 frames/sec

[Seinstra et al., IEEE Multimedia’07 ]

Page 43: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

MMCA

‘Most Visionary Research’ award at AAAI 2007, (Frank Seinstra et al.)

Page 44: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

High Resolution Video Processing

● Realtime processing of CineGrid movie data● 3840x2160 (4xHD) @ 30 fps = 1424 MB/sec

● Multi-cluster processing pipeline ● Using DAS-3, StarPlane and Ibis

Page 45: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

CineGrid with Ibis● Use of StarPlane requires no configuration

● StarPlane is connected to local Myrinet network● Detected & used automatically by SmartSockets

● Easy setup of application pipeline● Connection administration of application is simplified by the IPL

election mechanism

● Simple multi-cluster deployment (Ibis-Deploy)● Uses Ibis serialization for high throughput

Page 46: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Summary● Goal: Simplify grid

programming/deployment

● Key ideas in Ibis● Virtual machines (JVM) deal with

heterogeneity● High-level programming abstractions

(Satin)● Handle fault-tolerance, malleability,

connectivity problems automatically (Satin, SmartSockets)

● Middleware-independent APIs (JavaGAT)● Modular

Page 47: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

Acknowledgements Current members Rob van Nieuwpoort Jason Maassen Thilo Kielmann Frank Seinstra Niels Drost Ceriel Jacobs Kees Verstoep

Roelof Kemp Kees van Reeuwijk

Past members John Romein Gosia Wrzesinska Rutger Hofman Maik Nijhuis Olivier Aumage Fabrice Huet Alexandre Denis

Page 48: The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam

More information● Ibis can be downloaded from

● http://www.cs.vu.nl/ibis

● Papers:● Satin [PPoPP’07], SmartSockets [HPDC’07],

Gossiping [HPDC’07], JavaGAT [SC’07],MMCA [IEEE Multimedia’07]

• Ibis tutorials• Next one at CCGrid 2008 (19 May, Lyon)