mw: a framework to support master worker applications

27
MW: A framework to support Master Worker Applications Sanjeev R. Kulkarni Computer Sciences Department University of Wisconsin-Madison [email protected]

Upload: gent

Post on 14-Jan-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Sanjeev R. Kulkarni Computer Sciences Department University of Wisconsin-Madison [email protected]. MW: A framework to support Master Worker Applications. Outline of the talk. Introduction to MW Architecture of MW How to use MW? Records shattered by MW! Extensions. What is MW?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MW: A framework to support Master Worker Applications

MW: A framework to support Master Worker Applications

Sanjeev R. KulkarniComputer Sciences Department

University of [email protected]

Page 2: MW: A framework to support Master Worker Applications

2www.cs.wisc.edu/condor

Outline of the talk

› Introduction to MW

› Architecture of MW

› How to use MW?

› Records shattered by MW!

› Extensions

Page 3: MW: A framework to support Master Worker Applications

3www.cs.wisc.edu/condor

What is MW?

› MW = Master Worker

› Object Oriented, Fault Tolerant framework for Master-Worker Applications

› Can run on a variety of resource managers like Condor, PVM, Globus, ...

Page 4: MW: A framework to support Master Worker Applications

4www.cs.wisc.edu/condor

MW

› Object Oriented MW is a set of C++ base classes Users write a few virtual functions

› Fault Tolerant Handles workers joining/leaving Handles suspended/resumed workers Checkpointing

Page 5: MW: A framework to support Master Worker Applications

5www.cs.wisc.edu/condor

MW Framework

Application

MWLayer

Resource Management and Communication Layer

E.g. Condor, PVM, Globus,...

Page 6: MW: A framework to support Master Worker Applications

6www.cs.wisc.edu/condor

Why use MW?

› Handles communication layer

› Handles resource Management Useful especially in an opportunistic

environment like Condor

› Same application code can run on various resource managers

Page 7: MW: A framework to support Master Worker Applications

7www.cs.wisc.edu/condor

MW Layer

› Three main entities MWDriver

• corresponds to the (Tyrant) Master MWWorker

• corresponds to the (exploited) Worker MWTask

• The work itself!

Page 8: MW: A framework to support Master Worker Applications

8www.cs.wisc.edu/condor

MWDriver: The Master

› Setup

› Manages workers joining/leaving

› Deals with HostSuspend/HostDelete

› Maintains lists of tasks ToDo, Done, Running

› Acts on completed tasks

› Checkpointing

Page 9: MW: A framework to support Master Worker Applications

9www.cs.wisc.edu/condor

MWWorker: The Worker

› Executes the given task Unpacks the task got from Master Executes the task Packs results and sends back to

Master

Page 10: MW: A framework to support Master Worker Applications

10www.cs.wisc.edu/condor

MWTask: The Work

› Definition of a Unit of work

› Holds work to be done and results

› Follows the path Master-Worker-Master

Page 11: MW: A framework to support Master Worker Applications

11www.cs.wisc.edu/condor

Master

Workers

Running

To DoT1 T2 T3 ...

Global Data

Done

Page 12: MW: A framework to support Master Worker Applications

12www.cs.wisc.edu/condor

Master

Workers

Running

To DoT2 T3 T4 ...

Global Data

Wid 1

W1

T1

Done

Page 13: MW: A framework to support Master Worker Applications

13www.cs.wisc.edu/condor

Master

Workers

Running

To DoT5 T6 T7 ...

Global Data

Wid 1

W1

T1 T2 T3 T4

Wid 2 Wid 3

Wid 4

W2 W3

W4

Done

Page 14: MW: A framework to support Master Worker Applications

14www.cs.wisc.edu/condor

Master

Workers

Running

To DoT6 T7 T8 ...

Global Data

Wid 1

W1

T1 T2 T5 T4

Wid 2 Wid 3

Wid 4

W2 W3

W4

T3Done

Page 15: MW: A framework to support Master Worker Applications

15www.cs.wisc.edu/condor

Master

Workers

Running

To DoT2 T6 T7 ...

Global Data

Wid 1

W1

T1 T5 T4

Wid 2 Wid 3

Wid 4

W2 W3

W4

T3Done

Page 16: MW: A framework to support Master Worker Applications

16www.cs.wisc.edu/condor

Intelligent Scheduling of Tasks

› Some tasks may be harder

› Some machines may be faster

› Task ordering based on user defined MWKey

› Machine ordering based on Resource Manager information e.g. Condor ClassAds

Page 17: MW: A framework to support Master Worker Applications

17www.cs.wisc.edu/condor

Checkpointing

› Save master state in case of failure

› Automatic restart from checkpoint file in case of master failure

› User controlled checkpoint frequency

› Users need to implement two additional functions to take advantage of these.

Page 18: MW: A framework to support Master Worker Applications

18www.cs.wisc.edu/condor

How can you use MW?

› MWDriver’s virtual functions get_user_info() - Processes arguments

and does basic setup setup_initial_tasks() - Fills in the ToDo list pack_worker_init_data() - packs init data

for worker act_on_completed_tasks() - called every

time a task completes. Optional.

Page 19: MW: A framework to support Master Worker Applications

19www.cs.wisc.edu/condor

Using MW (cont.)

› MWWorker unpack_init_data() - unpack the init data

sent by the master execute_task() - execute a task and fill in

the results

› MWTask pack_work(), unpack_work() pack_results(), unpack_results()

Page 20: MW: A framework to support Master Worker Applications

20www.cs.wisc.edu/condor

Resource Management and Communication

Layer› Presently two implementations

exist MW-PVM

• Communication :PVM• Resource Manager : Condor-PVM

MW-File• Communication : Files• Resource Manager : Condor

Page 21: MW: A framework to support Master Worker Applications

21www.cs.wisc.edu/condor

MW-Independent

› Single process version master single worker

› sends and receives become memcpy

› No changes in source!

› Why? Faster debugging

Page 22: MW: A framework to support Master Worker Applications

22www.cs.wisc.edu/condor

Data Management

› Exploitation of available memory as compared to CPU cycles

› Aims to reduce access to storage site by caching data on network

› Aimed at applications involving extremely large data sets

Page 23: MW: A framework to support Master Worker Applications

23www.cs.wisc.edu/condor

Data Manager› Partitions large data sets into chunks

› Workers work on chunks allocated by manager

Data Manager

W1

W2

W3

Storage Device Memory

Page 24: MW: A framework to support Master Worker Applications

24www.cs.wisc.edu/condor

World Record Spree!!

› Stochastic Linear Programming “Record breaking” problem of over

8.5M rows and 35M columns solved

› Quadratic Assignment Problem nug27 in a little more than two days!

› Jeff will talk more in his talk

Page 25: MW: A framework to support Master Worker Applications

25www.cs.wisc.edu/condor

Extensions

› More Resource Manager ports Globus A thread based version for SMPs <put your suggestions here>

Page 26: MW: A framework to support Master Worker Applications

26www.cs.wisc.edu/condor

Scalability Issues

› Scales linearly with respect to the number of workers

› OK for 200 workers but for 1000?

› Solution Hierarchical MW?

Page 27: MW: A framework to support Master Worker Applications

27www.cs.wisc.edu/condor

More about MW

› http://www.cs.wisc.edu/condor/mw

› Demos of MW in action using Condor pool tomorrow in Room 3397