7th biennial ptolemy miniconference berkeley, ca february 13, 2007 scheduling data-intensive...

20
7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC Davis)

Post on 22-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC

7th Biennial Ptolemy Miniconference

Berkeley, CAFebruary 13, 2007

Scheduling Data-Intensive Workflows

Tim H. Wong, Daniel Zinn, Bertram Ludäscher

(UC Davis)

Page 2: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC

2Ptolemy Miniconference 2007 Daniel Zinn

Outline

Problem motivation Assumptions Cost model Problem formalization Different “simplifications” and their complexity Prototypical Java implementation for Kepler Summary

Page 3: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC

3Ptolemy Miniconference 2007 Daniel Zinn

Motivation: Distributed Execution of Scientific Workflows

Page 4: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC

4Ptolemy Miniconference 2007 Daniel Zinn

Motivation: Distributed Execution of Scientific Workflows

Process a set of data on a set of machines

GOAL:Minimize WF-Execution time!Allocation Problem: Which actors are computed on which hosts?

Page 5: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC

5Ptolemy Miniconference 2007 Daniel Zinn

Assumptions

Arbitrary data size Arbitrary machine speed Arbitrary bandwidth Arbitrary number of inputs Scientific workflow is a DAG (!)

GRID COMPUTING

Page 6: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC

6Ptolemy Miniconference 2007 Daniel Zinn

Cost Model

Communication Time: TC

Function Execution Time: TE

Total Time: TT = TC + TE

Shipping and Handling Problem:Schedule all tasks such that the total time is minimal

Page 7: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC

7Ptolemy Miniconference 2007 Daniel Zinn

Problem Variants and Complexities

Task Handling Problem (THP) Data Shipping Problem (DSP)

Reduction from Task Scheduling Problem [ERLA94]

Reduction from Multiprocessor Scheduling Problem [KA99]

Reduction from 1-Multiterminal Cut

Shipping and Handling Problem (SHP)Communication Cost: Non-uniformFunction Execution Cost: Non-uniformComplexity: NP-complete

Communication Cost: ZeroFunction Execution Cost: Non-uniformComplexity: NP-complete

Communication Cost: Non-uniformFunction Execution Cost: ZeroComplexity: NP-complete

Page 8: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC

8Ptolemy Miniconference 2007 Daniel Zinn

easy-DSP: Uniform Transfer Rate, Uniform Data Size

Given: Directed Acyclic Graph,

Set of Colors Some vertices are already

colored Edge Weight = 1, if two adjacent

vertices are of different colorsEdge Weight = 0, otherwise

TASK: Color the rest of the vertices

such that total weight is minimal!

Cost Model:Minimize TotalShipped Volume!

4

Page 9: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC

9Ptolemy Miniconference 2007 Daniel Zinn

1 - Multi-Terminal CUT

Given: Undirected Graph: G = (V,E) Set of Terminals: S V Edge Weights: 1

TASK: Find a multi-way cut of G with a

minimum number of edges

NP-Complete for more than 3 Terminals!

Minimize #edgesbetween differentterminals!

4

Page 10: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC

10Ptolemy Miniconference 2007 Daniel Zinn

Reduction: 1-MTC <= DSP

4 4

?

DSP 1-MTC

“Order graph Color terminals”

Page 11: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC

11Ptolemy Miniconference 2007 Daniel Zinn

Reduction: 1-MTC <= DSP

4 4

1

11

1

1

1 11

1

?!

DSP 1-MTC

Page 12: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC

12Ptolemy Miniconference 2007 Daniel Zinn

Reduction: 1-MTC <= DSP

4 4

1

11

1

1

1 11

1

!

DSP 1-MTC

Page 13: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC

13Ptolemy Miniconference 2007 Daniel Zinn

NP-Hard, ...But: Need to solve

Greedy Algorithm Dynamic Programing Algorithm

Investigate Approximation Algorithms for MTC/related !

Page 14: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC

14Ptolemy Miniconference 2007 Daniel Zinn

Prototypical Implementation ...

abstractonly somenodes assigned

concreteall nodes assigned

scheduling

Page 15: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC

15Ptolemy Miniconference 2007 Daniel Zinn

Prototypical Implementation ... in Kepler!

Abstract Workflow ...

SCHEDULING

Page 16: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC

16Ptolemy Miniconference 2007 Daniel Zinn

Prototypical Implementation ... in Kepler!

Concrete Workflow ...

Page 17: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC

17Ptolemy Miniconference 2007 Daniel Zinn

Future Work

Use Heuristics about looping to guess multiplicities(then not ACYCLIC any more!)

Investigate approximation algorithms with error guarantees for 1-MTC => try to apply for DSP

ALSO: Relevant for COMAD Workflows:can be “compiled” into a low-level conventional WF

Page 18: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC

18Ptolemy Miniconference 2007 Daniel Zinn

Summary

Bad news Scheduling is hard DSP is hard (for BEST plans)

Good news Finding a quite good plan is easy Greedy/Dynamic Algorithms

Open Problems Approximation Quality of “simple algorithms”? When do they perform badly? Does this occur often in real-life workflows?

Page 19: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC

19Ptolemy Miniconference 2007 Daniel Zinn

References

Page 20: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC

20Ptolemy Miniconference 2007 Daniel Zinn

Thank You. Questions?