starting work ow tasks before they’re...

Starting Workflow TasksBefore They’re Ready

Wladislaw Gusew, Bjorn Scheuermann

Computer Engineering Group, Humboldt University of Berlin

Agenda

I Introduction

I Execution semantics

I Methods and tools

I Simulation results

I Experimental results

I Conclusion

1 / 21

Big data in research

2 / 21

Scientific workflow example

I Directed Acyclic Graph(DAG)

I Executed on distributedsystems

I Aggregation and broadcasttypes of tasks

I Demanding for networkresources

3 / 21

Execution semantics

4 / 21

Execution semantics

I But in reality resources are limited

I Execute only a subset of parent tasks concurrently(insufficient number of workers)

I Congestion of network (all parent tasks have the same priority)

4 / 21

Example execution

5 / 21

Example execution

I Network congestion can slow down processing even further(effects of data losses at the transport protocol layer)

I High delay to the start of the aggregation task

I Low performance andhigh execution costs (e.g., in computation clouds)

5 / 21

What can we do to improve this?

6 / 21

What can we do to improve this?

List of actions:

1. Obtain information on task’s input characteristics

2. Refine the workflow and inform the execution engine

3. Let the aggregation task ”feel comfortable” in changed setting

6 / 21

Obtaining input characteristics

1. Annotations to workflows

2. Manual code review

3. Automated profiling

7 / 21

Automated profiling

I Operating system instrumentation tool

I Enables interception of system calls(file open, read/write, file close)

I Record and evaluate logfiles withtraces of conducted file accesses.

0

0.5

1

1.5

2

2.5

3

0 0.5 1 1.5 2 2.5 3

Read accesses [M

B]

Execution progress [108 CPU cycles]

Reads by mAdd in a small workflow

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 2 4 6 8 10 12 14 16 18

Read accesses [M

B]

Execution progress [108 CPU cycles]

Reads by mAdd in a medium sized workflow

8 / 21

Refining workflow by transforming DAG

9 / 21

Realizing virtual task split

I Real task is transparently wrapped

I FUSE enables the setup of a virtualFile system in USEr space

I Access to input files is performedthrough our wrapper

I Wrapper is responsible for maintainingthe correct execution logic

10 / 21

Evaluation with the Montage workflow

11 / 21

Simulating workflow execution

I Java-based simulation framework for scientific workflows

I Simulates an execution on a Pegasus/HTCondor stack

I Use provided Montage workflows with 25, 50, 100, 1000 tasks

I Python script conducted DAG transformation of DAX files

I Network configured as bottleneck (by bandwidth limitation)

W. Chen and E. Deelman, ”WorkflowSim: A toolkit for simulating scientific workflowsin distributed environments,” in eScience’12.

12 / 21

Simulation results

13 / 21

Variation of number of tasks

1

10

100

1000

25 50 100 1000

Total workflow runtime (log.) [s]

Number of tasks

Simulation results for 50 workers and max-min

Normal Split

15% 19%25%

31%

14 / 21

Variation of workers

15 / 21

Variation of workers

100

150

200

250

300

350

400

450

5 10 50 100

Total workflow runtime [s]

Number of workers

Simulation results for Montage100 and min-min

Normal Split

10%

14%

26% 25%

16 / 21

Variation of scheduling algorithms

17 / 21

Variation of scheduling algorithms

0

50

100

150

200

250

300

350

Min-minMax-min

Round-robin

HEFTDHEFT

Random


Scheduling algorithm

Simulation results for Montage100 on 100 workers

Normal Split

25% 27% 28% 25%

17% 34%

18 / 21

Evaluation in a computing cluster

I Small cluster of up to 10 compute nodes

I Intel i7 CPU@ 2.5GHz, 8GB RAM, connected to commonnetwork switch with 1Gbit/s

I Execute Montage 133 workflow in Pegasus/HTCondor

I Network bandwidth was limited on application layer to10Mbit/s

I 10 repetitions, mean values with 95% confidence intervals

19 / 21

Measurement results

0

20

40

60

80

100

120

140

160

180

200

1 2 3 4 5 6 7 8 9 10


Number of computing nodes

Computing cluster results for 1...10 workers

Original Montage133Transformed Montage133

20 / 21

Conclusion

I Many ”legacy” workflows exist which are executed with classicsemantics

I Our approach is applicable to aggregation tasks that are oftenthe most time intensive tasks in a workflow

I By using DAG transformation, no changes to taskimplementations and execution engines are required

I Simulation and real experiment show that performance can beimproved by up to 15%

I Potential of outperforming the original workflow grows withincreasing #workers and #tasks

21 / 21

starting work ow tasks before they’re...

Documents