an effective framework for handling recoverable temporal violations in scientific workflows

28
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows Xiao Liu 1 , Zhiwei Ni 2 , Zhangjun Wu 2 , Dong Yuan 1 , Jinjun Chen 1 , Yun Yang 1 1 SUCCESS (Centre for Computing and Engineering Software Systems), Swinburne University of Technology Melbourne, Australia 2 Institute of Intelligent Management, Hefei University of Technology

Upload: ronni

Post on 27-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows. Xiao Liu 1 , Zhiwei Ni 2 , Zhangjun Wu 2 , Dong Yuan 1 , Jinjun Chen 1 , Yun Yang 1 1 SUCCESS ( Centre for Computing and Engineering Software Systems ), Swinburne University of Technology - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

Xiao Liu1, Zhiwei Ni2, Zhangjun Wu2, Dong Yuan1, Jinjun Chen1, Yun Yang1

1SUCCESS (Centre for Computing and Engineering Software Systems), Swinburne University of Technology

Melbourne, Australia2Institute of Intelligent Management, Hefei University of Technology

Hefei, China

Page 2: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

Outline

> Background

– Workflow Technology Group

– SwinDeW Family, SwinGrid, SwinCloud

> Brief Overview: Workflow Temporal QoS Support

> Handling Temporal Violations in Scientific Workflows

– Problem Analysis

– An Effective Light-Weight Handling Framework

– Two-Stage Local Workflow Rescheduling Strategy

> Evaluation

> Summary

2

Page 3: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

Workflow Technology Group Overview

> WT group is a part of SUCCESS (Centre for Computing and Engineering Software Systems), a Tier-1 university research centre at Swinburne University of Technology. Our group conducts research into workflow technologies for complex software systems and services including peer-to-peer, grid, and cloud computing based e-science, e-business, transactional and inter-organisational workflows.

3

Leader:Prof Yun Yang

Visitors (7-8/09):Prof Lee OsterweilProf. Lori Clarke

Researchers:Dr Jinjun Chen (Senior Lecture)Xiao Liu (PostDoc)Dong Yuan (PhD)Gaofeng Zhang (PhD)Wenhao Li (PhD)

Dahai Cao (PhD)Xuyun Zhang (PhD)

Others:Prof Ryszard KowalczykProf Chengfei Liu

Dr Jun Yan (Wollongong)Prof Hai Jin (HUST)Prof Mingshu Li (ISCAS)Prof Qing Wang (ISCAS)Prof Zhiwei Ni (HFUT)Prof Jinpeng Huai (BUAA)

Page 4: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

SwinDeW Family

> SwinDeW – Swinburne Decentralised Workflow- foundation prototype based on p2p

– SwinDeW – past

– SwinDeW-A (for Agents) – ARC DP06

– SwinDeW-G (for Grid) – past

– SwinDeW-V (for Verification) – current (ARC DP)

– SwinDeW-C (for cloud) – current (ARC LP)

– Others: SwinDeW-B / -S / -P / -G – past

> Current Projects:

– ARC DP110101340, Cost effective storage of massive intermediate data in cloud computing applications, Duration: 2011-2013

– ARC LP0990393, Novel cloud computing based on workflow technology for managing large numbers of process instances, Duration: 2010-2012.

4

Page 5: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

SwinGrid to SwinCloud

5

Swinburne Computing Facilities

Astrophysics Supercomputer

VMware

Cloud Simulation Environment

Data Centres with Hadoop

· GT4· SuSE Linux

Swinburne CS3

…...

…...

· GT4· CentOS Linux

Swinburne ESR

…...

…...

· GT4· CentOS Linux

Activity

Workflow Execution

UKVPAC

HongKong

SwinburneCS3

· SwinDeW-G· GT4· CentOS Linux

BeihangCROWN· SwinDeW-G· CROWN· Linux

SwinburneESR

· SwinDeW-G· GT4· CentOS Linux

AstrophysicsSupercomputer

· SwinDeW-G· GT4· SuSE Linux

PfC

na 1na

2na

3na 4na

5na 6na Na

ma 1ma

2ma

3ma 4ma

5ma 6ma Ma

Amazon Data Centre

Google Data Centre

Microsoft Data Centre

SwinDeW-G Grid Computing

Infrastructure

Commercial Cloud

Infrastructure

VMVMVM VM VMVMVM VMVMVMVMVM

……..

……..

……..Application

Layer

Platform Layer

Unified Resource

Layer

Fabric Layer

SwinCloud……..

VM

SwinDeW-C Peer

SwinDeW-C Coordinator Peer

Page 6: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

Outline

> Background

– Workflow Technology Group

– SwinDeW Family, SwinGrid, SwinCloud

> Brief Overview: Workflow Temporal QoS Support

> Handling Temporal Violations in Scientific Workflows

– Problem Analysis

– An Effective Light-Weight Handling Framework

– Two-Stage Local Workflow Rescheduling Strategy

> Evaluation

> Summary

6

Page 7: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

Scientific Workflows

> Scientific Workflow often underlies many large-scale complex e-science applications such as climate modeling, astrophysics, structural biology and chemistry, earth quake simulation and disaster recovery.

> Scientific workflows are usually deployed in distributed high performance computing infrastructures such as cluster, grid and cloud.

> Compared with conventional business workflows, most scientific workflow are more data and/or computation intensive, less human interaction, large scale, complex process structures.

Page 8: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

Temporal QoS Support for Scientific Workflows

> Motivation: most e-science applications are time constrained with global temporal constraints (deadlines) and local temporal constraints (milestones) to achieve some pre-defined goals on schedule.

> Basic requirements: automation and cost-effectiveness.

> Challenges: highly dynamic system environments, changing process structures, charge for the usage of resources

> Solution: A Novel Probabilistic Temporal Framework and Its Strategies for Cost-Effective Delivery of High QoS in Scientific Cloud Workflow Systems [PhD Thesis - Xiao Liu]

Page 9: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

Lifecycle Support of Temporal QoS

Page 10: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

Lifecycle Support of Temporal QoS

> At workflow build-time modeling stage

– Component 1: temporal constraint setting

• Forecasting activity durations [eScience08], [JSS10b]

• Setting both coarse-grained and fine-grained temporal constraints [BPM08], [CCPE09], [JCSS10]

– Component 2: temporal consistency monitoring

• Temporal checkpoint selection [ICSE08], [TAAS07]

• Temporal verification [CCPE07], [ToSEM09]

– Component 3: temporal violation handling

• Temporal violation handling point selection [TSE]

• Temporal violation handling [CCGrid], [JSS10a], [TSE], [ICPADS]

Page 11: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

Outline

> Background

– Workflow Technology Group

– SwinDeW Family, SwinGrid, SwinCloud

> Brief Overview: Workflow Temporal QoS Support

> Handling Temporal Violations in Scientific Workflows

– Problem Analysis

– An Effective Light-Weight Handling Framework

– Two-Stage Local Workflow Rescheduling Strategy

> Evaluation

> Summary

11

Page 12: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

Problem Analysis

> Basic requirements: automation and cost-effectiveness

> 1) How to define fine-grained recoverable temporal violations.

– Define statistical recoverable and non-recoverable temporal violations, to avoid heavy-weight exception handling strategies and facilitate light-weight ones

– Divide fine-grained recoverable temporal violations, to facilitate the choice of different handling strategies with different capability (higher capability, higher cost)

> 2) Which light-weight effective exception handling strategies to be facilitated.

– Employ or design a set of light-weight handling strategies, from low capability to high capability (low cost to high cost)

Page 13: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

An Effective Light-Weight Handling Framework

> Three levels of temporal violations

– Level I, Level II and Level III

> Corresponding three levels of temporal violation handling strategies

– TDA, ACOWR and TDA+ACOWR

Page 14: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

Three Levels of Handling Strategies

> TDA (Time Deficit Allocation) [CCPE07]

– TDA is to actively propagate small time deficits to the subsequent workflow activities so that they may be compensated by their saved execution time.

> ACOWR (Ant Colony Optimisation based Workflow Rescheduling) [CCGrid10]

– Based on our general two-stage local workflow rescheduling strategy

– Using ACO as the metaheuristic algorithm

> TDA+ACOWR (the hybrid strategy of TDA and ACOWR)

– One time TDA and multiple times of ACOWR (normally smaller than 3)

Page 15: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

A General Two-Stage Workflow Local Rescheduling Strategy

> Handling temporal violations with workflow rescheduling

> Key objective: reduce or ideally remove the time deficit at the current checkpoint, i.e. to reduce the execution time of the subsequent activities after the checkpoint in the violated workflow segment as much as possible

> Requirement 1: fighting good balance between time deficit compensation and the completion time of other activities (workflow activities and general tasks, with or without temporal constraints) – from the overall makespan perspective

> Requirement 2: utilising available resources in the system rather than recruiting additional resources – from the overall cost perspective

15

Page 16: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

Integrated Task Resource List

16

Page 17: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

17

Pseudo-code for An Abstract Strategy

Page 18: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

Outline

> Background

– Workflow Technology Group

– SwinDeW Family, SwinGrid, SwinCloud

> Brief Overview: Workflow Temporal QoS Support

> Handling Temporal Violations in Scientific Workflows

– Problem Analysis

– An Effective Light-Weight Handling Framework

– Two-Stage Local Workflow Rescheduling Strategy

> Evaluation

> Summary

18

Page 19: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

Evaluation

> Performance analysis and comparison (with GA) for ACOWR

– Optimisation on Total Makespan

– Optimisation on Total Cost

– Time Compensation on Violated Workflow Segment

– CPU Time

> Effectiveness evaluation of the three-level handing framework

– Violation Rate of Global Temporal Constraints and Local Temporal Constraints

– Cost Analysis

Page 20: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

Optimisation on Total Makespan

20

Page 21: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

Optimisation on Total Cost

21

Page 22: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

Time Compensation on Violated Workflow Segment

22

Page 23: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

CPU Time

23

Page 24: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

Experiment Results on Temporal Violation Rates

24

Page 25: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

Cost Analysis

Page 26: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

Outline

> Background

– Workflow Technology Group

– SwinDeW Family, SwinGrid, SwinCloud

> Brief Overview: Workflow Temporal QoS Support

> Handling Temporal Violations in Scientific Workflows

– Problem Analysis

– An Effective Light-Weight Handling Framework

– Two-Stage Local Workflow Rescheduling Strategy

> Evaluation

> Summary

26

Page 27: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

Summary

> Temporal QoS Support is Critical in e-Science Applications

> Temporal Violation Handling in Scientific Workflows

– Automatic, Cost-Effective

– Level I, Level II and Level III

– TDA, ACOWR, TDA+ACOWR

> A Two-Stage Workflow Local Rescheduling Strategy

• ACO, GA, PSO, many other metaheuristics

> Future Work

– Data movement cost

– More scheduling algorithms

27

Page 28: An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows

X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China

The End – Thank You!

> Any questions or comments?

> Email: [email protected]

> Website: http://www.ict.swin.edu.au/personal/xliu/

> An extension of this paper, titled “A Novel General Framework for Automatic and Cost-Effective Handling of Recoverable Temporal Violations in Scientific Workflow Systems,” has been accepted by Journal of Systems and Software (JSS), http://dx.doi.org/10.1016/j.jss.2010.10.027.

28