a probabilistic strategy for setting temporal constraints ...€¦ · deadline and several...
TRANSCRIPT
Swinburne University of Technology | CRICOS Provider 00111D | swinburne.edu.au
Swinburne Research Bank http://researchbank.swinburne.edu.au
Liu, X., Chen, J., & Yang, Y. (2008). A probabilistic strategy for setting temporal
constraints in scientific workflows.
Originally published in M. Dumas, M. Reichert, & M.-C. Shan (eds.) Business process management: Proceedings of the 6th International Conference on
Business Process Management, BPM 2008, Milan, Italy, 02–04 September 2008. Lecture notes in computer science (Vol. 5240, pp. 180–195). Berlin: Springer.
Available from: http://dx.doi.org/10.1007/978-3-540-85758-7_15
Copyright © Springer-Verlag Berlin Heidelberg 2008. This is the author’s version of the work, posted here with the permission of the publisher for your personal use. No further distribution is permitted. You may also be able to access the published version from your library. The definitive version is available at http://www.springerlink.com/.
* The initial work was published in the Proc. of 6th International Conference on Business Process
Management (BPM2008), Lecture Notes in Computer Science, vol. 5240, pages 180-195, Sept. 2008
Milan, Italy.
A Probabilistic Strategy for Temporal Constraint Management in
Scientific Workflow Systems*
Xiao Liu1, Zhiwei Ni
2, Jinjun Chen
1, Yun Yang
1
1Faculty of Information and Communication Technologies
Swinburne University of Technology
Hawthorn, Melbourne, Australia 3122
{xliu, yyang, jchen}@swin.edu.au 2Institute of Intelligent Management, School of Management
Hefei University of Technology
Hefei, Anhui, China 230009
Abstract: In scientific workflow systems, it is critical to ensure the timely completion of scientific
workflows. Therefore, temporal constraints, as a type of QoS (Quality of Service) specification, are
usually required to be managed in scientific workflow systems. Specifically, temporal constraint
management includes two basic tasks: setting temporal constrains at workflow build time and
updating temporal constraints at workflow run time. For constraint setting, the current work mainly
adopts user specified temporal constraints without considering system performance. Hence, it may
result in frequent temporal violations which deteriorate the overall workflow execution
effectiveness. As for constraint updating which so far has not been well investigated, it is in fact of
great importance to workflow management tasks such as workflow scheduling and exception
handling. In this paper, with a systematic analysis of the above issues, we propose a probabilistic
strategy for temporal constraint management which utilises a novel probability based temporal
consistency model. Specifically, for constraint setting, a negotiation process between the client and
the service provider is designed to support the setting of coarse-grained temporal constraints and
then automatically derive the fine-grained temporal constraints; for constraint updating, the
probability time deficit/redundancy propagation process is proposed to update run-time fine-
grained temporal constraints when workflow execution is either ahead of or behind the schedule.
The effectiveness of our strategy is demonstrated through a case study on an example scientific
workflow process in our scientific workflow system.
Keywords: Scientific Workflow System, Workflow QoS, Temporal Constraints, Temporal
Constraint Setting, Temporal Constraint Updating, Probabilistic Strategy
1. Introduction
Scientific workflow is a new special type of workflow that often underlies many large-scale complex e-
science applications such as climate modelling, structural biology and chemistry, medical surgery or
disaster recovery simulation [15, 43, 48]. Real world scientific processes normally stay in a temporal
context and are often time constrained to achieve on-time fulfilment of certain scientific or business
targets. Otherwise, the usefulness of its execution results will be severely deteriorated. For example, a
daily weather forecast scientific workflow has to be finished before the broadcasting of the weather
forecast program everyday at, for instance, 6:00pm. Furthermore, due to scientific research
requirements, scientific workflows are usually deployed on high performance computing
infrastructures, e.g. peer-to-peer, cluster, grid and cloud computing, to deal with a huge number of data
intensive and computation intensive activities [4, 17, 27-28, 47, 51]. Therefore, as an important
dimension of workflow QoS (Quality of Service) constraints, temporal constraints are often set to
ensure satisfactory efficiency of scientific workflow executions [9, 12, 16, 53].
In traditional business workflows, workflow systems usually maintain an overall deadline (a global
temporal constraint for the entire workflow instance) and several milestones (local temporal constraints
for some important workflow segments) [1, 16, 20]. Most business workflows involve a lot of tasks
which require the execution and decision making by human resources. Since the performance of human
resources is normally difficult to be predicted and controlled [6, 19], it is neither effective nor realistic
to set too many temporal constraints along the business workflow processes. In real world, most
business workflows are also partially controlled by human managers. Therefore, a user-defined global
- 2 -
deadline and several milestones are normally enough for human managers who can perform dynamical
control over workflow executions to ensure on-time completion based on their own experiences [26] .
In contrast to business workflow systems, scientific workflow systems are designed to be highly
automatic to conduct large scale scientific processes [43]. Instead of human managers, scientific
workflows are controlled by workflow execution engines where predefined scheduling and exception
handling strategies are implemented to control underlying high performance computing resources [8,
13, 39, 44, 50]. For example, in many scientific computing environments such as grid and cloud
computing, resources are shared and competed by many users. Resources such as clusters and
supercomputers usually maintain a job queue of their own and managed by local schedulers rather than
under the full control of specific workflow execution engines outside the organisation [8, 45, 51].
Therefore, to meet specific QoS requirements in scientific workflows, hierarchical scheduling is often
employed [46]. For hierarchical scheduling, the central scheduler in the workflow execution engine is
responsible for controlling the workflow execution based on the global QoS constraint and assigning
workflow segments to local schedulers with local QoS constraints. Each local scheduler is responsible
for scheduling activities in a workflow segments onto one single resource or multiple resources owned
by one organisation. Therefore, to facilitate hierarchical scheduling and many other tasks for delivering
satisfactory temporal QoS in scientific workflow systems, besides a global temporal constraint, a large
number of local temporal constraints are required. In order to maintain these temporal constraints, the
issue of temporal constraint management is brought up in scientific workflow systems.
Specifically, temporal constraint management in scientific workflow systems include two basic tasks:
setting temporal constraints at build time and updating temporal constraints at run time. Here, to
illustrate the requirement for these two tasks, we take workflow temporal verification as an example.
As an important means to delivery satisfactory temporal QoS, many efforts have been dedicated to
workflow temporal verification in recent years. Different approaches for checkpoint selection and
dynamic temporal verification are proposed as scientific workflow functionalities to improve the
efficiency of temporal verification with given temporal constraints [7, 10-12]. However, with the
assumption that temporal constraints are pre-defined, most work focuses on run-time temporal
verification while neglecting the fact that efforts put at run-time will be mostly in vain without build-
time setting of high quality temporal constraints [25]. The reason is obvious since the purpose of
temporal verification is to identify potential violations of temporal constraints to minimise the
exception handling cost. Therefore, if temporal constraints are of low quality themselves, temporal
violations are highly expected no matter how much efforts have been put on temporal verification.
Meanwhile, with the assumption that temporal constraints are unchanged during workflow run-time,
the task of run-time updating temporal constraints is neglected. However, to support the many run-time
functionalities such as temporal verification and exception handing on temporal violations, local
temporal constraints should be updated dynamically according to real activity durations (the global
temporal constraint which serves as a type of QoS contract between clients and service providers
should normally stay unchanged unless new contract is signed). As will be discussed later in Section 2,
both the time deficit (the time delay between the execution time and the temporal constraint) and the
time redundancy (the time saving between the execution time and the temporal constraint) should be
propagated to subsequent activities to support the time related decision making process [49]. Therefore,
setting build-time temporal constraints and updating run-time temporal constraints are the two basic
tasks in temporal constraint management in scientific workflow systems.
Temporal constraints mainly include three types, i.e. upper bound, lower bound and fixed-time. An
upper bound constraint between two activities is a relative time value so that the duration between them
must be less than or equal to it. A lower bound constraint between two activities is a relative time value
so that the duration between them must be greater or equal to it. A fixed-time constraint at an activity is
an absolute time value by which the activity must be completed. As discussed in [12], conceptually, a
lower bound constraint is symmetrical to an upper bound constraint and a fixed-time constraint can be
viewed as a special case of upper bound constraint whose start activity is exactly the start activity of the
whole workflow instance, hence they can be treated similarly. In scientific workflow area, upper bound
constraints are often used as a general case to facilitate research investigation [7, 11]. Therefore, in this
paper, we focus on upper bound constraints only.
In this paper, our target is to investigate and address the issue of temporal constraint management in
scientific workflows. Specifically, as mentioned above, there are two basic tasks for temporal
constraint management: setting temporal constraints and updating temporal constraints. The task of
setting temporal constraints is to assign a set of coarse-grained and fine-grained upper bound
constraints to scientific workflows at workflow build time. Here, coarse-grained constraints refer to
those assigned to the entire workflow instance or workflow sub-processes, while fine-grained
- 3 -
constraints refer to those assigned to individual activities. The task of updating temporal constraints is
to update fine-grained temporal constraints according to real activity durations at workflow run time.
To address the above issues, in this paper, a probabilistic strategy for temporal constraint
management in scientific workflow systems is proposed. Our strategy utilises a novel probability based
temporal consistency model where workflow activity durations are modelled as random variables with
their structure weight. Here, the duration of a specific activity is defined as the time period from the
submission of the activity until its completion [30]. Based on system historic data, the structure weight
of a specific activity is defined according to its contribution to the completion time of the entire
workflow instance. Its value is specified as the choice probability or statistic iteration times associated
with the workflow path where the activity belongs to. The basic idea for the structure weight is
illustrated in Section 3 with the weighted joint distribution of four basic Stochastic Petri Nets [2-3]
based building blocks, i.e. sequence, iteration, parallelism and choice. Our strategy supports an iterative
and interactive negotiation process between the client (e.g. a user) and the service provider (e.g. a
workflow system) through either a time oriented or a probability oriented fashion for setting coarse-
grained upper bound temporal constraints. Thereafter, fine-grained temporal constraints associated with
each activity can be propagated automatically. Our strategy also provides a probability time deficit
propagation process and a probability time redundancy propagation process to update fine-grained
temporal constraints in an automatic fashion at run time. The effectiveness of our strategy is further
demonstrated by a weather forecast scientific workflow in our scientific workflow management system.
The remainder of the paper is organised as follows. Section 2 presents a motivating example and the
problem analysis. Section 3 proposes a novel probability based temporal consistency model which
facilitates our probabilistic strategy for temporal constraint management in scientific workflow
systems. Section 4 presents the negotiation based probabilistic strategy for setting temporal constraints
at build time. Section 5 presents the probability time deficit and time redundancy propagation process
for updating temporal constraints at run time. Section 6 demonstrates both the setting and updating
process by a case study with the motivating example to verify the effectiveness of our strategy. Section
7 introduces the implementation of the strategy in our scientific workflow system. Section 8 presents
the related work. Finally, Section 9 addresses our conclusions and points out the future work.
2. Motivating Example and Problem Analysis
In this section, we introduce a weather forecast scientific workflow to demonstrate the motivation for
temporal constraint management in scientific workflow systems. In addition, two basic requirements
for temporal constraint management are presented.
2.1 Motivating Example
The entire weather forecast workflow contains hundreds of thousands of data intensive and
computation intensive activities [5]. Major data intensive activities include the collection of
meteorological information, e.g. surface data, atmospheric humidity, temperature, clouds area and wind
speed from satellites, radars and ground observatories at distributed geographic locations. These data
files are transferred via various kinds of network. Computation intensive activities mainly consist of
solving complex meteorological equations, e.g. meteorological dynamics equations, thermodynamic
equations, pressure equations, turbulent kinetic energy equations and so forth which require high
performance computing resources. Due to the space limit, it is not possible to present the whole
forecasting process in detail. Here, we only focus on one of its segments for radar data collection. The
graphic notations for Stochastic Petri Nets are illustrated in Figure 1 and the example workflow
segment is depicted in Figure 2.
Figure 1. Graphic notations for Stochastic Petri Nets
The classical Petri Net is a directed bipartite graph. It contains two types of nodes called places
(represent conditions) and transitions (represent events/activities) which are connected via arcs
- 4 -
(represent control flows). Stochastic Petri Net is a type of high-level Petri Net which extends with
timing and probability features [2]. The notations of constraint start and constraint end represent the
start point and the end point of a temporal constraint respectively. The notation of structure weight
represents the structure weight for the duration of an activity as defined in Section 3. The notation of
duration distribution represents the distribution model for the duration of an activity.
For simplicity, we denote these activities in our example scientific workflow segment as 1X to 12X .
The workflow process structures are composed with four Stochastic Petri Nets based building blocks,
i.e. a choice block for data collection from two radars at different locations (activities 41 ~ XX ), a
compound block of parallelism and iteration for data updating and pre-processing
(activities 106 ~ XX ), and two sequence blocks for data transferring (activities 12115 ,, XXX ).
Figure 2. Example scientific workflow segment
It is evident that the duration of these scientific workflow activities are highly dynamic in nature due
to their data complexity and the computation environment. However, to ensure the weather forecast can
be broadcast on time, every scientific workflow instance must be completed within a specific time
duration. Therefore, a set of temporal constraints must be set to monitor the workflow execution time.
For our example workflow segment, to ensure that the radar data can be collected in time and
transferred for further processing, at least one overall upper bound temporal constraint is required.
However, a coarse-grained temporal constraint is not effective enough to control fine-grained workflow
execution time, e.g. the completion time of each workflow activity. It is evident that without the
support of local enforcements, the overall workflow duration can hardly be guaranteed. For example,
we set a two hour temporal constraint for this radar data collection process. But due to some technical
problems, the connection to the two radars are broken and blocked in a state of retry and timeout for
more than 30 minutes whilst its normal duration should be far less. Therefore, the two hour overall
temporal constraint for this workflow segment will probably be violated since its subsequent activities
normally require more than 90 minutes to accomplish. However, no actions were taken due to the
ignorance of the fine-grained temporal constraints. The exception handling cost for compensation of
this time deficit, e.g. workflow re-scheduling and recruitment of additional resources, is hence
inevitable. Similar problems also take place in hierarchical workflow scheduling. If we only set a two
hour upper bound temporal constraint for the whole radar data collection process, it is difficult for a
local scheduler to allocate a suitable time slot for activities 106 ~ XX in order to complete the data
updating and pre-processing segment on time. That is why we also need to set fine-grained temporal
constraints to each activity. Specifically, for this example workflow segment, at least one overall
coarse-grained temporal constraint, and ideally, 12 fine-grained temporal constraints for
activities 1X to 12X are required to be set.
Meanwhile, at workflow run-time, fine-grained temporal constraints need to be updated. For example,
the local temporal constrain for the data collection sub-process (activities 41 ~ XX ) is 10 minutes and
the local temporal constrain for the data updating and pre-processing (activities 106 ~ XX ) sub-process
is 60 minutes given the overall 2 hours constraint for the whole workflow segment. If at workflow run-
time, due to some unexpected technical problems, the actual duration for the data collection sub-
process is 30 minutes, then at activity 5X , a time deficit of 20 minutes will be detected. In order to
resolve such a temporal violation, the durations of the subsequent activities 126 ~ XX need to be
decreased to compensate such a 20 minutes delay. Normally, an exception handling strategy such as a
local workflow rescheduling strategy will be triggered to tackle the occurring temporal violation.
Therefore, the original temporal constraints for activities 126 ~ XX need to be updated. The new
temporal constraints are required to facilitate the central scheduler to allocate faster resources or to
facilitate local schedulers to assign closer time slots to decrease the queuing time. For another example,
if the actual duration for activities 106 ~ XX is 30 minutes, then at activity 11X , we can detect that not
only the 20 minutes delay has been compensated and there occurs a 10 minutes redundancy. In such a
- 5 -
case, the original temporal constraints for data transferring activities 1211 ~ XX , for instance 10
minutes, can be increased to 20 minutes. In such a case, one possible strategy is that the priority of the
data transferring activities is decreased so that the network can first meet the requirements of other
urgent tasks. Another possible strategy is that the system manger chooses another network with less
bandwidth so that the cost for data transfer can be reduced. With the former strategy, the overall
temporal QoS of the scientific workflow system can be improved. With the latter strategy, the cost for
scientific workflow execution can be reduced. Therefore, it is also important to update the local
temporal constraints so as to fully utilise the time redundancy.
2.2 Problem Analysis
From the illustration of the motivating example, it is evident that temporal constraint management
plays an important role in scientific workflow systems. However, setting and updating temporal
constraints are not straightforward tasks. Many factors such as workflow structures, system
performance and user requirements should be taken into consideration. Here, we present the basic
requirements for temporal constraint management by analysing two criteria for high quality temporal
constraints.
1) Temporal constraints should be well balanced between user requirements and system
performance. It is common that clients often suggest coarse-grained temporal constraints based on their
own interest while with limited knowledge about the actual performance of workflow systems. With
our example, it is not rational to set a 60 minutes temporal constraint to the segment which normally
needs two hours to finish. Therefore, user specified constraints are normally prone to cause frequent
temporal violations. To address this problem, a negotiation process between the client and the service
provider who is well aware of the system performance is desirable to derive balanced coarse-grained
temporal constraints that both sides are satisfied with.
2) Temporal constraints should facilitate both overall coarse-grained control and local fine-grained
control. As analysed above, this criterion actually means that temporal constraint management should
support both coarse-grained temporal constraints and fine-grained temporal constraints. Specifically,
the task of setting build time temporal constraints includes setting both coarse-grained temporal
constraints (an overall deadline for the entire workflow instance and local temporal constraints for local
workflow segments) and fine-grained temporal constraints (temporal constraints for individual
workflow activities). However, although the overall workflow process is composed of individual
workflow activities, coarse-grained temporal constraints and fine-grained temporal constraints are not
in a simple relationship of linear culmination and decomposition. Meanwhile, it is impractical to set or
update fine-grained temporal constraints manually for a large amount of activities in scientific
workflows. Since coarse-grained temporal constraints can be obtained through the negotiation process,
the problem for setting fine-grained temporal constraints is how to automatically derive them based on
the coarse-grained temporal constraints. Similarly, the problem for updating fine-grained temporal
constraints is also how to automatically propagate the time deficit/redundancy in an efficient fashion.
To conclude, the basic requirements for temporal constraint management in scientific workflow
systems can be put as: at build time, effective negotiation for setting coarse-grained temporal
constraints and automatically derive fine-grained temporal constraints; at run time, automatically
propagate time deficit/redundancy for updating local temporal constraints. To our best knowledge, the
problem of temporal constraint management in scientific workflows has so far not been systematically
investigated.
3. Probability Based Temporal Consistency Model
In this section, we propose a novel probability based temporal consistency model which utilises the
weighted joint distribution of workflow activity durations to facilitate temporal constraint management
in scientific workflow systems.
3.1 Weighted Joint Normal Distribution for Workflow Activity Durations
To define the weighted joint distribution of workflow activity durations, we first present two
assumptions on the probability distribution of activity durations.
- 6 -
Assumption 1: The distribution of activity durations can be obtained from workflow system logs
through statistic analysis [24]. Without losing generality, we assume that all the activity durations
follow the normal distribution model, which can be denoted as ),( 2σµN where µ is the expected
value, 2σ is the variance and σ is the standard deviation [42].
Assumption 2: The activity durations are independent from each other.
For the convenience of analysis, Assumption 1 chooses normal distribution to model the activity
durations without loosing generality. If most of the activity durations follow non-normal distribution,
e.g. Uniform distribution, Exponential distribution, lognormal distribution or Beta distribution [21], the
idea of our strategy can still be applied in a similar way given different joint distribution models.
However, we will leave the detailed investigation of different distribution models as our future work.
Furthermore, as it is commonly applied in the area of system simulation and performance analysis,
Assumption 2 requires that the activity durations be independent from each other to facilitate the
analysis of joint normal distribution. For those which do not follow the above assumptions, they can be
treated by normal transformation and correlation analysis [42], or moreover, they can be ignored first
when calculating joint distribution and then added up afterwards.
Furthermore, we present an important formula, Formula 1, of joint normal distribution.
Formula 1: If there are n independent variables of ),(~ 2iii NX σµ and n real numbers iθ , where
n is a natural number, then the joint distribution of these variables can be obtained with the following
formula [42]:
nn XXXZ θθθ +++= ...2211
= ∑
=∑=
∑=
n
iii
n
iii
n
iii NX
1
22
11,~ σθµθθ (1)
Based on this formula, we define the weighted joint distribution of workflow activity durations as
follows.
Definition 1: (Weighted joint distribution).
For a scientific workflow process SW which consists of n activities, we denote the activity duration
distribution of activity ia as ),( 2iiN σµ with ni ≤≤1 . Then the weighted joint distribution is defined
as
= ∑
=∑=
n
iii
n
iiiswsw wwNN
1
22
1
2,),( σµσµ , where iw stands for the weight of activity ia that denotes the
choice probability or iteration times associated with the workflow path where ia belongs to.
The weight of each activity with different workflow structures is illustrated through the calculation of
weighted joint distribution for basic Stochastic Petri Nets based building blocks, i.e. sequence,
iteration, parallelism and choice. These four building blocks consist of basic control flow patterns and
are widely used in workflow modelling and structure analysis [1-2]. Most workflow process models
can be easily built by their compositions, and similarly for the weighted joint distribution of most
workflow processes. Here, as introduced in Section 2, Stochastic Petri Nets based modelling is
employed to incorporate time and probability attributes with additional graphic notations, e.g.
stands for the probability of the path and stands for the normal duration distribution of the associated
activity. For simplicity, we illustrate with two paths for the iteration, parallelism and choice building
blocks, except the sequence building block which has only one path by nature. However, the results can
be effectively extended to more than two paths in a similar way.
1) Sequence building block. As depicted in Figure 3, the sequence building block is composed by
adjacent activities from ia to ja in a sequential relationship which means the successor activity will
not be executed until its predecessor activity is finished. The structure weight for each activity in the
sequence building block is 1 since they only need to be executed once. Therefore, according to Formula
1, the weighted joint distribution is ∑=
=j
ikkXZ ~
∑=
∑=
)(),(2
j
ikk
j
ikkN σµ .
Figure 3. Sequence building block
- 7 -
2) Iteration building block. As depicted in Figure 4, the iteration building block contains two paths
which are executed iteratively until certain end conditions are satisfied. Without the context of run-time
workflow execution, it is difficult, if not impossible, to obtain the number of iteration times at
workflow build time. Therefore, in practice, the number of iteration times is usually estimated with the
mean iteration times or with some probability distribution models such as normal, uniform or
exponential distribution. In this paper, we use the mean iteration times to calculate the weighted joint
distribution in the iteration building block. The major advantage for this simplification is to avoid the
complex joint distribution (if exists) of activity durations (normal distribution) and the number of
iteration times (may be normal or other non-normal distribution) in order to facilitate the setting of
temporal constraints at build time in an efficient fashion [42]. Here, to be consistent with the Stochastic
Petri Nets, we assume the probability of meeting the end conditions for a single iteration is γ (i.e. the
mean iteration times is r1 ) as denoted by the probability notation. Therefore, the lower path is
expected to be executed for r1 times and hence the upper path is executed for 1)1( +r times.
Accordingly, the structure weight for each activity in the iteration building block is the expected
execution times of the path it belongs to. Therefore, the weighted joint distribution here is
( ) ( )
+
+= ∑
=∑=
l
kqq
j
ipp XXZ γγ 11)1( ~ ( )( ) ( ) ( )( ) ( )
++++ ∑
=∑=
∑=
∑=
l
kqq
j
ipp
l
kqq
j
ippN )(1)(11,)(1)(11 2222 σγσγµγµγ .
Figure 4. Iteration building block
3) Parallelism building block. As depicted in Figure 5, the parallelism building block contains two
paths which are executed in parallel. Since the activity durations are modelled by normal distributed
variables, the overall duration time of the parallelism building block is equal to the distribution of the
maximum duration of the two parallel paths. However, to calculate the exact distribution of the
maximum of two random variables is a complex issue [29] which requires fundamental knowledge on
statistics and non-trivial computation cost. Therefore, in practice, approximation is often applied
instead of using the exact distribution. Since the overall completion time of the parallelism building
block is dominated by the path with the longer duration [2], in this paper, we define the joint
distribution of the parallelism building block as the joint distribution of the path with a lager expected
duration, i.e. if ∑=
∑=
≥l
kqq
j
ipp µµ then ∑
==
j
ippZ µ , otherwise ∑
==
l
kqqZ µ . Accordingly, the structure
weight for each activity on the path with longer duration is 1 while on the other path is 0. Therefore, the
weighted joint distribution of this block is
≥
=
∑=
∑=
∑=
∑=
∑=
∑=
∑=
∑=
l
kq
l
kqq
l
kqqq
j
ip
l
kqq
j
ipp
j
ipp
j
ippp
otherwiseNX
ifNX
Z
,,~
,,~
2
2
σµ
µµσµ
.
Figure 5. Parallelism building block
4) Choice building block. As depicted in Figure 6, the choice building block contains two paths in
an exclusive relationship which means only one path will be executed at run-time. The probability
notation denotes that the probability for the choice of the upper path is β and hence the choice
probability for the lower path is β−1 . In the real world, β may also follow some probability
distribution. However, similar to the iteration building block, in order to avoid the complex joint
distribution, β is estimated by the mean probability for selecting a specific path, i.e. the number of
times that the path has been selected divided by the total number of workflow instances. Accordingly,
- 8 -
the structure weight for each activity in the choice building block is the probability of the path it
belongs to. Therefore, the weighted joint distribution is ))(1()( ∑=
∑=
−+=l
kqq
j
ipp XXZ ββ ~
−+−+ ∑
=∑=
∑=
∑=
)()1()(),)(1()( 2222 l
kqp
j
ipp
l
kqq
j
ippN σβσβµβµβ .
Figure 6. Choice building block
Note that the purpose of presenting the weighted joint normal distribution of the four basic building
blocks has twofold. The first fold is to illustrate the definition of structure weight for workflow activity
durations. The second is to facilitate the efficient calculation of weighted joint normal distribution of
scientific workflows or workflow segments at build time by the composition of the four basic building
blocks. Furthermore, following the common practice in the workflow area [2, 41], approximations have
been made to avoid calculating complex joint distribution. Since it is not the focus of this paper, the
discussion on the exact distribution of these complex joint distribution models can be found in [29, 42].
3.2 Probability Based Temporal Consistency Model
The weighted joint distribution enables us to analyse the completion time of the entire workflow from
an overall perspective. Here, we need to define some notations. For a workflow activity ia , its
maximum duration, mean duration and minimum duration are defined as )( iaD , )( iaM and )( iad
respectively. For a scientific workflow SW which consists of n activities, its build-time upper bound
temporal constraint is denoted as )(SWU . In addition, we employ the “ σ3 ” rule which has been
widely used in statistical data analysis to specify the possible intervals of activity durations [21]. The
“ σ3 ”rule depicts that for any sample coming from normal distribution model, it has a probability of
99.73% to fall into the range of [ ]σµσµ 3,3 +− which is a systematic interval of 3 standard deviations
around the mean where µ and σ are the sample mean and sample standard deviation respectively.
The statistic information can be obtained through scientific workflow system logs through statistical
analysis [24]. Therefore, in this paper, we define the maximum duration, the mean duration and the
minimum duration as iiiaD σµ 3)( += , iiaM µ=)( and iiiad σµ 3)( −= respectively. Accordingly,
samples from the scientific workflow system logs which are above )( iaD or below )( iad are hence
discarded as outliers. The actual run-time duration at ia is denoted as )( iaR . Now, we propose the
definition of probability based temporal consistency which is based on the weighted joint distribution
of activity durations. Note that, since temporal constraint management includes both setting temporal
constraints at build time and updating temporal constraints at run time, our probability based temporal
consistency model also includes both definitions for build-time temporal consistency and run-time
temporal consistency.
Definition 2: (Probability based temporal consistency Model).
At build-time stage, )(SWU is said to be:
1) Absolute Consistency (AC), if )()3(1
SWUwn
iiii <+∑
=σµ ;
2) Absolute Inconsistency (AI), if )()3(1
SWUwn
iiii >−∑
=σµ ;
3) %α Consistency ( %α C), if )()(1
SWuwn
iiii =+∑
=λσµ .
At run-time stage, at a workflow activity pa ( np <<1 ), )(SWU is said to be:
1) Absolute Consistency (AC), if )()3()(11
SWUwaRn
pjiii
p
ii <++ ∑
+=∑=
σµ ;
- 9 -
2) Absolute Inconsistency (AI), if )()3()(11
SWUwaRn
pjiii
p
ii >−+ ∑
+=∑=
σµ ;
3) %α Consistency ( %α C), if )()()(11
SWUwaRn
pjiii
p
ii =++ ∑
+=∑=
λσµ .
Here iw stands for the weight of activity ia , λ ( 33 ≤≤− λ ) is defined as the %α confidence
percentile with the cumulative normal distribution function of
%2
1)( 22
2)(
απσ
λσµ σ
µλσµ =•=+
−−
∫+
∞− dxFi
ix
iiii l ( 1000 << α ). As depicted in Figure 7,
different from conventional multiple temporal consistency model where only four discrete coarse-
grained temporal consistency states are defined[7, 12], in our temporal consistency model, every
probability temporal consistency state is represented by a unique probability value and they together
compose a Gaussian curve the same as the cumulative normal distribution [21]. Therefore, they can
effectively support the requirements of both coarse-grained control and fine-grained control in
scientific workflow systems as discussed in Section 2.2. The probability consistency states outside the
confidence percentile interval of ]3,3[ +− are with continuous values infinitely approaching 0% or
100% respectively. However, since there are scarce chances (i.e. 1-99.73%=0.17%) that the probability
temporal consistency state will fall outside this interval, we name them absolute consistency (AC) and
absolute inconsistency (AI) in order to distinguish them form others.
Figure 7. Probability based temporal consistency
The purpose of probability based temporal consistency is to facilitate the management of temporal
constraints in scientific workflow systems. The advantage of the novel temporal consistency model
mainly includes three aspects. First, clients normally cannot distinguish between qualitative
expressions such as weak consistency and weak inconsistency due to the lack of background
knowledge, and thus deteriorates the negotiation process for setting coarse-grained temporal constraints
at build time. In contrast, a quantitative temporal consistency state of 90% or 80% makes much more
sense. Second, it is better to model activity duration as random variables instead of static time attributes
in system environments with highly dynamic performance to facilitate statistic analysis. Third, to
facilitate the setting of fine-grained temporal constraints at build time and the updating of fine-grained
temporal constraints at run time, continuous states based temporal consistency model where any fine-
grained temporal consistency state is represented by a unique probability value is required rather than
discrete multiple states based temporal consistency model where temporal consistency states are
represented by coarse-grained qualitative expressions. Therefore, in this paper, we propose the novel
probability based temporal consistency model.
4. Setting Build-Time Temporal Constraints
In this section, we present our negotiation based probabilistic strategy for setting temporal constraints
at build time. The strategy aims to effectively produce a set of coarse-grained and fine-grained
temporal constraints which are well balanced between user requirements and system performance. As
- 10 -
depicted in Table 1, the strategy requires the input of process model and system logs. It consists of
three steps, i.e. calculating weighted joint distribution, setting coarse-grained temporal constraints and
setting fine-grained temporal constraints. We illustrate them accordingly in the following sub-sections.
Table1: Negotiation based probabilistic setting strategy
4.1 Calculating Weighted Joint Distribution
The first step is to calculate weighted joint distribution. The statistic information, i.e. activity duration
distribution and activity weight, can be obtained from system logs by statistical analysis [2, 24].
Afterwards, give the input process model for the scientific workflow, the weighted joint distribution of
activity durations for the entire scientific workflow and workflow segments can be efficiently obtained
by the composition of the four basic building blocks as illustrated in Section 3.1.
4.2 Setting Coarse-Grained Temporal Constraints
The second step is to set coarse-grained upper bound temporal constraints at build time. Based on the
four basic building blocks, the weighted joint distribution of an entire workflow or workflow segment
can be obtained efficiently to facilitate the negotiation process for setting coarse-grained temporal
constraints. Here, we denote the obtained weighted joint distribution of the target scientific workflow
(or workflow segment) SW as ),( 2swswN σµ where ∑
==
n
iiisw w
1µµ and ∑
==
n
iiisw w
1
22σσ . Meanwhile,
we assume the minimum threshold for the probability consistency is
%β which implies client’s
acceptable bottom-line probability, namely the confidence for timely completion of the workflow
instance; and the maximum threshold for the upper bound constraint is )max(SW which denotes
client’s acceptable latest completion time. The actual negotiation process can be conducted in two
alternative ways, i.e. time oriented way and probability oriented way.
The time oriented negotiation process starts with the client’s initial suggestion of an upper bound
temporal constraint of )(SWU and the evaluation of the corresponding temporal consistency state by
the service provider. If swswSWU σµ +=)( with λ as the %α percentile, and %α is below the
threshold of %β , then the upper bound temporal constraint needs to be adjusted, otherwise the
negotiation process terminates. The subsequent process is the iteration that the client proposes a new
upper bound temporal constraint which is less constrained as the previous one and the service provider
re-evaluates the consistency state, until it reaches or is above the minimum probability threshold.
In contrast, the probability oriented negotiation process begins with the client’s initial suggestion of
a probability value of %α , the service provider evaluates the execution time )(SWR of the entire
workflow process SW by the sum of all activity durations as ∑=
+n
iiiiw
1)( λσµ , where λ is the %α
percentile. If )(SWR is above the maximum upper bound constraint of )max(SW for the client, the
- 11 -
probability value needs to be adjusted, otherwise the negotiation process terminates. The following
process is the iteration that the client proposes a new probability value which is lower than the previous
one and the service provider re-evaluates the workflow duration, until it reaches or is lower than the
upper bound constraint.
Figure 8. Negotiation process for setting coarse-grained
As depicted in Figure 8, with the probability based temporal consistency, the time oriented
negotiation process is normally where increasing upper bound constraints are proposed and evaluated
with their temporal probability consistency states until the probability is above the client’s bottom-line
confidence, while the probability oriented negotiation process is normally where decreasing temporal
probability consistency states are proposed and estimated with their upper bound constraints until the
constraint is below the client’s acceptable latest completion time. In real practice, the client and service
provider can choose either of the two negotiation processes, or even interchange dynamically if they
want. However, on one hand, for clients who have some background knowledge about the execution
time of the entire workflow or some of the workflow segments, they may prefer to choose time oriented
negotiation process since it is relatively easier for them to estimate and adjust the coarse-grained
constraints. On the other hand, for clients who have no enough background knowledge, probability
oriented negotiation process is a better choice since they can make the decision by comparing the
probability values of temporal consistency states with their personal bottom-line confidence values.
4.3 Setting Fine-Grained Temporal Constraints
The third step is to set fine-grained temporal constraints. In fact, this process is straightforward with the
probability based temporal consistency model. Since our temporal consistency actually defines that if
all the activities are executed with the duration of %α probability and their total weighted duration
equals their upper bound constraint, we say that the workflow process is %α consistency at build-
time. For example, if the obtained probability consistency is 90% with the confidence percentile λ of
1.28 (the percentile value can be obtained from any normal distribution table or most statistic program
[42]), it means that all activities are expected for the duration of 90% probability. However, to ensure
that the coarse-grained and fine-grained temporal constraints are consistent with the overall workflow
execution time, the sum of weighted fine-grained temporal constraints should be approximate to their
coarse-grained temporal constraint. Otherwise, even the duration of every workflow activity satisfies its
fine-grained temporal constraint, there is still a good chance that the overall coarse-grained temporal
constraints will be violated, i.e. the workflow cannot complete on time. Therefore, based on the same
percentile value, the fine-grained temporal constraint for each activity is defined with Formula 2 to
make them consistent with their overall coarse-grained temporal constraint.
Formula 2: For a scientific workflow or workflow segment SW which has a coarse-grained
temporal constraint of )(SWU with %α consistency of λ percentile, if SW consists of n workflow
activities with ),(~ 2iii Na σµ , the fine-grained upper bound temporal constraint for activity ia
is
)( iaU and can be obtained with the following formula:
- 12 -
−−×+= ∑
=∑=
∑=
n
ii
n
iii
n
iiiiii wwau
11
22
11)( σσσλσµ (2)
Here, iµ and iσ are obtained directly from the mean value and standard deviation of activity ia and
λ denotes the same probability with the coarse-grained temporal constraint. Based on Formula 2, we
can claim that with our setting strategy, the sum of weighted fine-grained temporal constraints is
approximately the same to their overall coarse-grained temporal constraint. Here, we present a
theoretical proof to verify our claim.
Proof: Assume the distribution model for the duration of activity ia is ),( 2iiN σµ , hence with
Formula 1, the coarse-grained constraint is set to be of swswSWu λσµ +=)( where
∑=
=n
iiisw w
1µµ and ∑
==
n
iiisw w
1
22σσ . As defined in Formula 2, the sum of weighted fine-grained
constraints is ∑=
∑=
∑=
∑=
∑=
−−×+=
n
i
n
i
n
ii
n
iii
n
iiiiiiii wwwauw
1 1 11
22
11)( σσσλσµ . Evidently, since iw
and iσ are all positive values, ∑=
∑=
≥n
iii
n
iii ww
1
22
1σσ holds and ∑
=
n
ii
1σ is normally big for a large size
scientific workflow SW , hence the right hand side of the equation can be extended and what we get is
( )( )∑=
−×+n
iiii Aw
11λσµ where A equals ∑
=∑=
∑=
−
n
ii
n
iii
n
iii ww
11
22
1σσσ . Therefore, it can be expressed
as 1111
)( twwauwn
iii
n
iii
n
iii ∆−+= ∑
=∑=
∑=
σλµ (Equation ⅠⅠⅠⅠ) where ∑=
=∆n
ii Awt
11 . Meanwhile,
since ∑=
∑=
≥n
iii
n
iii ww
1
22
1σσ , thus ∑
=∑=
∑=
∑=
+≤+n
iii
n
iii
n
iii
n
iii wwww
111
22
1σλµσλµ . Therefore, it can be
expressed as 2111
22
1)( twwwwWSu
n
iii
n
iii
n
iii
n
iii ∆−+=+= ∑
=∑=
∑=
∑=
σλµσλµ (Equation ⅡⅡⅡⅡ) where 2t∆
equals
− ∑
=∑=
n
iii
n
iii ww
1
22
1σσλ . Furthermore, if we denote
− ∑
=∑=
n
iii
n
iii ww
1
22
1σσ as B then we can
have Bwtn
ii
n
ii
=∆ ∑
=∑= 11
1 σ and Bt λ=∆ 2 . Since in real world scientific workflows,
∑=
∑=
n
ii
n
iiw
11σ is
smaller than 1 due to ∑=
n
ii
1σ is normally much bigger than ∑
=
n
iiw
1, meanwhile, λ is a positive value
smaller than 1 (1 means a probability consistency of 84.13% which is acceptable for most clients) [42],
1t∆ and 2t∆ are all relatively small positive values compared with the major component of Equation ⅠⅠⅠⅠand Equation ⅡⅡⅡⅡ. Evidently, we can deduce that ∑=
∑=
∑=
∆−+=n
i
n
iii
n
iiiii twwauw
11
11)( σλµ
)(211
WSutwwn
iii
n
iii =∆−+≈ ∑
=∑=
σλµ . Therefore, the sum of weighted fine-grained temporal constraints
is approximately the same to the coarse-grained temporal constraint and thus our claim holds.
5. Updating Run-Time Temporal Constraints
In this section, we propose our probabilistic updating strategy for run-time temporal constraints. At
scientific workflow run time, build time temporal constraints need to be updated according to run-time
activity durations. As depicted in Table 2, our probabilistic updating strategy consists of two major
steps including calculating the probability time deficit/redundancy and updating fine-grained
constraints. We illustrate them accordingly in the following sub-sections.
- 13 -
Table 2: Probabilistic updating strategy
5.1 Calculating the Probability Time Deficit/Redundancy
Due to none or limited context knowledge, it is difficult, if not impossible, to determine the concrete
workflow structure of a workflow instance at build time. Therefore, we utilise structure weights based
on system historic data to facilitate the setting of temporal constraints. In contrast, at scientific
workflow run time, there is normally some context knowledge available which can be used to
determine the previous unknown information such as the execution path which will be selected in a
choice building block. In such a case, the previously specified choice probability becomes ineffective.
For instance, the probability needs to be modified to 1 for the selected path and 0 for others at run time.
Therefore, at scientific workflow run time, activity structure weights are subject to change according to
run time execution results. However, for the workflow activities of those workflow paths which have
not been determined yet, their structure weights are still effective as at build time.
The task of updating run time temporal constraint is to automatically propagate time
deficit/redundancy to update fine-grained temporal constraints. Therefore, time deficit/redundancy
need to be calculated first before the propagation process. Here, the effective workflow segment for
time deficit/redundancy propagation is from the next activity point to the last activity point of the
coarse-grained temporal constraint which covers the current activity point. During the effective
workflow segment, as analysed above, there are probably some workflow paths which have been
determined but others have not. Therefore, the issue of estimating the execution time at run time is very
different from its build time counterpart since there is a mixture of determined and non-determined
workflow paths. To solve such an issue, we define the run-time workflow critical path which can
facilitate the estimation of the duration for the effective workflow segment.
Definition 3: (Run-Time Workflow Critical Path). Within the effective workflow segment for time deficit/redundancy propagation, the run-time workflow
critical path is defined as the longest execution path from the start node to the end node of the
workflow segment. Specifically, for those workflow paths which have been determined, all the
activities are included; for those workflow paths which have not been determined, only the activities of
the longest path are included. Here, the longest path is the path which has the maximum mean duration.
For calculating the probability time deficit/redundancy, the choice probability for those longest paths in
the previous choice building blocks is changed to 1.
Based on the definition of run-time workflow critical path, the probability time deficit and
probability time redundancy are defined as follows.
Definition 4: (Probability Time Deficit).
Given a scientific workflow or workflow segment SW with a upper bound constraint of )(SWU , at
activity point pa , let )(SWU be of %α C with the percentile of αλ which is below the threshold of
%β with the percentile of βλ ( %β is the initial probability temporal consistency state which agreed
by clients and service provides at build time through the negotiation process). Then the probability time
- 14 -
deficit of )(SWU at pa is defined as )),(( paSWUPTD = )()(),( 1 SWUaUwaaRthCriticalPak
kkp −
+ ∑
∈
where kw and )( kaU are the structure weight and the fine-grained upper bound temporal constraint
for activity ka respectively.
The probability time deficit is defined to measure the occurring time deficit at the activity point given
the upper bound temporal constraint which is set on the last activity of a scientific workflow or
workflow segment. In order to ensure on-time completion of scientific workflows and workflow
segments, the probability time deficit needs to be propagated to decrease the subsequent fine-grained
temporal constraints. As illustrated in Section 2.1 with the motivating example, in some cases, if the
expected activity durations exceed their fine-grained temporal constraints, some exception handling
strategies such as workflow rescheduling and resource recruitment [40, 52] may be triggered so as to
avoid the possible violations of coarse-grained temporal constraints.
Definition 5: (Probability Time Redundancy).
At activity point pa , let )(SWU be of %α C with the percentile of αλ which is above the threshold
of %β with the percentile of βλ . Then the probability time redundancy of )(SWU at pa is
)),(( paSWUPTR which is equal to
+− ∑
∈ thCriticalPakkkp aUwaaRSWU )(),()( 1 where kw and )( kaU
are the structure weight and the fine-grained upper bound temporal constraint for activity ka
respectively.
The probability time redundancy is defined to measure the time redundancy at the current activity
point given the upper bound temporal constraints. In order to save the execution cost or improve the
overall temporal QoS, probability time redundancy needs to be propagated to increase the subsequent
fine-grained temporal constraints. As illustrated in Section 2.1 with the motivating example, in some
cases, if the fine-grained temporal constraints are large enough compared with the expected activity
durations, activities can be re-allocated to less expensive resources to save the execution cost, or be
postponed intentionally to decrease the queuing time of other urgent activities so as to improve the
overall temporal QoS in the scientific workflow systems.
5.2 Updating Fine-Grained Temporal Constraints
After the probability time deficit/redundancy has been obtained, the next step is to propagate it to the
subsequent fine-grained temporal constraints. Note that since the probability time deficit/redundancy is
defined based on the critical path, the probability time deficit/redundancy should only apply to the fine-
grained temporal constraints of those activities on the critical path. However, the fine-grained temporal
constraints of those activities which are on the non-critical paths should also be updated. The reason
can be explained as follows. For choice building blocks (which have not been determined), those non-
critical paths still have the probability to be executed and hence require the update of fine-grained
temporal constraints. For parallelism building blocks, those non-critical paths also need to be updated
in case that the durations of those non-critical paths may exceed that of the critical path, i.e. non-critical
paths may become the critical path, when large time deficits occur on non-critical paths. Similar
situations may also occur on those sequence and iteration building blocks on the non-critical paths.
Therefore, we not only need to update the fine-grained temporal constraints for the activities on the
critical path, but also those for the activities on the non-critical paths. To address such as issue, in our
strategy, we first conduct the probability time deficit/redundancy propagation process for activities on
the critical path. Afterwards, based on the propagation results, the fine-grained temporal constraints for
activities on the non-critical paths can be updated accordingly.
1) Probability Time Deficit/Redundancy Propagation Process for Activities on Critical Path To ensure the fairness among subsequent activities, the probability time deficit/redundancy quota is
defined which is based on the ratio of the mean time redundancy (i.e. the difference between the
maximum and mean activity durations) with the mean activity durations. Given the current activity
point pa , the effective range for time deficit/redundancy propagation is from the next activity point
1+pa to the last activity, e.g. mpa + , of the coarse-grained temporal constraint which covers pa . Here,
we denote the critical path in the effective range as thCriticalPa . Since the probability time
deficit/redundancy is defined based on the critical path, the coefficient for the deficit quota of each
- 15 -
activity on the critical path is hence defined as
∑∈
∑∈
∑∈
=−+
−+
=−
−
thCriticalPai i
i
i
i
thCriticalPai i
iii
i
iii
thCriticalPai i
ii
i
ii
aM
aMaD
aM
aMaD
µσ
µσ
µ
µσµ
µ
µσµ
3
3
)(
)()(
)(
)()(
. Therefore, given the
probability time deficit )( paPTD or probability time redundancy )( paPTR at activity point pa , the
time deficit quota )( iaPTDQ or time redundancy quota )( iaPTRQ propagated to thCriticalPa are
defined with Formula 3 and Formula 4 respectively:
i
thCriticalPaj j
jj
j
jj
p
iw
w
w
aPTD
aPTDQ
∑∈
=µ
σ
µσ
*)(
)(
(3)
i
thCriticalPaj j
jj
j
jj
p
iw
w
w
aPTR
aPTRQ
∑∈
=µ
σ
µσ
*)(
)(
(4)
Given the time deficit quota )( iaPTDQ or time redundancy quota )( iaPTRQ for ia , and the build-
time upper bound fine-grained temporal constraint )( iaU for ia , )( iaU is updated according to
Formula 5 or Formula 6 respectively.
)( iaF = )()( ii aPTDQaU − (5)
)( iaF = )()( ii aPTRQaU + (6)
2) Probability Time Deficit/Redundancy Propagation Process for Activities on Non-Critical Paths Here, the basic idea is to apply the probability time deficit/redundancy quota of the longest path to
the other paths. The motivation of applying the same probability time deficit/redundancy quota to non-
critical paths can be explained as follows. Since the critical path is the longest path which has the
maximum mean duration among all choice or parallel paths in the choice or parallelism building blocks,
its probability time deficit quota will be the maximum one among all the paths if we calculate the
probability time deficit quota for all the paths according to Definition 3. Similarly, the time
redundancy quota of the longest path will be the minimum one according Definition 4. Therefore, in
such a condition, if time deficit occurs, the sum of the updated fine-grained temporal constraints for all
the activities on the non-critical paths will compensate for the time deficit since the maximum
probability time deficit quota is propagated. Similarly, if time redundancy occurs, the sum of the
updated fine-grained temporal constraints for all the activities on the non-critical paths will not exceed
the coarse-grained temporal constraints since the minimum probability time redundancy quota is
propagated. For example, if we assume the mean duration for the longest path in a choice building
block is 100 minutes and its probability time deficit quota is 10 minutes, then the other paths, e.g. a
path with its mean duration of 60 minutes, their probability time deficit quota will be less than 10
minutes, e.g. 5 minutes, according to Definition 3. However, based on our probability time deficit
propagation process, the probability time deficit quota for those non-longest paths is also set as 10
minutes. Therefore, given our method, no matter at run time whether the longest path or the other non-
longest paths are selected, the sum of the updated fine-grained temporal constraints can compensate for
the occurring probability time deficit since the maximum probability time deficit quota of 10 minutes
has already been propagated. Similarly, another example is that the probability time redundancy quota
for the longest path is 5 minutes, then according to Definition 4, the probability time redundancy quota
for the other non-longest path will be larger than 5 minutes, e.g. 8 minutes. However, based on our
probability time redundancy propagation process, the probability time redundancy quota for those non-
longest paths is also set as 5 minutes, i.e. the same as the longest path. Therefore, no matter at run time
whether the longest path or the other non-longest paths are selected, the sum of the updated fine-
- 16 -
grained temporal constraints will not exceed the coarse-grained temporal constraints since only the
minimum probability time redundancy quota of 5 minutes has been propagated.
Specifically, the probability time deficit/redundancy propagation process for activities on non-critical
paths is described as follows:
After the fine-grained temporal constraints of the activities on the critical path have been updated, the
following issue is to update the fine-grained temporal constraints of the activities on non-critical paths.
Here, we assume the longest path in the choice or parallelism building block is denoted as LP . The
fine-grained temporal constraints of activities on LP have been updated and the sum of their
probability time deficit and time redundancy quota are denoted as )(LPPTDQ and )(LPPTRQ
respectively. Here, the probability time deficit/redundancy quota for the other non-longest paths are
defined as the same as that of the longest path as shown in Formula 7 and Formula 8 respectively.
)|( LPPPPTDQ ii ≠ = )(LPPTDQ (7)
)|( LPPPPTRQ ii ≠ = )(LPPTRQ (8)
After that, the probability time deficit/redundancy quota for those activities on the other non-longest
paths is defined the same as in Formula 4/Formula 5, and their fine-grained temporal constraints are
updated according to Formula 6/Formula 7.
5.3 Updating Frequency and Overhead
Note that the probability time deficit/redundancy propagation processes need to be conducted for
many times in order to update fine-grained temporal constraints. However, although the computation
cost for a single propagation process is trivial, it is unnecessary to update fine-grained temporal
constraints every time when a minor time deficit/redundancy takes place. In practice, there are two
alternative ways to update fine-grained temporal constraints in a batch fashion. The first one is to set a
time deficit/redundancy threshold. Accordingly, the propagation process will only be conducted when
the accumulated deficit/redundancy exceeds the threshold. For example, if the threshold is set as 10
minutes, then the propagation process will be conducted if and only if the accumulated
deficit/redundancy exceeds the 10 minutes threshold. An alternative way is to set a fixed size for
workflow activities so that the propagation process will only be conducted at those activity points with
a distance of the fixed size in between. For example, if the fixed size for workflow activities is set as 20,
then the propagation process will be conducted only on those activities such as the 20th
, 40th
, 60th
and
so on. Besides the above two intuitive yet practical methods, some sophisticated strategies which can
choose specific activity points to conduct certain actions such as some work in temporal checkpoint
selection could be referred [11-12]. However, since it is not the focus of this paper, we will leave it as
our future work.
As for the overhead of the updating process, the major overhead is on the calculation of the
probability time deficit/redundancy. However, since all the required information despite the run-time
durations of completed activities such as the activity duration distribution model, the structure weight
and the workflow run-time critical path is either already available or can be easily obtained by simple
computation based on the build time setting results, with a moderate updating frequency, the overhead
for updating fine-grained temporal constraint is acceptable.
6. Case Study
In this section, we evaluate the effectiveness of our probabilistic strategy for temporal constraint
management by further illustrating the motivating example introduced in Section 2.1. The process
model is the same as depicted in Figure 1. Since our strategy consists of build-time setting temporal
constraints and run-time updating temporal constraints, the evaluation also includes two consecutive
parts.
Here, we first illustrate our probabilistic strategy for build-time setting temporal constraints. As
presented in Table 1, the first step is to calculate the weighted joint distribution. Based on statistical
analysis and the “ σ3 ”rule, the normal distribution model and its associated weight for each activity
duration are specified through statistical analysis of accumulated system logs. As the detailed
specification of the workflow segment depicted in Table 3, the weighted joint distribution of each
- 17 -
building block can be derived instantly with their formulas proposed in Section 4. We obtain the
weighted joint distribution as )217,6190( 2N with second as the basic time unit.
Table 3. Specification of the workflow segment
The second step is the negotiation process for setting an overall upper bound temporal constraint for
this workflow segment. Here, we first illustrate the time oriented negotiation process. We assume that
the client’s bottom-line confidence of the probability consistency state is 80%. The client starts to
propose an upper bound temporal constraint of 6250s, based on the weighted joint distribution of
)217,6190( 2N and the cumulative normal distribution function, the service provider can obtain the
percentile as 28.0=λ and reply with the probability of 61% which is lower than the threshold of 80%.
Hence the service provider advises the client to relax the temporal constraint. Afterwards, for example,
the client proposes a series of new candidate upper bound temporal constraints one after another, e.g.
6300s, 6360s and 6380s, and the service provider replies with 69%, 78% and 81% as the corresponding
temporal consistency states. Since 81% is higher than the 80% minimum threshold, therefore, through
time oriented negotiation process, the final negotiation result could be an upper bound temporal
constraint of 6190+0.88*217=6380s with a probability consistency state of 81% where 0.88 is the 81%
probability percentile. As for probability oriented negotiation process, we assume the clients acceptable
latest completion time is 6400s. The client starts to propose a probability temporal consistency state of
90%, based on the weighted joint distribution of )217,6190( 2N and the cumulative normal distribution
function, the service provider reply with an upper bound temporal constraint of 6468s which is higher
than the threshold. Afterwards, for example, the client proposes a series of new candidate probability
temporal consistency states one after another, e.g. 88%, 85% and 83%, and the service provider replies
with 6445s, 6415s and 6397s as the corresponding temporal consistency states. Since 6397s is lower
than the 6400s maximum threshold, through probability oriented negotiation process, the final
negotiation result could be an upper bound temporal constraint of 6397s with a probability temporal
consistency state of 83%. Evidently, from this example, with the result of 6380s and 6397s obtained
through two different negotiation processes, we can confirm that the setting process is effective no
matter which kind of negotiation process is adopted. Furthermore, the final coarse-grained temporal
constraints obtained are normally similar if the decision maker is the same client. The setting result of
time oriented negotiation process is presented in Table 4.
Table 4. Setting results
- 18 -
The third step is to set the fine-grained temporal constrains for each workflow activity with the
obtained overall upper bound constraint. As we mentioned in Section 4, the probability based temporal
consistency defines that the probability for each expected activity duration is the same as the
probability consistency state of the workflow process. Therefore, take the result obtained through the
time oriented negotiation process for illustration, since the coarse-grained temporal constraint is 6380s
with a probability consistency state of 81%, according to Formula 2, the fine-grained temporal
constraints for each activity can be obtained instantly. Since 4121
=∑=
i
n
iiw σ ,
2172
1
2 =∑=
i
n
iiw σ and 250
1=∑
=
n
iiσ , the coefficient here is ( ) 2502174121 −− which equals to 0.22.
Therefore, for example, the fine-grained upper bound temporal constraint for activity 1X is
s108)22.0*225*88.0105( =+ and the constraint for activity 12X
is s125)22.0*64*88.0123( =+ . The detailed results are presented in Table 4.
Now we further illustrate our strategy for run-time updating temporal constraints. Here, assume the
fixed size for workflow activities is set as 5, then the updating process will be conducted on 5X and
10X . Here, we first take activity 5X as an example. If the second radar is selected, i.e. the lower path in
the first choice building block is selected, and activity durations for 3X , 4X and 5X are 248s, 445s and
600s respectively. Here, the effective range for updating temporal constraints is from 6X to 12X and
the critical path is easily identified as ),,,,( 12111098 XXXXX from the results shown in Table 3.
Therefore, at 5X , there occurs a probability time deficit of 200s since )),(( 5XSWUPTD =
)()(),( 51 SWUaUwaaR kthCriticalPak
k −+ ∑∈
= 1293+5287-6380 = 200s. Therefore, we will first update the
fine-grained temporal constraints for the activities on the critical path and then update those for the
activities on the non-critical path, i.e. ),( 76 XX in the sequence building block. The updating results
are shown in Table 5.
Table 5. Updating results
Based on Formula 4, it is easy to calculate the probability time deficit quota for each activity on the
critical path. After that, the sum of the probability time deficit quota for the non-critical path, i.e.
),( 76 XX , is directly set as the same as its counterpart of the longest path ),,( 1098 XXX , i.e. 184s.
Therefore, we can obtain the probability time deficit quota for 6X and 7X as 80s and 104s
respectively. The fine-grained temporal constraints are hence updated according to Formula 6.
Here, in order to verify the effectiveness of the updated fine-grained temporal constraints, we test
the sum of the estimated execution time to check if it can compensate for the occurring 200s time
deficit. Since ),( 51 aaR = 1293s, and the sum of the fine-grained temporal constraints is equal to
∑=
12
6)(
iii aUw = 5087, hence the estimated execution time for the workflow segment is
),( 51 aaR + ∑=
12
6)(
iii aUw = 6380s which is equal to the coarse-grained temporal constraint set at build
time as shown in Table 4. Therefore, the updated fine-grained temporal constraints can ensure that the
occurring 200s time deficit can be compensated for by our probability time deficit propagation process.
- 19 -
Since the probability time redundancy propagation process is symmetrical to the probability time
deficit propagation process as illustrated above, its evaluation is omitted here.
To conclude, the above demonstration of the setting and updating process evidently shows that our
probabilistic strategy is effective for the management of temporal constraints in scientific workflow
systems. It has met the two basic requirements proposed in Section 2: at build time, effective
negotiation for setting coarse-grained temporal constraints and automatically derive fine-grained
temporal constraints; at run time, automatically propagate time deficit and time redundancy for
updating local temporal constraints. As for the overhead of the setting process, the major overhead is
on the calculation of weighted joint normal distribution. However, as presented in Section 6, with the
Stochastic Petri Nets based modelling tool provided in our scientific workflow system, the weighted
joint normal distribution can be obtained on-the-fly with user’s modelling process. The four basic
building blocks can speed up both the modelling process and the calculation of weighted joint normal
distribution for scientific workflows. After that, the negotiation process for setting coarse-grained
temporal constraints is totally under control of the stakeholders and the propagation process for setting
fine-grained temporal constraints can be done instantly. As for the overhead of the updating process,
the major overhead is on the calculation of the probability time deficit/redundancy. However, since all
the required information despite the run-time durations of completed activities such as the activity
duration distribution model, the structure weight and the workflow run-time critical path is either
already available or can be easily obtained based on the build time setting results, as demonstrated
above, with the guarantee of effectiveness, the overhead of temporal constraint management with our
probabilistic strategy is acceptable.
7. System Implementation
In this section, we introduce the implementation of the setting strategy in our SwinDeW-G scientific
workflow system.
7.1 SwinDeW-G Scientific Workflow System
SwinDeW-G (Swinburne Decentralised Workflow for Grid) is a peer-to-peer based grid workflow
system running on the SwinGrid (Swinburne service Grid) platform [47]. An overall picture of
SwinGrid is depicted in Figure 9 (bottom plane).
Figure 9. Overview of SwinDeW-G environment
SwinGrid contains many grid nodes distributed in different places. Each grid node contains many
computers including high performance PCs and/or supercomputers composed of significant number of
computing units. The primary hosting nodes include the Swinburne CS3 (Centre for Complex Software
Systems and Services) Node, Swinburne ESR (Enterprise Systems Research laboratory) Node,
Swinburne Astrophysics Supercomputer Node, and Beihang CROWN (China R&D environment Over
Wide-area Network) Node in China. They are running Linux, GT4 (Globus Toolkit) or CROWN grid
- 20 -
toolkit 2.5 where CROWN is an extension of GT4 with more middleware, hence compatible with GT4.
Besides, the CROWN Node is also connected to some other nodes such as those in Hong Kong
University of Science and Technology, and University of Leeds in UK. The Swinburne Astrophysics
Supercomputer Node is cooperating with PfC (Australian Platform for Collaboration) and VPAC
(Victorian Partnership for Advanced Computing). Currently, SwinDeW-G is deployed at all primary
hosting nodes as exemplified in the bottom of plane of Figure 9. In SwinDeW-G, a scientific workflow
is executed by different peers that may be distributed at different grid nodes. As shown in Figure 9,
each grid node can have a number of peers, and each peer can be simply viewed as a grid service. In
the top plane of Figure 9, we show a sample of how a scientific workflow can be executed in the
simulation environment.
As an important reinforcement for the overall workflow QoS, temporal verification is being
implemented in SwinDeW-G. It currently supports dynamic checkpoint selection and temporal
verification at run-time [7, 10]. After the running of SwinDeW-G for a period of time, statistical
analysis can be applied to accumulated system logs to obtain probability attributes [21, 24]. The
probabilistic strategy for setting temporal constraints is being integrated into the scientific workflow
modelling tool which supports Stochastic Petri Nets based modelling, composition of building blocks,
temporal data analysis, interactive and automatic setting of temporal constraints.
7.2 SwinDeW-G Scientific Workflow Modelling Tool
Our probabilistic strategy for temporal constraint management is being implemented into our
SwinDeW-G scientific workflow system as integrated components. Specifically, the setting strategy is
included in the SwinDeW-G modelling tool which is in charge of the modelling and setting of QoS
constraints at workflow build time, the updating strategy is included in the SwinDeW-G temporal
adjustment tool which is in charge of maintaining temporal correctness at workflow run time. As
shown in Figure 10(a), the modelling tool adopts Stochastic Petri Nets with additional graphic
notations, e.g. for probability, for activity duration, for a sub-process, for the start point
and for the end point of an upper bound temporal constraint, to support explicit representation of
temporal information. It also supports the composition of four basic building blocks and user specified
ones.
(a) Temporal data analysis (b) Setting temporal constraints
(c) Updating temporal constraints
Figure 10. SwinDeW-G Constraint Management Tool
- 21 -
The modelling tool supports temporal data analysis from workflow system logs. Temporal data
analysis follows the “ σ3 ” rule and can generate the normal distribution model for each activity
duration. The probability attributes for each workflow structure such as the choice probability and
iteration times can also be obtained through statistical analysis on historic workflow instances from
system logs. After temporal data analysis, the attributes for each activity, i.e. its mean duration,
variance, maximum duration, minimum duration and weight are associated to the corresponding
activity and explicitly displayed to the client. Meanwhile, the weighted joint distribution of the target
process is obtained automatically with basic building blocks. As shown in Figure 10(b), with our
probability based temporal consistency, the client can specify either an upper bound temporal
constraint or a probability consistency state, and afterwards, the system will instantly reply with the
corresponding probability state or the upper bound temporal constraint.
Based on the visualised results shown by a Gaussian curve (the cumulative normal distribution), the
client can decide whether to accept or decline the results. If the client is not satisfied with the outcomes,
he or she can specify a new value for evaluation until a satisfactory result is achieved. Evidently, the
negotiation process between the client and the service provider is implemented as an interactive process
between the system user and our developed program. After setting the coarse-grained temporal
constraints, the fine-grained constraints for each activity are propagated automatically. These activity
duration distribution models, coarse-grained and fine-grained temporal constraints are explicitly
represented in the scientific workflow models and will be further deployed to facilitate the control of
overall workflow execution time by run-time temporal verification in scientific workflows. Detailed
discussion on how temporal constraints are employed in scientific workflow temporal verification can
refer to [10-11].
At workflow run time, fine-grained temporal constraints are updated according to run-time activity
durations. As shown in Figure 10(c), based on run-time activity durations and the statistical information
obtained at build time, the run-time workflow critical path (as the execution path highlighted in the
“Run-Time Workflow Critical Path” interface) and the probability time deficit/redundancy (as shown
in the “Run-Time Activity Durations” interface) can be derived automatically. As a component in the
run-time temporal adjustment tool, the probability time deficit/redundancy propagation process is
normally conducted as a background program given system defined frequency (based on either a time
deficit/redundancy threshold or a fixed size for workflow activities as addressed in Section 5.3) in an
automatic fashion. Meanwhile, it can also be conducted as a foreground program to facilitate the
decision making of system managers. When working as a foreground program, temporal constraints
will be updated under manual control. Therefore, in our SwinDeW-G constraint management tool, the
run-time fine-grained temporal constraints can be updated accordingly either in an automatic or manual
fashion.
8. Related Work
In this section, we review some related work on temporal constraints in workflow systems. The work in
[49] presents the taxonomy of grid workflow QoS constraints which include five dimensions, i.e. time,
cost, fidelity, reliability and security. Some papers have presented an overview analysis of scientific
workflow QoS [18, 22, 27]. The work in [9] presents the taxonomy of grid workflow verification which
includes the verification of temporal constraints. In a distributed environment such as a distributed soft
real-time system, a task is usually divided into several subtasks to be executed in a specific order at
different sites. Therefore, the issue of automatically translating the overall deadline into deadlines for
the individual subtasks are investigated in [20]. Generally speaking, there are two basic ways to assign
QoS constraints, one is activity-level assignment and the other is workflow-level assignment. Since the
whole workflow process is composed by all individual activities, an overall workflow-level constraint
can be obtained by the composition of activity-level constraints. On the contrary, activity-level
constraints can also be assigned by the decomposition of workflow-level constraints [49]. However,
different QoS constraints have their own characteristics and require in depth research to handle
different scenario.
As shown in our probabilistic strategy, the primary information required for temporal constraint
management includes the workflow process models, statistics of activity durations and the temporal
consistency model. Scientific workflows require the explicit representation of temporal information,
i.e. activity durations and temporal constraints to facilitate temporal verification. One of the classical
modelling methods is the Stochastic Petri Nets [2-3] which incorporates time and probability attributes
into workflow processes that can be employed to facilitate scientific workflow modelling. Activity
- 22 -
duration, as one of the basic elements to measure system performance, is of significant value to
workflow scheduling, performance analysis and temporal verification [8, 11, 41, 51]. Most work
obtains activity durations from workflow system logs and describes them by a discrete or continuous
probability distribution through statistical analysis [2, 25]. As for temporal consistency, traditionally,
there are only binary states of consistency or inconsistency. However, as stated in [7], it argues that the
conventional consistency condition is too restrictive and covers several different states which should be
handled differently for the purpose of cost-effectiveness. Therefore, it divides conventional
inconsistency into weak consistency, weak inconsistency and strong inconsistency and treats them
accordingly. However, multiple-state based temporal consistency model cannot support quantitative
measurement of temporal consistency states and lacks the ability to support statistical analysis for
constraint management. Therefore, in our preliminary work [25], a probability based build-time
temporal consistency model is presented to facilitate the setting of temporal constraints. Furthermore,
in this paper, its run-time counterpart is presented to facilitate the updating of temporal constraints.
Table 6. A survey on the support of temporal constraints
Temporal constraints are not well emphasised in traditional workflow systems. However, some
business workflow systems accommodate temporal information for the purpose of performance
analysis. For example, Staffware provides the audit trail tool to monitor the execution of individual
instances [1] and SAP business workflow system employs the workload analysis [23]. As for the
support of temporal constraints in scientific workflow systems, a survey is conducted by us and based
on some of the work reported in [9, 49]. Since workflow modelling is highly related with the
specification of temporal constraints, the survey also concerns with two aspects of the modelling
language and the modelling tool (language-based, graph-based or both) in addition to the three aspects
of whether they support the specification of temporal constraints (the specification of temporal
constraints in workflow models), the management of temporal constraints (i.e. the setting and updating
of temporal constraints) and the temporal verification (the verification of temporal constraints). As
shown in Table 6, among the 10 representative scientific workflow projects (ASKALON [31],
CROWN [32], DAGMan [14], GridBus [33], JOpera [34], Kepler [35], SwinDeW-G [47], Taverna
[36], Triana [37] and UNICORE [38]), most projects are using XML-like modelling language and
support language-based or graph-based modelling tool. Therefore, in the modelling stage, a temporal
constraint can either be inexplicitly specified as an element in the XML document or explicitly as a
graphic component in the workflow template. As for the representation of temporal constraints, the
management of temporal constraints and the support of temporal verification which we are most
- 23 -
concerned with, only some of the projects such as ASKALON, DAGMan, GridBus, JOpera, Kepler,
Taverna and SwinDeW-G clearly stated in their published literatures that temporal constraints are
supported in their system QoS control or performance analysis. Yet, to our best knowledge, only
SwinDeW-G has set up a series of strategies such as the probabilistic strategy for temporal constraint
management [25] and the efficient checkpoint selection strategy to support dynamic temporal
verification [11]. In summary, even though temporal QoS has been recognised as an important aspect
in scientific workflow systems, the work in this area, e.g. the specification of temporal constraints and
the support of temporal verification, is still in its infancy [9].
9. Conclusions and Future Work
In this paper, we have proposed a probabilistic strategy for temporal constraint management in
scientific workflow systems. A novel probability based temporal consistency model which is defined
by the weighted joint distribution of activity durations has been provided to support statistical analysis
for temporal constraint management. Meanwhile, the weighted joint distribution of four Stochastic
Petri Nets based basic building blocks, i.e. sequence, iteration, parallelism and choice, has been
presented to facilitate the efficient calculation of the weighted joint distribution of specific workflows
and workflow segments by their compositions. Our temporal constraint management consists of a
negotiation based probabilistic strategy for setting temporal constraints at build time and a probabilistic
strategy for updating temporal constraints at run time. The setting strategy aims to achieve a set of
coarse-grained and fine-grained temporal constraints which are well balanced between user
requirements and system performance. With the probability based temporal consistency, well balanced
overall coarse-grained temporal constraints can be obtained through either a time oriented or
probability oriented negotiation process. Thereafter, fine-grained temporal constraints for each activity
can be propagated instantly in an automatic fashion. The updating strategy is designed to effectively
propagate occurring time deficit and time redundancy to update run time fine-grained temporal
constraints according to run-time activity durations. For such a purpose, the probability time deficit and
the probability time redundancy are defined in this paper to effectively estimate the occurring time
deficit and time redundancy along scientific workflow execution. Afterwards, the probability time
deficit/redundancy propagation process is conducted automatically to update the run-time fine-grained
temporal constraints.
A weather forecast scientific workflow has been first employed as a motivating example and then
revisited as a case study to evaluate the effectiveness of our strategy. The evaluation results have
shown that our strategy is capable of setting a set of coarse-grained and fine-grained temporal
constraints. Meanwhile, the sum of weighted fine-grained temporal constraints is approximately the
same to their coarse-grained temporal constraint, namely the coarse-grained and fine-grained temporal
constraints are consistent with the overall workflow execution time. The evaluation results have also
shown that our strategy can update the run-time fine-grained constraints effectively so that the
occurring probability time deficit can be compensated for and the occurring time redundancy can be
reasonably propagated among subsequent activities. The system implementation of our constraint
management strategy has been demonstrated with the components in our SwinDeW-G scientific
workflow Systems. To the best of our knowledge, this is the first work that has systematically analysed
and addressed the issue of temporal constraint management (including setting build-time coarse-
grained and fine-grained temporal constraints, and updating run-time fine-grained temporal constraints)
in scientific workflow systems.
In this paper, to simplify the statistical analysis without losing generality, the normal distribution
model has been used to model the workflow activity durations. In the future, other representative
probability distribution models such as uniform, exponential will also be employed to represent the
performance of different underlying services. Furthermore, our strategy will be investigated and
modified accordingly to accommodate those more complex scenarios.
Acknowledgments. This work is partially supported by Australian Research Council under
Linkage Grant LP0990393, the National High Technology Research and Development 863 Program of
China under Grant No. 2007AA04Z116.
- 24 -
References
[1] W. M. P. v. d. Aalst and K. M. V. Hee, Workflow Management: Models, Methods, and Systems: The MIT
Press, Cambridge, 2002.
[2] W. M. P. v. d. Aalst, K. M. V. Hee, and H. A. Reijers, "Analysis of Discrete-Time Stochastic Petri Nets,"
Statistica Neerlandica, vol. 54, pp. 237-255, 2000.
[3] G. Bucci, L. Sassoli, and E. Vicario, "Correctness Verification and Performance Analysis of Real-Time
Systems Using Stochastic Preemptive Time Petri Nets," IEEE Trans. on Software Engineering, vol. 31, no.11,
pp. 913-927, 2005.
[4] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, "Cloud computing and emerging IT platforms:
Vision, hype, and reality for delivering computing as the 5th utility," Future Generation Computer Systems,
vol. 25, no.6, pp. 599-616, 2009.
[5] N. M. D. Center, China Meteorological Data Sharing Service System, Available:
http://cdc.cma.gov.cn/index.jsp , accessed on 1st Oct. 2008,
[6] C. K. Chang, H. Jiang, and Y. Di, "Time-line Based Model for Software Project Scheduling with Genetic
Algorithms," Information and Software Technology, vol. 50(2008), pp. 1142-1154, 2008.
[7] J. Chen and Y. Yang, "Multiple States based Temporal Consistency for Dynamic Verification of Fixed-time
Constraints in Grid Workflow Systems," Concurrency and Computation: Practice and Experience, Wiley, vol.
19, no.7, pp. 965-982, 2007.
[8] J. Chen and Y. Yang, "Activity Completion Duration Based Checkpoint Selection for Dynamic Verification
of Temporal Constraints in Grid Workflow Systems," Int. J. High Perform. Comput. Appl., vol. 22, no.3, pp.
319-329, 2008.
[9] J. Chen and Y. Yang, "A Taxonomy of Grid Workflow Verification and Validation," Concurrency and
Computation: Practice and Experience, vol. 20, no.4, pp. 347-360, 2008.
[10] J. Chen and Y. Yang, "Temporal Dependency based Checkpoint Selection for Dynamic Verification of Fixed-
time Constraints in Grid Workflow Systems," Proc. 30th International Conference on Software Engineering
(ICSE2008), Leipzig, Germany, 2008, pp. 141-150.
[11] J. Chen and Y. Yang, "Temporal Dependency based Checkpoint Selection for Dynamic Verification of
Temporal Constraints in Scientific Workflow Systems " ACM Transactions on Software Engineering and
Methodology, to appear, 2009. Available: http://www.swinflow.org/papers/TOSEM.pdf.
[12] J. Chen and Y. Yang, "Adaptive Selection of Necessary and Sufficient Checkpoints for Dynamic Verification
of Temporal Constraints in Grid Workflow Systems," ACM Trans. on Autonomous and Adaptive Systems, vol.
2, no.2, article 6, 2007.
[13] K. Cooper, A. Dasgupta, K. Kennedy, C. Koelbel, and A. Mandal, "New Grid Scheduling and Rescheduling
Methods in the GrADS Project," Proc. 18th International Parallel and Distributed Processing Symposium
(IPDPS 2004), Santa Fe, New Mexico, Apri. 2004, pp. 199-206.
[14] DAGMan. 1st Sep. 2009). Condor Project. Available: http://www.cs.wisc.edu/condor/
[15] E. Deelman, D. Gannon, M. Shields, and I. Taylor, "Workflows and e-Science: An Overview of Workflow
System Features and Capabilities," Future Generation Computer Systems, vol. 25, no. 6, pp. 528-540, 2008.
[16] J. Eder, E. Panagos, and M. Rabinovich, "Time Constraints in Workflow Systems," in 11th International
Conference on Advanced Information Systems Engineering (CAiSE99), Heidelberg, Germany, 1999, pp. 286-
300.
[17] I. Foster and C. Kesselman, The Grid: Blueprint for a New Computing Infrastructure (Second Edition):
Morgan Kaufmann, 2004.
[18] S. Y. Hwang, H. Wang, J. Tang, and J. Srivastava, "A Probabilistic Approach to Modeling and Estimating the
QoS of Web-Services-based Workflows," Information Sciences, vol. 177, no.23, pp. 5484-5503, Dec 2007.
[19] M. Jørgensen and M. Shepperd, "Systematic Review of Software Development Cost Estimation Studies,"
IEEE Transaction on Software Engineering, vol. 33(1), pp. 33-53, 2007.
[20] B. Kao and H. Garcia-Molina, "Deadline Assignment in a Distributed Soft Real-Time System," IEEE Trans.
Parallel Distrib. Syst., vol. 8, no.12, pp. 1268-1274, 1997.
[21] A. M. Law and W. D. Kelton, Simulation Modelling and Analysis (Fourth Edition): McGraw-Hill, 2007.
[22] Z. Liangzhao, B.Benatallah, A. H. H.Ngu, M.Dumas, J.Kalagnanam, and H.Chang, "QoS-Aware Middleware
for Web Services Composition," IEEE Trans. on Software Engineering, vol. 30, no.5, pp. 311-327, 2004.
[23] SAP Library. Workflow System Administration. Available: http://help.sap.com/saphelp_nw2004s/helpdata/en,
accessed on 1st Sep. 2009.
[24] X. Liu, J. Chen, K. Liu, and Y. Yang, "Forecasting Duration Intervals of Scientific Workflow Activities based
on Time-Series Patterns," Proc. 4th IEEE International Conference on e-Science (e-Science08), Indianapolis,
USA, Dec. 2008, pp. 23-30.
[25] X. Liu, J. Chen, and Y. Yang, "A Probabilistic Strategy for Setting Temporal Constraints in Scientific
Workflows," in Proc. 6th International Conference on Business Process Management (BPM2008), Milan,
Italy, 2008, pp. 180-195.
[26] X. Liu, Y. Yang, J. Chen, Q. Wang, and M. Li, "Achieving On-Time Delivery: A Two-Stage Probabilistic
Scheduling Strategy for Software Projects," Proc. International Conference on Software Process: Trustworthy
Software Development Processes, Vancouver, B. C., Canada, 2009.
[27] A. Martinez, F. J. Alfaro, J. L. Sanchez, F. J. Quiles, and J. Duato, "A New Cost-Effective Technique for QoS
Support in Clusters," IEEE Trans. on Parallel and Distributed Systems, vol. 18, no.12, pp. 1714-1726, 2007.
- 25 -
[28] M. Miller, Cloud Computing: Web-Based Applications That Change the Way You Work and Collaborate
Online: Que, Aug. 2008.
[29] S. Nadarajah and S. Kotz, "Exact distribution of the max/min of two Gaussian random variables," IEEE
Trans. Very Large Scale Integr. Syst., vol. 16, no.2, pp. 210-212, 2008.
[30] R. Prodan and T. Fahringer, "Overhead Analysis of Scientific Workflows in Grid Environments," IEEE Trans.
on Parallel and Distributed Systems, vol. 19, no.3, pp. 378-393, Mar. 2008.
[31] ASKALON Project. Available: http://www.dps.uibk.ac.at/projects/askalon, accessed on 1st Sep. 2009.
[32] CROWN Project. CROWN portal. Available: http://www.crown.org.cn/en/, accessed on 1st Sep. 2009.
[33] GridBus Project. Available: http://www.gridbus.org, accessed on 1st Sep. 2009.
[34] JOpera Project. Available: http://www.iks.ethz.ch/jopera, accessed on 1st Sep. 2009.
[35] Kepler Project. Available: http://kepler-project.org/, accessed on 1st Sep. 2009.
[36] Taverna Project. Available: http://www.mygrid.org.uk/tools/taverna/, accessed on 1st Sep. 2009.
[37] Triana Project. Available: http://www.trianacode.org/, accessed on 1st Sep. 2009.
[38] UNICORE Project. Available: http://www.unicore.eu/, accessed on 1st Sep. 2009.
[39] N. Russell, W. M. P. v. d. Aalst, and A. H. M. t. Hofstede, "Exception Handling Patterns in Process-Aware
Information Systems," BPMcenter.org2006.
[40] N. Russell, W. M. P. v. d. Aalst, and A. H. M. t. Hofstede, "Workflow Exception Patterns," Proc. 18th
International Conference on Advanced Information Systems Engineering (CAiSE'06), Berlin, Gemany, 2006,
pp. 288-302.
[41] J. H. Son, J. Sun Kim, and M. Ho Kim, "Extracting the workflow critical path from the extended well-formed
workflow schema," Journal of Computer and System Sciences, vol. 70, no.1, pp. 86-106, 2005.
[42] K. A. Stroud, Engineering Mathematics (Sixth Edition). New York: Palgrave Macmillan, 2007.
[43] I. J. Taylor, E. Deelman, D. B. Gannon, and M. Shields, Workflows for e-Science: Scientific Workflows for
Grids: Springer, 2007.
[44] M. Wang, R. Kotagiri, and J. Chen, "Trust-based Robust Scheduling and Runtime Adaptation of Scientific
Workflow," Concurrency and Computation: Practice and Experience, vol. 21, no.16, pp. 1982-1998, 2009.
[45] C. Wei-Neng and Z. Jun, "An Ant Colony Optimization Approach to a Grid Workflow Scheduling Problem
With Various QoS Requirements," IEEE Transactions on Systems, Man, and Cybernetics, Part C:
Applications and Reviews, vol. 39, no.1, pp. 29-43, 2009.
[46] M. Wieczorek, R. Prodan, and A. Hoheisel, "Taxonomies of the Multi-criteria GridWorkflow Scheduling
Problem," CoreGRID Technical Report Number TR-0106, August 30, 2007.
[47] Y. Yang, K. Liu, J. Chen, J. Lignier, and H. Jin, "Peer-to-Peer Based Grid Workflow Runtime Environment of
SwinDeW-G," Proc. 3rd International Conference on e-Science and Grid Computing (e-Science07),
Bangalore, India, Dec. 2007, pp. 51-58.
[48] J. Yu and R. Buyya, "A Taxonomy of Scientific Workflow Systems for Grid Computing," SIGMOD Rec., vol.
34, no.3, pp. 44-49, 2005.
[49] J. Yu and R. Buyya, "A Taxonomy of Workflow Management Systems for Grid Computing," Journal of Grid
Computing, no.3, pp. 171-200, 2005.
[50] J. Yu and R. Buyya, "Scheduling Scientific Workflow Applications with Deadline and Budget Constraints
Using Genetic Algorithms," Scientific Programming, vol. 14, no.3,4, pp. 217-230, Dec. 2006.
[51] J. Yu and R. Buyya, "Workflow Scheduling Algorithms for Grid Computing," Computing and Distributed
Systems Laboratory, The University of Melbourne, AustraliaMay 31, 2007.
[52] Z. Yu and W. Shi, "An Adaptive Rescheduling Strategy for Grid Workflow Applications," Proc. 2007 IEEE
International Symposium on Parallel and Distributed Processing (IPDPS 2007), Long Beach, California
USA, Mar. 2007, pp. 115-122.
[53] H. Zhuge, T. Cheung, and H. Pung, "A Timed Workflow Process Model, Journal of Systems and Software,"
vol. 55, no.3, pp. 231-243, 2001.