stochastic dag scheduling using monte carlo approach heterogeneous computing workshop (at ipdps)...
TRANSCRIPT
Stochastic DAG Scheduling using
Monte Carlo ApproachHeterogeneous Computing Workshop (at IPDPS) 2012
Extended version: Elsevier JPDC (accepted July 2013, in Press)
Wei ZhengDepartment of Computer Science, Xiamen University, Xiamen, China
Rizos SakellariouSchoolofComputerScience,TheUniversityofManchester,UK
Previous Presentation (9/06/13)
• Research Area: Scheduling workflows under heterogeneous environment with variable performance.
DAG Scheduling
Static (full-ahead) Just In time Dynamic Rescheduling (runtime)
This Presentation
DAG Scheduling
Static (full-ahead) Just In time Dynamic Rescheduling (runtime)
Introduction
• General DAG Scheduling assumption:• Estimated Execution time for each task is known in advance.
• Several techniques of estimation: e.g. average over several runs• Similarly, estimated data transfer time is known in advance.
• A study* has shown, there might be significant deviations in observed performance in Grids.• To address this deviations, Two approaches are prevalent• Just-In-Time (high overhead)• RunTime (static schedule + runtime changes) (hypothesis**: might waste
resources and increase makespan if static schedule is not very good) • * A. Lastovetsky, J. Twamley, Towards a realistic performance model for networks of heterogeneous computers, in:M.Ng,A.Doncescu,L.Yang,T.Leng (Eds.), High
Performance Computational Science and Engineering, in: IFIP InternationalFederationforInformationProcessing,vol.172,Springer,Boston, 2005,pp.39–57. • ** R.Sakellariou,H.Zhao,A low-cost rescheduling policy for efficient mapping of workflows on grid systems, Sci. Program. 12(4) (2004) 253–262
Problem Addressed
• Generating a better (minimize makespan) “Static” schedule based on the stochastic model of the variations in the performance (execution time) of individual tasks in the graph.
Background and Related Work
• Heterogeneous Earliest Finish Time heuristic (discussed in the previous presentation)• List based scheduling.• Prioritize tasks based on the “bLevel” (essentially, tasks on the critical path get
higher priority)
• Once task is chosen, map it to “best” available resource.
bLevel(i) = wi + max j Succ(i)wi→j +bLevel(j)∈
Problem Description
• G = (N, E) -> DAG with one entry, one exit node.• R -> set of heterogeneous resources• Et
i,p -> Random variable for execution time
• Assumption: Network bandwidth is constant.• M -> Makespan = finish time of exit node.
Goal: Find schedule Ω to minimize makespan (assign N to R, no overlap, no preemption, no migration)
Methodology
• Assumption: Analytical methods that solve the probabilistic optimization problem are too expensive.• Use Monte Carlo Sampling (MCS) method.
• Define a space comprising possible input values• IG =ETi,p :i N,p R.∈ ∈
• Take an independent sample randomly from the space• PG =fsmp(IG) =ti,p :i N,p R∈ ∈
• Perform deterministic computation using the sample input (store the result)• ΩG =Static_SchedulingHEFT(G,PG)
• Repeat 2 and 3 till some exit condition (no. of repetitions)• Aggregate the stored results of the individual computations into the final result.
MCS Based SchedulingComplexity:• Depends on the deterministic
scheduling algorithm• For HEFT it is O(v + e * r) = O(e*r)• First loop: O(e*r*m)• Second loop: O(e * n * k)• Total = O(e*r*m + e*n*k)
Example
Example
10,000 iterations - production phase (Gaussian Distribution)
200 iterations - selection phase
20% reduction in makespan
Absolute increase in algorithm time: 1.2s
Evaluation
• Graphs
Threshold Calculation
Convergence (no. of repetitions)
Convergence
Makespan performance evaluation
• Static HEFT (baseline) with Mean ET values• Autopsy – Static HEFT With known ET values• MCS - Static• ReStatic • ReMCS
• Graph Generation (random generator of given type)• Task Execution Time for different runs
• Select “Mean” for each task.• Use a probability distribution to select actual execution time. The variation is bounded by Quality
of Estimation (QoE) (0<QoE<1)
Makespan performance evaluation
Summary
• It is possible to obtain a good full-ahead static schedule that performs well under prediction inaccuracy, without too much overhead.• MCS, which has a more robust procedure for selecting an initial
schedule, generally results in better performance when rescheduling is applied