contention-aware reliability efficient scheduling on ...lik/publications/longxin-zhang... · and...

13
Contention-Aware Reliability Efficient Scheduling on Heterogeneous Computing Systems Longxin Zhang , Kenli Li , Senior Member, IEEE, Weihua Zheng, and Kenqin Li , Fellow, IEEE Abstract—Energy efficiency and system reliability are the two main measurements in modern high-performance computing. The majority of previous recent studies have focused on realizing parallel task scheduling with low energy consumption or fast execution time. These approaches were developed with the classic scheduling model. However, the contention model is gaining increasing recognition as a more practical tool to create accurate and efficient schedules. This study proposes a contention-aware reliability management with deadline and energy budget constraints (CARMEB) algorithm for parallel task scheduling in heterogeneous computing systems. CARMEB involves three phases, namely, task priority calculation, communication edge allocation, and slack reclaiming. Results are validated by conducting extensive experiments, including randomly generated task graphs and three types of task graphs in real-world applications. This study demonstrates that our algorithm significantly improves system reliability. Index Terms—Contention-aware, DVFS, parallel tasks, reliability, task scheduling Ç 1 INTRODUCTION P OWER dissipation has been a major challenge for pro- moting green computing in recent decades. Modern high-performance processors, even idle ones, consume large amounts of energy. For instance, the Intel Core i7-975 3.33 GHz 1 MB L2, 8 MB L3 consumes 83 W when idle, and its peak power consumption reaches 210 W [1]. Moreover, the trend of scaling transistors and operating voltages has markedly increased the susceptibility of processors to faults. For example, the soft-error failure rate at 16 nm is expected to be 100 times worse at 180 nm [2]. Energy efficiency is a popular topic in research. Dynamic voltage and frequency scaling (DVFS) is one of the most effi- cient techniques among energy-aware strategies for task scheduling. Lee and Zomaya [3] presented two energy- conscious scheduling algorithms that utilize DVFS to balance the completion time and energy consumption of the applica- tion. Zhang et al. [4] presented a novel reliability maximiza- tion strategy with an energy constraint algorithm. Zong et al. [5] developed energy-aware and duplication-based schedul- ing algorithms to improve the performance and energy efficiency of task scheduling in parallel systems. Singh et al. [6] designed a contention-aware, energy-efficient duplica- tion (FastCEED) algorithm based on the mixed integer programming formulation for parallel task scheduling on heterogeneous multiprocessors. Tang et al. [7] established a reliable hierarchical reliability-driven scheduling (HRDS) algorithm with a short makespan (schedule length) for grid computing systems. Dogan et al. [8] introduced a reliable dynamic level scheduling (RDLS) algorithm that minimizes the schedule length. However, these approaches are based on the classic model, in which communication resources are free of contention. These approaches thus idealize target par- allel systems heavily. The classic model is unrealistic and therefore incapable of producing accurate task scheduling. Avoiding or decreasing inter-processor communication under a contention model is a more realistic strategy. The imperfect transient fault rates of DVFS require fur- ther study. Joint optimization schemes of energy efficiency and system reliability were implemented on directed acyclic graphs (DAGs) in [9] by adopting the shared recovery tech- nique. This technique attained high system reliability and noticeable energy preservation. A novel bi-objective genetic algorithm was proposed in [10] to realize low energy con- sumption and high system reliability for workflow schedul- ing. Two novel speculative strategies were developed in [11], to improve the estimation of the remaining time for a task; these strategies used a linear relationship model to estimate the remaining time and assess the extensional maximum cost performance. Li [12] developed four key techniques to address the energy- and time-constrained scheduling of independent or precedence-constrained tasks on multiple heterogeneous computers. Benoit et al. [13] uti- lized an active replication scheme to establish two efficient fault-tolerant algorithms, namely, the contention-aware L. Zhang is with the College of Computer Science, Hunan University of Technology, Zhuzhou 412007, China. E-mail: [email protected]. K. Li is with the College of Computer Science and Electronic Engineering, Hunan University, and the National Supercomputing Center, Changsha, Hunan 410082, China. E-mail: [email protected]. W. Zheng is with the College of Electrical and Information Engineering, Hunan University of Technology, Zhuzhou 412007, China. E-mail: [email protected]. K. Li is with the Department of Computer Science, State University of New York, New Paltz, NY 12561, and the College of Computer Science and Electronic Engineering and National Supercomputing Center, Hunan University, Changsha, Hunan 410082, China. E-mail: [email protected]. Manuscript received 30 Dec. 2016; revised 14 June 2017; accepted 10 Aug. 2017. Date of publication 23 Aug. 2017; date of current version 6 Sept. 2018. (Corresponding author: Kenli Li.) Recommended for acceptance by D. Zhu, M. Shafique, M. Lin, and S. Pasricha. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TSUSC.2017.2743499 182 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018 2377-3782 ß 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See ht_tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Upload: others

Post on 18-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Contention-Aware Reliability Efficient Scheduling on ...lik/publications/Longxin-Zhang... · and frequency island (VFI) multicore systems with DVFS capability. The said authors examined

Contention-Aware Reliability Efficient Schedulingon Heterogeneous Computing Systems

Longxin Zhang , Kenli Li , Senior Member, IEEE, Weihua Zheng, and Kenqin Li , Fellow, IEEE

Abstract—Energy efficiency and system reliability are the two main measurements in modern high-performance computing. The

majority of previous recent studies have focused on realizing parallel task scheduling with low energy consumption or fast execution

time. These approaches were developed with the classic scheduling model. However, the contention model is gaining increasing

recognition as a more practical tool to create accurate and efficient schedules. This study proposes a contention-aware reliability

management with deadline and energy budget constraints (CARMEB) algorithm for parallel task scheduling in heterogeneous

computing systems. CARMEB involves three phases, namely, task priority calculation, communication edge allocation, and slack

reclaiming. Results are validated by conducting extensive experiments, including randomly generated task graphs and three types

of task graphs in real-world applications. This study demonstrates that our algorithm significantly improves system reliability.

Index Terms—Contention-aware, DVFS, parallel tasks, reliability, task scheduling

Ç

1 INTRODUCTION

POWER dissipation has been a major challenge for pro-moting green computing in recent decades. Modern

high-performance processors, even idle ones, consume largeamounts of energy. For instance, the Intel Core i7-975 3.33GHz 1 MB L2, 8 MB L3 consumes 83 W when idle, and itspeak power consumption reaches 210 W [1]. Moreover, thetrend of scaling transistors and operating voltages hasmarkedly increased the susceptibility of processors to faults.For example, the soft-error failure rate at 16 nm is expectedto be 100 times worse at 180 nm [2].

Energy efficiency is a popular topic in research. Dynamicvoltage and frequency scaling (DVFS) is one of the most effi-cient techniques among energy-aware strategies for taskscheduling. Lee and Zomaya [3] presented two energy-conscious scheduling algorithms that utilizeDVFS to balancethe completion time and energy consumption of the applica-tion. Zhang et al. [4] presented a novel reliability maximiza-tion strategy with an energy constraint algorithm. Zong et al.

[5] developed energy-aware and duplication-based schedul-ing algorithms to improve the performance and energyefficiency of task scheduling in parallel systems. Singh et al.[6] designed a contention-aware, energy-efficient duplica-tion (FastCEED) algorithm based on the mixed integerprogramming formulation for parallel task scheduling onheterogeneous multiprocessors. Tang et al. [7] established areliable hierarchical reliability-driven scheduling (HRDS)algorithm with a short makespan (schedule length) for gridcomputing systems. Dogan et al. [8] introduced a reliabledynamic level scheduling (RDLS) algorithm that minimizesthe schedule length. However, these approaches are basedon the classic model, in which communication resources arefree of contention. These approaches thus idealize target par-allel systems heavily. The classic model is unrealistic andtherefore incapable of producing accurate task scheduling.Avoiding or decreasing inter-processor communicationunder a contentionmodel is amore realistic strategy.

The imperfect transient fault rates of DVFS require fur-ther study. Joint optimization schemes of energy efficiencyand system reliability were implemented on directed acyclicgraphs (DAGs) in [9] by adopting the shared recovery tech-nique. This technique attained high system reliability andnoticeable energy preservation. A novel bi-objective geneticalgorithm was proposed in [10] to realize low energy con-sumption and high system reliability for workflow schedul-ing. Two novel speculative strategies were developed in[11], to improve the estimation of the remaining time for atask; these strategies used a linear relationship model toestimate the remaining time and assess the extensionalmaximum cost performance. Li [12] developed four keytechniques to address the energy- and time-constrainedscheduling of independent or precedence-constrained taskson multiple heterogeneous computers. Benoit et al. [13] uti-lized an active replication scheme to establish two efficientfault-tolerant algorithms, namely, the contention-aware

� L. Zhang is with the College of Computer Science, Hunan University ofTechnology, Zhuzhou 412007, China. E-mail: [email protected].

� K. Li is with the College of Computer Science and Electronic Engineering,Hunan University, and the National Supercomputing Center, Changsha,Hunan 410082, China. E-mail: [email protected].

� W. Zheng is with the College of Electrical and Information Engineering,Hunan University of Technology, Zhuzhou 412007, China.E-mail: [email protected].

� K. Li is with the Department of Computer Science, State University ofNew York, New Paltz, NY 12561, and the College of Computer Scienceand Electronic Engineering and National Supercomputing Center, HunanUniversity, Changsha, Hunan 410082, China.E-mail: [email protected].

Manuscript received 30 Dec. 2016; revised 14 June 2017; accepted 10 Aug.2017. Date of publication 23 Aug. 2017; date of current version 6 Sept. 2018.(Corresponding author: Kenli Li.)Recommended for acceptance by D. Zhu, M. Shafique, M. Lin, and S. Pasricha.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TSUSC.2017.2743499

182 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018

2377-3782� 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See ht _tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: Contention-Aware Reliability Efficient Scheduling on ...lik/publications/Longxin-Zhang... · and frequency island (VFI) multicore systems with DVFS capability. The said authors examined

fault-tolerant (CAFT) and IsoLevel CAFT, for heteroge-neous systems. Bozda�g et al. [14] proposed a novel schedulecompaction algorithm that fits the schedule on fewer pro-cessors without increasing the schedule length.

Most previous studies on the energy and reliability man-agement of heterogeneous computing systems (HCSs) arebased on communication subsystems that are free of conten-tion. However, communication contention exists in almostall HCSs. Sinnen and Sousa [15] proved that schedulingcommunication applications without contention awarenessis inaccurate. The present study employs a communicationcontention model with an energy budget on HCSs to maxi-mize the system reliability of parallel tasks.

The major contributions of this study are summarized asfollows:

� We analyze the necessity of considering communica-tion contention in scheduling parallel tasks.

� We formulate that parallel task scheduling and com-munication resource allocation is a combinationoptional problem in HCSs.

� We propose an efficient reliability management solu-tion for parallel task scheduling in HCSs with con-tent communication resources.

� We evaluate the performance of the proposed algo-rithm under extensive randomly generated DAGsand three classic real-world applications.

The remainder of this paper is organized as follows.Section 2 reviews the existing studies on reliability manage-ment and energy saving for parallel tasks in heterogeneoussystems. Section 3 describes the different models adopted inthis study. Section 4 states the preliminaries of this study.Section 5 presents a novel contention-aware reliability man-agement parallel algorithm. Section 6 discusses the experi-ments performed in this study. Finally, the conclusions arepresented in Section 7.

2 RELATED WORK

Lowering the energy consumption of processors hasbecome an urgent requirement because of the risingdemand for high-performance and energy-efficient comput-ing systems. An extensive body of literature on energy-efficient computing is available.

As one of the most popular heuristic algorithms, heteroge-neous earliest finish time [16] was sought for the schedulinglength. Munir et al. [17] outlined the typical requirements ofembedded applications and discussed state-of-the-art hard-ware/software high-performance energy-efficient embeddedcomputing techniques that could satisfy the ever-increasingdemand for high-performance embedded computing in anenergy-efficient manner. Han et al. [18] focused on voltageand frequency island (VFI) multicore systems with DVFScapability. The said authors examined both static and dyna-mic synchronization-aware energy management schemes fora set of periodic real-time tasks that accessed shared resour-ces. Zhu et al. [19] explored the scheduling and allocation ofindependent tasks on HCSs to obtain reasonable trade-offsbetween user expectations and energy efficiency. The saidresearchers improved the flexibility of the HCS through anenergy-efficient elastic scheduling strategy that adaptivelyadjusted the supply voltages of new and queued tasks

according to the system workload. A new communication-aware DAG model for cloud computing applications wasdiscussed in [20]. This model used integrated communicationawareness to overcome the shortcomings of existing appr-oaches. Han et al. [21] studied static and dynamic contention-aware energy management schemes to improve the energyperformance of tasks set with precedence relationships and acommon deadline. A set of new online slack managementpolicies was established to slow down tasks and edges con-currently while it satisfies the deadline. These policiesincluded calculating the available slack times for edges alongtheir routes and efficiently determining the lowest feasiblelink speed of a VFI.

Chen [22] proposed a heterogeneous allotment-awarescheduling algorithm that considered the expected make-span for scheduling tasks in heterogeneous distributed sys-tems. Qiu and P�erez [23] evaluated the ability of concurrentreplication with canceling to improve the reliability andresponse times of parallel jobs subject to failure. Dynamicvoltage scaling (DVS) [24] and duplication [25] have beenrecognized as useful techniques for energy-efficient sched-uling. DVS has been widely applied in energy-efficientscheduling for both multi-core processors [26] and on acluster of processors. Similarly, duplication in schedulinghas received considerable interest [27]. Xu et al. [28] pre-sented a hybrid chemical reaction optimization scheme fortask scheduling on HCSs.

3 MODELS

This section describes the target HCS, application model,power model, and reliability model utilized in this study.

3.1 Application Model

A target HCS with communication contention can be repre-sented by MHG ¼ HG ¼ ðPE;LÞ, where PE comprises a setof DVFS-enabled processors, and L is the communicationnetwork with contention. This dedicated system has the fol-lowing characteristics: (1) No time cost exists for local com-munication; and (2) communication is completed in acommunication subsystem.

A parallel application that involves a set of precedence-constrained tasks can be described as a DAG, specifically atwo-tuple DAG, G ¼< T;Q > , where T is the parallel taskset, and Q comprises the edges that indicate the precedenceconstraints among tasks. For convenience, the set of directpredecessors of tiðti 2 T Þ is denoted by parentðtiÞ, namely,parentðtiÞ ¼ f8tp 2 T jepi 2 Qg. All direct successors of ti arerepresented by childðtiÞ [i.e., childðtiÞ ¼ f8tc 2 T jeic 2 Qg].For a given node ti in DAG, if no predecessor exists for ti[i.e., parentðtiÞ ¼ ? ], then ti is named as an entry task. Simi-larly, if ti does not have a successor [i.e., childðtiÞ ¼ ? ], thenti is called an exit task. Fig. 1 shows a simple DAG. The nota-tions employed in this study are summarized in Table 1.

3.2 Communication Contention Model

The communication system in the traditional model is free ofcontention. Every processor can send/receive messages to/from another processor at any time under this environment.However, this model type is unrealistic. The processors inthis study are connected to a central switch as depicted in

ZHANG ET AL.: CONTENTION-AWARE RELIABILITY EFFICIENT SCHEDULING ON HETEROGENEOUS COMPUTING SYSTEMS 183

Page 3: Contention-Aware Reliability Efficient Scheduling on ...lik/publications/Longxin-Zhang... · and frequency island (VFI) multicore systems with DVFS capability. The said authors examined

Fig. 2, which shows that the target communication systemmodel is a star network. The two-tuple on switches and linksdenotes their unit busy and idle energy consumptions [6].Notably, all three links are of unit bandwidth. If remote com-munication occurs (e.g., ti is assigned to pe1 where its imme-diate parent is assigned to pe0), then the edge between tiand its immediate parent is scheduled on links L0 and L1

(route from pe0 to pe1).

3.3 Energy Model

The classic energy model of heterogeneous systems in [29] isutilized in this study. The system power consumption is cal-culated by the following expression:

P ¼ Ps þ �hðPind þ PdÞ ¼ Ps þ �hðPind þ CefffaÞ; (1)

where Ps is the static power consumption, Pind denotes theactive power that is independent of CPU frequency, and Pd

is the dynamic power that depends on the CPU frequency.Ps is a static term for the power that keeps the basic circuitrunning and memories remaining in sleep status. This vari-able can be disregarded when the entire system is switchedoff. Pind is a constant that occurs when the system accessesexternal devices, such as the main memory, RAM, and I/O.Pd is the dominant part of the entire system energy con-sumption. Pd can be obtained by Pd ¼ Ceff � V 2

dd � f as shownin [30], where Ceff is the effective load capacitance, Vdd isthe CPU supply voltage, and f is the CPU frequency.

Vdd / f1=g according to [30], where 0 < g < 1. Thus,the dynamic energy consumption is Pd / fa, where a ¼ 1 þ2=g � 3. When all frequencies are formalized over the maxi-mum frequency fmax, the energy consumption of task ti canbe derived as follows:

EiðfiÞ ¼ Pindi �cifiþ Ceff � ci � f2

i ; (2)

where Pindi is the frequency-independent dynamic powerwhile performing task ti, and ci is the computation cost thatcorresponds to the executing frequency fi. In this study,contention appears among the links when communicationoccurs among the edges. And the energy consumption ofthe links is of unit bandwidth [6]. The communicationenergy consumption is derived as follows:

ELi;j ¼ Es

i þ eij � fmax

¼ Esi þ eij;

(3)

Fig. 1. Simple DAG.

TABLE 1Notations Used in This Study

Notation Definition

HCS The heterogeneous computing systemPE A set of processing elementsV A set of supply voltagesF A set of supply frequenciesLi A link, transfers data between the processors

and the switchwi;j The task ti 2 T executed on processor

pej 2 PEci;j The communication cost between node ti and

node tj�wi The average computational time of a task

when executed on different processorschildðtiÞ The set of immediate successors of task tiparentðtiÞ The set of immediate predecessors of task tiEST ðti; pejÞ The earliest execution start time of task ti on

processor pejEL

ij The communication energy consumption frompi to pj

Etotal The total energy consumption of processorswhile performing all tasks in a task set

EFT ðti; pejÞ The earliest execution finish time of task ti onprocessor pej

RiðfiÞ The probability of the task ti when it executessuccessfully

tfðti; pesrcÞ The finish time of task ti on processor pesrctdrðti; pekÞ The earliest time of node ti can release on pro-

cessor pkAFLs Available frequency levelsDAG Directed Acyclic GraphDVFS Dynamic Voltage Frequency ScalingCCR Communication to Computation RatioSLR Scheduling Length RatioECR Energy Consumption RatioPOF Probability of FailureRDLS-CA The Reliable Dynamic Level Scheduling algo-

rithm under the contention modelHRDS-CA The Hierarchical Reliability-Driven Schedul-

ing algorithm under the contention modelCARMEB The contention-aware reliability management

with energy budget algorithmSn The total number of nodes of a special DAG

Fig. 2. Link model.

184 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018

Page 4: Contention-Aware Reliability Efficient Scheduling on ...lik/publications/Longxin-Zhang... · and frequency island (VFI) multicore systems with DVFS capability. The said authors examined

where ELij is the communication energy consumption from

pi to pj, Esi is the start communication cost of link Li, and eij

is the communication volume. The speed of the communica-tion system in this study is at the maximum frequency.Hence, the total energy consumption Etotal can be estimatedwhile every task is performed in a task set by the following:

Etotal ¼Xni¼1

EiðfiÞ þXni¼1

Xnj¼1

ELij þ ES; (4)

where ES is the switch energy consumption.The processors in the HCS investigated in this study are

DVFS-enabled and have different computation capabilities.Only the computational energy is considered for simplicity(i.e., the first item in Eq. (4)). Table 2 lists the voltage-frequency pairs of the heterogeneous processors.

3.4 Reliability Model

Faults occur while processors perform tasks due to differentunavoidable reasons, such as software errors, hardware fail-ures, and ambient electromagnetic interference. Transientfaults occur much more frequently than permanent faults asmentioned in [31]. Thus, the present study focuses on tran-sient faults.

A Poisson distribution with an average rate � is generallyadopted to capture each transient fault [32]. Previous stud-ies show that transient faults occur separately during thetask execution. Moreover, most energy-efficient approachesare based on the DVFS technique, which aggravates tran-sient faults. The average rate of transient faults can beobtained from the computing frequency f as follows:

�ðfÞ ¼ �0 � gðfÞ; (5)

where �0 is the average transient rate at the maximumcomputing frequency fmax (fmax = 1 for the supply voltage

Vmax), gðfÞ ¼ 10dð1�fÞ1�fmin (d, which represents the dependency

of fault rate on the frequency and voltage scaling, is a con-

stant larger than zero.). The fault rate increases exponen-

tially as the CPU frequency decreases to save more energy.

Therefore, �ðfÞ is a strictly decreasing function.For convenience, the reliability of a task RiðfiÞ is defined

as the probability of the task ti when it is executed success-fully. Thus, we generate the following expression:

RiðfiÞ ¼ e��ðfiÞ�cifi ; (6)

where ci is the computation cost of ti, and fi is the corre-sponding CPU frequency.

The system reliability Rsys indicates whether all of the ntasks in the task set are executed successfully without anytransient faults. The expression can be described as follows:

Rsys ¼ Pni¼1RiðfiÞ: (7)

4 PRELIMINARIES

The topology of a communication network is modeled as agraphNG ¼ ðT; PE;Q;LÞ.Definition 1 (Edge Finish Time). Let L ¼<L1; L2; . . . ; Ll >

be the communication route of eij 2 E, which sends data frompesrc to pedst. The finish time of eij is as follows:

tfðeij; pesrc; pedstÞ ¼ tfðti; pesrcÞ if pesrc ¼ pedsttfðeij; LlÞ otherwise;

�(8)

where tfðti; pesrcÞ is the finish time of task ti on processorpesrc. Thus, tfðeij; pesrc; pedstÞ is the finish time of eij on thelast link of the route, which is referred to as Ll. Unless pesrcand pedst are the same processor, no communication occurs.In particular, the communication is local.

In the contention-aware model, the earliest time of nodeti that can be released on processor pk is defined as thedata-ready time tdr, which can be expressed as follows:

tdrðti; pekÞ ¼ maxeij2Q;ti2perentðtjÞ

ntfðeij; peðtiÞ; pekÞ

o; (9)

where peðtiÞ refers to the processor allocation of node ti,peðtiÞ 2 PE.

Definition 2. The immediate neighboring frequency fin of fee(energy efficient frequency) on processor pej is the selected fre-quency to save more energy.

Given that fee ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPind=ðm� 1ÞCeff

3p

and because m is aconstant (with a typical value ofm = 3), fee depends only onPind [29].

4.1 Task Priority

A feasible and efficient prior order is established in thisstudy to comply with the precedence constraint of paralleltasks. A non-increasing up rank URankðtiÞ similar to that in[4] is employed to recursively compute each task in a paral-lel task set. The calculation of URankðtiÞ is expressed asfollows:

URankðtiÞ ¼ wi þ maxtj2childðtiÞ

nci;j þ URankðtjÞ

o; (10)

where tj is the immediate successors of task ti, and ci;j is thecommunication cost between nodes ti and tj. URank isrecursively computed for each task in a parallel task set.The exit task URank is apparently 0. URank has been provenin [4] to be more effective than topological sorting.

Table 3 lists the computation costs for each node on dif-ferent processors. The last column of Table 3 indicates thecorresponding URank value for each node in Fig. 1.

TABLE 2Voltage-Relative Frequency Pairs

Level Pair 1 Pair 2 Pair 3

vol. fre. vol. fre. vol. fre.

0 1.75 1.0 1.5 1.0 2.2 1.001 1.50 0.8 1.4 0.9 1.9 0.852 1.40 0.7 1.3 0.8 1.6 0.653 1.20 0.6 1.2 0.7 1.3 0.504 1.00 0.5 1.1 0.6 1.0 0.355 0.90 0.4 1.0 0.56 0.9 0.4

ZHANG ET AL.: CONTENTION-AWARE RELIABILITY EFFICIENT SCHEDULING ON HETEROGENEOUS COMPUTING SYSTEMS 185

Page 5: Contention-Aware Reliability Efficient Scheduling on ...lik/publications/Longxin-Zhang... · and frequency island (VFI) multicore systems with DVFS capability. The said authors examined

4.2 Problem Description

The current study problem is selecting an appropriate fre-quency for each task (in a specific priority order) given (a) aparallel task application, (b) processors in an HCS that sup-port the DVFS technique, and (c) contention in the commu-nication system among these processors. The systemreliability should also be maximized simultaneously with-out exceeding the given deadline D� and energy budget E�

constraints. This frequency should then be assigned to anavailable processor. The scheduling problem for paralleltasks in HCSs with communication contention is formalizedas follows:

Maximize: Rsys;

subject to: Dtotal � D�; (11)

Etotal � E�: (12)

4.3 Motivational Example

A motivational example is presented in this section. Thescheduling of a task graph in Fig. 1 is illustrated in Fig. 3.Fig. 3a shows the task scheduling in the computing systemsfree of contention. Fig. 3b captures the task scheduling in acommunication contention system. e01 is scheduled beforethe release of t1 as shown in Fig. 3b. As its immediate parentt0 is assigned to pe0, t1 is scheduled to pe2. Remote commu-nication then occurs. The data should be transferred frompe0 to pe2. The route path is pe0 ! L0 ! switch! L2 ! pe2.Thus, e01 is scheduled in L0 and L2. t2 is ready to release

once it receives these data. t6 waits for the data from itsimmediate parent t1 before its release. However, e16 isdelayed in schedule, because e14 has occupied the commu-nication link e12. The deadline and energy budget for Fig. 1are 51 and 95, respectively. Under this scheme, the totalenergy consumption of the processors is 89.086, with a prob-ability of failure (1-Rsys) is 3.42e-07. The makespan valuesare 44.58 and 49.96 in Figs. 3a and 3b, respectively.

5 CONTENTION-AWARE RELIABILITY

MANAGEMENT SCHEME

Reliability is the most important measurement in heteroge-neous computing. Therefore, the objective is to maximizethe system reliability of task scheduling while guaranteeinga total energy consumption of processors that does notexceed an energy budget E� and a makespan that does notexceed a deadlineD�.

This section devises a novel contention-aware reliabilitymanagement with energy budget (CARMEB) algorithmbased on a communication contention system. CARMEBperforms each task assignment in a priority order, selects anappropriate execution frequency to attain higher reliability,and determines routes for the communications sent to thistask. The pseudo code of CARMEB is described inAlgorithm 1.

The said algorithm shows that Steps 1 and 3 establish apriority queue for a DAG, such that the subsequent schedul-ing conforms to the predecessor constraint. Step 2 calculatesthe energy-efficient frequency that satisfies the energy bud-get condition. The schedule is prepared in Steps 4 to 22.Steps 8 to 15 establish the inner loop, which determines anefficient frequency for each candidate task on a particularprocessor without exceeding the deadline and energy bud-get. Step 17 assigns the most suitable frequency and proces-sor for each candidate task. If remote communication isrequired, then edge scheduling begins from the processorwhere the scheduled candidate task is located. Steps 24 to28 reclaim the slack in the prepared scheduling. The taskorder is upward from the exit task of the priority queue.When a slack occurs, the nearest task will update its execu-tion frequency with the most efficient one. The time com-plexity of CARMEB is Oð Tj j � PEj j � log Fj jÞ, where Tj j is the

TABLE 3Computation Costs on Different Processors

Node pe0 pe1 pe2 URank

t0 5 7 8 45.67t1 6 7 3 33.00t2 7 10 5 41.00t3 6 8 5 17.67t4 8 5 7 17.67t5 8 11 6 17.00t6 7 5 9 16.33t7 6 9 7 7.33

Fig. 3. Scheduling of task graph in Fig. 1. (a) Schedule without contention, (b) schedule under CARMEB with contention.

186 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018

Page 6: Contention-Aware Reliability Efficient Scheduling on ...lik/publications/Longxin-Zhang... · and frequency island (VFI) multicore systems with DVFS capability. The said authors examined

DAG size, PEj j is the processor number, and log Fj j is themaximum level supported by the processor.

Algorithm 1. CARMEB

Input: A DAG G ¼< T;Q > , a MHG and a set PE of DVFSavailable processors available processors.Output: A schedule S of G ontoMHG.1: compute URank of ti 2 T by traversing the graph from the

exit node;2: compute energy-efficient frequency for each node in set T ;3: sort the tasks in a non-increasing order by URankðtiÞ value

and establish a priority queue QueueURank for the sortedtasks;

4: while the priority queue QueueURank is not empty do5: ti the head node in QueueURank;6: for each processor pej 2 P do7: calculate the summation of tdrðti; pejÞ and cij;8: for 8fj;k 2 F do9: find the immediate neighboring frequency fin of fee

on processor pej for task ti, mark the one which hasthe earliest finish time;

10: ifDtotal � D� ^ Etotal � E� then11: fj;k fin;12: else13: fj;k fmax;14: end15: end16: end17: assign the marked frequency fin to task ti on the marked

processor;18: compute the reliability of task ti using Eq. (6);19: allocate epi to LpeðtpÞ and LpeðtiÞ when they are assigned to

different processors tp is the immediate parent of ti;20: compute the energy consumption for task ti using Eq. (2);21: delete the head node ti in QueueURank;22: end23: ti texit;24: while ti 6¼ tentry do25: reclaim the previous scheduled tasks from texit upward

to tentry, adjust the candidate task in each slack with fee;26: update the reliability and energy consumption for the

candidate task ti;27: update the candidate task;28: end

6 EXPERIMENTS

The proposed algorithm is assessed in this section. A briefintroduction of the three most popular existing algorithms(i.e., FastCEED, RDLS, and HDLS) is presented before mak-ing comparisons.

FastCEED [6] refers to the contention-aware, energyefficient, duplication heuristic algorithm based on mixedinteger programming (MIP) formulation for parallel tasksscheduling in heterogeneous systems. The duplication ideais employed during task scheduling to minimize the make-span, total energy consumption, and tardiness of tasks innetwork resources. The communication energy consump-tion and latency are decreased significantly with the helpof duplicating the most important parent task, which isobtained under the MIP method. The FastCEED shows aneffect of energy efficiency and performance.

The HRDS was proposed by Tang et al. [7], whichattempts to improve system reliability and decrease thescheduling length. HRDS includes local and global levelscheduling. The parallel tasks of an application with prece-dence constraint run on heterogeneous processors of a vir-tual node under the local level scheduling strategy. Bycontrast, the independent applications are scheduled on thegrid under the global level scheduling strategy.

The RDLS [8] algorithm was designed to consider theresource reliability in the HCS while minimizing the make-span of an application during the tasks scheduling. The DLof a task-processor pair plays an important part in RLDS,which involves four items. The first three items are used topromote the most suitable resources to minimize the make-span, whereas the last item is employed to improve theresource with higher reliability to enhance the applicationreliability. The task-processor pair with largest dynamiclevel is selected while making the decision for the candidatetask mapping to a proper processor.

Comparing schedule algorithms under different modelsis generally unnecessary. Thus, the two aforementionedalgorithms (i.e., RDLS and HRDS) are simulated with con-tention for comparison. In particular, the task scheduling inRDLS and HRDS maintains the same processor assignmentin the same order, but the edges are scheduled for conten-tion. These schemes are called RDLS-CS and HRDS-CS,respectively, for distinction.

Experiments were performed to analyze many aspects ofthe developed parallel task scheduling with contention-aware reliability management. Our experiments were con-ducted with a workstation equipped with an Intel Core i5-6400 quad-core CPU, 8 GB DRAM, and 2 TB disk. The oper-ating system was Windows 7 (64 bit). As previously men-tioned, the occurrence of transient faults follows a Poissondistribution. The parameter configurations utilized in thisstudy were d ¼ 3,m ¼ 3 and �0 ¼ 10�9.

Parallel target systems were adopted in this study. Theprocessor number was selected as 3, 6, 9, 15, 25, and 50. Theswitch employed in this study only supports half-duplexcommunication (i.e., only one communication in eitherdirection was permitted at a time). The communication-to-computation ratio (CCR) is a highly important measure of aDAG. The CCR is defined as the summation of the entireedge weight over the entire node weight in a DAG [i.e.,CCR ¼P

e2Q wðeÞ.P

t2T cðtÞ].The following sections discuss the performance metrics,

as well as randomly generated and real-world applicationsof DAGs, to evaluate the effect of the proposed algorithm.

6.1 Performance Metrics

The performance of the proposed algorithm is assessed inthis section. Several important metrics must be introducedbefore the experimental results are evaluated. Randomlygenerated and real-world applicationDAGs are employed toverify the effects of our algorithm and existing algorithms.

6.1.1 Scheduling Length Ratio (SLR)

The scheduling length, which is also called the makespan, isdetermined by the finish time of task texit. In particular,makespan ¼ tfðtexitÞ. The makespan is one of the key metrics

ZHANG ET AL.: CONTENTION-AWARE RELIABILITY EFFICIENT SCHEDULING ON HETEROGENEOUS COMPUTING SYSTEMS 187

Page 7: Contention-Aware Reliability Efficient Scheduling on ...lik/publications/Longxin-Zhang... · and frequency island (VFI) multicore systems with DVFS capability. The said authors examined

for task scheduling. The makespan value becomes largewhen the DAG increases in size. The makespan shouldalways be normalized. This metric is measured by the SLR,which is defined as follows:

SLR ¼ makespanPti2CP minpej2PE

�wi;j

� ; (13)

where CP denotes the task nodes located in the critical pathof the DAG.

6.1.2 Energy Consumption Ratio (ECR)

The ECR for a given DAG is defined as follows:

ECR ¼ EpetotalP

ti2CP minpej2PE�Pindi þ Ceff � ci � f2i

� ; (14)

where Epetotal is the processor energy consumption.

6.1.3 POF

The POF indicates the probable execution fault of all tasks ina DAG that are mapped to specific processors with the allo-cated frequency under a particular strategy. The POF can beexpressed as follows:

POF ¼ 1�R ¼ 1�Yni¼1

RiðfiÞ: (15)

6.2 Randomly Generated DAG

Without loss of generality, a random DAG generated with aspecific probability for an edge between any two nodes inthe graph is considered in our study. Different characteristicsare considered to capture this type of weighted-applicationDAG. Themain parameters are listed as follows:

� DAG size: The number of task nodes in a DAG(ranges from 50 to 300).

� CCR: CCR = 0.5, 1.0, 2.0, 5, 10.� Number of processors: The processor number ranges

from 3 to 20.� Computation cost: The computation cost of each task

varies uniformly from 5 to 50.

� Average in/out degree: The average in/out degreeof each node ranges from 3 to 10.

� Heterogeneity factor (HF): HF is relative to the com-putation cost of each task, which varies in the rangeof 5 to 50.

The DAG graphs utilized in our experiments were gener-ated through a combination of these parameters. The com-munication weight between two task nodes is createdaccording to the CCRs. The probability of each communica-tion edge follows a normal distribution.

6.3 Effect of Random Applications

These random applications are aimed at realizing bettercomparisons among our presented algorithms and threewidely known algorithms (i.e., RDLS-CS, HRDS-CS, andFastCEED).

Figs. 4, 5, 6, and 7 show that each datum in the charts,which denotes the average value of the result, is obtainedafter these algorithms were run for 100 times for a specificset of configuration parameters. Figs. 4, 5, and 6 showthat the SLR, ECR, and POF significantly increase as CCRincreases, especially when CCR is large. When CCRbecomes equal to 0.5 (i.e., CCR is a computational inten-sive application), SLR, ECR, and POF are considerablyless than those for CCR = 10. Figs. 4, 5, and 6 also showthat the average SLRs are extremely close to those forHRDS-CS and FastCEED. These results are due to ECRand POF being the two primary metrics in this study,whereas CARMEB only ensures that the SLR is modest. Alight SLR must be sacrificed in several cases to guaranteea better POF. As expected, CARMEB outperforms theother three algorithms as task number increases for allCCRs. This performance improvement can be attributedto the intelligent strategy in the processor selection andfrequency assignment of CARMEB for each task in aDAG. The scheduling of the communication edge is alsoconsidered on a parallel computing system with commu-nication contention.

Fig. 6 illustrates the comparisons of the four algorithmsfor a large CCR. At CCR = 10, a communication intensiveapplication can be observed. Unlike that of CCR = 1.0, themetrics in Fig. 6 (i.e., SLR), are significantly increased.The reason for this condition is that the communicationbetween two nodes takes much more time when the CCR

Fig. 4. Effect of varying task number for CCR = 0.5.

188 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018

Page 8: Contention-Aware Reliability Efficient Scheduling on ...lik/publications/Longxin-Zhang... · and frequency island (VFI) multicore systems with DVFS capability. The said authors examined

is large. The said figure also shows that CARMEB sur-passes RDLS-CS, HRDS-CS, and FastCEED by 8.59, 6.36,and 2.48 percent in terms of SLR, respectively. CARMEBsignificantly outperforms RDLS-CS, HRDS-CS, and Fast-CEED by 18.48, 18.72, and 12.58 percent in terms of theaverage POF, respectively. For the average value of SLR,CARMEB performs �3.15, �1.26, and 2.07 percent betterthan RDLS-CS, HRDS-CS, and FastCEED, respectively.

These findings suggest that CARMEB suitably performsat a high CCR.

Fig. 7 shows the performance of the four algorithmswith varying task numbers for CCR = 5.0 and DAGsize = 100. As the processor number increases, the SLRand ECR decrease consistently for the four algorithms,whereas the POF increases significantly. Among the fouralgorithms, CARMEB surpasses RDLS-CS, HRDS-CS, and

Fig. 5. Effect of varying task number for CCR = 1.0.

Fig. 6. Effect of varying task for CCR = 10.

Fig. 7. Effect of varying task number for CCR = 5 and DAG size = 100.

ZHANG ET AL.: CONTENTION-AWARE RELIABILITY EFFICIENT SCHEDULING ON HETEROGENEOUS COMPUTING SYSTEMS 189

Page 9: Contention-Aware Reliability Efficient Scheduling on ...lik/publications/Longxin-Zhang... · and frequency island (VFI) multicore systems with DVFS capability. The said authors examined

FastCEED in terms of ECR and POF. Increasing the pro-cessor number does not always improve the SLR andECR as shown in the case of 12 processors in Fig. 7.

The bottleneck in this scenario is not caused by theprocessor number but by the communication datafrom the parents of the candidate task in the scheduling.

6.4 Real-World Application DAG

Besides a randomly generated DAG, three real-world appli-cations (i.e., LU decomposition [33], fast Fourier transform(FFT) [33], and molecular dynamics code [34]), are consid-ered to obtain a comprehensive evaluation of the presentedalgorithm. The following sections present that the experi-mental outputs denote the average value of the metrics afterthe four algorithms are run for 100 times with specific con-figurations. The metrics used for the comparison are SLR,ECR, and POF.

Fig. 8. LU-decomposition task graph.

TABLE 4Parameter Configuration for the LU Task Graphs

Parameter Possible values

CCR 0.5, 1, 2, 3, 4, . . . , 10Number of processors 3, 6, 9, 12, 15, 25Size of matrices 5, 6, 7, . . . , 30, 31, 32

Fig. 9. Effect of varying CCR the LU decomposition task graph.

190 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018

Page 10: Contention-Aware Reliability Efficient Scheduling on ...lik/publications/Longxin-Zhang... · and frequency island (VFI) multicore systems with DVFS capability. The said authors examined

6.4.1 LU Decomposition

The structure of LU decomposition is shown in Fig. 8. LUdecomposition is widely utilized because of its capability

to solve mathematical equations. The parameter configura-tion of LU graphs employed in this study is listed inTable 4. Let Sn be the DAG size of LU, wherein Sn satisfiesthe expression, Sn ¼ ðn2 þ 3nÞ=2, (n � 3). Sn increasesrapidly when n increases. As n increases to 30, the DAGsize of LU reaches 495. The processor number varies inthe range of 3, 6, 9, 12, 15, 25. The CCR ranges from 1 to 10with one-step increments.

The LU experimental results are shown in Fig. 9.CARMEB surpasses RDLS-CS, HRDS-CS, and FastCEEDfor all CCRs in energy and reliability management. AsCCR increases, the average values of ECR and POF forthe four algorithms increase. For ECR, CARMEB per-forms 12.33, 11.68, and 9.89 percent better than RDLS-CS, HRDS-CS, and FastCEED, respectively. For the aver-age POF value, the improvement of CARMEB is 31.00,29.47, and 26.54 percent more than those of than DLS-CS, HRDS-CS, and FastCEED, respectively. For the aver-age SLR, CARMEB performs 5.13, 8.39 percent, and�3.37 percent better than DLS-CS, HRDS-CS, and Fast-CEED, respectively.

Fig. 10. FFT with four points.

Fig. 11. Effect of varying CCR for the FFT task graph.

ZHANG ET AL.: CONTENTION-AWARE RELIABILITY EFFICIENT SCHEDULING ON HETEROGENEOUS COMPUTING SYSTEMS 191

Page 11: Contention-Aware Reliability Efficient Scheduling on ...lik/publications/Longxin-Zhang... · and frequency island (VFI) multicore systems with DVFS capability. The said authors examined

6.4.2 Fast Fourier Transform

The structure of FFT is shown in Fig. 10. FFT is widely usedin many fields, including science, mathematics, and engi-neering. Fig. 10 illustrates a FFT with four points. It involvestwo parts. The first part is the area above the dashdottedline. Tasks in this part recursively call tasks that are abovethem. The ones below the dashdotted line are the tasks thatapply the butterfly operation.

According to the characteristics of FFT, the DAG sizeincreases quickly as the point varies. Let Sn be the FFT DAGsize, where n is the point value. Thus, Sn ¼ 2n=2, (n%4 ¼ 0);an exit task is added to each FFT graph. The parameter con-figuration utilized in our experiment is listed in Table 5. TheDAG size reaches 512 when the point value becomes equalto 10. Fig. 11 presents the average values of the experi-mental outputs, including the average values of the SLR,ECR, and POF, which are based on the combinations of theparameters in Table 5. As shown in Figs. 11b and 11c, theECR and POF increase slightly as CCR increases. The ECRis notably large for all CCRs. In addition, CARMEB exhibitsa more competitive ECR and POF than the other three algo-rithms. Therefore, CARMEB surpasses RDLS-CS, HRDS-CS,and FastCEED for all CCRs in terms of ECR and POF. Spe-cifically, the overall performance improvement of CARMEBfor FFT is 17.59, 10.90, and 7.84 percent more than those ofRDLS-CS, HRDS-CS, and FastCEED in terms of ECR,respectively. For the average POF value, CARMEB performs29.68, 22.02, and 18.72 percent better than RDLS-CS, HRDS-CS, and FastCEED, respectively. For the average SLR value,CARMEB performs 8.51, 1.38, and �0.45 percent better thanRDLS-CS, HRDS-CS, and FastCEED, respectively.

6.4.3 Molecular Dynamic Code

As shown in Fig. 12, the DAG extracted from the moleculardynamics code presented in [34] is employed to evaluate theperformance of the scheduling algorithms. This moleculardynamic code graph is selected because of its irregularity.Given that its structure is known and that the task numbersare fixed, the experiment parameters are adjusted only withinthe range of the CCR, which ranges from 1 to 10 with incre-ments of 1. Three processors are chosen in this experiment.

As observed in Figs. 13b and 13c, CARMEB outperformsthe other three algorithms for all CCRs in terms of ECR andPOF. Both the ECR and POF increase as CCR increases. Onaverage, the overall performance of CARMEB is 5.56, 4.99,and 3.68 percent more than those of RDLS-CS, HRDS-CS,and FastCEED, respectively. For the average POF value,CARMEB performs 24.34, 23.61, and 20.59 percent betterthan RDLS-CS, HRDS-CS, and FastCEED, respectively.

7 CONCLUSION

This study presents a novel contention-aware reliability man-agement algorithm called CARMEB for parallel tasks in het-erogeneous systems. Given that majority of previous studiesdo not consider the realistic existence of contention inmoderncommunication systems, CARMEB is proposed in the currentstudy by applyingDVFS and slack reclaiming techniques.

Extensive experiments were conducted to evaluate theproposed algorithm and three other widely utilized algo-rithms. The comparisons were based on real-world applica-tions (i.e., LU decomposition, FFT, a molecular dynamiccode, and many randomly generated DAGs). Experimentalresults show that CARMEB significantly surpasses the otheralgorithms in terms of system reliability.

ACKNOWLEDGMENTS

The research was partially funded by the National NaturalScience Foundation of China (Grant Nos. 61702178,61672224, 61472124, 61662090), the Key Program of NationalNatural Science Foundation of China (Grant No. 61432005),the National Outstanding Youth Science Program ofNational Natural Science Foundation of China (Grant No.61625202), the International (Regional) Cooperation andExchange Program of National Natural Science Foundationof China (Grant No. 61661146006), the International Science& Technology Cooperation Program of China (Grant Nos.2015DFA11240, 2014DFB30010), the National HightechR&D Program of China (Grant No. 2015AA015305), theKey Technology Research and Development Programs ofGuangdong Province (Grant No. 2015B010108006), and theResearch Foundation of Education Bureau of Hunan Prov-ince (No. 15C0400).

Fig. 12. A molecular dynamics code.

TABLE 5Parameter Configuration for the

FFT Task Graphs

Parameter Possible values

CCR 0.5, 1, 2, 3, . . . , 10Number of processors 3, 6, 9, 15, 25Size 4, 8, 12, 16, 20, 24

192 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018

Page 12: Contention-Aware Reliability Efficient Scheduling on ...lik/publications/Longxin-Zhang... · and frequency island (VFI) multicore systems with DVFS capability. The said authors examined

REFERENCES

[1] K. Li, “Energy and time constrained task scheduling on multipro-cessor computers with discrete speed levels,” J. Parallel Distrib.Comput., vol. 95, pp. 15–28, 2016.

[2] S. Mittal and J. S. Vetter, “A survey of techniques for modelingand improving reliability of computing systems,” IEEE Trans. Par-allel Distrib. Syst., vol. 27, no. 4, pp. 1226–1238, Apr. 2016.

[3] Y. C. Lee and A. Y. Zomaya, “Energy conscious scheduling fordistributed computing systems under different operating con-ditions,” IEEE Trans. Parallel Distrib. Syst., vol. 22, no. 8, pp. 1374–1381, Aug. 2011.

[4] L. Zhang, K. Li, Y. Xu, J. Mei, F. Zhang, and K. Li, “Maximizingreliability with energy conservation for parallel task scheduling ina heterogeneous cluster,” Inf. Sci., vol. 319, pp. 113–131, 2015.

[5] Z. Zong, A. Manzanares, X. Ruan, and X. Qin, “EAD and PEBD:Two energy-aware duplication scheduling algorithms for paralleltasks on homogeneous clusters,” IEEE Trans. Comput., vol. 60,no. 3, pp. 360–374, Mar. 2011.

[6] J. Singh, S. Betha, B. Mangipudi, and N. Auluck, “Contentionaware energy efficient scheduling on heterogeneous multiproc-essors,” IEEE Trans. Parallel Distrib. Syst., vol. 26, no. 5, pp. 1251–1264, May 2015.

[7] X. Tang, K. Li, M. Qiu, and E. H.-M. Sha, “A hierarchical reliabil-ity-driven scheduling algorithm in grid systems,” J. Parallel Dis-trib. Comput., vol. 72, no. 4, pp. 525–535, 2012.

[8] A. Dogan and F. €Ozg€uner, “Matching and scheduling algorithmsfor minimizing execution time and failure probability of applica-tions in heterogeneous computing,” IEEE Trans. Parallel Distrib.Syst., vol. 13, no. 3, pp. 308–323, Mar. 2002.

[9] L. Zhang, K. Li, K. Li, and Y. Xu, “Joint optimization of energyefficiency and system reliability for precedence constrained tasksin heterogeneous systems,” Int. J. Elect. Power Energy Syst., vol. 78,pp. 499–512, 2016.

[10] L. Zhang, K. Li, C. Li, and K. Li, “Bi-objective workflow schedul-ing of the energy consumption and reliability in heterogeneouscomputing systems,” Inf. Sci., vol. 379, pp. 241–256, 2017.

[11] X. Huang, L. Zhang, R. Li, L. Wan, and K. Li, “Novel heuristicspeculative execution strategies in heterogeneous distributedenvironments,” Comput. Elect. Eng., vol. 50, pp. 166–179, 2016.

[12] K. Li, “Energy-efficient task scheduling on multiple heteroge-neous computers: Algorithms, analysis, and performance eval-uation,” IEEE Trans. Sustainable Comput., vol. 1, no. 1, pp. 7–19,Jan.–Jun. 2016.

[13] A. Benoit, M. Hakem, and Y. Robert, “Contention awareness andfault-tolerant scheduling for precedence constrained tasks in het-erogeneous systems,” Parallel Comput., vol. 35, no. 2, pp. 83–108,2009.

[14] D. Bozda�g, F. €Ozg€uner, and U. V. Catalyurek, “Compaction ofschedules and a two-stage approach for duplication-based dagscheduling,” IEEE Trans. Parallel Distrib. Syst., vol. 20, no. 6,pp. 857–871, Jun. 2009.

Fig. 13. Effect of varying CCR for the molecular dynamics code task graph.

ZHANG ET AL.: CONTENTION-AWARE RELIABILITY EFFICIENT SCHEDULING ON HETEROGENEOUS COMPUTING SYSTEMS 193

Page 13: Contention-Aware Reliability Efficient Scheduling on ...lik/publications/Longxin-Zhang... · and frequency island (VFI) multicore systems with DVFS capability. The said authors examined

[15] O. Sinnen and L. Sousa, “Communication contention in taskscheduling,” IEEE Trans. Parallel Distrib. Syst., vol. 16, no. 6,pp. 503–515, Jun. 2005.

[16] H. R. Topcuoglu, S. Hariri, and M. Wu, “Performance-effectiveand low-complexity task scheduling for heterogeneouscomputing,” IEEE Trans. Parallel Distrib. Syst., vol. 13, no. 3,pp. 260–274, Mar. 2002.

[17] A. Munir, S. Ranka, and A. Gordonross, “High-performanceenergy-efficient multicore embedded computing,” IEEE Trans.Parallel Distrib. Syst., vol. 23, no. 4, pp. 684–700, Apr. 2012.

[18] J. Han, X. Wu, D. Zhu, H. Jin, L. T. Yang, and J. Gaudiot,“Synchronization-aware energy management for VFI-based mul-ticore real-time systems,” IEEE Trans. Comput., vol. 61, no. 12,pp. 1682–1696, Dec. 2012.

[19] X. Zhu, R. Ge, J. Sun, and C. He, “3E: Energy-efficient elasticscheduling for independent tasks in heterogeneous computingsystems,” J. Syst. Softw., vol. 86, no. 2, pp. 302–314, 2013.

[20] D. Kliazovich, J. E. Pecero, A. Tchernykh, P. Bouvry, S. U. Khan,and A. Y. Zomaya, “CA-DAG: Modeling communication-awareapplications for scheduling in cloud computing,” J. Grid Comput.,vol. 14, no. 1, pp. 23–39, 2015.

[21] J. Han, M. Lin, D. Zhu, and L. T. Yang, “Contention-aware energymanagement scheme for NoC-based multicore real-time systems,”IEEE Trans. Parallel Distrib. Syst., vol. 26, no. 3, pp. 691–701,Mar. 2015.

[22] C. Chen, “Task scheduling for maximizing performance and reli-ability considering fault recovery in heterogeneous distributedsystems,” IEEE Trans. Parallel Distrib. Syst., vol. 27, no. 2, pp. 521–532, Feb. 2016.

[23] Z. Qiu and J. F. P�erez, “Evaluating replication for parallel jobs: Anefficient approach,” IEEE Trans. Parallel Distrib. Syst., vol. 27,no. 8, pp. 2288–2302, Aug. 2016.

[24] Y. C. Lee and A. Y. Zomaya, “Minimizing energy consumption forprecedence-constrained applications using dynamic voltagescaling,” in Proc. 9th IEEE/ACM Int. Symp. Cluster Comput. Grid,May 2009, pp. 92–99.

[25] Z. Zong, A. Manzanares, X. Ruan, and X. Qin, “EAD and PEBD:Two energy-aware duplication scheduling algorithms for paralleltasks on homogeneous clusters,” IEEE Trans. Comput., vol. 60,no. 3, pp. 360–374, Mar. 2011.

[26] Y. Ma, B. Gong, R. Sugihara, and R. Gupta, “Energy-efficientdeadline scheduling for heterogeneous systems,” J. Parallel Distrib.Comput., vol. 72, no. 12, pp. 1725–1740, 2012.

[27] O. Sinnen, A. To, and M. Kaur, “Contention-aware schedulingwith task duplication,” J. Parallel Distrib. Comput., vol. 71, no. 1,pp. 77–86, 2011.

[28] Y. Xu, K. Li, L. He, L. Zhang, and K. Li, “A hybrid chemical reac-tion optimization scheme for task scheduling on heterogeneouscomputing systems,” IEEE Trans. Parallel Distrib. Syst., vol. 26,no. 12, pp. 3208–3222, Dec. 2015.

[29] B. Zhao, H. Aydin, and D. Zhu, “Shared recovery for energy effi-ciency and reliability enhancements in real-time applications withprecedence constraints,” ACM Trans. Des. Autom. Electron. Syst.,vol. 18, no. 2, Mar. 2013, Art. no. 23.

[30] K. Li, “Scheduling precedence constrained tasks with reducedprocessor energy on multiprocessor computers,” IEEE Trans. Com-put., vol. 61, no. 12, pp. 1668–1681, Dec. 2012.

[31] R. K. Iyer, D. J. Rossetti, and M. C. Hsueh, “Measurement andmodeling of computer reliability as affected by system activity,”ACM Trans. Comput. Syst., vol. 4, no. 3, pp. 214–237, Aug. 1986.

[32] Y. Zhang and K. Chakrabarty, “Energy-aware adaptive check-pointing in embedded real-time systems,” in Proc. Des., Autom.Test Eur. Conf. Exhib., 2003, pp. 918–923.

[33] M. Wu and D. Gajski, “Hypertool: A programming aid formessage-passing systems,” IEEE Trans. Parallel Distrib. Syst.,vol. 1, no. 3, pp. 330–343, Jul. 1990.

[34] S. J. Kim and J. C. Browne, “A general approach to mapping ofparallel computation upon multiprocessor architectures,” in Proc.Int. Conf. Parallel Process., 1988, pp. 1–8.

Longxin Zhang received the PhD degree incomputer science from Hunan University, China,in 2015. He is currently an assistant professor ofcomputer science with the Hunan University ofTechnology. His major research includes real-time systems, power aware computing and fault-tolerant systems, modeling and scheduling fordistributed computing systems, distributed sys-tem reliability, parallel algorithms, cloud comput-ing, and big data computing.

Kenli Li received the PhD degree in computerscience from the Huazhong University of Scienceand Technology, Wuhan, China, in 2003. Hewas a visiting scholar with the University of Illinoisat Urbana-Champaign, Champaign, Illinois, from2004 to 2005. He is currently a full professorof computer science and technology, HunanUniversity, Changsha, China, and also thedeputy director of the National SupercomputingCenter, Changsha. He has authored more than150 papers in international conferences and jour-

nals, such as the IEEE Transactions on Computers, the IEEE Transac-tions on Parallel and Distributed Systems, and the IEEE Transactionson Signal Processing. His current research interests include parallelcomputing, cloud computing, and big data computing. He is an outstand-ing member of CCF. He currently serves on the editorial boards of theIEEE Transactions on Computers and the International Journal of Pat-tern Recognition and Artificial Intelligence. He is a senior member ofthe IEEE.

Weihua Zheng received the PhD degree in com-puter science from Hunan University, China, in2015. He is an associate professor of computerscience and technology at the Hunan Universityof Technology. His research interests includefast fourier transform, multiscale signal analysis,audio signal processing, image processing,parallel computing, natural language understand-ing, machine learning, knowledge representation,and artificial intelligence.

Keqin Li is a SUNY Distinguished Professor ofcomputer science. His current research interestsinclude parallel computing and highperformancecomputing, distributed computing, energy-efficientcomputing and communication, heterogeneous co-mputing systems, cloud computing, big datacomputing,CPU-GPUhybrid and cooperative com-puting, multicore computing, storage and file sys-tems, wireless communication networks, sensornetworks, peer-to-peer file sharing systems, mobilecomputing, service computing, Internet of things,

and cyber-physical systems. He has published more than 470 journalarticles, book chapters, and refereed conference papers, and has receivedseveral best paper awards. He is currently or has served on the editorialboards of the IEEE Transactions on Parallel and Distributed Systems, theIEEE Transactions on Computers, the IEEE Transactions on CloudComputing, the IEEE Transactions on Services Computing, and the IEEETransactions on Sustainable Computing. He is a fellow of the IEEE.

" For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

194 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018