1 distributed energy-efficient scheduling for data-intensive applications with deadline constraints...
TRANSCRIPT
![Page 1: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/1.jpg)
1
Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids
Cong Liu and Xiao Qin
Auburn University
![Page 2: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/2.jpg)
2
Outline
Introduction and MotivationSystem ModelAlgorithmPerformance AnalysisSummary
![Page 3: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/3.jpg)
3
Introduction
Distributed scientific applications in many cases require access to massive data sets.
In High Energy Physics (HEP) applications, for example, a handful of experiments have started producing petabytes of data per year for decades.
Data grids have served as a technology bridge between the need to access extremely large data sets and the goal of achieving high data transfer rates by providing geographically distributed computing resources and large-scale storage systems.
![Page 4: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/4.jpg)
4
Introduction
The Google Data Cluster
•31,654 machines •63,184 CPUs •126,368 Ghz of processing power•two identical buildings contain about 100,000 square feet of data center floor space
![Page 5: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/5.jpg)
5
Introduction
Reliability Computing in high temperatures is more error-prone
than in an appropriate environment.
Operational Cost A single 200-Watt server, such as the IBM 1U*300. The
energy bill for this single server would be $180/year.
![Page 6: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/6.jpg)
6
Introduction
A key factor in the process of scheduling data-intensive tasks is locations of input data sets required by tasks.
A straightforward strategy to enhance performance of data-intensive applications on data grids is to replicate popular data sets to multiple resource sites.
Offering higher data access speeds compared to maintaining the data sets in a single site.
![Page 7: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/7.jpg)
7
Drawbacks of Making Too Many Replicas
It is challenging to maintain consistency among replicas in Data Grids.
It is nontrivial to efficiently generate replicas of massive data sets on the fly in Data Grids.
A large number of data replicas can increase energy dissipation in storage resources.
![Page 8: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/8.jpg)
8
Reduce Energy Consumption in Data Grids
• Minimize electricity cost• Improve system reliability
• How to reduce energy consumption in Data Grids? Energy-efficient scheduling algorithms for
applications running on data grids.
![Page 9: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/9.jpg)
9
Goals of Scheduling
Tradeoffs between energy efficiency and high-performance for data-intensive applications.
Integrate data placement strategies with task scheduling
Consider real-time requirements
How to achieve the goals? A Distributed Energy-Efficient Scheduler called DEES Three key components: energy-aware ranking,
performance-aware scheduling, and energy-aware dispatching.
![Page 10: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/10.jpg)
10
Design Goals of DEES
Maximize the number of tasks completed before their corresponding deadlines
Replicate data and place replicas in an energy-efficient way
Dispatches real-time tasks to peer computing sites, considering three factors: Computational capacities of peer computing sites, Energy consumption introduced by tasks, and Data location.
![Page 11: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/11.jpg)
11
Features of DEES
High scalability Require no full knowledge of workload conditions
of all the computing sites in a data grid. One must consider that obtaining full knowledge of
the state of the grid is a difficult task.
![Page 12: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/12.jpg)
12
Key Ideas
High-priority tasks are scheduled first in order to meet their deadlines.
Explore slacks: low-priority tasks can have their deadlines guaranteed.
The dynamic voltage scaling (DVS) technique is used to reduce energy consumption by exploiting available slacks and adjusting appropriate voltage levels accordingly.
![Page 13: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/13.jpg)
13
Dynamic Voltage Scaling
A effective technique for reducing energy consumption by adjusting the clock speed and supply voltage dynamically.
Energy dissipation per CPU cycle is proportional to v2
Processor energy can be saved by reducing CPU voltages while running it at a slower speed.
![Page 14: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/14.jpg)
14
Design Ideas
Two types of tasks: hard real-time tasks and soft real-time tasks.
Prioritize hard real-time tasks but create slacks by delaying their executions till the latest moment.
After a schedule is made, the processor voltage is adjusted to the lowest possible level on a task-by-task basis at each scheduling point.
![Page 15: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/15.jpg)
15
System Model
Geographically distributed sites are interconnected through a WAN.
Each site consists of storage resources, computing resources, and a ticket server.
![Page 16: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/16.jpg)
16
Energy Consumption Model
Consider energy consumption of executing tasks, making data replicas, and communicating.
The total energy consumption of a data grid, Etotal can be expressed as:
where Ecomp is the total energy consumption of computing resources, Ecomm is the total energy consumption of communication, and Erep is the total energy consumption of replicating data.
repcommcomptotal EEEE
![Page 17: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/17.jpg)
17
Four Cases of Energy Consumption
Case 1: Local execution and local data Case 2: Local execution and remote data Case 3: Remote execution and same remote data Case 4: Remote execution and different remote data
4
300
20
1000
,,,,,,,,,,,
,,,,
,,,,,,,,,
,,
,,,,,,,,,,,,,
SifEEEE
SifEE
SifEEE
SifE
EEEEE
wrvoji
ttvui
tdvoji
cvki
ttvui
cvki
wrvoji
tdvoji
cvki
cvki
wrvoji
ttvui
tdvoji
cvkivki
![Page 18: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/18.jpg)
18
If data is not locally available, then?
Executing a task at a site where its data is located: Energy efficient No data transfer and no replication cost
Compared to the local execution and remote data scenario, executing the task at a remote site where data is located is still more energy efficient if task’s input data set is larger than its execution code
size.
![Page 19: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/19.jpg)
19
Algorithm Components
DEES is composed of Ranking Scheduling Dispatching
Goals: Maximize the number of tasks meeting deadlines Minimize energy consumption Improve scalability
![Page 20: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/20.jpg)
20
Task Grouping
Task Grouping: Tasks requiring the same data are grouped together. The task group whose data resides in the local site,
called local task group, is ranked first. Other task groups are ranked in descending order,
according to the number of tasks in the task group.
Considering Real-Time Requirements: Within each group, tasks are ordered by increasing
deadline. Thus, tasks with shorter deadlines are scheduled sooner.
![Page 21: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/21.jpg)
21
DEES Scheduling
DEES schedules tasks on a group basis. A local task group is scheduled first. In order to
schedule task ti on site su, DEES selects machine mk at su that can complete ti within its deadline and provide the minimum completion time.
After processing all tasks, remaining unscheduled tasks will be dispatched to remote sites.
![Page 22: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/22.jpg)
22
Dispatching
Dispatching: To delivers tasks within each task group to data sites.
For task group gj whose data site is so, scheduling decisions are made by so’s scheduler based on its local resource status and task information of gj.
If so cannot schedule all tasks in gj, then unscheduled tasks are dispatched to so’s immediate neighbors using tickets in a breadth-first manner.
![Page 23: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/23.jpg)
23
Energy-Aware Ranking
To make tradeoffs between energy efficiency and real-time performance, we propose a ranking system to rank so’s neighbors.
where n is the number of tasks in gj that can be scheduled on sv, ε is a coefficient concerning the task deadline, μ is a coefficient concerning energy saving. Energy consumed to replicate gi’s data from so to sv,
Energy consumed to transfer gi’s data from so to sv,
Energy consumed to execute these n tasks at sv.
nEEEnssgrank
vnicomp
voicomm
voirep
ovi /)(
1),,(
,,,,,,
![Page 24: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/24.jpg)
24
Dispatching: Energy Efficiency vs. real-time
ε and μ: To manage the two conflicting goals of saving energy and meeting deadlines.
For mission-critical tasks: ε is set to 1 and μ is set to 0, which means the neighbor that can schedule more tasks is given preference.
For energy efficiency: ε is set to 0 and μ is set to 1. Thus, the neighbor that consumes the least amount of energy will be considered first.
nEEEnssgrank
vnicomp
voicomm
voirep
ovi /)(
1),,(
,,,,,,
![Page 25: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/25.jpg)
25
Simulation Parameters
Parameter Value(fixed)-(varied)
Number of jobs (9600)-(1600,3200,6400,9600 12800,16000,19200, 22400)
Number of sites (32)- Site processing speed 8*8 nodes Number of datasets (200)-(100,200,400) Task execution time range (Uniform distribution)
(1,500) second
Size of datasets (500-800MB short jobs, 800MB-1GB medium jobs, 1-2GB long jobs)-(500MB-2GB)
Dataset popularity distribution
(Uniform)-(Uniform, Normal, Geometric)
Dataset popularity threshold
(2)-(2,4,6,8,10)
![Page 26: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/26.jpg)
26
Performance Analysis
Compared DEES with an effective scheduling algorithm - Close-to-Files.
Features of the Close-to-Files algorithm: Good performance since Close-to-File takes data locality
into account. It schedules a task to its data site to decrease the amount
of data transfer. Scheduling overhead is high: It is an exhaustive algorithm
that searches across all combinations of computing and data sites to find a result with the minimum computation and data transmission cost.
![Page 27: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/27.jpg)
27
Performance Metrics
The Guarantee Ratio
Normalized Average Energy Consumption and Total Energy Consumption are used as the performance metrics in the evaluation.
total
s
N
NRatioGuarantee
s
total
N
EnConsumptioEnergyAverageNormalized
![Page 28: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/28.jpg)
28
Real-Time Performance
Fig. 5. Guarantee Ratio by ranking coefficients
![Page 29: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/29.jpg)
29
Energy Consumption
Fig. 6. Normalized Average Energy Consumption by ranking coefficients
![Page 30: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/30.jpg)
30
Performance
Fig. 7. Guarantee Ratio by task loads
![Page 31: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/31.jpg)
31
Energy Consumption
Fig. 8. Normalized Average Energy Consumptionby task loads
![Page 32: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/32.jpg)
32
Summary
An energy efficient algorithm to schedule real-time tasks with data access requirements on data grids.
By reducing the amount of data replication and task transfers, the proposed algorithm effectively saves energy.
Distributed since it does not need knowledge of the complete state of the grid.
Detailed simulations demonstrate that DEES significantly reduces the energy consumption while increasing the Guarantee Ratio.
![Page 33: 1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649edb5503460f94bea986/html5/thumbnails/33.jpg)
33
Questions
Xiao Qinhttp://www.eng.auburn.edu/~xqin