Modeling CPU Energy Consumption for Energy Efficient Scheduling
Abhishek Jaiantilal, Yifei Jiang, Shivakant Mishra
University of Colorado - Boulder
GCM '10 Proceedings of the 1st Workshop on Green Computing © 2010 ACM
Outline
• Introduction
• Energy Model Overview
• Power Consumed and CPU Cycles
• Experimental Results
• Conclusions
2
Introduction (2/2)
• Dynamic Voltage and Frequency Scaling (DVFS) is used
in CPU, referring as P-states.
• Per Core Power Gating (PCPG), or Dynamic Core Gating
(DCG) is a hardware feature allowing the cores in a multi-
core CPU to shut themselves off. It is also called C-states.
– C0 - Active state
– C1 - Inactive state with the core not running on these idle cycles
– C3 - Inactive state with the cache saved
– C6 - All the PLL turned off
4
Energy Model Overview (1/3)
• Black Box approach
– PCPG is hardware controlled, so we use Black
Box approach.
• Obtained the statistics of /proc/stat file
– A scheduling policy to limit these loops on few
cores might not be the best compared with
running them on all the cores.
• Still a low power profile.
• Lesser execution time.
• So we need to know the power consumption of a task
5
Energy Model Overview (2/3)
6
• Even though the processes are running at 100% load, the
power consumed is different for different tasks.
• Because some of these tasks are float-cycle intensive and
others are integer or memory cycle intensive.
Energy Model Overview (3/3)
7
• Modified Black Box approach
– If we know how much power a task is consuming, then
we can fit a schedule that allowing for a shorter
execution time and a lower energy consumption.
– We need the “training data” to choose the best task
schedule depending on the tradeoff between the power
consumption and the execution time.
• Disadvantages
– Need training data from all the possible tasks first
– Computers should have the same configuration
Power Consumed and CPU Cycles (1/7)
• System power consumption
𝑃(𝑆𝑦𝑠𝑡𝑒𝑚)∝ 𝑓(𝑃 𝐶𝑃𝑈 + 𝑃 𝑀𝑒𝑚𝑜𝑟𝑦 + 𝑃 𝐹𝑎𝑛𝑠+ 𝑃 𝐻𝐷𝐷 + 𝑃 𝑁𝑜𝑟𝑡ℎ𝑏𝑟𝑖𝑑𝑔𝑒+ 𝑃 𝑆𝑜𝑢𝑡ℎ𝑏𝑟𝑖𝑑𝑔𝑒 + 𝑃 𝐺𝑟𝑎𝑝ℎ𝑖𝑐𝑠+ 𝑃(𝑂𝑡ℎ𝑒𝑟 𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡𝑠))
– f() = Efficiency of the Power supply
8
Power Consumed and CPU Cycles (2/7)
• Simplified system power consumption
𝑃(𝑆𝑦𝑠𝑡𝑒𝑚) ∝ 𝑃 𝐶𝑃𝑈 + 𝑃 𝑀𝑒𝑚𝑜𝑟𝑦 + 𝑃 𝐵𝑖𝑎𝑠
– Bias = Power of Fans, Motherboard, North-bridge,
South-bridge, Graphics, HDD, and Other Components.
9
Power Consumed and CPU Cycles (3/7)
• We proposed if we know the CPU cycle profile for a
task, we can build a simple linear model to account the
CPU load and energy consumed.
𝑃 𝑆𝑦𝑠𝑡𝑒𝑚∝ 𝐶𝑦𝑐𝑙𝑒𝑠 𝐹𝑃𝑈 + 𝐶𝑦𝑐𝑙𝑒𝑠 𝐼𝑁𝑇+ 𝐶𝑦𝑐𝑙𝑒𝑠 𝑀𝑒𝑚𝑜𝑟𝑦 + 𝑃(𝐵𝑖𝑎𝑠)
𝑃(𝑇𝑎𝑠𝑘𝑖)∝ 𝐶𝑦𝑐𝑙𝑒𝑠 𝐹𝑃𝑈 + 𝐶𝑦𝑐𝑙𝑒𝑠 𝐼𝑈 + 𝐶𝑦𝑐𝑙𝑒𝑠 𝐶𝑎𝑐ℎ𝑒
𝑃 𝑆𝑦𝑠𝑡𝑒𝑚 ∝ 𝑃𝑜𝑤𝑒𝑟 𝑇𝑎𝑠𝑘𝑖
𝑁
𝑖=1
+ 𝐵𝑖𝑎𝑠
10
Power Consumed and CPU Cycles (4/7)
• We need to know the counts and the types of CPU
cycles executed by a task.
– Dtrace for Solaris
– Oprofile
– Intel Vtune for Linux
• We used Vtune in an offline manner and sampled
the application and store the cycle time over some
period. (30 minutes~1 hour)
11
Power Consumed and CPU Cycles (5/7)
• Linear Regression Model
𝑃𝑜𝑤𝑒𝑟 𝑇𝑎𝑠𝑘𝑖 = 𝐹 ∗ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝐹𝑃 𝑐𝑦𝑐𝑙𝑒𝑠 +𝐼 ∗ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐼𝑛𝑡 𝐶𝑦𝑐𝑙𝑒𝑠 +𝑀 ∗ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑀𝑒𝑚𝑜𝑟𝑦 𝐶𝑦𝑐𝑙𝑒𝑠
• F, I, and M are multiplier for watt cost of running a
single FP, INT, or Memory cycle. But there is no
direct way to find them.
12
Power Consumed and CPU Cycles (6/7) • We use the statistical approach of minimizing the square
error to find these unknown variables.
min𝐹,𝐼,𝑀
𝑀𝑒𝑎𝑠𝑢𝑟𝑒𝑑 𝑤𝑎𝑡𝑡𝑎𝑔𝑒 𝑌 − 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑤𝑎𝑡𝑡𝑎𝑔𝑒 𝑌 2
𝑌 = 𝐹 ∗ 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝐹𝑃 𝑐𝑦𝑐𝑙𝑒𝑠 +𝐼 ∗ (𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐼𝑛𝑡 𝐶𝑦𝑐𝑙𝑒𝑠) +𝑀 ∗ 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑀𝑒𝑚𝑜𝑟𝑦 𝐶𝑦𝑐𝑙𝑒𝑠 + 𝐵𝑖𝑎𝑠 = Xβ 𝐹, 𝐼,𝑀 > 0, β = 𝐹 𝐼 𝑀
• Once we know X, Y, then F, I, and M (stored in the β
vector) can be obtained as:
β = 𝑋𝑇𝑋 + 𝜆𝐼 −1𝑋𝑇𝑌
13
Power Consumed and CPU Cycles (7/7)
• We also used another statistical algorithm -
Random Forests in our experiments.
– Random Forests is a popular machine learning/statistical
approach that uses decision trees.
– It is a non-linear algorithm compared to the linear
regression formulation.
14
Experimental Results (1/6)
15
• Regression Model Training
– We obtained training data from the following
benchmarks first:
• memcpy
• While-float
• mprime
– Then we obtained separated test data for:
• SPECjvm
• While-Int
• While-Branch
Experimental Results (3/6)
17
• Energy Efficient Scheduler
– We proposed that we do not wake up a core from idle state until its needed. The cores that were not allocated any tasks were shut off.
– A core cannot execute more than a specific number of processor cycles.
– We used the average number of cycles executed to predict the energy consumed and then chose the best energy efficient schedule.
• The ideal case would be in an online fashion, based on the current load/cycle executed and evaluate the task schedule every second.
Conclusions
• We showed that a linear and Random Forests model
can be used for predicting energy consumption.
• We also proposed a simple scheduler that utilizes
this model to minimize power consumption but still
maintain similar execution time.
• In the future, we propose to come up with a better
mathematical model for scheduler.
• We also propose to use model in an online fashion
and allowing the OS to limit processes that
consume power greater than a fixed limit.
21