power management for hard disks and main memory
DESCRIPTION
Power Management for Hard Disks and Main Memory. Presented by Matthias Eiblmaier. 11/06/2008. 11/06/2008. Motivation. Power-consumption is a key factor to achieve environmental and financial goals. There are several ways to save power in a computer. Throttling CPU speed. - PowerPoint PPT PresentationTRANSCRIPT
Power Management for Hard Disks and Main Memory
11/06/2008
Presented byMatthias Eiblmaier
1
• Power-consumption is a key factor to achieve environmental and financial goals
11/06/2008
Matthias Eiblmaier
Motivation
• There are several ways to save power in a computer
Throttling CPU speed
Set idle RAM banks and ranks into low power
mode
Throttling disk speed
2
Several approaches have been proposed to save energy by efficient peripheral power management.The two papers that will be discussed today:
11/06/2008
Matthias Eiblmaier
Outline
A. Performance Directed Energy Management for Main Memory and Disks
(by Xiaodong Li, Zhenmin Li, Francis David, Pin Zhou, Yuanyuan Zhou, Sarita Adve and Sanjeev Kumar)
Department of Computer Science ,University of Illinois, The Proceedings of the Eleventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'04), October, 2004
B. A comprehensive Approach to DRAM Power Management (by Ibrahim Hur and Calvin Lin)
Department of Computer Sciences, the University of Texas the 14th IEEE International Symposium on High-Performance Computer Architecture (HPCA 2008), Salt
Lake City, Utah, February 2008
3
11/06/2008
Matthias Eiblmaier
OutlineA. Performance Directed Energy Management for
Main Memory and Disks
1. Introduction and background2. Performance Guarantees3. Control Algorithms4. Disk energy management5. Experiment6. Conclusion 7. Critiques
4
11/06/2008
Matthias Eiblmaier
1. Introduction and background
You can save power for a storage device by putting them into low power modes.
Low power modes can degrade performance
Current (threshold based) algorithms:
• monitor usage (response time) and move if this value exceeds, based on certain thresholds, the device into low power mode.• need painstaking, application-dependent manual tuning of thresholds• have no performance guarantee
5
11/06/2008
Matthias Eiblmaier
1. Introduction and background
This paper contributes:1. Technique to guarantee performance2. A self tuning threshold based control algorithm (called PD)3. A simpler, optimization based, threshold free control algorithm (called PS)
RDRAM Memory Power Modes: Each chip can be activated independently There are 4 power modes: active, standby, nap & power down Chip needs to be in active mode to serve are read/write request.
Previous control algorithms:• Static: put device in a fixed power mode• Dynamic: change power mode when being idle for a specific amount of time (threshold)
6
11/06/2008
Matthias Eiblmaier
2. Performance Guarantee
• Assume the best performance is without energy management• An acceptable slowdown is referred to the control algorithm
• Slowdown is the percentage increase of the execution time
To estimate the slowdown, the following terms are used:
• t = Execution time using the underlaying energy management until some point P in the program• = Execution time without any energy management until the same point in the program• Delay(t) = absolute increase in execution time due to energy mangement = t – Tbase (t)• Actual percentage slowdown =
Slowdown limit
tTbase
100baseTtDelay
7
11/06/2008
Matthias Eiblmaier
2. Performance Guarantee
Performance Gurantee is subject toSlack(t) = amount of execution time not violating timing constraints
tDelaySlowdowntDelayt=tDelaySlowdownT=tSlack limitlimitbase
100100
Epoch based algorithm:
•Application’s execution time can be predicted.•Estimates available slack for entire epoch at start of epoch.•Check slack after each access.•If slack is not enough, algorithm forced all devices in active mode.
8
msmsmsms=msms=tSlack 155100
%2051055100
%20100
Example :
11/06/2008
Matthias Eiblmaier
2. Performance Guarantee
Available slack for next period:
limit
epoch
Slowdownt+tDelayt
tDelay+lackAvailableS
100
where tepoch is the predicted execution time of the next epoch without power management
tDelaytDelaytSlowdown+tSlowdownlackAvailableS limitepoch
limit 100100
slackt
Slowdown/100 * t
Delay(t)
slack
Slowdown/100 * tepoch
tepoch
Delay(t+1)
9
11/06/2008
Matthias Eiblmaier
3. Control Algorithm
Two kind of algorithms are used for performance guarantee:
• Performance-directed static algorithm (PS).Fixed power mode to a memory chip for the entire duration of an epoch.
• Performance-directed dynamic algorithm (PD).Transfers to low power mode after some idle time and re-tunes of the thresholds based on available slack and workload characteristics
10
11/06/2008
Matthias Eiblmaier
3. Control Algorithm (PS)
Goal is to choose for every chip a configuration, that maximizes the total energy savings to the constraint of the total available slack:
maximize:
subject to:
where
PS Algorithm called at the beginning of every epoch
1. Predict AvailableSlack for the next epoch.2. Predict E(Ci) and D(Ci) for each device i. 3. Solve the knapsack problem.4. Set the power mode for each device for the next epoch.
SlackAvailableCD
CE
i
N
=i
i
N
=i
1
0
1
0
11
))()(()( kaccess
kaccess
ik CtCtACD
11/06/2008
Matthias Eiblmaier
3. Control Algorithm (PS)
• Obtain available slack from performance-guarantee algorithm
• Algorithm need to predict next epoch’s number and distribution of accesses.
•Prediction: Number of accesses: The same as last epoch. Distribution of accesses: Uniform distribution in time.
•Algorithm reclaim any unused slack from last epoch.
12
11/06/2008
Matthias Eiblmaier
3. Control Algorithm (PD)
PD automatically re-tunes its thresholds at the end of each epoch, based on available slack and workload characteristics:
1. Predict AvailableSlack for the next epoch.2. Predict number of accesses for the next epoch.3. Adjust the functions for Thk(S) (1≤k≤M-1) access count measured
from the last epoch.4. for k = 1,...,M-1 do.5. 5. Use the Thk(S) functions to determine the values for Thk,6. end for.
7. Set thresholds Th1,..., ThM for all chips.
13
Manipulate thresholds
+
slackslackoptTransfer function
Per.Dyn. Controller
-
Command threshold
If slack to low set higher thresholds
11/06/2008
Matthias Eiblmaier
3. Control Algorithm (PD)•When i>k To keep device active during the short idle time. Using break-even time as threshold.• When 0≤i≤k(threshold: Ck-i*tk) Putting a device in mode k unless device id already idle for a large period.• The lower value of i the higher threshold. threshold: Ck-i*tk• Constant C use to dynamically adjust threshold: Slack not used up: Cnext=0.95*Ccurrent Slack used up: Cnext=2*Ccurrent
14
11/06/2008
Matthias Eiblmaier
4. Disk management
Model•DRPM disk model:
•multi-speed disk.•Can service request at a low rotational speed.•No transition overhead.
•Performance delay:•Period of speed change.•Service in low speed.
Performance Guarantee•Static algorithm:
•The same as memory.•Dynamic algorithm:
•Algorithm adjust UT and LT based on•Predicted access count.• Available slack.
15
11/06/2008
Matthias Eiblmaier
5. ExperimentsThe experimental verification are done on a simulator (Simplescalar) with an enhanced RDRAM memory model.
Execution times with original algorithms:
16
11/06/2008
Matthias Eiblmaier
5. ExperimentsResults for Memory:
17
11/06/2008
Matthias Eiblmaier
5. ExperimentsExperiments and results for Disk:•Simulator: DiskSim.•Disk:IBM Ultrastar 36Z15.•Rotational speed:3K,6K,9K,12K.•Access distribution:
Exponential,Pareto,Cello’96.
18
11/06/2008
Matthias Eiblmaier
6. Conclusion
•Improvement PM algorithm’s execution time degrade.
•Proposing self-tuning energy-management algorithm.
19
11/06/2008
Matthias Eiblmaier
7. Critiques
•PD/PS cannot guarantee real time•Performance guarantee algorithm is not tested for stability•PD causes overhead• Loop variable Delay, hence slack, is just estimated •Experimental verification lacks of substantial benchmarks (e.g. real server workloads)•Not exactly stated where and how to implement algorithm (chip, OS)
20
11/06/2008
Matthias Eiblmaier
OutlineB. Performance Directed Energy Management for
Main Memory and Disks
1.Queue-Aware Power-Down Mechanism2.Power/Performance-Aware Scheduling 3.Adaptive Memory Throttling4.Delay Estimator Model5.Simulation and Results6.Conclusions7.Critiques
21
11/06/2008
Matthias Eiblmaier
1. Queue-Aware Power-Down Mechanism
DRAM
Processors/Caches
MemoryQueue
Scheduler
ReadWrite
Queues
MEMORYCONTROLLER
1. Read/Write instructions are queued in a stack
2. Scheduler (AHB) decides which instruction is preferred
3. Subsequently instructions are transferred into FIFO Memory Queue
22
11/06/2008
Matthias Eiblmaier
1. Queue-Aware Power-Down Mechanism
1. Rank counter is zero -> rank is idle &
2. The rank status bit is 0 -> rank is not yet in a low power mode &
3. There is no command in the CAQ with the same rank number -> avoids powering down if a access of that rank is immanent
Read/Write Queue
C:1 - R:2 – B:1 – 0 - 1
C:1 - R:2 – B:1 – 0 - 2
C:1 - R:2 – B:1 – 0 - 3
C:1 - R:2 – B:1 – 0 - 4
C:1 - R:2 – B:1 – 0 - 5
C:1 - R:2 – B:1 – 0 - 6
C:1 - R:2 – B:1 – 0 - 7
C:1 - R:1 – B:1 – 0 - 1
Set rank1 counter to 8
Set rank2 status bit to 8
Set rank2 status bit to 8
Decrement counter for rank 2
Decrement counter for rank 1
Decrement counter for rank 1
Set rank2 status bit to 8
Power down rank 1
…
23
11/06/2008
Matthias Eiblmaier
2. Power/Performance-Aware Scheduling
1. An adaptive history scheduler uses the history of recently scheduled memory commands when selecting the next memory command
2. A finite state machine (FSM) groups same-rank commands in the memory as close as possible -> total amount of power-down/up operations is reduced
3. This FSM is combined with performance driven FSM and latency driven FSM
24
11/06/2008
Matthias Eiblmaier
3. Adaptive Memory Throttling
25
DRAM
Processors/Caches
MemoryQueue
Scheduler
ReadWrite
Queues
Reads/Writes
MEMORYCONTROLLER
Throttle Delay
Estimator
ThrottlingMechanism
Model Builder
(a software tool, active only
during system design/install time)
decides to throttle or not, at every cycle
determines how much to throttle, at every 1
million cycles
Power Target
sets the parameters for the
delay estimator
11/06/2008
Matthias Eiblmaier
3. Adaptive Memory Throttling
• Stall all traffic from the memory controller to DRAM for T cycles for every 10,000 cycle intervals
. . .
10,000 cycles 10,000 cycles
Tcycles
active stall active stall
time
Tcycles
• How to calculate T (throttling delay)?
26
11/06/2008
Matthias Eiblmaier
3. Adaptive Memory Throttling
Model Building
020406080
100120
Throttling Degree (Execution Time)
DRAM Power
A B
Application 1App. 2
020406080
100120
Throttling Degree (Execution Time)
DRAM Power
T
Throttling degrades performance
Inaccurate throttling Power consumption is over the budget Unnecessary performance loss
27
11/06/2008
Matthias Eiblmaier
4. Delay Estimation Model• Calculates the throttling delay, T, using a linear model
– Input: Power threshold and information about memory access behavior of the application
– Output: Throttling delay• Calculates the delay periodically (in epochs)
– Assumes consecutive epochs have similar behavior– Epoch length is long (1 million cycles): overhead is small
• What are the features and the coefficients of the linear model?• Step 1: Perform experiments with various memory access behavior• Step 2: Determine models and model features
– Needs human interaction during system design time• Step 3: Compute model coefficients
– Solution of a linear system of equations
28
11/06/2008
Matthias Eiblmaier
4. Delay Estimation Model
Model Building
• An offline process performed during system design/installation
• Step 1: Perform experiments with various memory access behavior
• Step 2: Determine models and model features– Needs human interaction during system design time
• Step 3: Compute model coefficients– Solution of a linear system of equations
29
11/06/2008
Matthias Eiblmaier
4. Delay Estimation Model
• Model features that we determine– Power threshold– Number of Reads– Number of Writes– Bank conflict information
• Possible Models– T1: Uses only Power
threshold– T2: Uses Power, Reads,
Writes– T3: Uses all features
30
11/06/2008
Matthias Eiblmaier
4. Delay Estimation Model
• Step 1: Set up a system of equations
– Known values are measurement data– Unknowns are model coefficients
• Step 2: Solve the system
R2=0.191 R2=0.122 R2=0.00331
11/06/2008
Matthias Eiblmaier
5. Simulation and Results
• Used a cycle accurate IBM Power5+ simulator that IBM design team uses– Simulated performance and DRAM power– 2.1 GHz, 533-DDR2
• Evaluated single thread and SMT configurations– Stream – NAS – SPEC CPU2006fp– Commercial benchmarks Memory Controller
• 2 cores on a chip• SMT capability• ~300 million transistors
(1.6% of chip area)
32
11/06/2008
Matthias Eiblmaier
5. Simulation and Results
Energy efficiency improvements from Power-Down mechanism and Power-Aware Scheduler
Stream : 18.1%SPECfp2006 : 46.1%
33
11/06/2008
Matthias Eiblmaier
5. Simulation and Results
34
11/06/2008
Matthias Eiblmaier
6. Conclusion
• Introduced three techniques for DRAM power management– Queue-Aware Power-Down– Power-Aware Scheduler– Adaptive Memory Throttling
• Evaluated on a highly tuned system, IBM Power5+– Simple and accurate– Low cost
• Results in the paper– Energy efficiency improvements from our Power-Down mechanism
and Power-Aware Scheduler• Stream : 18.1%• SPECfp2006 : 46.1%
35
11/06/2008
Matthias Eiblmaier
7. Critiques
• Overhead is not computed or estimated• Needs a relative complicated architecture• Throttling and queuing result in delays -> no RT• Dependence on prediction model
36
11/06/2008
Matthias Eiblmaier
Overall Conclusion and comparison
37
PS/PD+ Performance Guarantee
Queue aware mechanism +Power aware scheduling +Throttling
Objective Minimize power + guarantee fixed worst case execution time
Minimize power Maximize performance
Realization experimental Based on AHB scheduler
Real-time no no
Implementation Memory Controller or OS kernel (not specified)
Memory Controller
Methodology Simulation (Simplescalar) Simulation (IBM Power5+)
Controller Ad-hoc Open loop/open loop/ad hoc
11/06/2008
Matthias Eiblmaier
Thank You
38
11/06/2008
Matthias Eiblmaier
3. Control Algorithm
To enforce performance guarantee , algorithm needs to:
• Apportion a part of the available to each chip.• keep track of the actual delay each chip incurs.• Compare actual delay and predicted delay
for every epoch
39