power management in the linux kernel - rice universitymobile/elec518/lectures/2011-tatepeter.pdf ·...

31
Power Management in the Linux Kernel Tate Hornbeck, Peter Hokanson 7 April 2011

Upload: trinhdien

Post on 08-May-2018

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

Power Management in the Linux Kernel

Tate Hornbeck, Peter Hokanson

7 April 2011

Page 2: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

Intel Open Source Technology Center

Venkatesh PallipadiSenior Staff Software Engineer2001 - Joined IntelProcessor and platform power management2010 - Moved to Google

Suresh SiddhaSenior Staff Software Engineer2001 - Joined IntelMulticore, process scheduling, system scalability

Page 3: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

IBM Linux Technology Center

Vaidyanathan SrinivasanEnergy optimizations for Linux serversContributed to Linux power aware scheduler

Gautham R ShenoyWorked on cpufreq, idle system power management, idle load balancer

Page 4: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

Ottawa Linux Symposium (OLS)

Source of most of our papers

One of three major grassroots Linux and Open Source conferences in the world

Held since 1999

Page 5: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

Intel's lesswatts.org

Intel-sponsored project to improve Linux power consumption

Major projects:PowerTOPTickless idleLinux ACPILinux Battery Life Toolkit (BLTK)Power management on Intel graphics chipsets

Page 6: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

Why optimize Linux for energy?

Linux is being used in a wide variety of energy-critical applications:

Android phones and tabletsDatacenters:

Utility Costs Thermal management

Embedded LinuxLaptops

Page 7: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

Overview of Technologies

Non-Idle power management:CPU Frequency ScalingProcess Scheduling on multicore systems

Idle-time power management:CPU Idle time allocationRemoving timer interruptsInterrupt migration

Accurate energy-based measurementsPowerTOPBattery Life Toolkit

Page 8: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

CPU Frequency Scaling

Kernel subsystem cpufreq Common interface to CPU frequency steppingSupports multiple platforms and methods (ACPI, BIOS)Standard frequency governors

CPU Governors:performance - highest frequencypowersave - lowest frequencyuserspace - proxy: exports data through sysfs interface for user-mode controlondemandconservative

V. Pallipadi and A. Starikovskiy, “The Ondemand Governor,” The Linux Symposium, 2006.

Page 9: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

The Ondemand GovernorModern CPUs can frequency-step in about 10usUserspace tools lack sufficient response time

Configurable thresholds and sample ratesConservative differs only in frequency rise time

V. Pallipadi and A. Starikovskiy, “The Ondemand Governor,” The Linux Symposium, 2006.

For each CPU:Every X milliseconds:Get utilization since last checkif( utilization > UP_THRESHOLD) increase frequency to MAXEvery Y milliseconds:Get utilization since last checkif( utilization < DOWN_THRESHOLD) decrease frequency 20%

Page 10: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

Energy Aware Scheduling on multicore systems

Traditional Schedulers:Spread tasks across logical CPUsUnaware of underlying mutlicore and NUMA topology

Energy Aware Scheduling Goals:Consolidate all activity on fewest number of CPUsRemaining CPUs can enter idle statesUnderstand the underlying topologyTrade-off between throughput and energy efficiency

V. Pallipadi and V. Srinivasan, “Energy-aware task and interrupt management in Linux”, The Linux Symposium, 2008.

Page 11: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

Bad ThrougputGood Energy Usage

Good ThrougputBad Energy Usage

Two sockets with 4 cores each

Frequency and Voltage can only be controlled at socket level

Page 12: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

CFS Scheduler by Igno Molnar

Time ordered red-black treeNanosecond granularity accountingEach task has virtual runtime

represents amount of time the task has executedthe smaller a task's virtual runtime, the more the task needs to be scheduledleft node is in most need of CPU time

choosing task is O(1)reinserting takes O(logN)

sched_mc_power_savingsBias to keep workload onsingle package

Page 13: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

Idle-State Power: cpuidleIdle States

Tradeoff between idling power and amount of state a processor saves and the enter/exit latency

Kernel subsystem cpuidleGeneric idle management frameworkClean interface for processor hardware to make use of different idle levelsCreate abstraction between idle-governors and idle-driversMake informed decision about which idle state to enter

kernel has most information about current running applications and idle state characteristics

V. Pallipadi, A. Belay: "cpuidle - Do nothing, efficiently...", The Linux Symposium 2007

Page 14: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

cpuidle Architecture

Contains information about each individual idle state

1. Just before going idle, governor selects best idle state

2. cpuidle invokes the entry point for that particular state in the cpuidle driver

Page 15: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

cpuidle GovernorsLadder Governor

Simple, step-wise approachWorks well with periodic tick based kernels

Menu GovernorAnalyzes different parameters and current context

dyntick expected sleep timeprevious C-state residencyper cpu load averagenumber of processes waiting for I/O on current cpu

Jumps to lowest possible idle state that does not significantly affect performanceAims at getting maximum possible power advantage with little impact on performance

Page 16: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

Tickless Idle: The Timer Interrupt

Traditional systems use a periodic interrupt 'tick'update the system clockprovide a timer interface for schedulingpreviously at 100Hz to 1000Hztick requires wakeup from idle state

Tickless kernelrequires cross-platform clock source interface: hrtimerskip over ticks before next required eventcurrently only affects idle state

S. Siddha, V. Pallipadi, A. Van De Ven. "Getting maximum mileage out of tickless," The Linux Symposium, 2006.

Page 17: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

Tickless Idle: Cross-platform hrtimersMoving to a tickless kernel requires high resolution timingLinux is cross-platform, so changes must be portableOriginal architecture

T. Gleixner, D. Niehaus, "Hrtimers and Beyond: Transforming the Linux Time Subsystems", The Linux Symposium, 2007.

Page 18: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

Tickless Idle: Removing the Timer

Measurements show more idle time:

Some platform-dependent code requiredSmall delay in load-balancing incurred initially

Due to delay in running load balancerFixed by designating one idle CPU as load balancer

Does not prevent inefficient user-space polling

S. Siddha, V. Pallipadi, A. Van De Ven. "Getting maximum mileage out of tickless," The Linux Symposium, 2007.

Page 19: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

Tickless Idle APIsThe tickless patch adds two flags to timers:

__round_jiffies()Rounds jiffy timeout values to the nearest secondPrevents multiple wakeups from staggered interruptsUsed where precise timeout not required

deferabletimersTimer interrupts that are unnecessary when idleInterrupt deferred until processor is activeUsed for ondemand cpufreq governor__next_timer_interrupt() skips over deferrable timersFlag added to last bit of 2-byte aligned address

S. Siddha, V. Pallipadi, A. Van De Ven. "Getting maximum mileage out of tickless," The Linux Symposium, 2006.

Page 20: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

Interrupt Routing/Balancing

HW (nic, hdd, etc.) use interrupts to inform OS of eventsDefault interrupt routing either:

Routes all interrupts to logical CPU0 (1st core of 1st socket)

Could spend a disproportional amount of time processing interrupts under heavy loadsOther CPU0 tasks could starve/perform poorly

Broadcasts to all CPUsNon-ideal if some CPUs are idleCauses unneeded wakeups

V. Pallipadi and V. Srinivasan, “Energy-aware task and interrupt management in Linux”, The Linux Symposium, 2008.

Page 21: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

irqbalance daemon

Userspace solution by IntelIncluded in most distrosUnderstands processor topology and cache domainsClassifies irqs based on performance sensitivity

Network most importantPower Save Mode

All irqs assigned to first logical CPUConsolidates irqs and distributes at lower rateWhen system activity increases:

Increases distribution rateBegins to distribute among different CPUs

Page 22: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

Accurate Energy MeasurementMotivation

Not all problems are solved in kernel spaceUserspace applications need to be written to take advantage of these optimized APIsNeed a way to pinpoint misbehaving applicationsMeasuring should not change system execution signficantly

ToolsPowerTOPBattery Life Tool Kit

Page 23: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

PowerTOPPowerTOP can measure processor states and wakeupsFinds causes of CPU wakeups by thread

KernelUser-level

Help Linux developers test applicationsProvide system tuning suggestions to achieve low power consumptionHelped expose inefficiencies in Firefox and other applications

Page 24: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

PowerTOP

Page 25: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

Battery Life Toolkit (BLTK)

Power consumption difficult to measure reliably and easily External power supplies can effect power modes Modern batteries build in sufficient telemetry Runs battery to exhaustion

Package provides software infrastructure for launching workloads for power performance measurementsNo additional hardware requiredRepeatably measures battery life with precise workloads

Web browsing, DVD playback, office applicationFull scripting ability to simulate input

L. Brown, K. Karasyov, et al, "Linux Laptop Battery Life: Measurement Tools, Techniques, and Results," The Linux Symposium, 2006.

Page 26: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

Outstanding IssuesTask consolidation/load balancing relies on accurate CPU load calculation

CPU load is sampled periodicallyShort lived tasks (e.g. daemons) could not be factored into calculationTasks that are factored into load may not be CPU intensive, instead I/O bound

Consider example workload:make -j 2

compiling kernel w/ 2 threads2 socket dual core machine

CPU0/1 and CPU2/3 are core siblingssched_mc_power_savings = 1

Page 27: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions
Page 28: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

WorkloadLarge number of short lived jobsHigh rate of process creation/destructionMix of IO operations (not CPU bound)No imbalance detected so continue to wakeup idle CPUs

2 core caseUtilization high enough to raise ondemand frequency

4 core caseBoth threads jumped between all 4 processors

Scheduler could never get accurate CPU loadCPU utilization not high enough to trigger ondemand

Page 29: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

How to keep Userspace QuietDo not poll periodically for status change

Use some sort of event notificationhal (hardware abstraction layer) daemon used to poll for media changes

New SATA supports Asynchronous NotificationUse intelligent mechanisms to minimize periodic timers

Group them and expire at same instanceUse provided API to help

g_timeout_add_seconds() consolidates glib timers to the nearest second

Use tools like PowerTOP and BLTK

Page 30: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

Conclusions

�Linux power management has become a serious focus in the past 10 yearsSeveral techniques have seriously improved efficiency:

Frequency stepping with the ondemand governorTickless kernels to improve idle timeEnergy-aware multicore scheduler improvements Improvements to the userspace software stack

Work is in progress on several other improvements:VirtualizationPower-aware scheduling of short-lived tasks

Page 31: Power Management in the Linux Kernel - Rice Universitymobile/elec518/lectures/2011-tatepeter.pdf · Power Management in the Linux Kernel Tate Hornbeck, ... Provide system tuning suggestions

References[1] S. Siddha, V. Pallipadi, A. Van De Ven, "Getting maximum mileage out of tickless," The Linux Symposium, 2007.[2] V. Pallipadi, A. Starikovskiy, "The Ondemand Governor: Past, Present, and Future," The Linux Symposium, 2006.[3] T. Gleixner, D. Niehaus, "Hrtimers and Beyond: Transforming the Linux Time Subsystems," The Linux Symposium, 2006.[4] V. Pallipadi, A. Belay, "cpuidle -- Do nothing, efficiently...," The Linux Symposium, 2007.[5] http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/[6] http://www.irqbalance.org/documentation.php