406 bruce worthington_windows_server_power_efficiency_slideshare
TRANSCRIPT
CMG‘08 INTERNATIONAL conference
WINDOWS SERVER POWER EFFICIENCY
Dr. Bruce WorthingtonPrincipal Software Development Lead
Windows Server PerformanceMicrosoft Corporation
CMG‘08 INTERNATIONAL conference
Server Power Ground Rules TANSTAAFL: Everything is a trade-off
Performance, Power, Functionality, Capacity, Cost, Reliability, Availability, Manageability, Maintainability, Usability, Environmental Impact, Lifetime, Footprint, Security, Morale
Saving Power Power EfficiencyMore work at fixed power level, or Less power at fixed work level
Shifting component power efficiencies
CMG‘08 INTERNATIONAL conference
Outline Motivation Background Windows Server 2003 2008 Windows Server 2008 R2 Power Diagnostics and Control Summary
CMG‘08 INTERNATIONAL conference
Rising Cost of Ownership From 2000 to 2006
Computing performance: 25xEnergy efficiency: 8xUS electricity cost: 1.35xPower per $1K of server: 4xServer(+) world electricity: >2x
○ >1% of total world production
Datacenters use 2% of all US electricity
CMG‘08 INTERNATIONAL conference
Scale: Kilowatts Megawatts Idle high-performance servers
50-80% of max power draw2-sockets ~ 250 W4-sockets ~ 500 W 8-sockets ~ 1000 W
25 15Krpm 2.5” disks + SAN = 3U~ 300/450 W (idle/active)
10,000 2-socket 1U servers ~ 1-3 MW Datacenter “container” ~ 0.5 MW
~1500 servers + storage + infrastructure
CMG‘08 INTERNATIONAL conference
Datacenter Energy Demand Data centers are energy intensive facilities
Server racks now designed to carry 25 kW loadSurging demand for data storageTypical facility ~ 1MW, can be > 20 MW (even 200 MW)Nationally 1.5% of US Electricity consumption in 2006
○ Doubling every 5 years Significant data center building boom,
Power and cooling constraints in existing facilitiesGrowing demand for compute cyclesGrowing computing performance Commoditized hardwareDeclining cost of computing
CMG‘08 INTERNATIONAL conference
15 MW Datacenter Monthly Costs“Good” (PUE=1.7) Internet-scale datacenter with DAS
Servers$3,000,000
Infrastructure$1,800,000
Power$1,000,000
3 yr server and 15 yr infrastructure amortization
CMG‘08 INTERNATIONAL conference
Air Movement12%
Electricity Transformer/
UPS10%
Lighting, etc.3%
Cooling25%
IT Equipment50%
Source: EYP Mission Critical Facilities Inc., New York
Other than a common power source they are not connected.
Datacenter Costs Breakdown - 2
CMG‘08 INTERNATIONAL conference
Datacenter Costs Breakdown - 1
CMG‘08 INTERNATIONAL conference
Electricity Use by End-Use: 2000 - 2006
CMG‘08 INTERNATIONAL conference
Environmental Impact Governments, businesses, and
organizations are trying to reduce the production of greenhouse gases
New EPA Energy Star mandates for enterprise server power efficiencies
CMG‘08 INTERNATIONAL conference
Outline Motivation Background
ACPI Power StatesComponent Power
Windows Server 2003 2008 Windows Server 2008 R2 Power Diagnostics and Control Summary
CMG‘08 INTERNATIONAL conference
ACPI Power State Definitions Performance states (P-states)
Dynamic voltage and frequency scalingMore than linear savings (cubic function)
Throttle states (T-states)Linear scaling of CPU clock
“Power” states (C-states)Low-power idle (CPU “sleep”) statesTurn off increasing amounts of silicon in package
System sleep states (S-states)On, standby, hibernate, offMS has not encouraged S-state support for servers
○ Changing with the increased focus on power
CMG‘08 INTERNATIONAL conference
ACPI Power State State Machine• For entire system
○ Global System States (G-States)○ Sleeping States (S-States)
Standby (S1), Hibernate (S2), … For processor only
Processor Performance States (P-States)○ Different processor frequency and
voltage Processor Throttling States (T-States)
○ Processor clock throttling to reduce processor utilization (and capacity)
Processor Power States (C-States)○ Processor is executing instructions
(C0)○ Processor is idle (C1, C2, …)
Other devices Device Power States (D-States)
○ Similar as C-States, but are for devices other than processors
G3 -Mech Off
Legacy
WakeEvent
G0 (S0) -Working
G1 - Sleeping
S4S3
S2S1
Power Failure/Power Off
G2 (S5) - Soft Off
BIOS Routine
C0
D0D1
D2D3Modem
D0D1
D2D3HDD
D0D1
D2D3
CDROM
C2C1
Cn
Performance State Px Throttling
C0
CPU
CMG‘08 INTERNATIONAL conference
ACPI Specification Versions WS03 complies with ACPI 2.0 WS08 complies with ACPI 3.0
Multiprocessor○ Dependent (ganged) and independent control○ Independent control w/ dependent behavior
(may transition or not based on other processors’ states)
MS has some ideas for ACPI 3.5
CMG‘08 INTERNATIONAL conference
ACPI Power State Dependencies Dependency Domains for ACPI power states (assumes S0)
Logical processors in the same domain should have the same C-state, P-state, or T-state
No dependence between a processor’s C-state domain, P-state domain, or T-state domain
OS control mechanisms based on dependency relationships Dependent control: Transitioning one processor to a new state
causes other processor(s) to transition to the same state Independent control: Transitioning one processor to a new P‑state
or T‑state does not affect other processors’ power states Independent control, dependent behavior: Transitioning one
processor to a new P‑state or T‑state may or may not transition other processor(s) to the same state based on the current state of the other processor()s that share this relationship
CMG‘08 INTERNATIONAL conference
P-States Windows processor performance states are
enabled by default Power policy allows flexible use of performance
statesValues for min / max processor speedExpressed as a percentage of maximum
processor frequencyWindows will round up to the nearest available state
Processor- and workload-dependent impactE.g., one system configuration was determined to have
insignificant perf impact from capping P-states at P1, but significant power savings
CMG‘08 INTERNATIONAL conference
Demand Based Switching
Power policy will always use DBS between the range defined by min / max frequency
Full range or subset of available P-statesPolicy may be set to use only one performance state (min / max / intermediate)
Will not include linear clock throttle states
CMG‘08 INTERNATIONAL conference
P-State Policy: Frequencies - 1Example: Processor state power policy
Note: This is the default policy in WS08Intended to minimize performance hit
State Freq % Type0 2800 100 Performance1 2520 90 Performance2 2142 85 Performance3 1607 75 Performance4 964 60 Performance5 482 50 Performance
Maximum Processor State
Minimum Processor State
CMG‘08 INTERNATIONAL conference
P-State Policy Settings Example: Processor state power policy
Using a subset of available states Can use any contiguous range Some performance loss (may not be significant) unless P0 included (targets
minimal perf hit)
State Freq % Type0 2800 100 Performance1 2520 90 Performance2 2142 85 Performance3 1607 75 Performance4 964 60 Performance5 482 50 Performance
Maximum Processor State
Minimum Processor State
CMG‘08 INTERNATIONAL conference
P-State Policy: Frequencies - 3Example: Processor state power policy
Locking processor at one stateAny available state may be selectedSome performance loss (may not be significant) unless P0 is the state chosen (a la High Perf mode)
State Freq % Type0 2800 100 Performance1 2520 90 Performance2 2142 85 Performance3 1607 75 Performance4 964 60 Performance5 482 50 Performance
Min & Max Processor State
CMG‘08 INTERNATIONAL conference
Setting P-State Parameters
CMG‘08 INTERNATIONAL conference
Use Perfmon to Monitor P-State
Processor Performance / % of Max Frequency
CMG‘08 INTERNATIONAL conference
T-States
Linear clock throttle states (T-states)Compared to P-states, T-states do not save energy when performing identical workloadsHowever, throttle states may be useful for some scenarios (thermal overload)By default, WS08 uses T-states only if P-states are unavailable or in case of thermal overloadNo DBS: only the Maximum Processor State parameter is used
CMG‘08 INTERNATIONAL conference
T-State Policy: FrequenciesDefault use of linear throttle statesPerformance is directly affected by throttling
State Freq % Type0 2800 100 Performance1 2520 90 Performance2 2380 85 Performance3 2100 75 Performance4 1680 60 Performance5 1400 50 Performance6 1400 50 Throttle7 1120 40 Throttle8 840 30 Throttle9 560 20 Throttle
DBS Allowed
No DBS Allowed
CMG‘08 INTERNATIONAL conference
Power Capping / Budgeting Enforcing per-server power limits (static or dynamic)
Calculations based on “plate rating” are often over-configured○ Stranded capacity
OS may not be able to respond fast enough to enforce hard limits when power spikes
Typically lower-power P-states attempted, then T-states engaged as necessary○ OS might not get a good estimate of the resulting effective frequency○ Monitoring applications and diagnostic tools may give incorrect data○ Opposite strategy from OS, where P-states move towards higher
performance modes when load increases Potentially huge (and potentially unexpected) hit in performance
right when it is most vital○ Sudden hardware throttling should be last resort
CMG‘08 INTERNATIONAL conference
C-States Although hardware may support more than 3 C-
states, Windows only utilizes a maximum of 3. But that doesn’t mean Windows only uses the first three hardware C-states:C1 = hardware C1C2 = hardware C?
○ Lowest-power consuming c-state with _CST of type 2C3 = hardware Cn
Wouldn’t expect P-state to affect C-state power, but it does on some processorsWS08R2 handles this by providing the capability to drop
to Pn before transitioning to C-state
CMG‘08 INTERNATIONAL conference
Processor Power Management - 1 CPUs have increasing number and ranges
of P-states and C-states Ballpark expectations per socket:
A few watts per P-stateTens of watts for lowest C-state(s)
Varying impact to server throughput and responsiveness
Mature, reliable technologySignificant deployments in mobile and desktops
CMG‘08 INTERNATIONAL conference
Processor Power Management - 2 No user intervention required Managed by the operating system Balances power savings with CPU
utilizationKernel selects target P-state based on
processor utilization history, Windows power policies, thread scheduler, system heuristics, node/socket/HW thread hierarchy
Transition processor to “sleep” C-states when idle (i.e., no thread to run on that processor)
CMG‘08 INTERNATIONAL conference
Processor Power Management - 3 Windows’ power policy includes various
parameters that influence how the kernel chooses target power states
Low voltage/power processors must be evaluated and targeted for the right scenarios Reduces OS power management flexibilityAdditional servers are required if the
workload is CPU-bottlenecked
CMG‘08 INTERNATIONAL conference
Hardware Support The correctness of all PPM tools and settings
relies on accurate hardware / firmware supportBroken BIOSes found in some previous-generation
serversReporting
○ Initialization of ACPI tables (e.g., power states, memory and I/O controller locations)
○ P-state and C-state monitoringControlling
○ PPM algorithm depends on correct historical information○ HW should comply/cooperate with OS power state requests
CMG‘08 INTERNATIONAL conference
Processor Power ManagementWorking together with OEMs/IHVs - 1 Hardware must support PPM capabilities ACPI namespace must describe capabilities and contain
processor objects On a processor there may be multiple independently-managed
power planes, potentially shared between components, such as: Cores, Caches, Memory Controllers, and Bus/Serial interface(s) to
other processors or IO components The performance impacts of turning off various pieces of silicon must
be carefully weighed and understood○ Snooping caches must be flushed before being shut down○ Memory or IO channels attached to a package must still be accessible by
other packages○ Bus/Serial interfaces must be running for active caches, memory, or IO○ Different components have different power-up delays from the various
power states they support
CMG‘08 INTERNATIONAL conference
Collaborative Power Budgeting Ideal WS08R2 strategy Platform guarantees operation within the allocated
budget (HW Fail-safe) OS scales power/perf according to workload and
respects platform notifications New R2 Beta option: OS specifies target utilization
and HW selects P-states accordingly Otherwise, if the OS and HW are fighting for power
management control, both power and performance will suffer Hardware-directed power control settings are on by default in
some BIOSes
CMG‘08 INTERNATIONAL conference
Servers Defaulting to Hardware-Controlled Power Mgmt Hardware-directed power control settings are on
by default in some BIOSesPlatform alters P-states, C-states, T-states, and/or D-
states without OS information○ One alternative is to have platform dynamically restrict the
available states and update the OS via ACPI (<= 2 Hz)May take over processor performance counters!
○ Obviously this is a big concern when using performance monitoring tools that utilize the on-CPU counters
CMG‘08 INTERNATIONAL conference
Outline Motivation Background
ACPI Power StatesComponent Power
Windows Server 2003 2008 Windows Server 2008 R2 Power Diagnostics and Control Summary
CMG‘08 INTERNATIONAL conference
Component Power Metering
• Only a small set of server models provide the functionality of component power reporting
• Extra HW instrumentation (or fragile probing) is needed to monitor the component power usages for most platforms • Simplest alternative is to populate and
then take away any removable components and track the overall system power delta
CMG‘08 INTERNATIONAL conference
Example Component Power Distribution #1 Idle 3-Year-Old 4-Socket Single-Core Server
CMG‘08 INTERNATIONAL conference
Example Component Power Distribution #2Idle 4-Socket Quad-Core Server
CMG‘08 INTERNATIONAL conference
Example Component Power Distribution #3
CPU (2)
46%
PCI Cards (3)17%
SCSI HDD (4)
12%
Mobo, 8GB RAM18%
Other7%
Processor power management represents the best opportunity today
Source: Intel Server Products Power Budget Analysis Toolhttp://www.intel.com/support/motherboards/server/sb/cs-016976.htm
CMG‘08 INTERNATIONAL conference
Selecting Memory Components Lots of permutations for a given capacity
Family (e.g., DDR#)○ FB DIMMs draw more power
DIMM count○ Especially for FB, where bus may decrease frequency if enough DIMMs
Bus frequencies Ranks Density Data width Channel count
Low power memory must be evaluated and targeted for the right scenarios Additional servers are required if the workload is memory-
bottlenecked
CMG‘08 INTERNATIONAL conference
Memory Power Savings Select the right type and number of DIMMs for the
workload Reduce memory accesses
Overall○ Smaller working set○ Better cache hit ratios○ Probably better performance, too
More memory power statesCompare server memory idle characteristics to mobile
memoryDeeper self-refresh states
○ Takes memory longer to come out of deeper states
CMG‘08 INTERNATIONAL conference
“Green Memory”
Tech Marking Datarate Capacity Density DQ RanksPower/DIMM
DDR2 PC2-5300 667Mhz 1GB 256Mb x4 DR 18.1WDDR2 PC2-5300 667Mhz 1GB 256Mb x8 QR 18.6WDDR2 PC2-5300 667Mhz 1GB 512Mb x4 SR 7.6WDDR2 PC2-5300 667Mhz 1GB 512Mb x8 DR 7.8WDDR2 ECC 667Mhz 1GB 1Gb x16 DR 6.1WDDR2 No ECC 667Mhz 1GB 1Gb x16 DR 5.5W
No "by 16" part with 4Gb densityDDR2 PC2-5300 667Mhz 4GB 1Gb x4 DR 14.0WDDR2 PC2-5300 667Mhz 4GB 1Gb x8 QR 14.4WDDR2 PC2-5300 667Mhz 4GB 2Gb x4 SR 8.6WDDR2 PC2-5300 667Mhz 4GB 2Gb x8 DR 8.8W
CMG‘08 INTERNATIONAL conference
Networking Power NIC idle power (examples)
100 Mb 1 W1Gb 5 WQuad 1Gb 5-9 W10Gb 10-15 WQuad 10Gb 17 W
Don’t forget network switch power Windows Networking Optimizations
NDIS DPC timer period Wake-on-LAN (see content in WinHEC 2008)Low Power on Network Disconnect
CMG‘08 INTERNATIONAL conference
Hard Disk Power Decreasing radius
Cubic power relationship (Power~Radius3) 3.5” 15K RPM drive = ~12/18 W (idle/active) 2.5” 15K RPM drive = ~6/9 W (idle/active)
Decreasing rotational speed Quintic power relationship (Power~RPM5) 15K RPM = 2 ms avg rotational delay (serial workload) 10K RPM = 3 ms avg (~3-4 W idle) 7.2K RPM = 4 ms avg (may have slower seek as well)
Frequently spinning down enterprise drives not advisable (yet)
CMG‘08 INTERNATIONAL conference
Storage Controller Power HBA / storage connection interface
E.g., PCI-X and PCI-e cards 5-8W idle
Array ControllerE.g., small SAN ctlr (2U) = 200/300 W
(idle/active in direct attached mode)
Disk InterfaceSCSI: 80 160 320 GB/sFC: 1 2 4 8 Gb/sSAS/SATA: 1.5 3.0 6.0Gb/s
CMG‘08 INTERNATIONAL conference
PCI-Express Power Management Support for Active State Power
Management (ASPM)a k a, Link State Power ManagementIn-box power policy for ASPM stateRequires OS control of PCI Express
featuresAvailable white paper
CMG‘08 INTERNATIONAL conference
Power Supply Efficiency Power Factor: phase delta between input
voltage and currentActive Power Factor Correction (PFC)
Ratio of input:output power (ACDC)Entropy means 100% efficiency is unobtainableDefault supplies at 70%; new models up to 85%
Previous power supplies were often optimized for high workload levels, but most servers run at 5-20% of capacity (for now)
Decreases power without decreasing perf
CMG‘08 INTERNATIONAL conference
Power Supply Efficiency“80 Plus” Requirement for Energy Star (July ‘08) 80% minimum efficiency at 20%, 50%,
and 100% of rated outputPrevious power supplies often optimized for
high loads, but most servers run at 5-20% Minimum power factor of 0.9 or greater at
100% of rated output Decrease power without decreasing perf
CMG‘08 INTERNATIONAL conference
Power Supply Waste PowerEfficiency Output
PowerRequired
Input PowerWaste Power
Waste Power Cost per Annum
70 (default) 500 W 714 W 214W $183.15
80 (near 80 plus Bronze) 500 W 625W 125W $106.98
85 (80 plus silver) 500 W 588 88W $75.31
90(above 80 plus gold) 500 W 555 55W $47.07
CMG‘08 INTERNATIONAL conference
Fan Power Fans in some 1U servers consume 15-
20% of overall system power Fixed vs. variable-speed fans Decrease power without decreasing perf
CMG‘08 INTERNATIONAL conference
Outline Motivation Background Windows Server 2003 2008
OverviewServer Power Measurements
Windows Server 2008 R2 Power Diagnostics and Control Summary
CMG‘08 INTERNATIONAL conference
Windows Server 2003 ACPI 2.0 compliant Windows processor driver required for
specific CPU make/model Requires selecting appropriate power policy Each system power policy includes a
processor throttling policyHighest (default), lowest, or full range of P-states
OEMs or server administrators may create additional power plans
CMG‘08 INTERNATIONAL conference
Windows Server 2008 - 1 ACPI 2.0 and 3.0 compliant Native OS support for PPM on
multiprocessor systems Default power settings refined for each
release (including WS08R2) Windows Server 2008 & SP2
Simplified configuration modelGroup Policy over power settingsPower management enabled by default
(“Balanced Mode”)
CMG‘08 INTERNATIONAL conference
Power Plans
Power Plans Min P-state Max P-stateBalanced 5% 100%
Power Saver 5% 50%
High Performance 100% 100%
CMG‘08 INTERNATIONAL conference
Windows Server 2008 - 2 T-states used only when no P-states available Power management parameterization for
improved flexibility of P- and T-state algorithmsAdditional tunings available for OEMs to customize to
processor, chipset, platform, role, etc. Improved C3 support Very hard to generalize, but 2-10%
improvement in power efficiency observed at mid-to-low utilization levels (vs. 2003)
CMG‘08 INTERNATIONAL conference
Processor Power ManagementWindows Server Releases Fully supported by WS03, WS08, and
WS08R2 Feature parity with Windows client
operating systemsFor example, WS08 has full support for:
○ ACPI 2.0, 3.0 processor objects, Notify() events
○ Power policy for tuning Operating System (OS) target state algorithms
○ Deep idle C-states
CMG‘08 INTERNATIONAL conference
Default Power Parameters - 1
* = May not appear in Control Panel options by default
PPM parameters
Non-PPM parameters
CMG‘08 INTERNATIONAL conference
Default Power Parameters - 2
How frequent
Change P-state or not
How to change
CMG‘08 INTERNATIONAL conference
Default Power Parameters - 3
Entry idle, promote only
Deep idle, demote only
CMG‘08 INTERNATIONAL conference
Idle Improvement Techniques Shut down unnecessary services,
applications, roles, devices, drivers Avoid polling and spinning in tight loops Avoid high-res periodic timers (<10 ms)
CMG‘08 INTERNATIONAL conference
Outline Motivation Background Windows Server 2003 2008
OverviewServer Power Measurements
Windows Server 2008 R2 Power Diagnostics and Control Summary
CMG‘08 INTERNATIONAL conference
Measuring Power Few existing Windows servers are equipped with
comprehensive power metering capabilitiesIn the future, servers are likely to have onboard power
meters○ AC power (into the power supply)○ DC power (out of the power supply)○ For individual components (CPU, RAM, IO, fans, disks, …)
The Windows Server Performance team has resorted to two strategies:Metering at the wall (AC)Directly probing specially manufactured server
motherboards (solder and data acquisition)
CMG‘08 INTERNATIONAL conference
Measuring Power EfficiencyWhich Watts/Amps to measure? Total server (wall)
power External power
Network switches/hubsStorage (disks, array
controllers, SANs)Power distribution and
conditioningHVAC
Internal component power Processor package
○ Threads, cores, caches, memory controllers, cross-package interconnect controllers, IO controllers (e.g., PCI-E)
Memory (controllers, DIMMs, ranks, banks)
Chipsets (north bridge, south bridge, IO controllers)
Power supplies○ AC in, multiple DC out○ Redundant (active/active,
active/passive) IO (network, storage, video, USB)
○ Embedded components and expansion cards
Fans and other internal misc.
CMG‘08 INTERNATIONAL conference
Measuring Power Efficiency Traditional performance benchmarks optimize for high
throughput or low response time by using all resources The load line approach tracks power use as load varies
Pick a power point and see how much load can be handled Pick a load point and see how much power is required
Workload breadth Database, web server, file server, etc.
MS uses SPECpower (a la SPECjbb) and is adding customer-accepted performance benchmarks TPC-C/E/H, SpecWeb, NetBench, SAP, SPEC, … Semi-internal: FSCT, LCW2, Web Fundamentals, TermSrv,
PerfGates, …
CMG‘08 INTERNATIONAL conference
Measuring Power EfficiencyWhich workloads to test? Workload breadth
Database, web server, file server, etc.○ Need to prioritize based on potential for power savings and for broadest
customer coverage Each has unique “work accomplished” metrics (e.g., ops per second)
Industry standard workloads, such as SPEC and TPC Custom workloads designed to test power scenarios Microsoft is currently using SPECpower and customer-
accepted performance benchmarks to convey power efficiency TPC-C/E/H, SpecWeb, NetBench, SAP, SPEC, … Semi-internal: FSCT, LCW2, Web Fundamentals, TermSrv,
PerfGates, …
CMG‘08 INTERNATIONAL conference
Industry Standard Workloads SPEC
SPECpower is the only standardized benchmark at this point○ Single workload defined to date
Order processing for a wholesale supplier running typical Java business applications Basically SPECjbb with some changes Minimal I/O and kernel time
○ Other SPEC benchmarks could have a “power” version, and each one may or may not be modified from the “perf” version
TPC Could add a power metric to each of their existing benchmarks, but details
are still being worked out○ What is server power vs. storage power?○ What needs to be installed in the audited server?
I suspect they will stick to the same approach used for pricing, in that the system has to be available as a purchasable product
What about the “price” of power?○ Etc.
CMG‘08 INTERNATIONAL conference
Measuring Power EfficiencyWindows Server Performance Lab Methodology for obtaining power load line data for
TPC-C, TPC-E, FSCT, and Web Fundamentals have been demonstratedBenchmark loads varied by throttling number of active usersMultiple workloads tested in Hyper-V environment
SPECpower has been successfully tuned Data has been gathered on 2-, 4-, and 8-socket
systems with various processorsWall-socket power measurementsComponent power measurement by brute force (device
extraction)
CMG‘08 INTERNATIONAL conference
Varying Load Levels
68
Iteration SPECpower(Reduce load)
TPC-E(Reduce users)
FSCT(Increase users)
1 100% load 100% of max users 0 users2 90% load ~90% of max users 10% of max users3 80% load ~80% of max users 20% of max users4 70% load ~70% of max users 30% of max users5 60% load ~60% of max users 40% of max users6 50% load ~50% of max users 50% of max users7 40% load ~40% of max users 60% of max users8 30% load ~30% of max users 70% of max users9 20% load ~20% of max users 80% of max users
10 10% load ~10% of max users 90% of max users11 0% load 0 users 100% of max users
Similar strategy used for Web Fundamentals
CMG‘08 INTERNATIONAL conference
Testbed
Server Storage(Database Workloads)
Clients /Controller
CMG‘08 INTERNATIONAL conference
HW and SW Test Configurations Sample platforms
2-socket and 4-socket quad-core 8-socket dual-core x64 (AMD and Intel); ia64
Hardware- and software-controlled power management modes
WS03, WS08, WS08SP2 (prerelease), and WS08R2 (prerelease)
Windows power schemes Balanced, Higher Performance, Power Saver, … P-State settings and heuristics C-State settings and heuristics Parameterized power management optimizations
○ E.G., core parking, tick skipping
CMG‘08 INTERNATIONAL conference
SPECpower: WS03 and WS08
0% 20% 40% 60% 80% 100%60%
70%
80%
90%
100%W2K3.SP1 W2K8.RTM W2K8.SP2
Workload (% of Max ssj_opts)
Pow
er (%
of M
ax W
atts
)
2 sockets, 8 cores total
CMG‘08 INTERNATIONAL conference
SPECpower & FSCT: WS03 and WS08
SPECpower throughput and power at different workload levels
on a 4-socket quad-core system
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%60%
65%
70%
75%
80%
85%
90%
95%
100%
Windows Server 2003 Windows Server 2008
Workload (% of maximum throughput)
Pow
er (
% o
f max
imum
wat
ts)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%60%
65%
70%
75%
80%
85%
90%
95%
100%
Windows Server 2003 Windows Server 2008
Workload (% of maximum throughput)
Pow
er (%
of M
axim
um w
atts
)
FSCT throughput and power at different workload levels
on a 2-socket dual-core system
CMG‘08 INTERNATIONAL conference
TPC-E: WS03 and WS08TPC-E power usage at varying workload levels
TPC-E power efficiency (tpsE/Watt) at varying workload levels
0% 20% 40% 60% 80%70%
75%
80%
85%
90%
95%
100%
Windows Server 2003 Windows Server 2008
Workload (% of maximum tpsE)
Wat
ts (%
of m
axim
um)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%0%
20%
40%
60%
80%
100%
Windows Server 2003 Windows Server 2008
Workload (% of maximum tpsE)
tpsE
/Wat
t (%
of m
axim
um)
CMG‘08 INTERNATIONAL conference
OOB Windows Server 2008
0% 20% 40% 60% 80% 100%0%
20%
40%
60%
80%
100%
0%
20%
40%
60%
80%
100%
power ssj_ops per Watt
Workload (% of max ssj_ops)
Pow
er (%
of M
axim
um)
ssj_
ops
per W
att (
% o
f Max
imum
)
SPECpower throughput (ssj_ops) and power at varying workload levels
Processor utilization and frequency as SPECpower workload decreases over time
0 4 8 1216202428323640444852566064680%
20%
40%
60%
80%
100%
70%
75%
80%
85%
90%
95%
100%
Processor Utilization Processor Frequency
Time (minutes) with decreasing workload
Ave
rage
Pro
cess
or U
tiliz
atio
n
Proc
esso
r Fre
quen
cy (%
of M
axim
um)
Time (min)
CMG‘08 INTERNATIONAL conference
TPC-E: Windows Server 2008
30 39 48 57 66 75 84 93 102 111 120 129 1380%
20%
40%
60%
80%
100%
120%
Distribution of P-States as workload decreases over time
Time (minutes)
Cum
ulat
ive
P-St
ate
Dis
tribu
tion
P0
P1
P2P3
P4
C1
CMG‘08 INTERNATIONAL conference
WS03/IIS6 and WS08/IIS7
4 quad-core CPUs, 16 GB, RAID-5 arrayMeasured Projected
Server Config
Active Clients
Avg Watts kWh / yr Cost Kg of CO2
WS03, IIS6 0 468 4100 375 3190
WS08, IIS7 0 457 4000 357 3110
WS03, IIS6 20 537 4700 430 3660
WS08, IIS7 20 500 4380 401 3410
CMG‘08 INTERNATIONAL conference
Outline Motivation Background Windows Server 2003 2008 Windows Server 2008 R2
Server Energy VisionIdle Power OptimizationsCore ParkingHyper-V (v2)Power Metering and BudgetingSSD
Power Diagnostics and Control Summary
CMG‘08 INTERNATIONAL conference
Windows Server Core Energy Vision Dynamic Data Center
Coordination across all data center components to scale infrastructure and computing according to business needs
Scalable Node: Server power efficiencyLow idle power consumptionPower consumption should scale with load
CMG‘08 INTERNATIONAL conference
Dynamic Data Center Holistic approach spanning all infrastructure
not just the computing nodes Reducing waste and optimizing performance
Scaling and migrating workloadsCoordination with power and cooling systemsWatch out for over-eager workload consolidation
or low-power component acquisition Building platform and management
infrastructure
CMG‘08 INTERNATIONAL conference
Dynamic Data Center – The Problem Addressing energy consumption in the data center
requires a holistic approach spanning all infrastructure not just the computing nodes
Many factors affect how a data center consumes energy Hardware, workload, time of day/week/year, locality, etc. Data centers are generally statically configured for peak load
Tremendous opportunities for reducing waste and optimizing performance exist Scaling and migrating workloads across groups of machines Coordination with power and cooling systems Opportunities also exist for unexpected reduction in computing
capacity through over-eager workload consolidation or low-power component acquisition without proper planning / testing
CMG‘08 INTERNATIONAL conference
Dynamic Data Center – The Vision Enable the management of aggregate
servers in conjunction with data center infrastructure
Deliver this through building platformand management infrastructurePower metering and budgetingVirtualization and workload migrationStandards-based management technologiesCoordination between in-band and out-of-band
management systems
CMG‘08 INTERNATIONAL conference
Scalable Node Today power consumption does not scale in line
with server utilizationTypical commodity servers consume 50-70% of the
maximum power when completely idleBasic approaches:
○ Increase server utilization via virtualization○ Reduce power when full performance not needed○ Power down / put to sleep excess servers
Work with partners to provide the best power and performance by managing the system efficiently
Windows power management improvements
CMG‘08 INTERNATIONAL conference
Scalable Node – The Problem Today power consumption does not scale in line
with server utilizationTypical commodity servers consume 50-70% of the
maximum power when completely idle○ Idle servers have low efficiency due to high idle power○ Efficiency rises with utilization due to idle power amortization
Tremendous opportunities exist for reducing energy needs○ Reduce power when full performance is not required○ Leverage virtualization solutions to increase server utilization○ Power down servers when they are not needed
CMG‘08 INTERNATIONAL conference
Scalable Node – The Vision Work with partners to provide the best power
and performance by managing the system efficiently
Deliver this through improvements to Windows Power Management Build on existing infrastructure and extend
Windows value Enhancements to processor power management Focus on idle and low-to-medium workload levels Support for device performance states
CMG‘08 INTERNATIONAL conference
Windows Server 2008 R2 - 1 Refined “Balanced Mode” defaults to optimize
power efficiency Takes advantage of advances in server platform
hardware (e.g., powering down individual cores or sockets)
Configurable power settings for new features (e.g., core parking)
P-state and C-state selection algorithms updated Increased support for joint OS/HW power
management
CMG‘08 INTERNATIONAL conference
Windows Server 2008 R2 - 2 Simplified configuration model Group Policy control over all power settings Rich command line interface and refined UI
elements In-band WMI power metering and budgeting
support Remote manageability of power policy via
WMI Additional qualification logo to indicate
enhanced power management support
CMG‘08 INTERNATIONAL conference
Windows Device Power Management Extensible power policy infrastructure
Allows easy incorporation of power management-enabled devices○ Device power settings integrate with Windows
system power policy○ Device power settings can appear in
Advanced power UI○ Rich notification support
Allows for true OEM power management innovation and value
CMG‘08 INTERNATIONAL conference
Enhanced Power Management Logo Additional Qualification logo for
“Enhanced Power Management” that indicates support for the following:Processor power management through
WindowsPower metering and budgetingPower On/Off via WS-Management
(SMASH)
CMG‘08 INTERNATIONAL conference
Windows Server 2008 P-State Parameters
Balanced Mode Settings WS08 R2 Pre-
Beta WS08R2
Time Check 100 ms 100 ms 50 msIncrease Time 100 ms 100 ms 50 msDecrease Time 300 ms 100 ms 50 msIncrease Percentage 30% 70% 80%Decrease Percentage 50% 30% 70%Domain Accounting Policy 0 (On) Always Off Always Off
Increase Policy IDEAL (0) IDEAL (0) SINGLE (1)Decrease Policy SINGLE
(1) SINGLE (1) IDEAL (0)
CMG‘08 INTERNATIONAL conference
Optimized for Low-to-Medium Loads Even though 100% utilization may have the
highest power efficiency, few servers run at full capacityServers at maximum utilization provide less
opportunities for power optimizations In the short term, targeting low utilization
servers will provide most benefit In medium term, targeting medium utilization
servers will provide increased benefitE.g, consolidation and virtualization will increase
average utilization levels
CMG‘08 INTERNATIONAL conference
Outline Motivation Background Windows Server 2003 2008 Windows Server 2008 R2
Server Energy VisionIdle Power OptimizationsCore ParkingHyper-V (v2)Power Metering and BudgetingSSD
Power Diagnostics and Control Summary
CMG‘08 INTERNATIONAL conference
Get Idle; Stay Idle Shut down unnecessary services, applications, roles,
devices, drivers Avoid polling and spinning in tight loops Avoid high-res periodic timers (<10 ms) Timer Coalescing Intelligent Timer Tick Distribution (ITTD) Use NUMA-based affinity for threads and interrupts
Thread (via APIs and tools): soft (IdealProc), hard (affinity mask) Interrupts (via IntPolicy.exe)
Idle improvements extend to Hyper-V Significant reduction in platform interrupt activity Enables power savings and greater scalability
CMG‘08 INTERNATIONAL conference
Timer Coalescing Platform energy efficiency can be improved by extending
idle periodsNew timer coalescing API enables callers to specify a tolerance for
due timeEnables the kernel to expire multiple timers at the same time
Extensions should integrate with WS08R2 API/DDI
Timer tick15.6 ms
Periodic Timer Events
Windows 7
Vista
CMG‘08 INTERNATIONAL conference
Intelligent Timer Tick Distribution (Tick Skipping) Extend processor sleep states by not waking
the CPU unnecessarily CPU 0 handles the periodic system timer tick;
other processors are signaled as necessary Non-timer interrupts will still wake sleeping
processors Not available on IA64 Only enabled on systems with more C-states
than just C1
CMG‘08 INTERNATIONAL conference
Background Process Management Background activity on the macro scale (minutes, hours) is
also important for powerE.g., disk defragmentation, AV scansPrevents low-power idle and sleep modesWill collapsing multiple background activities result in a
significantly heavier load during that interval and thus potentially impede concurrent foreground activity?
Unified Background Process Manager (UBPM)New WS08R2 infrastructureDrives scheduling of services and scheduled tasksTransparent to users, IT pros, and existing APIsEnables trigger-starting servicesDelivers usage data and metrics to Microsoft via CEIP
CMG‘08 INTERNATIONAL conference
UBPM: Trigger-Start Services Many services configured to Autostart and wait for rare
events UBPM enables Trigger-Start services based on
environmental changesDevice arrival/removal, IP address change, domain join, etc.Examples
○ Bluetooth service is started only if a Bluetooth radio is currently attached○ BitLocker encryption service started only when new volumes detected
ISV Call to ActionLeverage trigger-start capability for value-add servicesValidate performance impact with XPerf tools
○ Performance impact can be positive or negative
CMG‘08 INTERNATIONAL conference
Coordinated Processor Clocking Control New processor performance state interface
described via ACPI Feature enables OS and HW platform
coordination of processor power managementPlatform is in direct control of T-states and P-statesOS dynamically specifies processor performance
requirements on per-processor basis as a percentage of maximum frequency
Platform is responsible for delivering requested performance○ In some cases, like a power budget condition, the
platform may underdeliver, but must report this
CMG‘08 INTERNATIONAL conference
Outline Motivation Background Windows Server 2003 2008 Windows Server 2008 R2
Server Energy VisionIdle Power OptimizationsCore ParkingHyper-V (v2)Power Metering and BudgetingSSD
Power Diagnostics and Control Summary
CMG‘08 INTERNATIONAL conference
Processor Core Parking This is a Windows scheduler optimization, not HW! Goals
Save power on multi-processor systems by dynamically scaling number of active cores to match workload
Drop parked cores into deepest C-states
Approach Use historical information to predict future workload Calculate number of cores needed Heuristically select the “unparked” cores
MonitoringPerfmon and ETW
CMG‘08 INTERNATIONAL conference
Processor (Logical) Core Parking Logical core = HW thread (e.g., Intel®
Hyperthreading) Extension of Windows’ processor performance
state engineConfigurable via power policy settings
Parking may reduce performance, depending on the parameter settings, by reducing OS responsiveness to rising load levels
Parking could improve performance by concentrating work onto a smaller number of cores
CMG‘08 INTERNATIONAL conference
Selecting Cores to Park - 1 WS08R2 (Beta) approach:
Leave one logical core unparked per NUMA node Other possible approaches, including
customizable minimum unparked entitiesPark entire packages at oncePark logical cores individually, regardless of packagesLeave one logical core unparked per socketLeave one logical core unparked per physical core
Affinitized activity does tend to unpark logical cores that must be used (selection heuristic)Beta tracks affinitized threads, not DPCs / Interrupts
CMG‘08 INTERNATIONAL conference
Selecting Cores to Park - 2 Parking algorithm takes many inputs. At a minimum:
Time since the last parking decision was madeAverage frequencies of each core over the last time intervalAverage CPU “utilization” over the last time interval
Possible additional inputs depending on parameter setting and final WS08R2 refinements:
○ Power state domains (i.e., groups of associated cores)○ Current processor P-States○ P-State change rate policies (SINGLE, ROCKET, IDEAL)○ Affinitized DPCs / Interrupts○ Time spent in affinitized activity○ More comprehensive or longer historical information○ More system component topology information
CMG‘08 INTERNATIONAL conference
Outline Motivation Background Windows Server 2003 2008 Windows Server 2008 R2
Server Energy VisionIdle Power OptimizationsCore ParkingHyper-V (v2)Power Metering and BudgetingSSD
Power Diagnostics and Control Summary
CMG‘08 INTERNATIONAL conference
Hyper-V Power Management Full P-state/C-state management already
integrated between Windows root partition and Hyper-V v1 (WS08)
Enlightenments added in Hyper-V v2 (WS08R2) Hypervisor delivers child clocks without requiring
root interaction, plus Intelligent Timer Tick Distribution (to children)
Core parking enabled for all partitions
CMG‘08 INTERNATIONAL conference
Web Fundamentals Dynamic: WS08
0% 5000% 10000%250
270
290
310
330
350
370
0
5000
10000
15000
20000
25000
Watts ThroughputSystem Utilization Percentage
Wat
ts
Thro
ughp
ut (R
eqs
/ Sec
)
For these experiments, the highest system utilization under WF workload is ~80%. This issue has been subsequently resolved.
0% 5000% 10000%250
270
290
310
330
350
370
0
5000
10000
15000
20000
25000
Watts ThroughputSystem Utilization Percentage
Wat
ts
Thro
ughp
ut (
Reqs
/ se
c)
Adding load to each guest Adding guests
CMG‘08 INTERNATIONAL conference
SPECpower: WS08
0%20
00%
4000
%60
00%
8000
%
1000
0%250270290310330350370390
0
50,000
100,000
150,000
200,000
250,000
300,000
Watts ThroughputSystem Utilization Percentage
Wat
ts
Thro
ughp
ut (i
n Th
ousa
nds)
0%20
00%
4000
%60
00%
8000
%
1000
0%250270290310330350370390
0
50,000
100,000
150,000
200,000
250,000
300,000
Watts ThroughputSystem Utilization Percentage
Wat
ts
Thro
ughp
ut (i
n Th
ousa
nds)
Adding load to each guest Adding guests
CMG‘08 INTERNATIONAL conference
SPECpower + WF (WS08)SPECpower throughput and server power
usage versus total system utilizationPower usage for various
throughput levels
2000
%40
00%
6000
%80
00%
1000
0%250
270
290
310
330
350
370
390
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
Watts Throughput
System Utilization Percentage
Wat
ts
Thro
ughp
ut (i
n Th
ousa
nds)
0% 20% 40% 60% 80% 100%60%65%70%75%80%85%90%95%
100%
Workload (% of maximum throughput)
Pow
er (%
of W
atts
)
• 4 Guests running WF (4940 requests/sec)• ~25% system utilization; ~35% guest virtual processor utilization
• 4 Guests running SPECpower (similar efficiency as single workload)
CMG‘08 INTERNATIONAL conference
Outline Motivation Background Windows Server 2003 2008 Windows Server 2008 R2
Server Energy VisionIdle Power OptimizationsCore ParkingHyper-V (v2)Power Metering and BudgetingSSD
Power Diagnostics and Control Summary
CMG‘08 INTERNATIONAL conference
Power Metering and Budgeting In the future, servers are likely to have onboard
power metersAC power (into the power supply)DC power (out of the power supply)For components (CPU, RAM, IO, fans, disks, …)
WS08R2 provides the capability to monitor such meters, as well as communicate with power management logic, through standard Windows and ACPI interfaces
Power budget information is reported to OSOptional support for configuring the budget from within
Windows
CMG‘08 INTERNATIONAL conference
Power Metering and Budgeting
System Center
.
.
.
WMI ConsumersWMI
Namespaceroot\cimv2\power
Power Supply classPower Meter classPower Meter Events
User-mode Power Service
Power WMI
providers
Standard Windows IOCTL interface
In-box ACPI-based
implementation
Vendors provide ACPI code in
firmware
Other vendor specific
implementations…
Implemented in WS08R2
BMC hardware
Admin scripts
Hardware
Management tools
CMG‘08 INTERNATIONAL conference
Power Metering and Budgeting WS08R2 introduces the ability to report power consumption and
budgeting information Server platform reports this in-band to the OS via ACPI No additional drivers are required or HW changes, only platform support
Power information is exposed via WMI Adheres to the DMTF Power Supply Profile v1.01
Power budget information is reported to the OS Optional support for configuring the budget from within Windows
Extendable to enable per-device metering WDM driver interface available
Design goals Standard hardware and software interfaces Native infrastructure, easily extendable Leverages existing platform technology
CMG‘08 INTERNATIONAL conference
Power Metering and Budgeting – Usage Statistical/inventory/auditing Data center can monitor power consumption
across nodes Administrator can write scripts to control
power policies and receive power condition events
Model can be extended to per-device meters Another set of metrics for virtualization and
consolidation
CMG‘08 INTERNATIONAL conference
Power Metering and Budgeting – WDM Standard Windows driver IOCTL interface Event model based on pending IO
requests (IRPs) Two separate device interfaces Consumed by the WMI providers An alternative to the ACPI implementation Future direction – potentially consumed
by the kernel power manager Documented on MSDN
CMG‘08 INTERNATIONAL conference
Power Metering and Budgeting – ACPI Rationale
Works as the abstraction layer to the underlying platform technology (IPMI, WSMAN, etc.)
Scales across different platformsDoes not require special driversRequires only firmware updates
Currently being proposed to the ACPI 4.0 specification
Delegate tasks to the BMC (e.g., rolling average calculation, polling for events, etc.)
CMG‘08 INTERNATIONAL conference
Power Metering and Budgeting – ACPI Power supply device
Extends the current power source deviceControl method to publish capabilities
Power meter deviceSimilar to control method for batteriesA set of control methods to get capabilities
and set configuration parameters, trip points,
and configure hardware enforced limitsEvent notification via Notify codes
CMG‘08 INTERNATIONAL conference
Power Metering and Budgeting – ACPI WS08R2 will provide
In-box driver to support power meter device(s) described in ACPI
In-box IPMI operation region handler as part of the Microsoft IPMI driver – allowing ACPI control methods to communicate with IPMI using the KCS protocol○ Format similar to the SMBUS OpRegion○ 3rd-party IPMI drivers can register OpRegion
handler for other IPMI protocol(s)
○ Also proposed to ACPI 4.0 specification
CMG‘08 INTERNATIONAL conference
Outline Motivation Background Windows Server 2003 2008 Windows Server 2008 R2
Server Energy VisionIdle Power OptimizationsCore ParkingHyper-V (v2)Power Metering and BudgetingSSD
Power Diagnostics and Control Summary
CMG‘08 INTERNATIONAL conference
WS08R2 Enables Improved Endurance for SSD Technology SSD can identify itself differently from HDD in ATA as
defined through ATA8-ACS Identify Word 217: Nominal media rotation rate
Reporting non-rotating media will allow WS08R2to set Defrag off as default; improving device endurance by reducing writes
CMG‘08 INTERNATIONAL conference
WS08R2 Enables Optimization for SSD Technology Microsoft implementation of “Trim” feature
NTFS will send down delete notification to the device supporting “trim”○ File system operations: Format, Delete, Truncate,
Compression○ OS internal processes: e.g., Snapshot, Volume Manager
Three optimization opportunities for the device Enhancing device wear leveling by eliminating merge
operation for all deleted data blocksMaking early garbage collection possible for fast write Keeping device’s unused storage area as high as
possible; more room for device wear leveling.
CMG‘08 INTERNATIONAL conference
Outline Motivation Background Windows Server 2003 2008 Windows Server 2008 R2 Power Diagnostics and Control
Perfmon/ResmonPowertstPowercfg
Summary
CMG‘08 INTERNATIONAL conference
Check Processor ACPI States:System Event ID 4
CMG‘08 INTERNATIONAL conference
Check Power State Settings
Kernel Debugger !ppmperf
Provides P-state and T-state information
!ppmidleProvides C-state information
CMG‘08 INTERNATIONAL conference
Monitoring Power Status - 1 System Event Log: ID 4 Perfmon/Logman
Processor○ Provide average C-state information
% C1/2/3 Time and C1/2/3 Transactions/secProcessor Information
○ Parking status Processor Performance
○ Only present if P-states are exposed○ Provide current P-state information (e.g., avg freq)
Resource MonitorCPU % Max Frequency average and graph
CMG‘08 INTERNATIONAL conference
Perfmon: Processor Frequency
CMG‘08 INTERNATIONAL conference
Perfmon: Processor Frequency vs. Utilization
CMG‘08 INTERNATIONAL conference
Resmon: Processor Frequency
CMG‘08 INTERNATIONAL conference
Monitoring Power Status - 2 ETW tracing (Windows Perf Tool Kit)
Xperf –on power Pwrtest.exe
Logs use of P-, T-, and C-statesPwrtest /ppm
○ Sampling P-state and C-state performancePwrtest /ppm /live
○ Event driven logging for all the P-state and C-state transactions
CMG‘08 INTERNATIONAL conference
Pwrtest.exe /info:ppmC:\Program Files\Microsoft PwrTest>
pwrtest /info:ppm
PROCESSOR_POWER_INFORMATION
CPU Number = 0
MaxMhz = xxxx
CurrentMhz = yyyy
MhzLimit = zzzz
MaxIdleState = M
CurrentIdleState= N InstanceName: CPU Model X
(continued)Processor Performance States PerfStates: Max Transition Latency: xx us Number of States: yy
State Speed (Mhz) Type 0aaaa (100%) Perf 1bbbb ( ss%) Perf 2cccc ( tt%) Perf 3dddd ( uu%)Throttle 4eeee ( vv%) Throttle 5 ffff ( ww%) Throttle
CMG‘08 INTERNATIONAL conference
Pwrtest.exe in Logging Mode - 1C:\Program Files\Microsoft PwrTest> pwrtest /ppm Elapsed Idle C1 C2 C3 P- Freq Freq Perf/Cpu [ms] [%] [%] [%] [%] State [%] [MHz] Throttle--- ------- ---- --- --- --- ----- ---- ----- -------- 0 5007 98 0 73 26 2 54 1000 P 1 5007 99 0 93 6 2 54 1000 P 0 10014 97 0 72 27 2 54 1000 P 1 10014 97 0 91 8 2 54 1000 P 0 15021 88 1 0 0 2 54 1000 P 1 15021 89 1 0 0 2 54 1000 P 0 20028 99 0 0 100 2 54 1000 P
CMG‘08 INTERNATIONAL conference
Pwrtest.exe in Logging Mode - 2
C:\Program Files\Microsoft PwrTest> pwrtest /ppm /live Waiting for PPM Events. Press 'Ctrl-C' to quit...Timestamp Proc# Event Information-------------------------------------------------------------------------------21:27:41.133 0 Idle State Demotion (Old:2, New:1, Affinity:0x1)21:27:41.133 1 Idle State Demotion (Old:2, New:1, Affinity:0x2)21:27:41.196 1 Perf State Change (State:0, Speed:1833 Mhz)21:27:41.196 1 Domain Perf State Change (State:0, Speed:1833 Mhz, Affinity:0x3)21:27:41.196 0 Idle State Demotion (Old:1, New:0, Affinity:0x1)
CMG‘08 INTERNATIONAL conference
Setting P-State Parameters
CMG‘08 INTERNATIONAL conference
Power Controls: Powercfg.exe Configure power settings within a specific
power scheme (WS03+) WS08R2: Detect common energy
efficiency problems (via /ENERGY flag)USB device selective suspendProcessor Power Management (PPM) Inefficient power policy settingsPlatform timer resolutionPlatform firmware problems…and more
CMG‘08 INTERNATIONAL conference
Powercfg.exe ExampleConfigure power setting within a specific power scheme
Set AC, DC values for individual settingsEvery power setting belongs to a Subgroup-setdcvalueindex used for battery scenario
C:\> powercfg.exe –setacvalueindex <SCHEME> <SUBGROUP> <SETTING> <VALUE>
C:\> powercfg.exe –setacvalueindex SCHEME_BALANCED SUB_SLEEP STANDBYIDLE 0
CMG‘08 INTERNATIONAL conference
Power Efficiency Diagnostics “Powercfg /ENERGY” to start tracing
Close open applications and documents first Inbox with WS08R2 only
Leverages new inbox ETW instrumentationAdvanced users can run utility and view HTML output
Automatically executed when the system is idle [Win7]Reports data to Microsoft via Customer Experience Improvement
Program (CEIP) Attend COR-C633 Microsoft Tools for Energy Efficiency
Diagnostics for demo and details
CMG‘08 INTERNATIONAL conference
Power Efficiency DiagnosticsDetected problems
Problem Area Data Collected Warning
ThresholdError
Threshold
USB Device Selective Suspend
Individual device suspend transitions% of time device was in suspend state
< 80% suspend time
< 50% suspend time
Power Policy
Settings
Idle timeouts (dim, display, sleep)PPM configurationPower plan personality802.11 Wireless Power Save
Idle timeouts < EnergyStar 4.0 Recommendations
Idle timeouts disabled
Processor Utilization
Overall utilizationPer-process utilization (any process over .1%)Top 3 module utilization in each process
Total utilization >2%
Total utilization > 4%
CMG‘08 INTERNATIONAL conference
Power Efficiency DiagnosticsDetected problems
Problem Area Data Collected Warning
ThresholdError
Threshold
Timer Resolution Requests
Current system timer interrupt period (e.g., 15.6ms)Applications with outstanding timer requests, request amount
None Timer interrupt period < 15.6ms
Platform Capabilitie
sFirmware validation problemsPCI Express ASPM status None
If any capability is disabled or missing
CMG‘08 INTERNATIONAL conference
Lab Issues: Processor Utilization is Based on Non-Idle Wall Time Idle == idle loop or HALT It doesn’t take frequency into account, so 100% CPU
utilization could be at P0 or at Pn There may actually be more performance on the table
Idle time will include the time taken to return from C-states (HALT), which could be microseconds
CPU utilization will include cache warm-up effects if the cache has been flushed to reach the deepest C-states
CPU utilization will include latencies caused by remote memory being in low-power states In particular, AMD and future Intel processors where memory
is socket-attached
CMG‘08 INTERNATIONAL conference
Lab Issues: OS vs. HW C-States Only three C-states selected by the OS:
C1: C1 in HWC2: lowest power “type 2” C-state reported
by HWC3: Cn in HW
Perfmon shows OS perspective of C-states
CMG‘08 INTERNATIONAL conference
Outline Motivation Background Windows Server 2003 2008 Windows Server 2008 R2 Power Diagnostics and Control Summary
CMG‘08 INTERNATIONAL conference
Summary Windows Server 2008 and 2008 R2 deliver
real energy savings for the data center New WS08R2 features deliver enhanced
power efficiency and better manageabilityImprovements to idle and low-to-medium
workload operating efficiencyManagement of power policy via WMIPower metering support provides energy
consumption information through Windows
CMG‘08 INTERNATIONAL conference
Future Work Example:NonVolatile Memory (NVM) Solid State Disk (current server usage) Potential additional layer(s) in memory hierarchy
Cache (a la ReadyBoost) DRAM complement
Very low power when idle But low-power DRAM may narrow the gap significantly
Poor performance of random writes Could be improved by coalescing and remapping writes
Block orientation Difficult to use as DRAM complement
Limited lifetime of Flash cells Future NVM technologies may improve on this
CMG‘08 INTERNATIONAL conference
Call to Action - 1 Make sure any reduction in server capabilities is a
planned-for and acceptable tradeoff between power and performance (e.g.)TANSTAAFL, Do More With LessReduce idle activity and power consumptionValidate new platform power management using Power
Efficiency Diagnostics ISV/IHV Call to Action for Power: eliminate activity
during workload idle periods in applications and driversTarget average idle period at minimum >100msProvide software with adjustable tradeoffs between power
and performance when appropriate
CMG‘08 INTERNATIONAL conference
Call To Action - 2 Build power efficient platforms and solutions
Expose complete processor (and memory and device) information from BIOS
Ensure drivers and applications work with core parking enabled
Speak with Microsoft about creating ACPI-based power meter and supply devices
Get the Enhanced Power Management logo Review microsoft.com power whitepapers
and presentations
CMG‘08 INTERNATIONAL conference
The Power of WinHEC 2008!COR-
T540 Windows 7 Power Management Overview
MBL-T541 Improving Platform Energy Efficiency (part 1)
MBL-T541 Improving Platform Energy Efficiency (part 2)
COR-C622
Discussion: Windows 7 Power Management
COR-T542
NDIS 6.20: Core Network Power Management Fundamentals
COR-C633
Microsoft Tools for Energy Efficiency Diagnostics
ENT-T551 Windows Server Power Management Overview
ENT-T552 Windows Server Power Management Implementation Details
ENT-C630 Windows Server and Intel® Dynamic Power Technology for Data Centers
COR-S559
Power-Performance Benchmarks, AMD, andScalable Windows with HP Integrity Servers, HP
CMG‘08 INTERNATIONAL conference
Additional Resources WDK available with pre-Beta Web Resources:
White papers and presentations at www.microsoft.com (search on “power”)○ http://www.microsoft.com/whdc (search on “power”)
Windows Hardware Developer Central – Power Management: …/whdc/system/pnppwr/ Processor Power Management in Windows Vista and Windows Server 2008: …/whdc
/system/pnppwr/powermgmt/ProcPowerMgmt.mspx ACPI / Power Management: …/whdc/system/pnppwr/powermgmt/default.mspx Recommendations for Power Budgeting with Windows Server: …/whdc/system/pnppwr/
powermgmt/Svr_PowerBudget.mspx Active State Power Management in Windows Vista: …/whdc/connect/pci/aspm.mspx
○ Windows Server 2008 Power Savings http://download.microsoft.com/download/4/5/9/459033a1-6ee2-45b3-ae76-a2dd1da3e81b/Windows_Server_2008_Power_Savings.docx
○ Designing Efficient Background Processes for Windows (Trigger-Start Services): http://go.microsoft.com/fwlink/?LinkId=128622
ACPI Specifications: http://www.acpi.info 80 Plus Program for power supplies: http://www.80plus.org Energy Star Power Supply Specification Draft:
http://www.energystar.gov/ia/partners/prod_development/new_specs/downloads/Draft1_Server_Spec.pdfE-mail: Server Power Feedback alias [email protected]
CMG‘08 INTERNATIONAL conference
Sources Estimating Total Power Consumption by Servers in the U.S. and the World –
Jonathan G. Koomey, Ph.D. http://enterprise.amd.com/Downloads/svrpwrusecompletefinal.pdf
Bureau of Labor Statistics http://data.bls.gov/cgi-bin/cpicalc.pl
US Energy Information Administration http://www.eia.doe.gov/fuelelectric.html
AFCOM Data Center Institute’s Five Bold Predictions, 2006 http://www.afcom.com/News_Releases/Afcom_In_The_News_05010601.asp
Intel Server Products Power Budget Analysis Tool http://www.intel.com/support/motherboards/server/sb/cs-016976.htm
Data center TCO benefits of reduced air flow -- Malone, Vinson, and Bash Various Gartner press releases Aperture Research Institute EYP Mission Critical Facilities Inc. Power In, Dollars Out: How to Stem the Flow in the Data Center
http://www.microsoft.com/whdc/system/pnppwr/powermgmt/Svr_Pwr_ITAdmin.mspx
CMG‘08 INTERNATIONAL conference
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after
the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
CMG‘08 INTERNATIONAL conference
Find references for the following figures and include in the Resources slide
CMG‘08 INTERNATIONAL conference
15 MW Datacenter Monthly Costs“Good” (PUE=1.7) Internet-scale datacenter with DAS
Servers$3,000,000
Infrastructure$1,800,000
Power$1,000,000
3 yr server and 15 yr infrastructure amortization
CMG‘08 INTERNATIONAL conference
2004 Energy Consumption = ~ 100 quads2004 Energy Expenditures = ~ $910 billion
0
4000
8000
12000
16000
20000
24000
28000
32000
36000
1940 1950 1960 1970 1980 1990 2000 2010
In d us t r ia l = redT rans p o rtat io n = p u rp leRes id en t ia l = g reenCo mmerc ia l = b lue
U .S. Energy C onsumption1949 - 2004
A ll Fuels (TB TU )
Growing Energy Demand
CMG‘08 INTERNATIONAL conference
Datacenter Costs Breakdown - 1
CMG‘08 INTERNATIONAL conference
Electricity Use by End-Use: 2000 - 2006
CMG‘08 INTERNATIONAL conference
CMG‘08 INTERNATIONAL conference
BACKUP SLIDES
CMG‘08 INTERNATIONAL conference
POWER METERING and BUDGETING
CMG‘08 INTERNATIONAL conference
Power Metering In the future, servers are likely to have
onboard power metersAC power (into the power supply)DC power (out of the power supply)For individual components (CPU, RAM, IO,
fans, disks, …) WS08R2 provides the capability to monitor
such meters, as well as communicate with power management logic, through standard Windows and ACPI interfaces
CMG‘08 INTERNATIONAL conference
Power Metering and Budgeting WS08R2 introduces the ability to report power consumption and
budgeting information Server platform reports this in-band to the OS via ACPI No additional drivers are required or HW changes, only platform support
Power information is exposed via WMI Adheres to the DMTF Power Supply Profile v1.01
Power budget information is reported to the OS Optional support for configuring the budget from within Windows
Extendable to enable per-device metering WDM driver interface available
Design goals Standard hardware and software interfaces Native infrastructure, easily extendable Leverages existing platform technology
CMG‘08 INTERNATIONAL conference
Power Metering and Budgeting
System Center
.
.
.
WMI ConsumersWMI
Namespaceroot\cimv2\power
Power Supply classPower Meter classPower Meter Events
User-mode Power Service
Power WMI
providers
Standard Windows IOCTL interface
In-box ACPI-based
implementation
Vendors provide ACPI code in
firmware
Other vendor specific
implementations…
Implemented in WS08R2
BMC hardware
Admin scripts
Hardware
Management tools
CMG‘08 INTERNATIONAL conference
Power Metering and Budgeting – WMI
Based on the DMTF management profilesNew power namespace – root\cimv2\power
1) Power supply deviceInventory informationCapabilities/characteristicsRedundancy information
CIM_NumericSensor
Win32_PowerMeter
CIM_PowerSupply
Win32_PowerSupply
_ExtrinsicEvent
Win32_PowerMeterEvent
Win32_PowerSupplyEvent
CMG‘08 INTERNATIONAL conference
Power Metering and Budgeting – WMI
2) Power meter deviceInventory informationCapabilities/characteristicsLatest meter measurementsOS-Configurable trip-pointsConfigurable platform enforced limit
3) Power supply/meter eventsNotification for changes in configuration and capabilitiesNotification for trip-points crossed and platform limit enforced
CMG‘08 INTERNATIONAL conference
Power Metering and Budgeting – Usage
Statistical/inventory/auditingData center can monitor power consumption across nodesAdministrator can write scripts to control power policies and receive power condition eventsModel can be extended to per-device metersAnother set of metrics for virtualization and consolidation
CMG‘08 INTERNATIONAL conference
Power Metering and Budgeting – WDM
Standard Windows driver IOCTL interfaceEvent model based on pending IO requests (IRPs)Two separate device interfacesConsumed by the WMI providersAn alternative to the ACPI implementationFuture direction – potentially consumed by the kernel power managerDocumented on MSDN
CMG‘08 INTERNATIONAL conference
Power Metering and Budgeting – ACPI
RationaleWorks as the abstraction layer to the underlying platform technology (IPMI, WSMAN, etc.)Scales across different platformsDoes not require special driversRequires only firmware updates
Currently being proposed to the ACPI 4.0 specificationDelegate tasks to the BMC (e.g., rolling average calculation, polling for events, etc.)
CMG‘08 INTERNATIONAL conference
Power Metering and Budgeting – ACPI
Power supply deviceExtends the current power source deviceControl method to publish capabilities
Power meter deviceSimilar to control method batteriesA set of control methods to get capabilities and set configuration parameters, trip points, and configure hardware enforced limitsEvent notification via Notify codes
CMG‘08 INTERNATIONAL conference
Power Metering and Budgeting – ACPI
WS08R2 will provideIn-box driver to support power meter device(s) described in ACPIIn-box IPMI operation region handler as part of the Microsoft IPMI driver – allowing ACPI control methods to communicate with IPMI using the KCS protocol
Format similar to the SMBUS OpRegion3rd-party IPMI drivers can register OpRegion handler for other IPMI protocol(s)Also proposed to ACPI 4.0 specification
CMG‘08 INTERNATIONAL conference
Architecture Details
IPMI OpRegion encountered
acpipmi.sys
User-mode Power ServicePower supply
providerPower meter provider
Power supply interface
Power meter interface
acpipsu.sys
xyzpsu.sys xyzpmi.sys
Power Manag
er
acpi.sys
User mode
Kernel mode
FirmwareACPI control methods
E.g., Query power supply information, Set power meter trip points, Get power meter capabilities
IOCTLs
IOCTLs
WDF drivers
Microsoft IPMI driver (ipmidrv.sys)
BMC hardware
IPMI KCS protocol
IOCTLs
Interprets
Power policy provider
IPMI handler
Event feedback
Power policy feedback
CMG‘08 INTERNATIONAL conference
STORAGE: SSDs
CMG‘08 INTERNATIONAL conference
Flash SSD versus HDD (Jun ‘08)
HDD Flash SSD
Endurance (write cycles per bit) 10^12 10^5 (SLC*)10^6 (MLC*)
Cost per byte 1x 2.5x – 25x
Performance : Small random read requests 1x 10 – 100x
Active Power (Watts/byte) 10-20x 1x
Shock Resistance Non-operating Operating
100g 200g (2010)~10g
1500g100g
Thermal (°C) 5-55 0-70
* SLC – Single Level Cell* MLC – Multi Level Cell
CMG‘08 INTERNATIONAL conference
Flash Characteristics (Jun ’08) Chip Read50 MB/s
Write 25 MB/s Scales with number of chips
Read Latency 25 μs to start, 100 μs for 2KB “page”
Write Latency 200 to 300 μs for 2KB “page” 2,000 μs to erase
Active Power 1-2 Watts for 8 chips + controller
CMG‘08 INTERNATIONAL conference
SSD High-IOps Workload TCO Decrease TCO for IOps-intensive systems
IOps bottleneck causes customers to buy spindles instead of capacity, driving up TCO and operational complexity (e.g., workload balancing)
SSDs provide less expensive systems for same performance targets
Smaller form factors
CMG‘08 INTERNATIONAL conference
SSD Performance Concerns - 1 Random write perf
Could be alleviated with next generation of products
New technological problems may arise with future generations (no guarantee that it will stay at same level)
Potential bottleneck on erasing/block cleaning Mixing workloads creates unexpected
performance characteristicsRead:write ratio, request sizes, sequentiality
CMG‘08 INTERNATIONAL conference
SSD Performance Concerns - 2 First-pass performance might be better than
steady-state When nearing EOL, perf may degrade as blocks
are removed from pool Does mapping metadata have be
re-read/initialized after power failure? Need enough onboard parallelism to keep internal
serial interfaces from becoming bottlenecks Just like disk arrays, the wrong stripe unit size
can kill perf in an SSD array
CMG‘08 INTERNATIONAL conference
WS08R2 Enables Improved Endurance for SSD Technology SSD can identify itself differently from HDD in ATA as
defined by ATA8-ACS Identify Word 217: Nominal media rotation rate
Reporting non-rotating media will allow WS08R2to set Defrag off as default; improving device endurance by reducing writes
CMG‘08 INTERNATIONAL conference
WS08R2 Enables Optimization for SSD Technology Microsoft implementation of “Trim” feature
NTFS will send down delete notification to the device supporting “trim”○ File system operations: Format, Delete, Truncate, Compression
○ OS internal processes: e.g., Snapshot, Volume Manager
Three optimization opportunities for the device Enhancing device wear leveling by eliminating merge
operation for all deleted data blocksMaking early garbage collection possible for fast write Keeping device’s unused storage area as high as
possible; more room for device wear leveling.
CMG‘08 INTERNATIONAL conference
Parallelism Tradeoffs No one scheme optimal for all workloads
Highly sequential Striping, ganging (for scale), and interleaving
Inherent parallelism in workload
Independent, deeply parallel request streams to the flash chips
Poor cleaning efficiency (no locality) Background, intra-chip cleaning
With faster serial connect, intra-chip ops are less
important
CMG‘08 INTERNATIONAL conference
SSD Performance Trends(Sequential write)
Sequential performancecontinues to
improve
MLC drive’s Performance is increasing
Sequential performanceadvantage is big and real
Source: a subset of sample data from internal lab
CMG‘08 INTERNATIONAL conference
SSD Performance Trends(Random write)
Random write speed is
increasing
MLC drive’s random write
is also improving
Random performance
issues are being solved
Source: a subset of sample data from internal lab
CMG‘08 INTERNATIONAL conference
SSD Cost Trends $
Source: Semiconductor Forecast Worldwide--Forecast Database [SEQS-WW-DB-DATA], Gartner August 2008, by Joe Unsworth, et al.
The mark shows where
the cost is today
And it continues
down
The device cost
will be in affordable
range by 2010
1TB SSD is on the radar
CMG‘08 INTERNATIONAL conference
VIRTUALIZATION
CMG‘08 INTERNATIONAL conference
Hyper-V Power Management Full P-state/C-state management already
integrated between Windows root partition and Hyper-V v1 (WS08)
Enlightenments, such as timer assist added in Hyper-V v2 (WS08R2) Hypervisor delivers child clocks without requiring
root interaction, plus ITTDCore parking enabled for all partitions
CMG‘08 INTERNATIONAL conference
WS08R2 Hyper-V Core ParkingOverview
Scheduling virtual machines on a single server for density as opposed to dispersionThis allows “park/sleep” cores by putting them into deep C states
BenefitsSignificantly enhances Green IT by being able to reduce power required for CPUs
Idle improvements extend to Hyper-V Significant reduction in platform interrupt activityEnables power savings and greater scalability
CMG‘08 INTERNATIONAL conference
Windows Server 200816 LP Server
CMG‘08 INTERNATIONAL conference
WS08R2 Hyper-V Core Parking16 LP Server
Processor is
“parked”
Processor is
“parked”
CMG‘08 INTERNATIONAL conference
VIRTUALIZATION TESTS
CMG‘08 INTERNATIONAL conference
Hyper-V Power EfficiencyWindows Server Performance Lab Testbed Configurations Single Workloads
Web FundamentalsSPECpower
Mixed Workloads
CMG‘08 INTERNATIONAL conference
Hyper-V Power EfficiencyWorkload configuration - 1 Methodology for obtaining power load line
data for TPC-C, TPC-E, FSCT, and Web Fundamentals have been demonstratedBenchmark loads varied by throttling number of
active usersMultiple workloads tested in Hyper-V environment
SPECpower has been successfully tuned
CMG‘08 INTERNATIONAL conference
Workload Characteristics Web Fundamentals (WF)
Dynamic scenarioCPU-bound workload
SPECpower (modified SPECjbb)Kit version 1.0Java version: JDK 1.6.0_02JVM options: -Xms1024m -Xmx1024m -
XXaggressive -XXlargePages -XXthroughputCompaction -XXcallprofiling -XXlazyUnlocking -Xgc:genpar -XXgcthreads:2 -XXtlasize:min=8k,preferred=1024k
CMG‘08 INTERNATIONAL conference
Hyper-V Power EfficiencyWorkload configuration - 2 Single workloads
All the guests run the same workloadTwo scenarios:
○ Fixing the number of active guests and scaling the load in each guest
○ Fixing the load in each guest and activating more guests Mixed workloads
Half of guests run each workload○ Fixed load in WF guests (~35% CPU utilization each)○ Varying load in SPECpower guests
CMG‘08 INTERNATIONAL conference
HW and SW Test Configurations Hardware
2-socket quad-core processors○ Minimal P-States
16GB memory: 4x4GB 667MHz DIMMsExternal (wall) power monitor
SoftwareOS: Windows Server 2008
○ OS Power Management: Balanced modeHyper-V v2 (pre-release build)
○ Configured with 8 guestsSingle virtual processor: 3.16GHz1.75GB memory
CMG‘08 INTERNATIONAL conference
Web Fundamentals Dynamic Adding Load to Each Guest
0 5000 10000 15000 20000 25000250
270
290
310
330
350
370
Throughput (Requests / Sec)W
atts
Throughput and power usage versus total system utilization
Power usage for various throughput levels
0% 5000% 10000%250
270
290
310
330
350
370
0
5000
10000
15000
20000
25000
Watts ThroughputSystem Utilization Percentage
Wat
ts
Thro
ughp
ut (R
eque
sts
/ Sec
)
For these experiments, the highest system utilization under WF workload is ~80%. This issue has been subsequently resolved.
CMG‘08 INTERNATIONAL conference
Web Fundamentals Dynamic Activating Guests - 1
0% 5000% 10000%250
270
290
310
330
350
370
0
5000
10000
15000
20000
25000
Watts ThroughputSystem Utilization Percentage
Wat
ts
Thro
ughp
ut (
requ
ests
/ se
c)
Throughput and power usage versus total system utilization
Power usage for various throughput levels
• Data points from left to right: 0 guest, 1 guest, 2 guests, …, 8 guests active
• Each active guest tries to run at the maximum load
0 5000 10000150002000025000250270290310330350370
Throughput (requests / sec)W
atts
CMG‘08 INTERNATIONAL conference
Web Fundamentals Dynamic Activating Guests - 2
Virtual processor utilizations for different numbers of active guests
• The maximum utilization of each guest decreases as more guests are activated. Most of this decrease has been subsequently removed.
1 2 3 4 5 6 7 80
102030405060708090
100Guest 1Guest 2Guest 3Guest 4Guest 5Guest 6Guest 7Guest 8
Number of Guests
Virtu
al P
roce
ssor
Util
izat
ion
(%)
CMG‘08 INTERNATIONAL conference
SPECpower Adding Load to Each Guest - 1
Throughput and power usage versus total system utilization
Power usage for various throughput levels
0%20
00%
4000
%60
00%
8000
%
1000
0%250270290310330350370390
0
50,000
100,000
150,000
200,000
250,000
300,000
Watts ThroughputSystem Utilization Percentage
Wat
ts
Thro
ughp
ut (i
n Th
ousa
nds)
0% 20% 40% 60% 80% 100%65%70%75%80%85%90%95%
Workload (% of maximum throughput)Po
wer
(% o
f Max
Wat
ts)
CMG‘08 INTERNATIONAL conference
SPECpower Adding Load to Each Guest - 2
Average processor frequency for various workload levels
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Workload (% of Max Throughput)
Freq
uenc
y (M
Hz)
CMG‘08 INTERNATIONAL conference
SPECpower Adding Load to Each Guest - 3
Virtual processor utilizations for various workload levels
Logical processor utilizations for various workload levels
• All the guests change load levels concurrently.• The VM scheduler is biased towards utilizing higher numbered
processors.
100% 80
%60
%40
%20
%
Active
Idle
0102030405060708090
100Guest 1
Guest 2
Guest 3
Guest 4
Guest 5
Guest 6
Guest 7
Guest 8
Load
Util
izat
ion
(%)
100% 80
%60
%40
%20
%
Active
Idle
0102030405060708090
100Proc 1
Proc 2
Proc 3
Proc 4
Proc 5
Proc 6
Proc 7
Proc 8
LoadU
tiliz
atio
n (%
)
CMG‘08 INTERNATIONAL conference
SPECpower Activating GuestsThroughput and power usage versus total system utilization
Power usage for various throughput levels
Similar scalability behavior of power and throughput as when adding load.
0%20
00%
4000
%60
00%
8000
%
1000
0%250270290310330350370390
0
50,000
100,000
150,000
200,000
250,000
300,000
Watts ThroughputSystem Utilization Percentage
Wat
ts
Thro
ughp
ut (i
n Th
ousa
nds)
0% 20% 40% 60% 80% 100%60%65%70%75%80%85%90%95%
100%
Workload (% of maximum throughput)
Pow
er (%
of M
ax W
atts
)
CMG‘08 INTERNATIONAL conference
SPECpower and WFMixed Workloads - 1
SPECpower throughput and server power usage versus total system utilization
Power usage for various throughput levels
2000
%40
00%
6000
%80
00%
1000
0%250
270
290
310
330
350
370
390
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
Watts Throughput
System Utilization Percentage
Wat
ts
Thro
ughp
ut (i
n Th
ousa
nds)
0% 20% 40% 60% 80% 100%60%65%70%75%80%85%90%95%
100%
Workload (% of maximum throughput)Po
wer
(% o
f Wat
ts)
• 4 Guests running WF (4940 req/sec)• ~25% system utilization; ~35% guest virtual processor utilization
• 4 Guests running SPECpower (similar efficiency as single workload)
CMG‘08 INTERNATIONAL conference
SPECpower and WFMixed Workloads - 2
Average processor frequency for various levels of SPECpower workload
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Workload (% of Max SPECPower Throughput)
Freq
uenc
y (M
Hz)
CMG‘08 INTERNATIONAL conference
Hyper-V Power EfficiencyFuture Experiments More workloads Different workload mix scenarios
Different combinations of fixed and varying workloads
More VM configurationsMultiple virtual processors per guestOversubscription
CMG‘08 INTERNATIONAL conference
WMI
CMG‘08 INTERNATIONAL conference
Power WMI Provider Enables power policy configuration through
standard WMI interfaceChange power setting valuesActivate a given planConforms to DMTF data model
To get started…Change a power setting: Win32_PowerSettingActivate a plan: Win32_Plan.Activate() method
Attend ENT-T552 Windows Server Power Management Implementation Details for additional details
CMG‘08 INTERNATIONAL conference
Configuration and Administration WMI interfaces to query and set configuration
settingsConfiguration of systemsGlobal administrationManagement applications
WMI interfaces to query current and hardware capabilities3rd party applicationsDiagnostics
CMG‘08 INTERNATIONAL conference
WMI – Set Power Settings
TargetSetting = "Microsoft:PowerSetting\\{3c0bc021-c8a8-4e07-a973-6b14cbcb2b7e}" 'display blank timeoutSet objWMIService = GetObject("WinMgmts:\\.\root\cimv2\power")
Set SettingIndices = objWMIService.ExecQuery(“ASSOCIATORS OF {“ & chr(34) & “Win32_PowerSetting.InstanceID=“ & chr(34) & TargetSetting & chr(34) & “} WHERE ResultClass = Win32_PowerSettingDataIndex”)
For Each SettingIndex in SettingIndices Set Plan = objWMIService.ExecQuery(“ASSOCIATORS OF {“ & chr(34) & SettingIndex.InstanceID & “} WHERE ResultClass = Win32_PowerPlan”) If Plan.IsActive THEN SettingIndex.SettingIndexValue = 120 ‘2 seconds SettingIndex.Put_ Plan.Activate()
CMG‘08 INTERNATIONAL conference
Remote Power Manageability - 1 WS08R2 supports the configuration of power policy
via WMI Local and remote management via WMI Adheres to DMTF conventions for setting data Scriptable
Includes support for reading and writing of all power plan and setting data
Active power plan can get changed remotely Power Action can be carried out (sending a node to
S3)
CMG‘08 INTERNATIONAL conference
Remote Power Manageability - 2
Win32_PowerSettingDefinitionPossibleValue
Win32_PowerSettingDataIndex
Win32_PowerSettingInSubgroup
Win32_PowerSetting
Win32_PowerSettingCapabilities
Win32_PowerSettingDefineCapabilities
Win32_PowerSettingDataIndexInPlan
Win32_PowerSettingDefinitionRangeData
Win32_PowerSettingElementDataIndex
Win32_PowerSettingDefinition
Win32_PowerSettingSubgroup
Win32_PowerPlan
Win32_PowerSettingDefineCapabilities
objectsassociation
Class relationship
CMG‘08 INTERNATIONAL conference
Remote Power Manageability - 3
Get the Active Plan:Set objWMIService = GetObject("WinMgmts:\\.\root\cimv2\power")
Set PowerPlans = objWMIService.InstancesOf("Win32_PowerPlan")
For Each PowerPlan in PowerPlans If PowerPlan.IsActive Then wscript.echo "Current Plan: " & PowerPlan.ElementName End IfNext
Set the Active Plan:PowerPlan.Activate()
CMG‘08 INTERNATIONAL conference
Remote Power Manageability - 4
Get all power settings in the Active Plan:(Continued with PowerPlan)
EscapedInstanceID = Replace(PowerPlan.InstanceID, "\", "\\")Set PowerSettingIndexes = objWMIService.ExecQuery( "ASSOCIATORS OF {Win32_PowerPlan.InstanceID=" & chr(34) & EscapedInstanceID & chr(34) & "}")
For Each PowerSettingIndex in PowerSettingIndexes
EscapedInstanceID = Replace(PowerSettingIndex.InstanceID, "\", "\\") Set PowerSettings = objWMIService.ExecQuery( "ASSOCIATORS OF {Win32_PowerSettingDataIndex.InstanceID=" & chr(34) & EscapedInstanceID & chr(34) & "} WHERE ResultClass = Win32_PowerSetting")
For Each PowerSetting in PowerSettings wscript.echo “Power Setting: “ & PowerSetting.InstanceID wscript.echo “Description: “ & PowerSetting.Description wscript.echo “Index Value: “ & PowerSettingIndex.SettingIndexValue