much ado about cpu

zZS28 Much Ado About CPUMartin Packer

IBM System z Technical University – Vienna , Austria – May 2-6

AbstractSystem z and zEnterprise processors have in recent years introduced a number of capabilities of real value to mainframe customers. These capabilities have, however, required changes in the way we think about CPU management.

This presentation describes these capabilities and how to evolve your CPU management to take them into account. It is based on the author's experience of evolving his reporting to support these changes.

This presentation is substantially enhanced this year

Agenda

A brief review of technologyUnfinished Business?Coupling Facility CPUzAAP and zIIPz/OS Release 10 ChangesSoft Capping and Group Capacity LimitsBlocked Workloadsz10 HiperdispatchCool ItI/O Assist Processors (IOPs)SMF 23 and 113In Conclusion

A Brief Review of Technology

"Characterisable" Engines–GCPs - Pool 1–(Obsolete Pool 2)–ICFs - Pool 5–IFLs - Pool 3–zAAPs - Pool 4–zIIPs – Pool 6

● “Non-Characterisable" Engines―SAPs―Spares

With zEnterprise zBX other engines―Not connected in the same way at all―Not discussed here

―Treating as a “z11”

Book-Structured● Connected by a ring in z9

● z10 and zEnterprise ensure all books connected to all books directly● Data transfers are direct between books via the L2 Cache chip in each book's

MCM● L2 Cache is shared by every PU on the MCM

● zEnterprise has an additional per-chip level of cache – and nomenclature “cleaned up”

● Only 1 book in BC models

IRD CPU ManagementWeight Management for GCP engines

–Alter weights within an LPAR Cluster–Shifts of 10% of weight

CP Management–Doesn't work with HiperDispatch

–Vary LOGICAL CPs on and off–Only for GCP engines

WLM objectives–Optimise goal attainment–Optimise PR/SM overhead

–Optimise LPAR throughput

Part of "On Demand" picture–Ensure you have defined reserved engines–Make weights sensible to allow shifts to happen

Unfinished Business?How do we evolve our performance and capacity reporting?Should we define an LPAR with dedicated engines?

–Or with shared engines?•What should the weights be?

- In total and individually- And what about the total for each pool?

-How many engines should each LPAR have?

-And IRD makes all this so much more dynamic

IBM System z Technical University – Vienna , Austria – May 2-6 Increasing ComplexityInstallations are increasing the numbers of LPARs on a machine

–Many exceed 10 per footprint● Expect 20 + soon● My record: 51 and 52, 56

● 33 and 34 active, respectively

―And have more logical and physical engines―And increasing the diversity of their LPARs

●Greater incidence of IFLs●Fast uptake of zIIPs and zAAPs

●Sometimes meaning 2 engine speeds

●Fewer stand-alone CF configurations―With mergers etc. the numbers of machines managed by a team is increasing―And stuff's got more dynamic, too―As an aside...

● Shouldn't systems be self-documenting?

Coupling Facility CPU

•Managed out of Pool 5

–Pool numbers given in SMF 70 as index into table of labels

– Label is “ICF”

Recommendation: Manage in reporting as a separate pool

Follow special CF sizing guidelines

–Especially for takeover situations

Always runs at full speed

So good technology match for coupled z/OS images on same footprint

Another good reason to use ICFs is IC links

Shared ICFs strongly discouraged for Production

Especially if the CF image has Dynamic Dispatch turned on

Internal Coupling Facility (ICF)

ICF ...

Need to correlate SMF 70-1 with SMF 74-4 CF Utilisation to get proper CPU picture

Since z/OS Release 8 74-4 has machine serial numberAllows correlation in most casesPartition number added to 74-4 in OA21140

• Enables correlation with 70-1 when LPAR name is not the Coupling Facility Name

Structure-Level CPU Consumption

CFLEVEL 15 and z/OS R.9

Always 100% Capture Ratio

Adds up to R744PBSY

Multiple uses:

Capacity planning for changing request rates

Examine which structures are large consumers

Compute CPU cost of a request

• And compare to service time• Interesting number is “non-CPU” element of service time

– as we shall see

NOTE:Need to collect 74-4 data from all z/OS systems sharing to get total request rate

Structure CPU ...

Where not trivial I plot Sync Request %Shows if deterioration with load

Different request types and technologies behave markedly differentlyFor example modern lock structures locally accessed are typically around 5us CPU and

elapsed or lowerFor example XCF structures often in hundreds of us elapsed

• And quite high CPU• Though obviously all async

zAAP and zIIP

Must each not exceed number of GCPs

Run at full speed, even if GCPs don't

•Instrumentation documents “speed” difference

Hardcapping but no softcapping

•No Resource Group cappingNot managed by IRD

–Weight is the INITIAL LPAR weight

zAAP on zIIP

New with z/OS Release 11Retrofitted to R.9 and R.10 with OA27495

Not available if you already have zAAPs installedOr have reserved zAAP logical engines

Designed to enable further use of perhaps-underused zIIPs

Does not change the configuration rules relative to GCPs

Does not suddenly make zAAP-eligible work look like zIIP-eligible in terms of SRBs etc

No special metricseg zAAP work now in zIIP bucketeg zAAP-eligible now in zIIP-eligible bucket

zIIP Instrumentation – Subsystems and Address SpacesInstrumentation on consumption and potential for a number of

exploiters:Latter is eg “zAAP on GCP”

Type 30 Address Space – Interval and Step/Job-EndTakes RMF Workload Activity (72-3) to address space level

DB2 Accounting TraceType 101 shows zIIP USED times by usage category

• At plan and package level• ELIGIBLE is only reported on up to Version 9

Websphere Application ServerType 120 Subtype 9 (Request Activity)

• Both zIIP and zAAP usage and potential

z/OS Release 10 Changes

All RMF Records

Whether at least one zAAP was online

Whether at least one zIIP was online

In Type 70 and retrofitted to supported releases:

Permanent and Temporary Capacity Models and 3 capacities

Hiperdispatch

• To be covered in a few minutes

Defined- and Group- Capacity

instrumentation

Soft Capping and Group CapacityDefined Capacity

A throttle on the rolling 4-hour average of the LPARƒ When this exceeds the defined capacity PR/SM softcaps the LPARƒ CPU delay in RMF

SMF70PMA Average Adjustment Weight for pricing managementSMF70NSW Number of samples when WLM softcaps partition

Group Capacity

Similar to Defined Capacity but for groups of LPARs on the same machines

SMF70GJT Timestamp when the system joined the Group Capacity groupSMF70GNM Group nameSMF70GMU Group Capacity MSU limit

IBM System z Technical University – Vienna , Austria – May 2-6 Exceeding 8 MSUs (MSU_VS_CAP > 100%) in the morning leads to active capping (SOFTCAPPED > 0%). Note: OCPU and O2 are CPU Queuing numbers

Group Capacity Limits

Each partition (z/OS system) manages itself

Group capacity is based on defined capacity implementation4hr rolling average of group MSU consumption is used for managing the group's

partitions

Each partition is aware of the consumption of all other partitions on the CPC And identifies all other partitions that are member of the same capacity groupCalculates its defined share of the capacity group, based on the partition weight.

• This share is the target for the partition if all partitions of the group want to use as much CPU as possible

If some LPARs do not consume their share the unused capacity will be distributed over those LPARs that need additional capacity

If a defined capacity limit is defined to a partition that limit will not be violated even when the partition receives capacity from others.

WLM will only manage partitions with shared CPs and WC=NO

LPAR Table Fragment for Group Capacity

Blocked Workloads

z/OS Release 9 Blocked Workload SupportRolled back to R.7 and R.8

Blocked workloads:Lower priority work may not get dispatched for an elongated timeMay hold a resource that more important work is waiting for

WLM allows some throughput for blocked workloads

By dispatching low important workload from time to time, these “blocked workloads” are no longer blocked

Helps to resolve resource contention for workloads that have no resource management implemented

Additional information in WSC flash http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/FLASH10609

Additional instrumentation in 70-1 and 72-3

IBM System z Technical University – Vienna , Austria – May 2-6 IEAOPT BLWLTRPCT and BLWLINTHD (With

OA22443) BLWLTRPCT Percentage of the CPU capacity of the LPAR to

be used for promotion

Specified in units of 0.1%

Default is 5 (=0.5%)

Maximum is 200 (=20%)

Would only be spent when sufficiently many dispatchable units need promotion.

BLWLINTHD Specifies threshold time interval for which a blocked address space or enclave must wait before being considered for promotion.

Minimum is 5 seconds. Maximum is 65535 seconds.

Default is 60 seconds.

Type 70 CPU Control Section

Type 72-3 Service/Report Class Period Data Section

IBM System z10 EC HiperDispatch

HiperDispatch – z10 EC unique function

– Dispatcher Affinity (DA) - New z/OS Dispatcher

– Vertical CPU Management (VCM) - New PR/SM Support

Hardware cache optimization occurs when a given unit of work is consistently dispatched on the same physical CPU

– Up until now software, hardware, and firmware have acted independently of each other

– Non-Uniform-Memory-Access has forced a paradigm change

• CPUs have different distance-to-memory attributes

• Memory accesses can take a number of cycles depending upon cache level / local or remote memory accessed

The entire z10 EC hardware / firmware / OS stack now tightly collaborates to manage these effects

z10 EC HiperDispatch

New z/OS Dispatcher– Multiple dispatching queues

• Average 4 logical processors per queue

– Tasks distributed amongst queues

– Periodic rebalancing of task assignments

– Generally assign work to minimum # logicals needed to use weight

• Expand to use white space on box

– Real-time on/off switch (Parameter in IEAOPTxx)

– May require "tightening up" of WLM policies for important work• Priorities are more sensitive with targeted dispatching queues

z10 EC HiperDispatch – z/OS Dispatcher Functionality

z10 EC HiperDispatch – z/OS Dispatcher Functionality…

Initialization:Single HIPERDISPATCH=YES z/OS parameter dynamically activates HiperDispatch

(full S/W and H/W collaboration) without IPL• With HIPERDISPATCH=ON, IRD management of CPU is turned OFF

Four Vertical High LPs are assigned to each Affinity Node

A “Home” Affinity Node is assigned to each address space / task

zIIP, zAAP and standard CP “Home” Affinity Nodes must be maintained for work that transitions across specialty engines

Benefit increases as LPAR size increases (i.e. crosses books)

Workload Variability Issues:– Short Term

• Dealing with transient utilization spikes

– Intermediate• Balancing workload across multiple Affinity Nodes

– Manages “Home” Book assignment

– Long Term• Mapping z/OS workload requirements to available physical resources

– Via dynamic expansion into Vertical Low Logical Processors

z10 EC HiperDispatch – z/OS Dispatcher Functionality…

New PR/SM Support–Topology information exchanged with z/OS

• z/OS uses this to construct its dispatching queues

–Classes of logicals• High priority allowed to consume weight

– Tight tie of logical processor to physical processor

• Low priority generally run only to consume white space

z10 EC HiperDispatch – PR/SM Functionality

z10 EC HiperDispatch – PR/SM Functionality…

Firmware Support (PR/SM, millicode)

New z/OS invoked instruction to cause PR/SM to enter “Vertical mode” • To assign vertical LPs subset and their associated LP to physical CP mapping

– Based upon LPAR weight

Enables z/OS to concentrate its work on fewer vertical processors • Key in PR/SM overcommitted environments to reduce the LP competition for physical CP

resources

Vertical LPs are assigned High, Medium, and Low attributes

Vertical low LPs shouldn’t be used unless there is logical white space within the CEC and demand within LPAR

z10 EC HiperDispatch Instrumentation

Hiperdispatch status– SMF70HHF bits for Supported, Active, Status Changed

Parked Time– SMF70PAT in CPU Data Section

Polarization Weight– SMF70POW in Logical Processor Data Section

• Highest weight for LPAR means Vertical High processor• Zero weight means Vertical Low processor• In-between means Vertical Medium processor

Example on next foil– 2 x Vertical High (VH)– 1 x Vertical Medium (VM)– 4 x Vertical Low (VL)– Because Hiperdispatch all engines online in the interval are online all

the time• But there are other engines reserved so with Online Time = 0

Depiction Of An LPAR – With HiperDispatch Enabled

0 1 2 3 4 5 6

UNPARKED % PARKED % POLAR WEIGHT I/O %

IBM System z Technical University – Vienna , Austria – May 2-6 HiperDispatch “GA2” Support in RMF - OA21140

SMF70POF Polarisation Indicators Bits 0,100 is “Horizontal” or “Polarisation Not Indicated”01 is “Vertical Low”10 is “Vertical Medium”11 is “Vertical High”(Bit 2 is whether it changed in the interval)

SMF70Q00 - SMF70Q12 In & Ready counts based on the number of processors online and unparked

Refinement is to take into account parking and unparking

Also SMF70RNMNormalisation factor for zIIP

• Which happens to be the same for zAAP

Also R744LPN – LPAR NumberFor correlation with SMF 70

(Also zHPF support)

IBM System z Technical University – Vienna , Austria – May 2-6 “Cool It” - Cycle Steering

Introduced with z990http://www.research.ibm.com/journal/rd/483/goth.html

Refined in later processorsBOTH frequency- and voltage-reduction in z9

When cooling degraded processor progressively slowedMuch better than dyingRare event

• But should not be ignored

WLM Policy refreshedAdmittedly not that helpful a message:

• IWM063I WLM POLICY WAS REFRESHED DUE TO A PROCESSOR SPEED CHANGE• Automate it

SMF70CPA not changed• Used as part of SCRT• Talk to IBM and consider excluding intervals round such an event

R723MADJ is changed• Al Sherkow's news item shows an example:

– http://www.sherkow.com/updates/20081014cooling.html

In R.12 Types 89, 70, 72 and 30 have instrumentation for this situation

IOPs – I/O Assist Processors

Not documented in Type 70Despite being regular engines characterised as IOPsNOT a pool

Instrumentation in Type 78-3Variable-length Control Section

• 1 IOP Initiative Queue / Util Data Section per IOP inside itProcessor Was Busy / Was Idle counts

• NOT Processor Utilisation as such• Suggest stacking the two numbers on a by-hour plot

I/O Retry counts• Channel Path Busy, CU Busy, Device Busy

Machines can be configured with different numbers of IOPsDepending on I/O intensiveness of workloads

• Generally speaking it's only TPF that is said to need extra IOPs

Analysis can help get this right

SMF 23 and 113

SMF 23

SMF 23 –The “SMF”recordNew extensions to the SMF 23 record

• Provide information related to Dispatching, Storage and I/O• Available on z/OS 1.8 and above

Why you’d want to collect them?They may provided a way to help characterize your workload to improve your capacity

planning• LoIO Mix zPCR is simply an estimate of your actual workload pattern

Record Size and IntervalSmall record - 210 bytes (258 bytes with “deltas”) per System per interval

What is in the SMF 23s? - New Fields via APAR OA22414

StorageTotal Number of Getmain requests (NGR)Total Pages backed during Getmain requests (PBG)Total Number of Fixed requests for Storage below 2 GB (NFR)Total number of Frames for Fixed requests for Storage below 2 GB (PFX)

FaultsTotal number of first reference faults (1RF)Total number of non first reference faults (NRF)

I/OsTotal Number of I/Os (NIO)

Dispatches (Dispatch)Number of unlocked TCB Dispatches (TCB)Number of SRB Dispatches (SRB)

APAR OA27161–Closed 1/19/2009To provide “delta” counters for above fieldsOtherwise “cumulative” counters

What is the z10 CPU Measurement Facility?

New hardware instrumentation facility “CPU Measurement Facility”(CPU MF)Available on System z10 EC GA2 and z10 BCSupported by a new z/OS component (Instrumentation), Hardware

Instrumentation Services (HIS)

Potential Future Uses –for this new “cool”virtualization technologyCPU MF provides support built into the processor hardware

• So exploiting mechanism allows the observation of performance behavior with nearly no impact to the system being observed

Potential Uses• Future workload characterization• ISV product improvement• Application Tuning

CPU MF ...

Data collection done by System z hardwareLow overheadLittle/No skew in samplingAccess to information which is not available from software

SAMPLINGSAMPFREQ=800000 is default (samples per minute), = 13,333 /s

• 8M samples in 10 minutes is the default(DURATION=10 is the default, 10 minutes)

• Recommendation – Start with a small frequency, e.g. SAMPFREQ=320, and increase after early experiences – e.g. ensure enough disk space for output

– Smaller z10 BCs should increase only up to SAMPFREQ=130000 (for DURATION=60)

New IBM Research article“IBM System z10 performance improvements with software and hardware synergy”http://www.research.ibm.com/journal/rd/531/jackson.pdf

IBM System z Technical University – Vienna , Austria – May 2-6 COUNTERS

Basic Counter SetCycle countInstruction countLevel-1 I-cache directory write countLevel-1 I-cache penalty cycle countLevel-1 D-cache directory write countLevel-1 D-cache penalty cycle count

Problem State Counter SetProblem state cycle countProblem state instruction countProblem state level-1 I-cache directory write countProblem state level-1 I-cache penalty cycle countProblem state level-1 D-cache directory write countProblem state level-1 D-cache penalty cycle count

Extended Counter SetNumber and meaning of counters are model-dependent

Crypto Activity Counter Set (CPACF activity)

PRNG function count

PRNG cycle count

PRNG blocked function count

PRNG blocked cycle count

SHA function count

SHA cycle count

SHA blocked function count

SHA blocked cycle count

DES function count

DES cycle count

DES blocked function count

DES blocked cycle count

AES function count

AES cycle count

AES blocked function count

AES blocked cycle count

IBM System z Technical University – Vienna , Austria – May 2-6 Sample Report – Basic / Extended Counters z10 L1 Cache Hierarchy Sourcing

In Conclusion

IBM System z Technical University – Vienna , Austria – May 2-6 In ConclusionBe prepared for fractional engines, multiple engine pools, varying weights etcUnderstand the limitations of z/OS Image Level CPU Utilisation as a numberTake advantage of Coupling Facility Structure CPU

For Capacity Planning

For CF Request Performance Analysis

There’s additional instrumentation for Defined- and Group-Capacity limits z9, z10 and zEnterprise ARE different from z990 – and from each otherThe CPU data model is evolving

To be more complete

To be more comprehensibleTo meet new challenges

Such as Hiperdispatch’s Parked Time state

For example SMF 23 and 113

much ado about cpu

ibm corporation4

ibm corporation6

ibm corporation13

ibm corporation10

ibm corporation11

ibm corporation12

ibm corporation5

ibm corporation2

Documents

much ado comparison paper

much ado lesson1

miracles metaphysics and much ado

much ado about nothing?

acos: much ado about nothing (?)

much ado over nothing

much ado about twitter

much ado about...documents

much ado about nothing

vacuum – much ado about nothing

much ado about nothing - tarragon...

much ado - final 2011

much ado about nothing+ sun

much ado about nothing : research

much ado about nothing pronouns

much ado about nothing comic

much ado about nothing programme

much ado abridged

much ado about nothing - department of theatre &...

much ado... shakespeare1