much ado about cpu
Post on 06-Dec-2014
1.441 Views
Preview:
DESCRIPTION
TRANSCRIPT
© 2011 IBM Corporation
zZS28 Much Ado About CPUMartin Packer
IBM System z Technical University – Vienna , Austria – May 2-6
© 2011 IBM Corporation2
IBM System z Technical University – Vienna , Austria – May 2-6
AbstractSystem z and zEnterprise processors have in recent years introduced a number of capabilities of real value to mainframe customers. These capabilities have, however, required changes in the way we think about CPU management.
This presentation describes these capabilities and how to evolve your CPU management to take them into account. It is based on the author's experience of evolving his reporting to support these changes.
This presentation is substantially enhanced this year
© 2011 IBM Corporation3
IBM System z Technical University – Vienna , Austria – May 2-6
Agenda
A brief review of technologyUnfinished Business?Coupling Facility CPUzAAP and zIIPz/OS Release 10 ChangesSoft Capping and Group Capacity LimitsBlocked Workloadsz10 HiperdispatchCool ItI/O Assist Processors (IOPs)SMF 23 and 113In Conclusion
© 2011 IBM Corporation4
IBM System z Technical University – Vienna , Austria – May 2-6
R
A Brief Review of Technology
© 2011 IBM Corporation5
IBM System z Technical University – Vienna , Austria – May 2-6
"Characterisable" Engines–GCPs - Pool 1–(Obsolete Pool 2)–ICFs - Pool 5–IFLs - Pool 3–zAAPs - Pool 4–zIIPs – Pool 6
● “Non-Characterisable" Engines―SAPs―Spares
With zEnterprise zBX other engines―Not connected in the same way at all―Not discussed here
―Treating as a “z11”
© 2011 IBM Corporation6
IBM System z Technical University – Vienna , Austria – May 2-6
Book-Structured● Connected by a ring in z9
● z10 and zEnterprise ensure all books connected to all books directly● Data transfers are direct between books via the L2 Cache chip in each book's
MCM● L2 Cache is shared by every PU on the MCM
● zEnterprise has an additional per-chip level of cache – and nomenclature “cleaned up”
● Only 1 book in BC models
© 2011 IBM Corporation7
IBM System z Technical University – Vienna , Austria – May 2-6
IRD CPU ManagementWeight Management for GCP engines
–Alter weights within an LPAR Cluster–Shifts of 10% of weight
CP Management–Doesn't work with HiperDispatch
–Vary LOGICAL CPs on and off–Only for GCP engines
WLM objectives–Optimise goal attainment–Optimise PR/SM overhead
–Optimise LPAR throughput
Part of "On Demand" picture–Ensure you have defined reserved engines–Make weights sensible to allow shifts to happen
© 2011 IBM Corporation8
IBM System z Technical University – Vienna , Austria – May 2-6
Unfinished Business?How do we evolve our performance and capacity reporting?Should we define an LPAR with dedicated engines?
–Or with shared engines?•What should the weights be?
- In total and individually- And what about the total for each pool?
-How many engines should each LPAR have?
-And IRD makes all this so much more dynamic
© 2011 IBM Corporation9
IBM System z Technical University – Vienna , Austria – May 2-6 Increasing ComplexityInstallations are increasing the numbers of LPARs on a machine
–Many exceed 10 per footprint● Expect 20 + soon● My record: 51 and 52, 56
● 33 and 34 active, respectively
―And have more logical and physical engines―And increasing the diversity of their LPARs
●Greater incidence of IFLs●Fast uptake of zIIPs and zAAPs
●Sometimes meaning 2 engine speeds
●Fewer stand-alone CF configurations―With mergers etc. the numbers of machines managed by a team is increasing―And stuff's got more dynamic, too―As an aside...
● Shouldn't systems be self-documenting?
© 2011 IBM Corporation10
IBM System z Technical University – Vienna , Austria – May 2-6
Coupling Facility CPU
© 2011 IBM Corporation11
IBM System z Technical University – Vienna , Austria – May 2-6
•Managed out of Pool 5
–Pool numbers given in SMF 70 as index into table of labels
– Label is “ICF”
Recommendation: Manage in reporting as a separate pool
Follow special CF sizing guidelines
–Especially for takeover situations
Always runs at full speed
So good technology match for coupled z/OS images on same footprint
Another good reason to use ICFs is IC links
Shared ICFs strongly discouraged for Production
Especially if the CF image has Dynamic Dispatch turned on
Internal Coupling Facility (ICF)
© 2011 IBM Corporation12
IBM System z Technical University – Vienna , Austria – May 2-6
ICF ...
Need to correlate SMF 70-1 with SMF 74-4 CF Utilisation to get proper CPU picture
Since z/OS Release 8 74-4 has machine serial numberAllows correlation in most casesPartition number added to 74-4 in OA21140
• Enables correlation with 70-1 when LPAR name is not the Coupling Facility Name
© 2011 IBM Corporation13
IBM System z Technical University – Vienna , Austria – May 2-6
Structure-Level CPU Consumption
CFLEVEL 15 and z/OS R.9
Always 100% Capture Ratio
Adds up to R744PBSY
Multiple uses:
Capacity planning for changing request rates
Examine which structures are large consumers
Compute CPU cost of a request
• And compare to service time• Interesting number is “non-CPU” element of service time
– as we shall see
NOTE:Need to collect 74-4 data from all z/OS systems sharing to get total request rate
© 2011 IBM Corporation14
IBM System z Technical University – Vienna , Austria – May 2-6
Structure CPU ...
Where not trivial I plot Sync Request %Shows if deterioration with load
Different request types and technologies behave markedly differentlyFor example modern lock structures locally accessed are typically around 5us CPU and
elapsed or lowerFor example XCF structures often in hundreds of us elapsed
• And quite high CPU• Though obviously all async
© 2011 IBM Corporation15
IBM System z Technical University – Vienna , Austria – May 2-6
zAAP and zIIP
© 2011 IBM Corporation16
IBM System z Technical University – Vienna , Austria – May 2-6
zAAP and zIIP
Must each not exceed number of GCPs
Run at full speed, even if GCPs don't
•Instrumentation documents “speed” difference
Hardcapping but no softcapping
•No Resource Group cappingNot managed by IRD
–Weight is the INITIAL LPAR weight
© 2011 IBM Corporation17
IBM System z Technical University – Vienna , Austria – May 2-6
© 2011 IBM Corporation18
IBM System z Technical University – Vienna , Austria – May 2-6
zAAP on zIIP
New with z/OS Release 11Retrofitted to R.9 and R.10 with OA27495
Not available if you already have zAAPs installedOr have reserved zAAP logical engines
Designed to enable further use of perhaps-underused zIIPs
Does not change the configuration rules relative to GCPs
Does not suddenly make zAAP-eligible work look like zIIP-eligible in terms of SRBs etc
No special metricseg zAAP work now in zIIP bucketeg zAAP-eligible now in zIIP-eligible bucket
© 2011 IBM Corporation19
IBM System z Technical University – Vienna , Austria – May 2-6
zIIP Instrumentation – Subsystems and Address SpacesInstrumentation on consumption and potential for a number of
exploiters:Latter is eg “zAAP on GCP”
Type 30 Address Space – Interval and Step/Job-EndTakes RMF Workload Activity (72-3) to address space level
DB2 Accounting TraceType 101 shows zIIP USED times by usage category
• At plan and package level• ELIGIBLE is only reported on up to Version 9
Websphere Application ServerType 120 Subtype 9 (Request Activity)
• Both zIIP and zAAP usage and potential
© 2011 IBM Corporation20
IBM System z Technical University – Vienna , Austria – May 2-6
z/OS Release 10 Changes
© 2011 IBM Corporation21
IBM System z Technical University – Vienna , Austria – May 2-6
z/OS Release 10 Changes
All RMF Records
Whether at least one zAAP was online
Whether at least one zIIP was online
In Type 70 and retrofitted to supported releases:
Permanent and Temporary Capacity Models and 3 capacities
Hiperdispatch
• To be covered in a few minutes
© 2011 IBM Corporation22
IBM System z Technical University – Vienna , Austria – May 2-6
Defined- and Group- Capacity
instrumentation
© 2011 IBM Corporation23
IBM System z Technical University – Vienna , Austria – May 2-6
Soft Capping and Group CapacityDefined Capacity
A throttle on the rolling 4-hour average of the LPARƒ When this exceeds the defined capacity PR/SM softcaps the LPARƒ CPU delay in RMF
SMF70PMA Average Adjustment Weight for pricing managementSMF70NSW Number of samples when WLM softcaps partition
Group Capacity
Similar to Defined Capacity but for groups of LPARs on the same machines
SMF70GJT Timestamp when the system joined the Group Capacity groupSMF70GNM Group nameSMF70GMU Group Capacity MSU limit
© 2011 IBM Corporation24
IBM System z Technical University – Vienna , Austria – May 2-6 Exceeding 8 MSUs (MSU_VS_CAP > 100%) in the morning leads to active capping (SOFTCAPPED > 0%). Note: OCPU and O2 are CPU Queuing numbers
© 2011 IBM Corporation25
IBM System z Technical University – Vienna , Austria – May 2-6
Group Capacity Limits
Each partition (z/OS system) manages itself
Group capacity is based on defined capacity implementation4hr rolling average of group MSU consumption is used for managing the group's
partitions
Each partition is aware of the consumption of all other partitions on the CPC And identifies all other partitions that are member of the same capacity groupCalculates its defined share of the capacity group, based on the partition weight.
• This share is the target for the partition if all partitions of the group want to use as much CPU as possible
If some LPARs do not consume their share the unused capacity will be distributed over those LPARs that need additional capacity
If a defined capacity limit is defined to a partition that limit will not be violated even when the partition receives capacity from others.
WLM will only manage partitions with shared CPs and WC=NO
© 2011 IBM Corporation26
IBM System z Technical University – Vienna , Austria – May 2-6
LPAR Table Fragment for Group Capacity
© 2011 IBM Corporation27
IBM System z Technical University – Vienna , Austria – May 2-6
Blocked Workloads
© 2011 IBM Corporation28
IBM System z Technical University – Vienna , Austria – May 2-6
z/OS Release 9 Blocked Workload SupportRolled back to R.7 and R.8
Blocked workloads:Lower priority work may not get dispatched for an elongated timeMay hold a resource that more important work is waiting for
WLM allows some throughput for blocked workloads
By dispatching low important workload from time to time, these “blocked workloads” are no longer blocked
Helps to resolve resource contention for workloads that have no resource management implemented
Additional information in WSC flash http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/FLASH10609
Additional instrumentation in 70-1 and 72-3
© 2011 IBM Corporation29
IBM System z Technical University – Vienna , Austria – May 2-6 IEAOPT BLWLTRPCT and BLWLINTHD (With
OA22443) BLWLTRPCT Percentage of the CPU capacity of the LPAR to
be used for promotion
Specified in units of 0.1%
Default is 5 (=0.5%)
Maximum is 200 (=20%)
Would only be spent when sufficiently many dispatchable units need promotion.
BLWLINTHD Specifies threshold time interval for which a blocked address space or enclave must wait before being considered for promotion.
Minimum is 5 seconds. Maximum is 65535 seconds.
Default is 60 seconds.
© 2011 IBM Corporation30
IBM System z Technical University – Vienna , Austria – May 2-6
Type 70 CPU Control Section
Type 72-3 Service/Report Class Period Data Section
© 2011 IBM Corporation31
IBM System z Technical University – Vienna , Austria – May 2-6
IBM System z10 EC HiperDispatch
© 2011 IBM Corporation32
IBM System z Technical University – Vienna , Austria – May 2-6
HiperDispatch – z10 EC unique function
– Dispatcher Affinity (DA) - New z/OS Dispatcher
– Vertical CPU Management (VCM) - New PR/SM Support
Hardware cache optimization occurs when a given unit of work is consistently dispatched on the same physical CPU
– Up until now software, hardware, and firmware have acted independently of each other
– Non-Uniform-Memory-Access has forced a paradigm change
• CPUs have different distance-to-memory attributes
• Memory accesses can take a number of cycles depending upon cache level / local or remote memory accessed
The entire z10 EC hardware / firmware / OS stack now tightly collaborates to manage these effects
z10 EC HiperDispatch
© 2011 IBM Corporation33
IBM System z Technical University – Vienna , Austria – May 2-6
New z/OS Dispatcher– Multiple dispatching queues
• Average 4 logical processors per queue
– Tasks distributed amongst queues
– Periodic rebalancing of task assignments
– Generally assign work to minimum # logicals needed to use weight
• Expand to use white space on box
– Real-time on/off switch (Parameter in IEAOPTxx)
– May require "tightening up" of WLM policies for important work• Priorities are more sensitive with targeted dispatching queues
z10 EC HiperDispatch – z/OS Dispatcher Functionality
© 2011 IBM Corporation34
IBM System z Technical University – Vienna , Austria – May 2-6
z10 EC HiperDispatch – z/OS Dispatcher Functionality…
Initialization:Single HIPERDISPATCH=YES z/OS parameter dynamically activates HiperDispatch
(full S/W and H/W collaboration) without IPL• With HIPERDISPATCH=ON, IRD management of CPU is turned OFF
Four Vertical High LPs are assigned to each Affinity Node
A “Home” Affinity Node is assigned to each address space / task
zIIP, zAAP and standard CP “Home” Affinity Nodes must be maintained for work that transitions across specialty engines
Benefit increases as LPAR size increases (i.e. crosses books)
© 2011 IBM Corporation35
IBM System z Technical University – Vienna , Austria – May 2-6
Workload Variability Issues:– Short Term
• Dealing with transient utilization spikes
– Intermediate• Balancing workload across multiple Affinity Nodes
– Manages “Home” Book assignment
– Long Term• Mapping z/OS workload requirements to available physical resources
– Via dynamic expansion into Vertical Low Logical Processors
z10 EC HiperDispatch – z/OS Dispatcher Functionality…
© 2011 IBM Corporation36
IBM System z Technical University – Vienna , Austria – May 2-6
New PR/SM Support–Topology information exchanged with z/OS
• z/OS uses this to construct its dispatching queues
–Classes of logicals• High priority allowed to consume weight
– Tight tie of logical processor to physical processor
• Low priority generally run only to consume white space
z10 EC HiperDispatch – PR/SM Functionality
© 2011 IBM Corporation37
IBM System z Technical University – Vienna , Austria – May 2-6
z10 EC HiperDispatch – PR/SM Functionality…
Firmware Support (PR/SM, millicode)
New z/OS invoked instruction to cause PR/SM to enter “Vertical mode” • To assign vertical LPs subset and their associated LP to physical CP mapping
– Based upon LPAR weight
Enables z/OS to concentrate its work on fewer vertical processors • Key in PR/SM overcommitted environments to reduce the LP competition for physical CP
resources
Vertical LPs are assigned High, Medium, and Low attributes
Vertical low LPs shouldn’t be used unless there is logical white space within the CEC and demand within LPAR
© 2011 IBM Corporation38
IBM System z Technical University – Vienna , Austria – May 2-6
z10 EC HiperDispatch Instrumentation
Hiperdispatch status– SMF70HHF bits for Supported, Active, Status Changed
Parked Time– SMF70PAT in CPU Data Section
Polarization Weight– SMF70POW in Logical Processor Data Section
• Highest weight for LPAR means Vertical High processor• Zero weight means Vertical Low processor• In-between means Vertical Medium processor
Example on next foil– 2 x Vertical High (VH)– 1 x Vertical Medium (VM)– 4 x Vertical Low (VL)– Because Hiperdispatch all engines online in the interval are online all
the time• But there are other engines reserved so with Online Time = 0
© 2011 IBM Corporation39
IBM System z Technical University – Vienna , Austria – May 2-6
Depiction Of An LPAR – With HiperDispatch Enabled
0
20
40
60
80
100
120
0 1 2 3 4 5 6
0
20
40
60
80
100
120
140
160
UNPARKED % PARKED % POLAR WEIGHT I/O %
© 2011 IBM Corporation40
IBM System z Technical University – Vienna , Austria – May 2-6 HiperDispatch “GA2” Support in RMF - OA21140
SMF70POF Polarisation Indicators Bits 0,100 is “Horizontal” or “Polarisation Not Indicated”01 is “Vertical Low”10 is “Vertical Medium”11 is “Vertical High”(Bit 2 is whether it changed in the interval)
SMF70Q00 - SMF70Q12 In & Ready counts based on the number of processors online and unparked
Refinement is to take into account parking and unparking
Also SMF70RNMNormalisation factor for zIIP
• Which happens to be the same for zAAP
Also R744LPN – LPAR NumberFor correlation with SMF 70
(Also zHPF support)
© 2011 IBM Corporation41
IBM System z Technical University – Vienna , Austria – May 2-6 “Cool It” - Cycle Steering
Introduced with z990http://www.research.ibm.com/journal/rd/483/goth.html
Refined in later processorsBOTH frequency- and voltage-reduction in z9
When cooling degraded processor progressively slowedMuch better than dyingRare event
• But should not be ignored
WLM Policy refreshedAdmittedly not that helpful a message:
• IWM063I WLM POLICY WAS REFRESHED DUE TO A PROCESSOR SPEED CHANGE• Automate it
SMF70CPA not changed• Used as part of SCRT• Talk to IBM and consider excluding intervals round such an event
R723MADJ is changed• Al Sherkow's news item shows an example:
– http://www.sherkow.com/updates/20081014cooling.html
In R.12 Types 89, 70, 72 and 30 have instrumentation for this situation
© 2011 IBM Corporation42
IBM System z Technical University – Vienna , Austria – May 2-6
IOPs – I/O Assist Processors
Not documented in Type 70Despite being regular engines characterised as IOPsNOT a pool
Instrumentation in Type 78-3Variable-length Control Section
• 1 IOP Initiative Queue / Util Data Section per IOP inside itProcessor Was Busy / Was Idle counts
• NOT Processor Utilisation as such• Suggest stacking the two numbers on a by-hour plot
I/O Retry counts• Channel Path Busy, CU Busy, Device Busy
Machines can be configured with different numbers of IOPsDepending on I/O intensiveness of workloads
• Generally speaking it's only TPF that is said to need extra IOPs
Analysis can help get this right
© 2011 IBM Corporation43
IBM System z Technical University – Vienna , Austria – May 2-6
SMF 23 and 113
© 2011 IBM Corporation44
IBM System z Technical University – Vienna , Austria – May 2-6
SMF 23
SMF 23 –The “SMF”recordNew extensions to the SMF 23 record
• Provide information related to Dispatching, Storage and I/O• Available on z/OS 1.8 and above
Why you’d want to collect them?They may provided a way to help characterize your workload to improve your capacity
planning• LoIO Mix zPCR is simply an estimate of your actual workload pattern
Record Size and IntervalSmall record - 210 bytes (258 bytes with “deltas”) per System per interval
© 2011 IBM Corporation45
IBM System z Technical University – Vienna , Austria – May 2-6
What is in the SMF 23s? - New Fields via APAR OA22414
StorageTotal Number of Getmain requests (NGR)Total Pages backed during Getmain requests (PBG)Total Number of Fixed requests for Storage below 2 GB (NFR)Total number of Frames for Fixed requests for Storage below 2 GB (PFX)
FaultsTotal number of first reference faults (1RF)Total number of non first reference faults (NRF)
I/OsTotal Number of I/Os (NIO)
Dispatches (Dispatch)Number of unlocked TCB Dispatches (TCB)Number of SRB Dispatches (SRB)
APAR OA27161–Closed 1/19/2009To provide “delta” counters for above fieldsOtherwise “cumulative” counters
© 2011 IBM Corporation46
IBM System z Technical University – Vienna , Austria – May 2-6
What is the z10 CPU Measurement Facility?
New hardware instrumentation facility “CPU Measurement Facility”(CPU MF)Available on System z10 EC GA2 and z10 BCSupported by a new z/OS component (Instrumentation), Hardware
Instrumentation Services (HIS)
Potential Future Uses –for this new “cool”virtualization technologyCPU MF provides support built into the processor hardware
• So exploiting mechanism allows the observation of performance behavior with nearly no impact to the system being observed
Potential Uses• Future workload characterization• ISV product improvement• Application Tuning
© 2011 IBM Corporation47
IBM System z Technical University – Vienna , Austria – May 2-6
CPU MF ...
Data collection done by System z hardwareLow overheadLittle/No skew in samplingAccess to information which is not available from software
SAMPLINGSAMPFREQ=800000 is default (samples per minute), = 13,333 /s
• 8M samples in 10 minutes is the default(DURATION=10 is the default, 10 minutes)
• Recommendation – Start with a small frequency, e.g. SAMPFREQ=320, and increase after early experiences – e.g. ensure enough disk space for output
– Smaller z10 BCs should increase only up to SAMPFREQ=130000 (for DURATION=60)
New IBM Research article“IBM System z10 performance improvements with software and hardware synergy”http://www.research.ibm.com/journal/rd/531/jackson.pdf
© 2011 IBM Corporation48
IBM System z Technical University – Vienna , Austria – May 2-6 COUNTERS
Basic Counter SetCycle countInstruction countLevel-1 I-cache directory write countLevel-1 I-cache penalty cycle countLevel-1 D-cache directory write countLevel-1 D-cache penalty cycle count
Problem State Counter SetProblem state cycle countProblem state instruction countProblem state level-1 I-cache directory write countProblem state level-1 I-cache penalty cycle countProblem state level-1 D-cache directory write countProblem state level-1 D-cache penalty cycle count
Extended Counter SetNumber and meaning of counters are model-dependent
© 2011 IBM Corporation49
IBM System z Technical University – Vienna , Austria – May 2-6
Crypto Activity Counter Set (CPACF activity)
PRNG function count
PRNG cycle count
PRNG blocked function count
PRNG blocked cycle count
SHA function count
SHA cycle count
SHA blocked function count
SHA blocked cycle count
DES function count
DES cycle count
DES blocked function count
DES blocked cycle count
AES function count
AES cycle count
AES blocked function count
AES blocked cycle count
© 2011 IBM Corporation50
IBM System z Technical University – Vienna , Austria – May 2-6 Sample Report – Basic / Extended Counters z10 L1 Cache Hierarchy Sourcing
© 2011 IBM Corporation51
IBM System z Technical University – Vienna , Austria – May 2-6
In Conclusion
© 2011 IBM Corporation52
IBM System z Technical University – Vienna , Austria – May 2-6 In ConclusionBe prepared for fractional engines, multiple engine pools, varying weights etcUnderstand the limitations of z/OS Image Level CPU Utilisation as a numberTake advantage of Coupling Facility Structure CPU
For Capacity Planning
For CF Request Performance Analysis
There’s additional instrumentation for Defined- and Group-Capacity limits z9, z10 and zEnterprise ARE different from z990 – and from each otherThe CPU data model is evolving
To be more complete
To be more comprehensibleTo meet new challenges
Such as Hiperdispatch’s Parked Time state
For example SMF 23 and 113
top related