improving xcf performance to keep up with cpc...
Post on 12-Dec-2018
246 Views
Preview:
TRANSCRIPT
1
Improving XCF Performance
to keep Up with CPC
Improvements
Donald Zeunert
Tuesday 1st November, 2016 (12:00 - 13:00)
Session LB in Woodcote
2
Objectives
• Why should you care about XCF Performance
• Why can XCF performance be an issue now
• How to determine source of issues
• How to correct performance issues
3
Common XCF Users
• GRS, VTAM, FTP
• CICS, IMS OTMA
Why should I care ? –Response of online applications
4
Now (2016)
• CPs too fast / expensive
– Sharing ICFs
– Using %GCP for ICF
• CF Links
– ICs approx. 9 Gbps
– CS5 – fastest 6 Gbps
• Higher XCF volumes
– Mobile / Web workloads
• Tuning Ignored or outdated
Then (2006)
• Last IBM XCF White Paper
• z9 (2094) Sep 16, 2005
– 2094-701 81 MSU CP
• CF Links
– ICs Internal – 3.2 Gbps
– ICB-4 fastest at 1.5 Gbps
– IC3 > 10km slowest 0.1 Gbps
• z9 Physical Memory $, size
Why Issue Now ? Hardware Advances
5
CPC 2
LPAR CXCF
Typical Multi-CPC Transport Class Paths
Class – Path Definitions
• CLASSDEF CLASS(DEFAULT)
CLASSLEN(956)
GROUP(UNDESIG)
• PATHOUT CLASS(DEFAULT)
MAXMSG(2000)
STRNAME(IXCPLEX_DEF1)
• PATHOUT CLASS(DEFAULT)
MAXMSG(2000)
STRNAME(IXCPLEX_DEF2)
• PATHOUT CLASS(DEFAULT)
DEVICE(C400)
CPC 1
LPAR A
XCF
CTC Paths
XCF
CF Paths
DIAGRAM of 1 Transport Class
LPAR B
6
Coupling Facility – Faster Data transfer
Best %Slow Expected Data Transfer Rate (MB/sec)x-CPC Same IC CS5 IFB3 IFB 12x IFB 1x ICB-4Avg CPC <70m 12x
z9 EC (2005) 2750 31% 4000 600 1500z10 EC (2008) 4500 40% 7500 1000 400 1500z196 (2010) 6950 22% 8900 5000 1000 400 N/AzEC12 (2012) 7200 23% 9400 5000 1000 400 N/Az13 (2015) 7250 15% 8500 6000 5000 1000 400 N/A
Note: IBM XCF Tuning in 2006 on z9 hardware
X-CPC > 2.5x faster
Same CPC > 2x faster
7
Determining CF resources for XCF
SDSF SYSLOG (Production often 2 CFs)
COMMAND INPUT ===> /D XCF,CF
RESPONSE=IMSA
IXC361I 13.56.40 DISPLAY XCF 721
CFNAME COUPLING FACILITY SITE
CF11 002964.IBM.02.0000000B62E7 N/A
PARTITION: 0D CPCID: 00
CF13 002964.IBM.02.0000000B62E7 N/A
PARTITION: 0C CPCID: 00
----------------------------------------------
COMMAND INPUT ===> /D XCF,CF
RESPONSE=BMCB
IXC361I 14.23.39 DISPLAY XCF 052
CFNAME COUPLING FACILITY SITE
CF0B 002964.IBM.02.0000000B62E7 N/A
PARTITION: 2F CPCID: 00
SDSF SYSLOG
COMMAND INPUT ===> /D XCF,PATHIN RESPONSE=BMCB
IXC355I 14.13.36 DISPLAY XCF 060
PATHIN FROM SYSNAME: BMCA
STRNAME: IXCLIST1 IXCLIST2
IXCLIST3 IXCLIST4
IXCLIST5
How many Paths defined?
What type Coupling Facility on the LPAR A, B?z Systems z13s (2965)z Systems z13 (2964)zEnterprise BC12 (2828)zEnterprise EC12 (2827)zEnterprise 114 (2818)zEnterprise 196 (2817)
CF speed relative to GCP
8
What are XCF Transport Class definitions
COMMAND INPUT ===> /D XCF,CLASSDEF IXC343I 14.44.52 DISPLAY XCF 116
TRANSPORT CLASS: DEFAULT DEFLARGE DEFMED DEFSMALL DEFXLRGE
COMMAND INPUT ===> /D XCF,CD,CLASS=DEFLARGE
IXC344I 14.47.08 DISPLAY XCF 310
TRANSPORT CLASS DEFAULT ASSIGNED
CLASS LENGTH MAXMSG GROUPS
DEFLARGE 8124 5000 UNDESIG
DEFLARGE TRANSPORT CLASS USAGE FOR SYSTEM BMCA
SUM MAXMSG: 10000 IN USE: 100 NOBUFF: 0
SEND CNT: 101142 BUFFLEN (FIT): 8124
DEFLARGE TRANSPORT CLASS USAGE FOR SYSTEM BMCB
SUM MAXMSG: 5000 IN USE: 0 NOBUFF: 0
SEND CNT: 27360 BUFFLEN (FIT): 8124
• Many customers
only have 2 XCF
transport classes.
• Default and some
larger size
• Issue class
command for each
one
10
Determine XCF Capabilities
Robust Slow / stall Unacceptable
Production Test Test w/ Load
11
KPIs of Concern
• Coupling Facility
– CF CPU Utilization and # of effective processors
– SYNC times of Lock structures for reference
– ASYNC Service times and Standard Deviations
• XCF
– # of transport classes and paths, path busy
– Message sizes / fit
– Message transfer times • Only in SMF recs or XCF display commands
12
Sample Range of Customer data
XCF ISSUES CF Issues
Deply High XferPath issue Class Sz Q/Bsy/Rtry Async StdDev Sync ICF Shr
ICF %Bsy
%Shr GCP
DED Issue
Bank A Test 0.06 N 2-Small 30 40 33 0.0 50&70 x% n N/ABank A Prod 0.3 N 2-Small 32 40 31 0.0 30&50 x% n N/ABank B Test 0.1 N 2-Small 0/50%/0 <40 500 <10 100 12 N/A PRSM?Bank C Test 2.4 N N 0.04/35/0 250 4.8K 460 0.0 56 N/A N/ABank D test >100 N N 0.01/0/3 >275 5K 2310 1% / 6 ? 1% / 6 N/AGovernment 1 Own/Prod > 2.0 N N 0/0/0 800 5K <10 100 < 1 N/A Y/ GCP%Bank E test 0.002 – 50% Y N 0/133/0 1.9K 44K ? 100 ? ? Y/ GCP%Insurance A Prod 0.00025 N ? Y/ GCP%
13
Coupling Facility Overview
Good - No Path or subchannel busy,
Dynamic Dispatch=THIN improve CF Service StdDev
CF %Busy - Bad > 50% for 2, >30% for 1
ICF Share - Bad very small < 0.005% of a CP.
Desired >= 0.1
Production typically Dedicated=1
Note: z9+ w/ Dedicated ICF should deliver 7-10 microsec Sync service to DB2 Lock structures
14
CF Structures used by XCF (IXC*)
XCF Structure % of all CF CPU
utilization. In n-way DS DB2
and GRS typically use large %
High XCF Async Service times on CF structures will cause high XCF message transfer times
High Standard Deviations will lead to erratic response
Async Service times
Test 40-80 µ acceptable
Prod 20-50 µ typical
Standard Deviations
Shared CF – 3x of above ok
Dedicated CF – 2x typical
15
CF SYNC Service times (Good)C O U P L I N G F A C I L I T Y A C T I V I T Y
z/OS V2R2 SYSPLEX PLEX1 DATE 06/09/2016 INTERVAL 015.00.000
RPT VERSION V2R2 RMF TIME 07.45.00 CYCLE 01.000 SECONDS
----------------------------------------------------------------------------------------------------------
COUPLING FACILITY NAME = ZCF01
----------------------------------------------------------------------------------------------------------
COUPLING FACILITY STRUCTURE ACTIVITY
----------------------------------------------------------------------------------------------------------
STRUCTURE NAME = ISGLOCK TYPE = LOCK STATUS = ACTIVE
# REQ -------------- REQUESTS ------------- -------------- DELAYED REQUESTS -------------
SYSTEM TOTAL # % OF -SERV TIME(MIC)- REASON # % OF ---- AVG TIME(MIC) -----
NAME AVG/SEC REQ ALL AVG STD_DEV REQ REQ /DEL STD_DEV /ALL
MVSA 281K SYNC 281K 74.8 6.3 2.8 NO SCH 0 0.0 0.0 0.0 0.0
312.2 ASYNC 0 0.0 0.0 0.0 PR WT 0 0.0 0.0 0.0 0.0
CHNGD 0 0.0 INCLUDED IN ASYNC PR CMP 0 0.0 0.0 0.0 0.0
SUPPR 0 0.0
MVSB 7303 SYNC 7303 1.9 8.4 4.3 NO SCH 0 0.0 0.0 0.0 0.0
8.11 ASYNC 0 0.0 0.0 0.0 PR WT 0 0.0 0.0 0.0 0.0
CHNGD 0 0.0 INCLUDED IN ASYNC PR CMP 0 0.0 0.0 0.0 0.0
SUPPR 0 0.0
16
XCF (SYNC vs ASYNC) NE CF SYNC
XCF
Buffer
XCF
Pathout
CICS
Appl.
PGM
80k
32k
32k
XCF
Buffer
XCF
Buffer
32k
XCF
Buffer
16kXCF
Buffer
Other
XCF
Applications
Buffer /
DB2
thread in
CICS
17
Not only CF issue
CF01 BC12 Req/Sec #req% all Req Avg Srv StdDev
MVSA IXC_SIG1 DEFAULT 56.13 51K 13.80 200.80 1,865MVSB IXC_SIG1 DEFAULT 24.99 22K 6.20 656.50 6,392
2 Dedicated ICFs shared 2 CPCs, both < 10% busy. Still service time issues
Two LPARs same CPC using the same CF w/ drastically different service times
Issue is MVSB low PR/SM entitlement using all of it on a CPC at 75%
18
LPAR Weight – SYNC vs ASYNCShare LOG Ent MSU -
PARTITION LPAR WEIGHT % CP Shr CNT MSU USEDLP01 MVSA 200 61.92% 1.24 2 24.77 19LP02 MVSB 41 12.69% 0.25 2 5.08 5LP03 MVSC 41 12.69% 0.25 2 5.08 2LP04 MVSD 41 12.69% 0.25 2 5.08 1TOTAL 323 100.00% 2
SYNC Req
zOS LPAR
keeps GCP
ASYNC Req
zOS LPAR may
give up GCP
19
CF Local vs Remote ASYNC Service
64K
1K
64K
1K
64K Buffer is NOT full but costs
Remote via CFLink > 3x the 1K
21
XCF Usage Overview
How many transport classes are defined (Defsmall, Defmed, etc.), what sizes ~1K, 4K, 8K, etc., # paths
Which transport classes moved the most messages (Find associated CF structure)
22
XCF Path details
No queueing on paths
1% queueing acceptable
If > then add more paths
Highest XCF traffic was on CF structure IXCLIST2
Which matches the CF structure report of highest user
23
XCF Path Batch Report (CMF)PRODUCED BY CMF ANALYZER (6.0.00 PUT 1502B)
Run Parm XCF TYPE=BOTH
----------------------------- PATH UTILIZATION SECTION ----------------------------
FROM TO TRANSPORT FROM-TO DEVICE/ REQUESTS AVERAGE AVG XFR -- RETRY -- -PERFORMANCE PERCENTAGES-
SYSTEM SYSTEM CLASS STRUCTURE SATISFIED QUEUE LEN TIME LIMIT COUNT REFUSED APPEND. IMMED. STATUS
IMSA ESAJ <INBOUND> IXCLIST1 172 0.00 0.535 10 0 0.00 0.00 0.00 ACTIVE
IMSA ESAJ <INBOUND> IXCLIST2 104,996 0.00 4.187 10 0 0.00 0.00 0.00 ACTIVE
IMSA ESAJ <INBOUND> IXCLIST3 208 0.00 0.190 10 0 0.00 0.00 0.00 ACTIVE
IMSA ESAJ <INBOUND> IXCLIST4 922 0.00 1.027 10 0 0.00 0.00 0.00 ACTIVE
IMSA ESAJ <INBOUND> IXCLIST5 307 0.00 1.316 10 0 0.00 0.00 0.00 ACTIVE
Average Transfer Times
• Are one way times on Inbound reports (need all LPARs)
• Are only calculated for the last minute of an SMF Interval
• Good service times are 0.02 or less
• Shared CFs often cause CF StdDev
SMF 74(2) field R742PIOT is
microsecs. But CMF/RMF/TDS
show in millisecs. Displayed
0.535 millisecs is 535 microsecs.
So ~5.0 ms adds 10.0 ms round
trip or 0.01 secs per call
24
XCF Path Batch Report (RMF)CONVERTED TO z/OS V2R1 RMF TIME 11.29.41 CYCLE 1.000 SECONDS
Run parm - REPORTS(XCF)
TOTAL SAMPLES = 300 XCF PATH STATISTICS--------------------------------------------------------------------------------------------------------------------------------
OUTBOUND FROM TX41 INBOUND TO TX41
--------------------------------------------------------------------------- -------------------------------------------------
T FROM/TO T FROM/TO
TO Y DEVICE, OR TRANSPORT REQ AVG Q FROM Y DEVICE, OR REQ BUFFERS TRANSFER
SYSTEM P STRUCTURE CLASS OUT LNGTH AVAIL BUSY RETRY SYSTEM P STRUCTURE IN UNAVAIL TIME
TX42 S IXC1 DEFAULT 8,175 0.00 8,175 0 0 TX42 S IXC1 12,894 0 0.068
S IXC2 DEFAULT 36,773 0.00 36,773 0 0 S IXC2 41,389 0 1.932
S IXC3 BIG 35 0.00 35 0 0 S IXC3 115 0 0.120
S IXC4 BIG 9 0.00 9 0 0 S IXC4 45 0 0.098
C 0B2C TO 0F2C DEFAULT 1,639 0.00 1,639 0 0 C 0F2A TO 0B2A 5 0 0.190
C 0B2D TO 0F2D DEFAULT 2,639 0.00 2,639 0 0 C 0F2B TO 0B2B 65 0 0.219
C 0B2E TO 0F2E DEFAULT 4,646 0.00 4,646 0 0 C 0F28 TO 0B28 5 0 0.186
C 0B2F TO 0F2F MID 67 0.00 67 0 0 C 0F29 TO 0B29 5 0 0.183
Customer reduced RMF interval to 5 Mins (300) and ran load tests 6 Mins to ensure running in last
minute. Average Transfer times too high 1.9 vs < 0.1, preferred < 0.06
The type field contains S=Structure (CF) or C=Channel to Channel (CTC).
The typically slower CTCs are 10x faster than this shared CF structure
25
XCF Transfer times – Display Command
MVS Console Command - D XCF,PI,DEVICE=ALL,STATUS=WORKINGNeeds to be issued on sending and receiving LPARs, suggest 1x/ minute to detect variances
D XCF,PI,DEVICE=ALL,STATUS=WORKING
IXC356I 12.02.12 DISPLAY XCF 901
LOCAL DEVICE REMOTE PATHIN REMOTE LAST MXFER
PATHIN SYSTEM STATUS PATHOUT RETRY MAXMSG RECORD TIME
C200 JA0 WORKING C200 10 500 3496 339
C220 JA0 WORKING C220 10 500 3640 419
Useful if Application test can not be ensured to;
run for > 1 minute or
be running in last minute of RMF Interval
Batch reporting not timely
SMF dumped at end of day and not allowed to run against active data
26
XCF Performance Metrics• XCF Display commands @ intervals
• Provides consistent workload with more
than just Average
• Critical data points for outliers shown
• RMF / CMF XCF Avg Transfer times
only last min of interval Ou
tlie
rs
27
Example customer capabilities compareLPAR to LPAR statistics: Prd 1-2 Prd 2-1 Prd Avg Tst 1-2 Tst 2-1 Test Avg Test % WorseAverage XMIT time 0.000064 0.000061 0.000063 0.001322 0.000589 0.000956 1429%
29945 29950 29947.5 29845 29568 29706.5 -0.80%5 0 2.5 655 319 487 19380%0 0 0 0 0 0 0.00%
0.003385 0.001460 0.002423 0.274515 0.205271 0.239893 9803%0.002837 0.001043 0.001940 0.266698 0.190904 0.228801 11694%0.002736 0.000702 0.001719 0.253822 0.189308 0.221565 12789%0.002171 0.000641 0.001406 0.248940 0.178768 0.213854 15110%0.002109 0.000634 0.001372 0.243544 0.177490 0.210517 15249%0.001884 0.000633 0.001259 0.242594 0.172368 0.207481 16386%0.001804 0.000624 0.001214 0.234656 0.170282 0.202469 16578%0.001722 0.000608 0.001165 0.234443 0.169376 0.201910 17231%0.001650 0.000552 0.001101 0.232075 0.165994 0.199035 17978%0.001595 0.000512 0.001054 0.226652 0.165200 0.195926 18498%
Prod
14.29x
faster
than
test
28
Job(s) delays by XCF
CMF (all jobs) & RMF (a job) view of any
jobs delayed by XCF
30
XCF Transport class – buffer size analysis
• Only 2 Xport class
sizes
• LongMsg transport
class
• Not effectively sized
• Too Big (%Sml) for
• 32-75% Msgs
• Too Small (%big)
for
• 32-66% Msgs
TO TRANSPORT BUFFER REQ % % %SYSTEM CLASS LENGTH OUT SML FIT BIGSYSJ DEFAULT 956 75,579 0 100 0SYSJ LONGMSG 36,796 955 75 1 25
SYSA LONGMSG 36,796 355 32 2 66SYSB LONGMSG 36,796 617 61 1 38SYSC LONGMSG 36,796 389 38 2 60SYSF LONGMSG 36,796 617 61 1 38SYSJ LONGMSG 36,796 955 75 1 25SYSL LONGMSG 36,796 355 32 2 66SYSN LONGMSG 36,796 355 32 2 66SYSP LONGMSG 36,796 359 33 2 65SYSQ LONGMSG 36,796 355 32 2 66SYSU LONGMSG 36,796 355 32 2 66
Need Production LPAR reports to confirm same size issues
31
XCF – Display select Structures• SDSF SYSLOG 18907.114 BMCB BMCB 03/11/2016 9W
• COMMAND INPUT ===> D XCF,STR,STRNAME=IXC*
• IXC360I 14.03.20 DISPLAY XCF 574
• STRNAME: IXCLIST1
• STATUS: ALLOCATED
• TYPE: LIST
• POLICY INFORMATION:
• DUPLEX : DISABLED
• ALLOWREALLOCATE: YES
• PREFERENCE LIST: CF0B (useful if > 1 path for xportclass)
• ENFORCEORDER : NO
• EXCLUSION LIST IS EMPTY
•
• ACTIVE STRUCTURE
• ----------------
• ALLOCATION TIME: 02/27/2016 04:12:06
• CFNAME : CF0B
• COUPLING FACILITY: 002964.IBM.02.0000000B62E7
• PARTITION: 2F CPCID: 00
• STORAGE CONFIGURATION ALLOCATED MAXIMUM %
• ACTUAL SIZE: 33 M 33 M 100
•
• SPACE USAGE IN-USE TOTAL %• ENTRIES: 1 6052 0
• ELEMENTS: 16 6009 0
•
• PHYSICAL VERSION: D05CDEE9 131A6BC6
• LOGICAL VERSION: D05CDEE9 131A6BC6
• SYSTEM-MANAGED PROCESS LEVEL: NOT APPLICABLE
• DISPOSITION : DELETE
• ACCESS TIME : 0
• MF AX CONNECTIONS: 32
• # CONNECTIONS : 2 (all Plex members connected?)
•
• CONNECTION NAME ID VERSION SYSNAME JOBNAME ASID STATE
• ---------------- -- -------- -------- -------- ---- ---------
• SIGPATH_01000023 01 00010013 BMCA XCFAS 0006 ACTIVE
• SIGPATH_02000024 02 00020011 BMCB XCFAS 0006 ACTIVE
32
XCF Usage / Fit StatisticsCOMMAND INPUT ===> /D XCF,CD,CLASS=DEFXLRGRESPONSE=IMSA
IXC344I 15.00.23 DISPLAY XCF 708
TRANSPORT CLASS DEFAULT ASSIGNED
CLASS LENGTH MAXMSG GROUPS
DEFXLRG 62396 10000 UNDESIG
DEFXLRG TRANSPORT CLASS USAGE FOR SYSTEM ESAJ
SUM MAXMSG: 30000 IN USE: 990 NOBUFF: 0
SEND CNT: 15722 BUFFLEN (SML): 32700
SEND CNT: 5948 BUFFLEN (SML): 36796
SEND CNT: 1695 BUFFLEN (SML): 40892
SEND CNT: 524 BUFFLEN (SML): 44988
SEND CNT: 438 BUFFLEN (SML): 49084
SEND CNT: 610 BUFFLEN (SML): 53180
SEND CNT: 483 BUFFLEN (SML): 57276
SEND CNT: 2037 BUFFLEN (SML): 61372
SEND CNT: 96583 BUFFLEN (FIT): 62464
DEFXLRG TRANSPORT CLASS USAGE FOR SYSTEM SYSM
SUM MAXMSG: 30000 IN USE: 660 NOBUFF: 0
SEND CNT: 13571 BUFFLEN (SML): 32700
SEND CNT: 5216 BUFFLEN (SML): 36796
SEND CNT: 1396 BUFFLEN (SML): 40892
SEND CNT: 108 BUFFLEN (SML): 44988
SEND CNT: 55 BUFFLEN (SML): 49084
SEND CNT: 6 BUFFLEN (SML): 53180
SEND CNT: 15 BUFFLEN (SML): 57276
SEND CNT: 800385 BUFFLEN (FIT): 62464
Large # of Messages would have
fit in 32K Transport class
Wasted 32K of fixed real
D XCF,CD,CLASS=ALL
Would show all of them
34
Options
• Suggest CF & XCF Changes
• CF DYNDISP=THIN, if not used (see appendix)
• Increase PR/SM weight of CF partition
• Ensure CF has ICF or CP not shared w/ zOS LPARs
• Additional XCF Paths if path busy > 2%
• Dedicated XCF transport class if
• existing congested
• Poor sizes available (waste space or too may Packets)
35
Requested Changes not possible
• Calculate delay in test from XCF transfer times
• Extrapolate production response from lower XCF times
• Cost justify additional CF capacity based on MLC savings
36
Additional info on Sysplex (CF, XCF)
Tuning• IBM WSC Flash “Parallel Sysplex Performance”
FLASH10011
• Redbook - System z Parallel Sysplex Best Practices
SG24-7817
• z/OS MVS Setting up a Sysplex (SA23-1399)
– Chapter 6. Tuning a Sysplex
• XCF (first and large section)
37
Coupling Thin Interrupts
• Requirements
• CFCC Level 19 (Sept 2013) on zEC12 and zBC12, or L20 on z13
• z/OS V1R12 and V1R13+ w/ zOS APAR OA38734 from 2013-08-08
• DYNDISP keyword for the CF, the choices are;
• OFF (for dedicated CPs typically production)
• ON (old option for CP sharing) – fixed time slice
• THIN (new high performance option for CP Sharing) – interrupt driven
• IBM Announcement for CFCC Level 19
• http://www-03.ibm.com/systems/z/advantages/pso/whatsnew.html
• IBM CF Performance report recommending and documenting performance of new option
• http://www-03.ibm.com/support/techdocs/atsmastr.nsf/5cb5ed706d254a8186256c71006d2e0a/b6f20816aca23acc86257c580053a8cb/$FILE/Coupling%20Thin%20Interrupts%2020131217.pdf
38
Software – Additional Sessions
Day Start End Room Session # / Title Speaker
Tue 12:00 13:00 WoodcoteLB: Improving XCF performance to keep up with CPC improvements
Don Zeunert
Tue 12:00 13:00 MonzaIB- DB2 Partitioning- choices, choices, choices
Phil Grainger
Tue 15:15 16:00 Indianapolis BD: "System z” Scalabilty Don Zeunert
Wed 12:00 13:00 PrioryHH: How to Improve IMS Scheduling
Loc Tran
Wed 14:00 14:45 MonzaIJ: Solving DB2 Performance Problems
Phil Grainger
Stop by the BMC booth for more information about sessions or other Q/A with speakers
39
Session feedback
• Please submit your feedback at
http://conferences.gse.org.uk/2016/feedback/lb
• Session is LB
Contact: Donald_Zeunert@bmc.com
top related