rob van der heij velocity software, inc
TRANSCRIPT
© Copyright Velocity Software, 2006
Understanding Linux on z/VM Performance
Rob van der HeijVelocity Software, Inc
http://velocitysoftware.com/
© Copyright Velocity Software, 2006 2
Agenda
● Measurement Tools
– Performance Reporting
– Diagnostic Tools
– Velocity ESALPS – Linux Performance Suite
● Performance Analysis Examples
– Idle Linux Servers
– Analysis of Linux Q-drop
– VSWITCH Traffic
– Linux Swap
– Expanded Storage
© Copyright Velocity Software, 2006 3
Performance Tools
Routine Performance Measurements
– Planned reporting on known performance metrics
● Real-time monitoring (CPU Usage, Top-users)
● Daily reports on CPU Usage, I/O utilization, accounting
– Works well in “stable” environment
● No drastic application changes
● No hardware changes VM PTK ESALPS Linux Tools
Realtime Monitor
z/VM
LinuxPerformance Reports
z/VM
Linux
© Copyright Velocity Software, 2006 4
Performance Tools
Performance Problem Solving
– Review detailed performance data
● Compare different “identical” servers or similar time periods
● Understand correlation between symptoms
– Reproduce your performance problem
● Set up instrumentation and do it again (if you can)
– Measure performance impact of changes
● Can you measure before and after?
● How soon can you have the data?
© Copyright Velocity Software, 2006 5
© Copyright Velocity Software, 2006 6
History
ESAMAP
Velocity Linux Performance Suite
z/VM Monitor
ESAWRITE
MONDCSS
ESAMON DCSS
zMON
Control BlockInternals
History
ESAMONRexx APIExtract
zTUNEPerformance Support
Subscription
ESATCPSNMP
© Copyright Velocity Software, 2006 7
Performance Tools
Destructive Measurements
– Various CP debugging tools
● TRACE, TRSOURCE
● DISPLAY, VMDUMP
– Linux debugging tools
● Programs reading /proc statistics
● Kernel modifications for experiments
● Extra kernel modules for measurements
Performance Research
● Reading Linux and z/VM source code
● Performance Experiments
© Copyright Velocity Software, 2006 8
Performance Analysis Examples
● Your Mileage May Vary
– These are not recommendations
– Main purpose is to illustrate the process and tools
● Some examples come from z/VM on a P/390
– This is unsupported hardware and software
– Too slow for real Linux workload
© Copyright Velocity Software, 2006 9
Screen: ESAUSR2 rvdheij.com ESAMON V3.5 01/261 of 3 User Resource Utilization CLASS * USER LNX00A0 P390 <---CPU time--> <--------Main Storage (pages)---------> UserID <(seconds)> T:V <Resident> Lock <-----WSS-----> Time /Class Total Virt Rat Total Activ -ed Total Actv Avg Resrvd-------- -------- ----- ----- --- ----- ----- ---- ----- ---- ---- ------12:48:00 LNX00A00 0.22 0.20 1.1 13413 13413 0 13413 13K 13K 012:47:00 LNX00A00 0.27 0.25 1.1 13413 13413 0 13413 13K 13K 012:46:00 LNX00A00 4.28 4.14 1.0 13364 13364 0 13364 13K 13K 012:45:00 LNX00A00 0.70 0.62 1.1 13257 13257 0 13213 13K 13K 012:44:00 LNX00A00 0.23 0.21 1.1 13733 13733 0 13811 13K 13K 012:43:00 LNX00A00 0.22 0.20 1.1 15107 15107 0 15107 15K 15K 012:42:00 LNX00A00 0.28 0.25 1.1 15106 15106 0 15053 15K 15K 012:41:00 LNX00A00 0.22 0.20 1.1 15041 15041 0 15020 15K 15K 012:40:00 LNX00A00 0.22 0.20 1.1 15041 15041 0 15020 15K 15K 012:39:00 LNX00A00 0.21 0.20 1.1 15041 15041 0 15020 15K 15K 012:38:00 LNX00A00 0.23 0.21 1.1 15041 15041 0 15041 15K 15K 012:37:00 LNX00A00 0.30 0.27 1.1 15041 15041 0 15012 15K 15K 012:36:00 LNX00A00 0.22 0.20 1.1 15126 15126 0 15106 15K 15K 0
Average of 0.54 CPU seconds per minuteThe spikes are half of the resourcesWhat is causing the spike every 15 minutes ?
Idle Linux Servers
12:21 12:25 12:28 12:32 12:36 12:39 12:43 12:46 12:50
0.2
0.6
1
1.4
1.8
2.2
2.6
3
3.4
3.8
4.2
CPU Usage Idle Server
© Copyright Velocity Software, 2006 10
Idle Linux Servers
5:25pm up 1 day, 3:57, 1 user, load average: 0.37, 0.24, 0.1232 processes: 31 sleeping, 1 running, 0 zombie, 0 stoppedCPU states: 8.7% user, 13.7% system, 0.0% nice, 77.4% idleMem: 125916K av, 123616K used, 2300K free, 0K shrd, 18068K buffSwap: 0K av, 0K used, 0K free 67800K cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 3064 root 23 0 972 968 788 R 15.5 0.7 0:37 top 2430 root 15 0 2332 2332 2012 S 6.4 1.8 11:50 sshd 20 root 16 0 0 0 0 SW 0.3 0.0 0:26 kjournald 1 root 15 0 236 236 196 S 0.0 0.1 0:20 init 2 root 0K 0 0 0 0 SW 0.0 0.0 0:00 migration_CPU0 3 root 25 0 0 0 0 SW 0.0 0.0 0:00 kmcheck 4 root 15 0 0 0 0 SW 0.0 0.0 0:00 keventd 5 root 34 19 0 0 0 SWN 0.0 0.0 0:00 ksoftirqd_CPU0 6 root 16 0 0 0 0 SW 0.0 0.0 1:20 kswapd 7 root 25 0 0 0 0 SW 0.0 0.0 0:00 bdflush 8 root 15 0 0 0 0 SW 0.0 0.0 0:06 kupdated 9 root 25 0 0 0 0 SW 0.0 0.0 0:00 kinoded 10 root 25 0 0 0 0 SW 0.0 0.0 0:00 mdrecoveryd 55 root 15 0 0 0 0 SW 0.0 0.0 0:24 kjournald 240 root 15 0 656 656 552 S 0.0 0.5 0:03 syslogd 243 root 25 0 1076 1076 460 S 0.0 0.8 0:30 klogd
© Copyright Velocity Software, 2006 11
Idle Linux Servers
Screen: ESAUSR2 rvdheij.com ESAMON V3.5 01/26 17:20-17:52 1 of 3 User Resource Utilization CLASS * USER LNX00A0 P390 00000 <---CPU time--> <--------Main Storage (pages)---------> UserID <(seconds)> T:V <Resident> Lock <-----WSS-----> Time /Class Total Virt Rat Total Activ -ed Total Actv Avg Resrvd -------- -------- ----- ----- --- ----- ----- ---- ----- ---- ---- ------ 17:26:00 LNX00A00 11.54 10.88 1.1 5716 5716 0 5716 5716 5716 0 17:25:00 LNX00A00 12.05 11.34 1.1 5576 5576 0 5574 5574 5574 0 17:24:00 LNX00A00 12.52 11.83 1.1 5426 5426 0 5389 5389 5389 0 17:23:00 LNX00A00 11.58 10.93 1.1 5388 5388 0 5388 5388 5388 0 17:22:00 LNX00A00 8.14 7.57 1.1 5319 5319 0 5319 5319 5319 0 17:21:00 LNX00A00 0.21 0.20 1.1 3983 3983 0 3983 3983 3983 0
But now we know what is causing the heavy load !
© Copyright Velocity Software, 2006 12
Idle Linux Servers
Screen: ESAUSR2 ESAMON V3.5 03/16 07:10-07:20 1 of 3 User Resource Utilization CLASS * USER LNEALE1 2084 <---CPU time--> <--------Main Storage (pages)---------> UserID <(seconds)> T:V <Resident> Lock <-----WSS-----> Time /Class Total Virt Rat Total Activ -ed Total Actv Avg Resrvd -------- -------- ----- ----- --- ----- ----- ---- ----- ---- ---- ------ 07:21:00 LNEALE1 2.41 2.39 1.0 130K 130K 0 130K 130K 130K 0 07:20:00 LNEALE1 2.41 2.38 1.0 130K 130K 0 130K 130K 130K 0 07:19:00 LNEALE1 2.34 2.31 1.0 130K 130K 0 130K 130K 130K 0 07:18:00 LNEALE1 2.40 2.37 1.0 130K 130K 0 130K 130K 130K 0 07:17:00 LNEALE1 2.38 2.35 1.0 130K 130K 0 130K 130K 130K 0 07:16:00 LNEALE1 20.55 20.43 1.0 130K 130K 0 130K 130K 130K 0 07:15:00 LNEALE1 2.40 2.38 1.0 130K 130K 0 130K 130K 130K 0 07:14:00 LNEALE1 2.41 2.38 1.0 130K 130K 0 130K 130K 130K 0 07:13:00 LNEALE1 2.35 2.33 1.0 130K 130K 0 130K 130K 130K 0 07:12:00 LNEALE1 2.40 2.37 1.0 130K 130K 0 130K 130K 130K 0 07:11:00 LNEALE1 2.38 2.35 1.0 130K 130K 0 130K 130K 130K 0
© Copyright Velocity Software, 2006 13
Screen: ESALNXP ESAMON V3.5 03/16 07:10-07:17 2 of 3 LINUX VSI Process Statistics Report NODE * LIMIT 500 2084 <-Process Ident-> <------CPU Seconds------> Time Node Name ID PPID GRP Total sys user syst usrt -------- -------- --------- ----- ----- ----- ----- ---- ---- ---- ---- 07:18:00 NEALE1 mono 6106 1 6106 2.2 0.6 1.6 0.0 0.0 snmpd 4213 1 4212 0.2 0.1 0.1 0.0 0.0 *Totals* 0 0 24789 2.4 0.7 1.7 0.0 0.0 07:17:00 NEALE1 mono 6106 1 6106 2.1 0.8 1.3 0.0 0.0 snmpd 4213 1 4212 0.2 0.1 0.1 0.0 0.0 *Totals* 0 0 24789 2.3 1.0 1.3 0.0 0.0 07:16:00 NEALE1 mono 6106 1 6106 1.4 0.6 0.8 0.0 0.0 cron 1073 1 1073 19.0 0.0 0.0 7.8 11.2 *Totals* 0 0 24789 20.5 0.6 0.9 7.8 11.2 07:15:00 NEALE1 mono 6106 1 6106 2.3 0.5 1.7 0.0 0.0 snmpd 4213 1 4212 0.2 0.2 0.0 0.0 0.0 *Totals* 0 0 24789 2.5 0.7 1.8 0.0 0.0
Idle Linux Servers
© Copyright Velocity Software, 2006 14
Other causes of high usage
Screen: ESAUSR2 rvdheij.com ESAMON V3.5 02/14 08:05-1 of 3 User Resource Utilization CLASS * USER LNX00* P390 <---CPU time--> <--------Main Storage (pages)---------> UserID <(seconds)> T:V <Resident> Lock <-----WSS-----> Time /Class Total Virt Rat Total Activ -ed Total Actv Avg Resrvd-------- -------- ----- ----- --- ----- ----- ---- ----- ---- ---- ------08:07:00 LNX00D00 54.13 52.83 1.0 5163 5163 0 5157 5157 5157 0 LNX00A00 0.57 0.54 1.1 3651 3651 0 3634 3634 3634 0 LNX00A01 0.43 0.40 1.1 3700 3700 0 3697 3697 3697 0 LNX00A02 0.27 0.25 1.1 2621 2621 0 2618 2618 2618 0 LNX00B00 0.23 0.21 1.1 3753 3753 0 3738 3738 3738 0
08:01:00 LNX00D00 27.76 26.20 1.1 5315 5315 0 5309 5309 5309 008:00:00 LNX00D00 45.99 44.67 1.0 5402 5402 0 5396 5396 5396 007:59:00 LNX00D00 53.68 52.28 1.0 5402 5402 0 5396 5396 5396 007:58:00 LNX00D00 53.92 52.85 1.0 5402 5402 0 5396 5396 5396 007:57:00 LNX00D00 53.69 52.60 1.0 5402 5402 0 5396 5396 5396 007:56:00 LNX00D00 50.45 49.35 1.0 5402 5402 0 5396 5396 5396 007:55:00 LNX00D00 54.09 52.79 1.0 5402 5402 0 5396 5396 5396 0
© Copyright Velocity Software, 2006 15
Other causes of high usage
Screen: ESAUSR3 rvdheij.com ESAMON V3.5 02/14 08:06-08:01 of 2 User Resource Utilization - Part 2 CLASS * USER LNX00* P390 000 DASD MDisk Virt Cache I/O <---Virtual Device----> UserID DASD Block Cache Disk Hit Prty <----I/O Requests----->Time /Class I/O I/O Hits I/O Pct Queued Cons U/R CTCA Other-------- -------- ----- ----- ----- ---- ----- ----- ----- ----- ----- -----08:07:00 LNX00D00 481 0 230 0 47.8 0 312 0 0 0 LNX00A00 0 0 0 0 . 0 0 0 0 0 LNX00A01 7 0 1 0 14.3 0 8 0 0 0 LNX00A02 5 0 0 0 0.0 0 0 0 0 0 LNX00B00 4 0 0 0 0.0 0 0 0 0 0 LNX00C00 5 0 0 0 0.0 0 0 0 0 0
08:01:00 LNX00D00 962 0 757 0 78.7 0 232 0 0 008:00:00 LNX00D00 550 0 297 0 54.0 0 304 0 0 007:59:00 LNX00D00 387 0 15 0 3.9 0 470 0 0 007:58:00 LNX00D00 265 0 15 0 5.7 0 318 0 0 007:57:00 LNX00D00 272 0 24 0 8.8 0 312 0 0 007:56:00 LNX00D00 310 0 54 0 17.4 0 323 0 0 007:55:00 LNX00D00 339 0 7 0 2.1 0 432 0 0 0
© Copyright Velocity Software, 2006 16
Other causes of high usage
dasd_erp(3990): /dev/dasdb ( 94: 4),0201@01: 00aebec8: 00000000 00000000 00000000 00000000 dasd_erp(3990): /dev/dasdb ( 94: 4),0201@01: 00aebed8: 00000000 00000000 00000000 00000000 dasd_erp(3990): /dev/dasdb ( 94: 4),0201@01: Failed CCW (00aebeb8) already logged end_request: I/O error, dev 5e:05 (dasd), sector 262184 dasd_erp(3990): /dev/dasdb ( 94: 4),0201@01: EXAMINE 32: fatal error dasd(eckd): Sense data: dasd(eckd):device 0201 on irq 1: I/O status report: dasd(eckd):in req: 00aebf00 CS: 0x00 DS: 0x02 dasd(eckd):Failing CCW: 00aebfb8 dasd(eckd):Sense(hex) 0- 7: 80 02 00 00 00 00 00 00 dasd(eckd):Sense(hex) 8-15: 00 00 00 00 00 00 00 00 dasd(eckd):Sense(hex) 16-23: 20 03 0f 00 00 00 05 c0 dasd(eckd):Sense(hex) 24-31: 02 01 00 00 00 00 00 00 dasd(eckd):32 Byte: Format: 0 Exception class 0
Sense code explains “write inhibited”Configuration problem (disk linked R/O)
© Copyright Velocity Software, 2006 17
Analyzing Linux Q-drop
● Virtual machines should drop from queue when idle
– Best time for CP to take resources away
● Various things make Linux servers stay in queue
– Active I/O makes Linux appear waiting for completion
● Virtual network devices (VM 63282 for z/VM 4.3)
● Real network devices (fixed in z/VM 5)
– Frequent timer interrupts make it appear active
● Recent kernels have hz_timer (default is still set wrong)
● Some applications are not well-behaved (or broken)
© Copyright Velocity Software, 2006 18
Analyzing Linux Q-drop
Screen: ESAUSRQ rvdheij.com ESAMON V3.5 1 of 3 User Queue and Load Analysis CLASS * USER
<----------User Load-------------> UserID Logged Non- Disc- Total Tran Time /Class on Idle Active conn InQue /min -------- -------- ------ ----- ------- ------ ----- ---- 12:48:00 LNX00C01 1 1 1 0 0.18 104 12:47:00 LNX00C01 1 1 1 0 0.35 112 12:46:00 LNX00C01 1 1 1 0 0.60 72 12:45:00 LNX00C01 1 1 1 0 0.25 86 12:44:00 LNX00C01 1 1 1 0 0.05 120 12:43:00 LNX00C01 1 1 1 0 0.07 120
Counting timer ticks with TRACE EXT 1004
Idle server with about 120 ticks per minute
ESAMON confirms that Linux drops from queue
© Copyright Velocity Software, 2006
19
Analyzing Linux Q-drop
Locate schedule_timeout in System.map
– 001234bc T schedule_timeout
– Find a suitable place to put the trace
#cp trace i r 12350e.2 term run cmd d g2"#d 80.4;base1
Count Time PID
363 100 303 180 201 313 72 500 1 73 500 8 24 1501 311 24 1501 314 1 5201 276 2 5901 276 4 6001 276
276 ? S 0:03 /usr/sbin/cron 287 ? S 1:59 /usr/sbin/sshd 303 ? S 0:09 /usr/sbin/httpd -f /etc/httpd/httpd.conf 311 ? S 0:01 /usr/sbin/nscd 313 ? S 0:01 /usr/sbin/nscd 314 ? S 0:01 /usr/sbin/nscd 315 ? S 0:00 /usr/sbin/nscd 316 ? S 0:00 /usr/sbin/nscd 317 ? S 0:00 /usr/sbin/nscd 318 ? S 0:00 /usr/sbin/nscd 320 ? S 0:00 /usr/sbin/httpd -f /etc/httpd/httpd.conf
© Copyright Velocity Software, 2006
20
VSWITCH Traffic
Report: ESAQDIO Queued I/O Report Velocity Software Inc.Monitor initialized: on 2086 serial First record analyzed:------------------------------------------------------------------------------------------- Number <QDIO SIGA Instructions/Sec-> <-Throughtput / sec->Date/ Dev. Virt QDIO Queues <---Guest----> <----CP------> <Buffers> <--Bytes-->Time Nmbr owner DevN Fmt In Out Read Writ "s" Read Writ "s" Sent From Sent From-------- ---- -------- ---- ---- -- --- ---- ---- ---- ---- ---- ---- ---- ---- ----- -----13:23:00 0000 Totals 0000 QDIO 0 0 0 0 0 0 3 0 4 6 2744 2074 A112 DTCVSW1 A112 QDIO 1 1 0 0 0 0 3 0 4 6 2744 2074-------------------------------------------------------------------------------------------13:24:00 0000 Totals 0000 QDIO 0 0 0 0 0 0 3 0 4 6 2768 2062 A112 DTCVSW1 A112 QDIO 1 1 0 0 0 0 3 0 4 6 2768 2062-------------------------------------------------------------------------------------------13:25:00 0000 Totals 0000 QDIO 0 0 0 0 1 0 9 0 11 21 7071 8421 A112 DTCVSW1 A112 QDIO 1 1 0 0 0 0 9 0 11 21 7052 8387
-------------------------------------------------------------------------------------------13:33:00 0000 Totals 0000 QDIO 0 0 0 152 779 0 156 0 157 398 34360 49157 A112 DTCVSW1 A112 QDIO 1 1 0 0 0 0 4 0 5 7 3552 2576-------------------------------------------------------------------------------------------13:34:00 0000 Totals 0000 QDIO 0 0 0 165 857 0 168 0 169 437 36166 53433 A112 DTCVSW1 A112 QDIO 1 1 0 0 0 0 4 0 5 6 3337 2422-------------------------------------------------------------------------------------------13:35:00 0000 Totals 0000 QDIO 0 0 0 191 977 0 198 0 202 505 49730 66220 A112 DTCVSW1 A112 QDIO 1 1 0 0 0 0 7 0 10 12 9039 8105-------------------------------------------------------------------------------------------
© Copyright Velocity Software, 2006
21
VSWITCH Traffic
Report: ESATCP2 TCPIP Internetwork Layer Data Report Velocity Software Inc. Monitor initialized: on 2086 serial First record analyzed: -------------------------------------------------------------------------------------------Date/ <----------Input Datagrams per Second----------> <Output Datagrams> <Fragment ReasTime/ <---Input Datagrams--><Input in Error,Discarded> <Discarded> Fragmnts Data-Node Total ForWrded Delivrd Header Addrss Prot Other Reqsts NoRte Other Input grams-------- ----- -------- ------- ------ ------ ---- ----- ------ ----- ----- -------- -----
13:23:00 TCPIP 3.67 0 0.27 0 0 0 0 1.37 0 0 0.03 0.02VMLOCAL 3.80 0 0.27 0 0 0 0 1.37 0 0 0.03 0.02LINUX1 3.82 0 3.82 0 0 0 0 4.15 0 0 0 0-------------------------------------------------------------------------------------------13:24:00 TCPIP 3.25 0 0.27 0 0 0 0 1.37 0 0 0.03 0.02VMLOCAL 3.15 0 0.27 0 0 0 0 1.37 0 0 0.03 0.02LINUX1 3.83 0 3.83 0 0 0 0 4.12 0 0 0 0
13:34:00 TCPIP 3.63 0 0.27 0 0 0 0 1.40 0 0 0.03 0.02VMLOCAL 3.70 0 0.27 0 0 0 0 1.38 0 0 0.03 0.02LINUX1 435.5 0 435.55 0 0 0 0 168.8 0 0 0 0-------------------------------------------------------------------------------------------13:35:00 TCPIP 3.40 0 0.28 0 0 0 0 1.47 0 0 0.03 0.02VMLOCAL 3.15 0 0.27 0 0 0 0 1.40 0 0 0.03 0.02LINUX1 490.5 0 490.52 0 0 0 0 197.1 0 0 0 0
© Copyright Velocity Software, 2006
22
VSWITCH Traffic
● Graph shows almost perfect match
● Combining the data from different sources helps to explain how it works
● Traffic attributed to the VSWITCH is most likely management traffic
● The “real traffic” is only accumulated in the total
● Important to have Linux data interval match VM data
© Copyright Velocity Software, 2006
23
Linux Swap
● General recommendations:
– Make Linux small enough that it “starts to swap”
● This is not always easy to do
– Swap to VDISK so swapping hurts less
● Compare VDISK and real disk for Linux swapping
– VDISK is fast but takes CPU cycles
– Real disk is slower but uses less CPU
● It is good when others are waiting for I/O
– Currently not obvious which is best in what situation
© Copyright Velocity Software, 2006
24
Linux Swap
Noticed 50% MDC hit ratio on Linux swap
– Not seen before since most swap to VDISK
– Unexpected since swap is only read once
● MDC only works when data is read 2nd time
● We're told to disable MDC for swap disks
Analyze I/O trace to see what happens -> 00220998' SSCH B2333000 01B7BE9C CC 0 SCH 000C CPA 011BAFA8 PARM 01B7BE1C KEY 0 FPI C0 LCCW 011BAFA8 63400010 011BAF78 0000 63400010 ........ EXTENT 40C00000 00000000 09670009 09670009CCW 011BAFB0 47400010 011BAF98 0008 47400010 ........ LOCATE RECORD 06800008 09670009 00060009 054D1000CCW 011BAFB8 86401000 0127F000 0010 86441000 ........ IDAL 031A7000 IDAL 031A7800 CCW 011BAFC0 86401000 01A96000 0018 86441000 ........ IDAL 01EDF000 IDAL 01EDF800
© Copyright Velocity Software, 2006
25
Linux Swap
TRACE I/O for 1 transaction (~1 minute)
– 1132 SSCH, 2522 swaps (2042 R, 479 W)
– Read: 1-8 pg/SSCH, Write: 1-14 pg/SSCH
Same pages are read back in, over and over again
75% of the page reads are re-reads
Linux is thrashing during this experiment
1 2 3 4 5 6 7 8 9 10 11 12 13 14
0
200
400
600
800
Swap CCW Chain Length
Write
Read
Chain Length
Tota
l b
locks
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0
50
100
150
200
250
300
350
400
450
500
550
600
650
0
10
20
30
40
50
60
70
80
90
100
SWAP MDC Page Read CountTotal Pgs Cum %
Page re-read
Tota
l pages
© Copyright Velocity Software, 2006
26
Transactions taking too long
Running Linux on z/VM on a P/390
– Very slow CPU and little storage
● Linux building up the working set
– After some idle time much has been paged out
– WSS grows with 600 pages
● At 3 mS per page, expected latency ~ 2 sec
0
500
1000
1500
2000
2500
3000
3500
Run 100 Servers #2
Avg
Tra
ns R
esp
(m
s)
© Copyright Velocity Software, 2006
27
Transactions taking too long
● Standard transaction takes some 100 mS
● 50 mS for Linux
● 40 mS for TCP/IP
– Experiment shows response time ~ 1 second
● Why?
0
500
1000
1500
2000
2500
3000
3500
Run 100 Servers #2A
vg T
ran
s R
esp (
ms)
© Copyright Velocity Software, 2006
28
Transactions taking too long
● Idle load of all Linux servers was ~ 80%
– On average 45 servers in-queue
– Only 200 mS in a second for real work
● Any work you try takes 5 times as long
● Linux “small enough to just swap a bit”
– Swapping takes CPU cycles in Linux
● Approximately 5 mS per swap I/O (on a P/390 !)
● Even the pressure on swapping takes CPU time
© Copyright Velocity Software, 2006
29
Transactions taking too long
● The math confirms our measurements
● Resource shortage made response worse
– 5 times because of CPU contention
– 2-3 times because of “moderate” swapping
– This makes 100 mS take 1.5 sec response time
1024 1280 1536 1792 2048 2304 2560 2816
0
5000
10000
15000
20000
25000
30000
0
250
500
750
1000
1250
1500
1750
2000
2250
2500
Cost of swappingCPU (ms)
Swap I/O
Memory shortage
CP
U U
sag
e
© Copyright Velocity Software, 2006 30
Expanded Storage
● With 31-bit z/VM to use more than 2GB of real storage
– No choice other than partitioning
● Relief for “2GB contention” in 64-bit configurations
– Very visible with large I/O intense Linux servers
– Only effective with sufficient expanded storage
● Avoid using expanded storage for MDC
– Resolved with z/VM 5.2
● Linux “fixed buffers” not needed
> 2 GB
XSTORE
< 2 GB
DASD
© Copyright Velocity Software, 2006 31
Expanded Storage
“Do we still need expanded storage, now that ...”
“My hardware expert says that 64-bit does not ...”
Expanded Storage provides storage hierarchy to z/VM
– Various algorithms expect expanded storage
● Memory reference measurement is a trade-off
– When paging, CP will page-out the wrong pages
● Not a bug, it's just very hard to get it right
● Expanded storage makes that less painful
© Copyright Velocity Software, 2006 32
Expanded Storage
● Same transaction repeated
● 112 MB Central, 0 MB Expanded
● 80 MB Central, 32 MB Expanded
– Decreased transaction latency
– Lowered external page rate
0
25
50
75
100
125
150
175
200
225
250
275
Benefit of Expanded Storage
XSTORE
DASD
Time
Pagin
g (
pg/s
)
0
1000
2000
3000
4000
5000
6000
Benefit of Expanded Storage
Main
Xstore
Time
Avg T
ransactio
n R
esp (
ms)
© Copyright Velocity Software, 2006 33
Expanded Storage
● CPU Utilization higher with expanded storage
– DASD paging was limiting factor
– Transaction was 30% faster
● This is not your system, these are not your numbers
0
20
40
60
80
100
CPU Usage
Ttime
Vtime
Time
CP
U U
tilization
0
1000
2000
3000
4000
5000
6000
Benefit of Expanded Storage
Main
Xstore
Time
Avg T
ransactio
n R
esp (
ms)
© Copyright Velocity Software, 2006 34
Summary
Most installations will need to manage performance
– Set up routine measurements to spot trends and problems
– Review your data also when there are no problems yet
– Measure all your Linux servers, not just production
● Enables you to spot any “pigs” that take resources away
● You need something cheap so measuring does not cause load
● When possible have standard instrumentation
Some performance problems are expectation problems
– Proper tools help you get realistic expectations