unix performance benchmarking - sas group...sas vs non-sas •disk io –not many options within sas...

26
Unix performance benchmarking Isolating application performance issues Establishing performance benchmarks

Upload: duongdang

Post on 23-May-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

Unix performance benchmarking

Isolating application performance issues

Establishing performance benchmarks

Page 2: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

Why bother?

• Identify issues with shared Unix resources

• Understanding your SAS processes

• Helping System Administrators who don’t understand SAS

• Prove that something is wrong

Page 3: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

Measurements

• Disk Read/Write (I/O)

• Memory

• CPU

Page 4: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

SAS vs non-SAS

• Disk IO – Not many options within SAS code (bufno, bufsize)

• Memory – no Unix equivalent to some SAS features (realmemsize, sortsize)

• CPU – SAS threaded kernel too complex to replicate in Unix

Page 5: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

Basic SAS Disk IO testing

options fullstimer ;

/* Write performance */

libname outlib ‘/disk_output_path’ ;

data outlib.mybigfile ;

do n = 1 to 100000 ; randnum = ranuni(0) ; output ; end ; run ;

/* Read performance */

data _null_ ; set outlib.mybigfile ; run ;

Page 6: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

Basic Unix IO testing

$dd /path_to_big_file /path_to_write_location

Or

$./iotest.sh –t /path_to_write_location

Page 7: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

Disk caching

• Most Unix servers are 64bit and have lots of memory.

• Disk IO is always a bottleneck

• Modern Unix uses spare memory instead of disk

• Performance is always great after a reboot

• Caching distorts IO performance measurement

Page 8: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

Real Unix IO testing

• Disable caching (Linux)

• Lots of concurent processes

• Large files (20Gb) created by each process

• Capture results

Page 9: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

File systems

• Local (ext4)

• Shared (gfs2, gpfs)

• External (xfs) – includes Fusion IO

• Temp (tmpfs)

On Linux, use df –hT

Page 10: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

Typical results

Page 11: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

Minumum SAS requirements

• ETL – 50-75MBs/sec per CPU core

• Adhoc analytics – 15-25MBs/sec per CPU core

• SASWORK - 50-75MBs/sec per CPU core

Page 12: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

Comparing channels

Page 13: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

Running tests

• Schedule a quiet time• Start small : 3 concurrent tests of 2Gb against /saswork:

$ ./iotestv1.sh -i 3 -t /saswork -b 15625 -s 128

• $ df /write_location_mountpoint• $ du –s /path_to_write_location• Clean up afterwards:

$ find /saswork -type f -name 'iotest*' -user $USER -exec rm-f {} \; 2>&1 | grep -v 'Permission denied'

Page 14: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

Gathering results

Example script output in listing

dc5cad,07Jan2014:00:09:37,3,64,312500,60,30.17,/saswork,iotestv1.sh-writetest.out.2

dc5cad,07Jan2014:00:09:37,3,64,312500,60,2.02,/saswork,iotestv1.sh-readtest.out.2

Page 15: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

Processing results

hostname streams blksize blocks target mode dtime iteratn filesz elapsed thruput

dc5cad 3 64 312500/saswork R 07JAN14:00:09:37

1 20000000 62.04 322372.66

dc5cad 3 64 312500/saswork R 07JAN14:00:09:37

2 20000000 62.02 322476.62

dc5cad 3 64 312500/saswork R 07JAN14:00:09:37

3 20000000 64.36 310752.02

dc5cad 3 64 312500/saswork W 07JAN14:00:09:37

1 20000000 90.18 221778.66

dc5cad 3 64 312500/saswork W 07JAN14:00:09:37

2 20000000 90.17 221803.26

dc5cad 3 64 312500/saswork W 07JAN14:00:09:37

3 20000000 48 416666.67

dc5cad 3 128 156250/saswork R 07JAN14:00:11:48

1 20000000 50.15 398803.59

dc5cad 3 128 156250/saswork R 07JAN14:00:11:48

2 20000000 50.1 399201.6

dc5cad 3 128 156250/saswork R 07JAN14:00:11:48

3 20000000 50.03 399760.14

dc5cad 3 128 156250/saswork W 07JAN14:00:11:48

1 20000000 79.7 250941.03

dc5cad 3 128 156250/saswork W 07JAN14:00:11:48

2 20000000 79.13 252748.64

dc5cad 3 128 156250/saswork W 07JAN14:00:11:48

3 20000000 79.14 252716.7

dc5cad 4 64 312500/saswork R 07JAN14:00:14:49

1 20000000 70.21 284859.71

Page 16: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

Results for interpretation and analysis

At this point, the testing has resulted in a set of results. The results are groups of tests, varying by:

– The server used for execution;

– How many concurrent streams were executed;

– The block size of the files being transferred;

– Whether the data were being written to or read from disk.

File size is constant (20 GB).

Information available for analysis is elapsed time.

Page 17: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

Diagrammatically, it looks like this:

• If we have two concurrent streams, each reading 20 GB, in 80 seconds, our throughput is logically 40 GB in 80 seconds, or .5 GB / second (500 MB / second).

• In the case of four concurrent streams, it would be 80 GB in 80 seconds, or 1 GB / second.

Therefore, for each group, we must collapse the iterations, selecting the MAX of elapsed time, the SUM of the data volumes (individually always 20 GB in our case), and keeping the host name, number of streams, block size, and mode as classification variables.

20 GB20 GB 80

Seconds20 GB20 GB20 GB20 GB

Page 18: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

And we need to calculate Throughput as Volume / Elapsed, to use as the analysis variable in our graphs.

Page 19: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

Impact of block size, 64 KB and 128 KB

Page 20: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

Duration increases with concurrent streams

Page 21: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

Throughput is independent of number of concurrent streams

Reading data

Writing data

Page 22: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

Throughput reading data is much higher than writing

Page 23: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

Does server performance vary?

Reading data

Writing data

Page 24: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,
Page 25: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

References

• Margaret Crevar’s definitive guide to performance tuning: http://support.sas.com/rnd/papers/sgf07/sgf2007-iosubsystem.pdf

Page 26: Unix Performance Benchmarking - SAS Group...SAS vs non-SAS •Disk IO –Not many options within SAS code (bufno, bufsize) •Memory –no Unix equivalent to some SAS features (realmemsize,

Authors

• Tom Kari: [email protected]

• Andrew Farrer: [email protected]

Acknowledgements

• Dan Gelinas, IBM Canada for deep insights into the filesystem cache

• Clifford Myers, SAS Institute. The original author of iotest.sh