best practices for benchmarking and performance analysis in the cloud (ent305) | aws re:invent 2013

34
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Best Practices for Benchmarking and Performance Analysis in the Cloud Robert Barnes, Amazon Web Services November 15, 2013

Upload: amazon-web-services

Post on 10-May-2015

1.826 views

Category:

Technology


5 download

DESCRIPTION

In this session, we explain how to measure the key performance-impacting metrics in a cloud-based application. With specific examples of good and bad tests, we make it clear how to get reliable measurements of CPU, memory, disk, and how to map benchmark results to your application. We also cover the importance of selecting tests wisely, repeating tests, and measuring variability.

TRANSCRIPT

Page 1: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Best Practices for Benchmarking and

Performance Analysis in the Cloud

Robert Barnes, Amazon Web Services

November 15, 2013

Page 2: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Benchmarks: Measurement Demo

3

3

4

6 4

How many

ways to

measure? At least 20…

Page 3: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Cloud Benchmarks: Prequel

• The best benchmark • Absolute vs. relative measures • Fixed time or fixed work • What’s different? • Use a good AMI

0.00 5.00 10.0015.0020.0025.0030.00

Ubuntu 12.4 ami-…AWS CentOS 5.4 ami-…

CentOS 5.4 ami-…CentOS 5.4 ami-…CentOS 5.4 ami-…

Average CPU result

0%

10%

20%

30%

40%

50%

60%

Coefficient of Variance

Page 4: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Scenario: CPU-based Instance Selection

• Application runs on premises

• Primary requirement is integer CPU performance

• Application is complex to set up, no benchmark tests exist, limited time

• What instance would work best?

1. Choose a synthetic benchmark

2. Baseline: Build, configure, tune, and run it on premises

3. Run the same test (or tests) on a set of instance types

4. Use results from the instance tests to choose the best match

Page 5: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Testing CPU

• Choose a benchmark – geekbench, UnixBench, sysbench(cpu), and SPEC CPU2006

Integer

• How do you know when you have a good result?

• Tests run on 9 instance types – 10 instances of each of the 9 types launched

– Tests run a minimum of 4 times on each instance

– Ubuntu 13.04 base AMI

Page 6: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

geekbench Overview • Workloads in 3 categories

– 13 Integer tests

– 10 Floating Point tests

– 4 Memory tests

• Commercial product (64bit)

• No source code

• Runs single and multi-cpu

• Fast setup, fast runtime

Integer

AES

Twofish

SHA1

SHA2

BZip2 compress

BZip2 decompress

JPEG compress

JPEG decompress

PNG compress

PNG decompress

Sobel

LUA

Dijkstra

Floating Point

Black-Scholes

Mandelbrot

Sharpen image

Blur image

SGEMM

DGEMM

SFFT

DFFT

N-Body

Ray trace

Memory

STREAM copy

STREAM scale

STREAM add

STREAM triad

Page 7: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

geekbench Script SEQNO=$1

GBTXT=gbtest.txt

DL=+

ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`"

TYPE="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-type`”

OUTID=$ID$DL$TYPE$DL

START=$(date +%s.%N)

./geekbench_x86_64 --no-upload >$GBTXT

END=$(date +%s.%N)

DIFF=$(echo "$END - $START" | bc)

OUTNAME=$OUTID$SEQNO$DL$DIFF$DL$GBTXT

mv $GBTXT $OUTNAME

grep “Geekbench Score” i-*$GBTXT >gbresults.txt

cat gbresults.txt | sed s/:// | awk ‘/i-/ {print $1”;”$4”;”$5}’>gbresults.csv

Page 8: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

geekbench

Geekbench

1CPU ratio C.O.V. NCPU ratio C.O.V. RT (min)

m3.xlarge 0.93 1.04% 2.04 2.31% 2.06

m3.2xlarge 0.93 1.40% 3.80 1.46% 2.08

m2.xlarge 0.80 2.84% 1.54 4.06% 1.99

m2.2xlarge 0.80 1.34% 2.82 1.21% 2.04

m2.4xlarge 0.76 2.28% 5.11 1.71% 2.01

c3.large 1.13 0.93% 1.32 0.71% 1.76

c3.xlarge 1.13 0.39% 2.51 1.81% 1.74

c3.2xlarge 1.13 0.19% 4.88 0.25% 1.70

cc2.8xlarge 1.00 0.71% 15.46 1.93% 2.21

Page 9: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

geekbench – Run Variance geekbench 1CPU ratio C.O.V.

m3.xlarge

instance-1 0.93 0.31%

instance-2 0.97 0.23%

instance-3 0.94 0.17%

instance-4 0.94 0.10%

instance-5 0.94 0.32%

instance-6 0.94 0.10%

instance-7 0.93 0.25%

instance-8 0.93 0.38%

instance-9 0.94 0.11%

instance-10 0.94 0.09%

Page 10: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

geekbench – Integer Portion

gb-integer 1CPU ratio C.O.V. NCPU ratio C.O.V. RT (min)

c3.large 1.12 0.50% 1.37 0.43% NA

c3.xlarge 1.13 0.38% 2.72 0.41% NA

c3.2xlarge 1.12 0.38% 5.35 0.51% NA

cc2.8xlarge 1.00 0.20% 17.88 3.31% NA

geekbench

c3.large 1.13 0.93% 1.32 0.71% 1.76

c3.xlarge 1.13 0.39% 2.51 1.81% 1.74

c3.2xlarge 1.13 0.19% 4.88 0.25% 1.70

cc2.8xlarge 1.00 0.71% 15.46 1.93% 2.21

Page 11: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

UnixBench Overview • Default: the BYTE Index

– 12 workloads, run 2 times (roughly 29 minutes each time)

• Integer computation

• Floating point computation

• System calls

• File system calls

– Geomean Of results to a baseline produces a system benchmarks index score

• Open source – must be built – Must be patched for > 16 CPUs

11

Page 12: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

UnixBench Script SEQNO=$1

UBTXT=ubtest.txt

DL=+

ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`"

TYPE="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-type`"

FN=$ID$DL$TYPE$DL$SEQNO$DL$UBTXT

COPIES=`cat /proc/cpuinfo | grep processor | wc –l`

./Run –c 1 –c $COPIES >$FN

grep “System Benchmarks Index Score” i-*$UBTXT >ubresults.txt

cat ubresults.txt | sed s/”.txt:System Benchmarks Index Score”// | \

awk ‘/i-/ {print $1”;”$2}’>ubresults.csv

Page 13: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

UnixBench

UnixBench 1CPU ratio C.O.V. NCPU ratio C.O.V. RT (min)

m3.xlarge 1.38 1.90% 2.49 1.36% 28.25

m3.2xlarge 1.42 1.85% 4.21 1.99% 28.29

m2.xlarge 0.40 5.82% 0.76 1.28% 28.30

m2.2xlarge 0.42 1.71% 1.23 1.75% 28.32

m2.4xlarge 0.48 3.31% 2.02 1.71% 28.34

c3.large 1.10 1.33% 1.91 1.54% 28.17

c3.xlarge 1.06 1.48% 2.85 1.26% 28.21

c3.2xlarge 1.10 0.54% 4.50 1.02% 28.96

cc2.8xlarge 1.00 2.97% 6.44 2.65% 30.20

Page 14: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

UnixBench – Dhrystone 2 UB-Integer 1CPU ratio C.O.V. NCPU ratio C.O.V. RT (min)

c3.large 1.05 0.24% 1.10 0.30% 0.17

c3.xlarge 1.05 0.27% 2.20 0.28% 0.17

c3.2xlarge 1.05 0.07% 4.34 0.23% 0.17

cc2.8xlarg

e 1.00 0.10% 15.54 0.95% 0.17

UnixBench

c3.large 1.10 1.33% 1.91 1.54% 28.17

c3.xlarge 1.06 1.48% 2.85 1.26% 28.21

c3.2xlarge 1.10 0.54% 4.50 1.02% 28.96

cc2.8xlarg

e 1.00 2.97% 6.44 2.65% 30.20

Page 15: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

SPEC CPU2006 Overview

• Competitive (reviewed)

• Commercial (site) license required

• Source code provided, must be built

• Highly customizable

• Full “reportable” run 5+ hours

• Published results on www.spec.org

Page 16: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

SPEC CPU2006 Overview Benchmark Category

400.perlbench C Programming language

401.bzip2 C Compression

403.gcc C C compiler

429.mcf C Combinatorial optimization

445.gobmk C Artificial intelligence

456.hmmer C Search gene sequence

458.sjeng C Artificial intelligence

462.libquantum C Physics / quantum computing

464.h264ref C Video compression

471.omnetpp C++ Discrete event simulation

473.astar C++ Path-finding algorithms

483.xalancbmk C++ Xml processing

Page 17: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

SPEC CPU2006 Integer Script CPATH=“/cpu2006/result”

COPIES=`cat /proc/cpuinfo | grep processor | wc –l`

SITXT=estspecint.txt

DL=+

ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`”

TYPE="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-type`”

FN=$ID$DL$TYPE$DL$SEQNO$DL$SITXT

runspec –noreportable –tune=base –size=ref –rate=$COPIES –iterations=1 /

400 403 445 456 458 462 464 471 473 483

grep “_base” $CPATH/CINT*.ref.csv | cut -d, -f1-2 > $FN

grep “total seconds elapsed” $CPATH/CPU*.log | awk '/finished/ {print $9}’ >>$FN

Page 18: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Estimated SPEC CPU2006 Integer

Est.

SPECint 1CPU ratio C.O.V. RT (min)

NCPU

ratio C.O.V. RT (min)

m3.xlarge 1.01 1.06% 54.39 2.24 1.15% 104.18

m3.2xlarge 1.01 1.67% 54.49 4.25 1.63% 109.22

m2.xlarge 0.76 1.97% 70.83 1.39 2.45% 85.37

m2.2xlarge 0.79 0.94% 68.85 2.76 1.24% 85.42

m2.4xlarge 0.78 0.16% 68.73 5.21 1.26% 89.91

c3.large 1.11 1.95% 50.00 1.25 1.47% 94.22

c3.xlarge 1.10 1.96% 50.29 2.39 1.28% 97.66

c3.2xlarge 1.08 0.87% 50.87 4.67 0.25% 100.22

cc2.8xlarge 1.00 0.29% 54.92 14.92 0.52% 125.74

Page 19: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Sysbench Overview • Designed as quick system test of MySQL servers

• Test categories – Fileio

– Cpu

– Memory

– Threads

– Mutex

– oltp

• Source code provided, must be built

• Very simplistic defaults – tuning recommended

Page 20: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Sysbench Script COPIES=`cat /proc/cpuinfo | grep processor | wc –l`

TDS=$(($COPIES * 2))

STXT=sysbenchcpu.txt

DL=+

ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`”

TYPE="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-type`”

FN=$ID$DL$TYPE$DL$TDS$DL$STXT

sysbench –num-threads=$TDS --max-requests=30000 --test=cpu /

--cpu-max-prime=100000 run > $FN

grep “total time:” i-*$STXT| cut -d, -f1-2 > $FN

Page 21: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Sysbench – CPU

sysbench Default C.O.V. RT (min)

m3.xlarge 3.21 1.44% 0.06

m3.2xlarge 6.41 1.38% 0.03

m2.xlarge 1.59 0.75% 0.11

m2.2xlarge 3.19 0.64% 0.06

m2.4xlarge 8.83 0.62% 0.02

c3.large 1.78 0.26% 0.10

c3.xlarge 3.55 0.53% 0.05

c3.2xlarge 6.55 8.45% 0.03

cc2.8xlarge 25.34 2.30% 0.01

tuned ratio C.O.V. RT (min)

1.69 1.29% 3.86

3.38 1.41% 1.93

0.80 0.23% 8.16

1.60 0.76% 4.07

4.71 0.20% 1.38

0.91 0.09% 7.13

1.83 0.02% 3.57

3.54 3.31% 1.85

13.69 1.10% 0.48

Page 22: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Summary: CPU Comparison GB GB

Int

UB UB

Int

Est.

SPECInt

sysbench

default

sysbench

tuned

m3.xlarge 2.04 2.01 2.49 1.88 2.24 3.21 1.69

m3.2xlarge 3.80 3.96 4.21 3.77 4.25 6.41 3.38

m2.xlarge 1.54 1.52 0.76 1.59 1.38 1.59 0.80

m2.2xlarge 2.82 3.02 1.23 3.19 2.76 3.19 1.60

m2.4xlarge 5.11 5.54 2.02 6.48 5.21 8.83 4.71

c3.large 1.32 1.37 1.91 1.10 1.25 1.78 0.91

c3.xlarge 2.51 2.72 2.85 2.20 2.39 3.55 1.83

c3.2xlarge 4.88 5.35 4.50 4.34 4.67 6.55 3.54

cc2.8xlarge 15.46 17.88 6.44 15.5

4

14.92 25.34 13.69

Page 23: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Scenario: Memory Instance Selection

• Application runs on premises

• Primary requirement: memory throughput of 20K MB/sec

• What instance would work best?

1. Choose a synthetic benchmark

2. Baseline: Build, configure, tune, and run it on premises

3. Run the same test (or tests) on a set of instance types

4. Use results from the instance tests to choose the best match

Page 24: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Testing Memory

• Choose a benchmark: – stream, geekbench, sysbench(memory)

• How do you know when you have a good result?

• Tests run on 9 instance types – Minimum of 10 instances launched

– Tests run a minimum of 3 times on each instance

– Ubuntu 13.04 base AMI

Page 25: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Stream* Overview

• Synthetic measure sustainable memory bandwidth – Published results at www.cs.virginia.edu/stream/top20/Bandwidth.html

– Must be built

– By default, runs 1 thread per cpu

– Use stream-scaling to automate array size and thread scaling • https://github.com/gregs1104/stream-scaling

name kernel

bytes

iter

FLOPS

iter

COPY: a(i) = b(i) 16 0

SCALE: a(i) = q*b(i) 16 1

SUM: a(i) = b(i) + c(i) 24 1

TRIAD: a(i) = b(i) + q*c(i) 24 2

* McCalpin, John D.: "STREAM: Sustainable Memory Bandwidth in High Performance Computers",

Page 26: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Memory Scripts TDS=`cat /proc/cpuinfo | grep processor | wc –l`

export OMP_NUM_THREADS= $TDS

MTXT=stream.txt

DL=+

ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`”

TYPE="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-type`”

FN=$ID$DL$TYPE$DL$TDS$DL$MTXT

./stream | egrep \

"Number of Threads requested|Function|Triad|Failed|Expected|Observed" > $FN

MTXT=sysbench-mem.txt

FN=$ID$DL$TYPE$DL$TDS$DL$MTXT

./sysbench --num-threads=$TDS --test=memory run >$FN

Page 27: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Memory Comparison

Stream-

Triad

Geekbench

Memory-Triad

sysbench

(default)

m3.xlarge 23640.56 15375.64 302.95

m3.2xlarge 26046.17 14999.27 603.40

m2.xlarge 18766.58 17365.76 528.16

m2.2xlarge 22421.91 17600.00 1019.08

m2.4xlarge 19634.50 14405.82 1576.30

c3.large 11434.83 9967.96 2116.84

c3.xlarge 21141.30 13972.65 2643.33

c3.2xlarge 30235.78 20657.49 2944.91

cc2.8xlarge 55200.86 37067.32 1195.90

sysbench memory defaults

--memory-block-size [1K]

--memory-total-size [100G]

--memory-scope {global,local} [global]

--memory-hugetlb [off]

--memory-oper {read, write, none} [write]

--memory-access-mode {seq,rnd} [seq]

Page 28: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Testing Disk I/O • Storage options:

– Amazon EBS

– Amazon EBS PIOPs

– Ephemeral

– hi1.4xlarge local storage

• I/O metrics – IOPs

– Throughput

– Latency

• Test parameters: – Read %

– Write %

– Sequential

– Random

– Queue depth

• Storage configuration – Volume(s)

– RAID

– LVM

Page 29: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Benchmarking PIOPs

• Launch an Amazon EBS-optimized

instance

• Create provisioned IOPS volumes

• Attach the volumes to Amazon

EBS-optimized instance

• Pre-warm volumes

• Tune queue depth and latency

against IOPs

0

200

400

600

800

1000

1200

Seq.Read

Seq.Write

MixedSeq

Read

MixedSeqWrite

RandRead

RandWrite

MixedRandRead

MixedRandWrite

Late

ncy (

usec)

PIOPs 2K Queue Depth

1D PIOPS 2K

1D PIOPS 2KQD22D PIOPS 2K

2D PIOPS 2KQD2

Page 30: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Testing Disk I/O Examples • [global]

• clocksource=cpu

• randrepeat=0

• ioengine=libaio

• direct=1

• group_reporting

• size=1G

• [xvdd-fill]

• filename=/data1/testfile1

• refill_buffers

• scramble_buffers=1

• iodepth=4

• rw=write

• bs=2m

• stonewall

• [xvdd-1disk-write-1k-1]

• time_based

• ioscheduler=deadline

• iodepth=1

• rate_iops=4080

• ramp_time=10

• filename=/data1/testfile1

• runtime=30

• bs=1k

• rw=write

• disk copy

• cp file1 /disk1/file1

• dd

• dd if=/dev/zero of=/data1/testile1 \

bs=1048 count=1024000

• fio – flexible io tester

• fio simple.cfg

Page 31: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Summary Disk I/O

Seconds MB/sec

cp f1 f2 17.248 59.37

rm –rf f2; cp f1 f2 .853 1200.47

cp f1 f3 .880 1164.96

dd if=/dev/zero bs=1048 count=1024000 of=d1 .722 1419.01

dd if=/dev/urandom bs=1048 count=1024000 of=d2 79.710 12.84

fio simple.cfg NA 61.55

Page 32: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Beyond Simple Disk I/O

Random

1M I/O

PIOPs 16disk

MBps

read 1006.73

write 904.03

r70w30 1005.91

Page 33: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Summary

If benchmarking your application is not practical, synthetic

benchmarks can be used if you are careful.

• Choose the best benchmark that represents your application

• Analysis – what does “best” mean?

• Run enough tests to quantify variability

• Baseline – what is a “good result” ?

• Samples – keep all of your results – more is better!

Page 34: Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

ENT305