best practices for benchmarking and performance analysis in the cloud (ent305) | aws re:invent 2013

Best Practices for Benchmarking and

Performance Analysis in the Cloud

Robert Barnes, Amazon Web Services

November 15, 2013

Benchmarks: Measurement Demo

How many

ways to

measure? At least 20…

Cloud Benchmarks: Prequel

• The best benchmark • Absolute vs. relative measures • Fixed time or fixed work • What’s different? • Use a good AMI

0.00 5.00 10.0015.0020.0025.0030.00

Ubuntu 12.4 ami-…AWS CentOS 5.4 ami-…

CentOS 5.4 ami-…CentOS 5.4 ami-…CentOS 5.4 ami-…

Average CPU result

Coefficient of Variance

Scenario: CPU-based Instance Selection

• Application runs on premises

• Primary requirement is integer CPU performance

• Application is complex to set up, no benchmark tests exist, limited time

• What instance would work best?

1. Choose a synthetic benchmark

2. Baseline: Build, configure, tune, and run it on premises

3. Run the same test (or tests) on a set of instance types

4. Use results from the instance tests to choose the best match

Testing CPU

• Choose a benchmark – geekbench, UnixBench, sysbench(cpu), and SPEC CPU2006

Integer

• How do you know when you have a good result?

• Tests run on 9 instance types – 10 instances of each of the 9 types launched

– Tests run a minimum of 4 times on each instance

– Ubuntu 13.04 base AMI

geekbench Overview • Workloads in 3 categories

– 13 Integer tests

– 10 Floating Point tests

– 4 Memory tests

• Commercial product (64bit)

• No source code

• Runs single and multi-cpu

• Fast setup, fast runtime

Integer

Twofish

BZip2 compress

BZip2 decompress

JPEG compress

JPEG decompress

PNG compress

PNG decompress

Dijkstra

Floating Point

Black-Scholes

Mandelbrot

Sharpen image

Blur image

N-Body

Ray trace

Memory

STREAM copy

STREAM scale

STREAM add

STREAM triad

geekbench Script SEQNO=$1

GBTXT=gbtest.txt

ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`"

TYPE="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-type`”

OUTID=$ID$DL$TYPE$DL

START=$(date +%s.%N)

./geekbench_x86_64 --no-upload >$GBTXT

END=$(date +%s.%N)

DIFF=$(echo "$END - $START" | bc)

OUTNAME=$OUTID$SEQNO$DL$DIFF$DL$GBTXT

mv $GBTXT $OUTNAME

grep “Geekbench Score” i-*$GBTXT >gbresults.txt

cat gbresults.txt | sed s/:// | awk ‘/i-/ {print $1”;”$4”;”$5}’>gbresults.csv

geekbench

Geekbench

1CPU ratio C.O.V. NCPU ratio C.O.V. RT (min)

m3.xlarge 0.93 1.04% 2.04 2.31% 2.06

m3.2xlarge 0.93 1.40% 3.80 1.46% 2.08

m2.xlarge 0.80 2.84% 1.54 4.06% 1.99

m2.2xlarge 0.80 1.34% 2.82 1.21% 2.04

m2.4xlarge 0.76 2.28% 5.11 1.71% 2.01

c3.large 1.13 0.93% 1.32 0.71% 1.76

c3.xlarge 1.13 0.39% 2.51 1.81% 1.74

c3.2xlarge 1.13 0.19% 4.88 0.25% 1.70

cc2.8xlarge 1.00 0.71% 15.46 1.93% 2.21

geekbench – Run Variance geekbench 1CPU ratio C.O.V.

m3.xlarge

instance-1 0.93 0.31%

instance-2 0.97 0.23%

instance-3 0.94 0.17%

instance-4 0.94 0.10%

instance-5 0.94 0.32%

instance-6 0.94 0.10%

instance-7 0.93 0.25%

instance-8 0.93 0.38%

instance-9 0.94 0.11%

instance-10 0.94 0.09%

geekbench – Integer Portion

gb-integer 1CPU ratio C.O.V. NCPU ratio C.O.V. RT (min)

c3.large 1.12 0.50% 1.37 0.43% NA

c3.xlarge 1.13 0.38% 2.72 0.41% NA

c3.2xlarge 1.12 0.38% 5.35 0.51% NA

cc2.8xlarge 1.00 0.20% 17.88 3.31% NA

geekbench

c3.large 1.13 0.93% 1.32 0.71% 1.76

c3.xlarge 1.13 0.39% 2.51 1.81% 1.74

c3.2xlarge 1.13 0.19% 4.88 0.25% 1.70

cc2.8xlarge 1.00 0.71% 15.46 1.93% 2.21

UnixBench Overview • Default: the BYTE Index

– 12 workloads, run 2 times (roughly 29 minutes each time)

• Integer computation

• Floating point computation

• System calls

• File system calls

– Geomean Of results to a baseline produces a system benchmarks index score

• Open source – must be built – Must be patched for > 16 CPUs

UnixBench Script SEQNO=$1

UBTXT=ubtest.txt

ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`"

TYPE="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-type`"

FN=$ID$DL$TYPE$DL$SEQNO$DL$UBTXT

COPIES=`cat /proc/cpuinfo | grep processor | wc –l`

./Run –c 1 –c $COPIES >$FN

grep “System Benchmarks Index Score” i-*$UBTXT >ubresults.txt

cat ubresults.txt | sed s/”.txt:System Benchmarks Index Score”// | \

awk ‘/i-/ {print $1”;”$2}’>ubresults.csv

UnixBench

UnixBench 1CPU ratio C.O.V. NCPU ratio C.O.V. RT (min)

m3.xlarge 1.38 1.90% 2.49 1.36% 28.25

m3.2xlarge 1.42 1.85% 4.21 1.99% 28.29

m2.xlarge 0.40 5.82% 0.76 1.28% 28.30

m2.2xlarge 0.42 1.71% 1.23 1.75% 28.32

m2.4xlarge 0.48 3.31% 2.02 1.71% 28.34

c3.large 1.10 1.33% 1.91 1.54% 28.17

c3.xlarge 1.06 1.48% 2.85 1.26% 28.21

c3.2xlarge 1.10 0.54% 4.50 1.02% 28.96

cc2.8xlarge 1.00 2.97% 6.44 2.65% 30.20

UnixBench – Dhrystone 2 UB-Integer 1CPU ratio C.O.V. NCPU ratio C.O.V. RT (min)

c3.large 1.05 0.24% 1.10 0.30% 0.17

c3.xlarge 1.05 0.27% 2.20 0.28% 0.17

c3.2xlarge 1.05 0.07% 4.34 0.23% 0.17

cc2.8xlarg

e 1.00 0.10% 15.54 0.95% 0.17

UnixBench

c3.large 1.10 1.33% 1.91 1.54% 28.17

c3.xlarge 1.06 1.48% 2.85 1.26% 28.21

c3.2xlarge 1.10 0.54% 4.50 1.02% 28.96

cc2.8xlarg

e 1.00 2.97% 6.44 2.65% 30.20

SPEC CPU2006 Overview

• Competitive (reviewed)

• Commercial (site) license required

• Source code provided, must be built

• Highly customizable

• Full “reportable” run 5+ hours

• Published results on www.spec.org

SPEC CPU2006 Overview Benchmark Category

400.perlbench C Programming language

401.bzip2 C Compression

403.gcc C C compiler

429.mcf C Combinatorial optimization

445.gobmk C Artificial intelligence

456.hmmer C Search gene sequence

458.sjeng C Artificial intelligence

462.libquantum C Physics / quantum computing

464.h264ref C Video compression

471.omnetpp C++ Discrete event simulation

473.astar C++ Path-finding algorithms

483.xalancbmk C++ Xml processing

SPEC CPU2006 Integer Script CPATH=“/cpu2006/result”

COPIES=`cat /proc/cpuinfo | grep processor | wc –l`

SITXT=estspecint.txt

ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`”

FN=$ID$DL$TYPE$DL$SEQNO$DL$SITXT

runspec –noreportable –tune=base –size=ref –rate=$COPIES –iterations=1 /

400 403 445 456 458 462 464 471 473 483

grep “_base” $CPATH/CINT*.ref.csv | cut -d, -f1-2 > $FN

grep “total seconds elapsed” $CPATH/CPU*.log | awk '/finished/ {print $9}’ >>$FN

Estimated SPEC CPU2006 Integer

SPECint 1CPU ratio C.O.V. RT (min)

ratio C.O.V. RT (min)

m3.xlarge 1.01 1.06% 54.39 2.24 1.15% 104.18

m3.2xlarge 1.01 1.67% 54.49 4.25 1.63% 109.22

m2.xlarge 0.76 1.97% 70.83 1.39 2.45% 85.37

m2.2xlarge 0.79 0.94% 68.85 2.76 1.24% 85.42

m2.4xlarge 0.78 0.16% 68.73 5.21 1.26% 89.91

c3.large 1.11 1.95% 50.00 1.25 1.47% 94.22

c3.xlarge 1.10 1.96% 50.29 2.39 1.28% 97.66

c3.2xlarge 1.08 0.87% 50.87 4.67 0.25% 100.22

cc2.8xlarge 1.00 0.29% 54.92 14.92 0.52% 125.74

Sysbench Overview • Designed as quick system test of MySQL servers

• Test categories – Fileio

– Cpu

– Memory

– Threads

– Mutex

– oltp

• Source code provided, must be built

• Very simplistic defaults – tuning recommended

Sysbench Script COPIES=`cat /proc/cpuinfo | grep processor | wc –l`

TDS=$(($COPIES * 2))

STXT=sysbenchcpu.txt

FN=$ID$DL$TYPE$DL$TDS$DL$STXT

sysbench –num-threads=$TDS --max-requests=30000 --test=cpu /

--cpu-max-prime=100000 run > $FN

grep “total time:” i-*$STXT| cut -d, -f1-2 > $FN

Sysbench – CPU

sysbench Default C.O.V. RT (min)

m3.xlarge 3.21 1.44% 0.06

m3.2xlarge 6.41 1.38% 0.03

m2.xlarge 1.59 0.75% 0.11

m2.2xlarge 3.19 0.64% 0.06

m2.4xlarge 8.83 0.62% 0.02

c3.large 1.78 0.26% 0.10

c3.xlarge 3.55 0.53% 0.05

c3.2xlarge 6.55 8.45% 0.03

cc2.8xlarge 25.34 2.30% 0.01

tuned ratio C.O.V. RT (min)

1.69 1.29% 3.86

3.38 1.41% 1.93

0.80 0.23% 8.16

1.60 0.76% 4.07

4.71 0.20% 1.38

0.91 0.09% 7.13

1.83 0.02% 3.57

3.54 3.31% 1.85

13.69 1.10% 0.48

Summary: CPU Comparison GB GB

SPECInt

sysbench

default

sysbench

m3.xlarge 2.04 2.01 2.49 1.88 2.24 3.21 1.69

m3.2xlarge 3.80 3.96 4.21 3.77 4.25 6.41 3.38

m2.xlarge 1.54 1.52 0.76 1.59 1.38 1.59 0.80

m2.2xlarge 2.82 3.02 1.23 3.19 2.76 3.19 1.60

m2.4xlarge 5.11 5.54 2.02 6.48 5.21 8.83 4.71

c3.large 1.32 1.37 1.91 1.10 1.25 1.78 0.91

c3.xlarge 2.51 2.72 2.85 2.20 2.39 3.55 1.83

c3.2xlarge 4.88 5.35 4.50 4.34 4.67 6.55 3.54

cc2.8xlarge 15.46 17.88 6.44 15.5

14.92 25.34 13.69

Scenario: Memory Instance Selection

• Application runs on premises

• Primary requirement: memory throughput of 20K MB/sec

• What instance would work best?

1. Choose a synthetic benchmark

2. Baseline: Build, configure, tune, and run it on premises

3. Run the same test (or tests) on a set of instance types

4. Use results from the instance tests to choose the best match

Testing Memory

• Choose a benchmark: – stream, geekbench, sysbench(memory)

• How do you know when you have a good result?

• Tests run on 9 instance types – Minimum of 10 instances launched

– Tests run a minimum of 3 times on each instance

– Ubuntu 13.04 base AMI

Stream* Overview

• Synthetic measure sustainable memory bandwidth – Published results at www.cs.virginia.edu/stream/top20/Bandwidth.html

– Must be built

– By default, runs 1 thread per cpu

– Use stream-scaling to automate array size and thread scaling • https://github.com/gregs1104/stream-scaling

name kernel

COPY: a(i) = b(i) 16 0

SCALE: a(i) = q*b(i) 16 1

SUM: a(i) = b(i) + c(i) 24 1

TRIAD: a(i) = b(i) + q*c(i) 24 2

* McCalpin, John D.: "STREAM: Sustainable Memory Bandwidth in High Performance Computers",

Memory Scripts TDS=`cat /proc/cpuinfo | grep processor | wc –l`

export OMP_NUM_THREADS= $TDS

MTXT=stream.txt

FN=$ID$DL$TYPE$DL$TDS$DL$MTXT

./stream | egrep \

MTXT=sysbench-mem.txt

FN=$ID$DL$TYPE$DL$TDS$DL$MTXT

./sysbench --num-threads=$TDS --test=memory run >$FN

Memory Comparison

Stream-

Geekbench

Memory-Triad

sysbench

(default)

m3.xlarge 23640.56 15375.64 302.95

m3.2xlarge 26046.17 14999.27 603.40

m2.xlarge 18766.58 17365.76 528.16

m2.2xlarge 22421.91 17600.00 1019.08

m2.4xlarge 19634.50 14405.82 1576.30

c3.large 11434.83 9967.96 2116.84

c3.xlarge 21141.30 13972.65 2643.33

c3.2xlarge 30235.78 20657.49 2944.91

cc2.8xlarge 55200.86 37067.32 1195.90

sysbench memory defaults

--memory-block-size [1K]

--memory-total-size [100G]

--memory-scope {global,local} [global]

--memory-hugetlb [off]

--memory-oper {read, write, none} [write]

--memory-access-mode {seq,rnd} [seq]

Testing Disk I/O • Storage options:

– Amazon EBS

– Amazon EBS PIOPs

– Ephemeral

– hi1.4xlarge local storage

• I/O metrics – IOPs

– Throughput

– Latency

• Test parameters: – Read %

– Write %

– Sequential

– Random

– Queue depth

• Storage configuration – Volume(s)

– RAID

– LVM

Benchmarking PIOPs

• Launch an Amazon EBS-optimized

instance

• Create provisioned IOPS volumes

• Attach the volumes to Amazon

EBS-optimized instance

• Pre-warm volumes

• Tune queue depth and latency

against IOPs

Seq.Read

Seq.Write

MixedSeq

MixedSeqWrite

RandRead

RandWrite

MixedRandRead

MixedRandWrite

PIOPs 2K Queue Depth

1D PIOPS 2K

1D PIOPS 2KQD22D PIOPS 2K

2D PIOPS 2KQD2

Testing Disk I/O Examples • [global]

• clocksource=cpu

• randrepeat=0

• ioengine=libaio

• direct=1

• group_reporting

• size=1G

• [xvdd-fill]

• filename=/data1/testfile1

• refill_buffers

• scramble_buffers=1

• iodepth=4

• rw=write

• bs=2m

• stonewall

• [xvdd-1disk-write-1k-1]

• time_based

• ioscheduler=deadline

• iodepth=1

• rate_iops=4080

• ramp_time=10

• filename=/data1/testfile1

• runtime=30

• bs=1k

• rw=write

• disk copy

• cp file1 /disk1/file1

• dd

• dd if=/dev/zero of=/data1/testile1 \

bs=1048 count=1024000

• fio – flexible io tester

• fio simple.cfg

Summary Disk I/O

Seconds MB/sec

cp f1 f2 17.248 59.37

rm –rf f2; cp f1 f2 .853 1200.47

cp f1 f3 .880 1164.96

dd if=/dev/zero bs=1048 count=1024000 of=d1 .722 1419.01

dd if=/dev/urandom bs=1048 count=1024000 of=d2 79.710 12.84

fio simple.cfg NA 61.55

Beyond Simple Disk I/O

Random

1M I/O

PIOPs 16disk

write 904.03

r70w30 1005.91

Summary

If benchmarking your application is not practical, synthetic

benchmarks can be used if you are careful.

• Choose the best benchmark that represents your application

• Analysis – what does “best” mean?

• Run enough tests to quantify variability

• Baseline – what is a “good result” ?

• Samples – keep all of your results – more is better!

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

ENT305

best practices for benchmarking and performance analysis in the cloud (ent305) | aws re:invent 2013

xlarge c3

xlarge instance

xlarge m2

xlarge cc2

xlarge m3

instance tests

cpu ratio c3

geekbench geekbench

Technology

aws re:invent 2017 | cloudhealth tech session

aws re:invent 2016: amazon ec2 foundations (cmp203)

recap on aws lambda after re:invent 2015

aws user group hungary - re:invent review

brands re:charge, re:invent, re:engage in today's times

aws re:invent 2017 re:view

re:invent 2013-foster-madduri

aws re:invent 2016 photo report

(sov207) amazon appstream | aws re:invent 2014

aws re:invent 2016: setting the stage for instant success:...

"re:invent recruiting," the irecruit keynote

the cloudhealth tech re:invent 2016 survival guide

aws re:invent 2016 day 2 keynote re:cap

aws re:invent 2017 recap

the startup’s guide to re:invent · 2020-04-13 · pg. 2...

riverbed aws re:invent 2014 survey results

aws re:invent 2016: open-source resources (dcs201)

jaws-ug osaka special re:invent 2013

aws re:invent 2015 re:cap

20131122 cloudpack night re:invent report