ec2 performance v2.ppt-2
TRANSCRIPT
![Page 1: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/1.jpg)
UC Berkeley
Evaluating Amazon’s EC2 As A Research Platform
Michael Armbrust and Gunho Lee
1
![Page 2: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/2.jpg)
2
RAD Lab Overview
High
level
spec
Low
level
spec
Drivers Drivers Drivers
New apps, equipment, global policies (eg SLA)
Com- piler
Offered load, resource
utilization, etc.
Instru
menta
tion B
ackpla
ne
Training data
Ruby on Rails environment
VM monitor
local OS functions
trace collection
web svc APIs
Web 2.0 apps
Berkeley DB
local OS functions
trace collection
SCADS
Au
tom
atic
Work
loa
d
En
gin
e
Director
performance & cost
models
Log Mining
Policy-aware switching
![Page 3: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/3.jpg)
Overview of EC2
Platform Units Memory Disk
Small - $0.10 / hour 32-bit 1 1.7GB 160GB
Large - $0.40 / hour 64-bit 4 7.5GB 850GB – 2 spindles
X Large - $0.80 / hour 64-bit 8 15GB 1690GB – 4 spindles
High CPU Med - $.20 64-bit 5 1.7GB 350 GB
High CPU Large-$.80 64-bit 20 7GB 1690 GB
One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.
•! Elastic virtual machine capacity •! Programmatically controlled through web service API •!~5 minutes to allocate new machines •! Charges for machine time and data transfer in/out of datacenter
![Page 4: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/4.jpg)
Appeal
•! Automatically dynamically scale your computing resources as demand changes
•! For researchers:
–!Run experiments for $100 per 1000 machine hours!
–!AMIs provide simple containment of experimental setup and allow for easy verification
![Page 5: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/5.jpg)
Rapid Changes
•! Changing machines and processor types
•! New features
–! Availability Zones
–! Elastic IP Addresses
–! Persistent Storage
![Page 6: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/6.jpg)
Reverse Engineering
•! How much hardware do they have?
–!Not as much as we expected
–!Seems to be changing
•! What have we seen in our tests?
–!402 VMs
–!379 Physical Machines
–!Overlap of as many as 7 VMs per machine
![Page 7: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/7.jpg)
Goal
•! Characterize the performance and variance of different aspects of EC2
–!Clock Accuracy
–!CPU
–!Memory Bandwidth
–!Disk
–!Network Throughput / Latency
•! Provide recommendations to researchers hoping to use EC2
![Page 8: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/8.jpg)
Benchmark Architecture
•! Suite of benchmarks controlled from the R Cluster –! SSH to machines –! Installed required binaries –! Collect and record results to MySQL
Usage: util/runTests.rb [options] -r, --reservation RESERVATION_ID Include the specified reservation in tests
-d, --disk SIZE Execute a disktest of specified GBs -c, --cpu_info Caputure CPU Info
-t, --trace_path FILE Run traceroute and capture to file (for graphing) -n, --network N Perform a network throughput test between random pairs of
nodes N times
-s, --skewserver SERVER Install and start the skew monitor in the background reporting into SERVER
-m, --memorytest Perform the stream memory benchmark -x, --detectcap Attempt to detect Xen CPU Capping
![Page 9: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/9.jpg)
Clock Accuracy
•! Time difference between EC2 & local server
•! Frequency (1~3 times / hour) and magnitude (~200ms) of clock skews seen by many machines over some period
![Page 10: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/10.jpg)
CPU Performance
•! EC2 uses Xen CPU caps to provide fairly consistent performance across different processor type
•! Programs may experience scheduling artifacts with small or latency sensitive computation
![Page 11: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/11.jpg)
CPU Performance
![Page 12: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/12.jpg)
Memory Bandwidth
•! Used Stream Memory Benchmark
•! Low variation between machines of the same type
–! Std Dev < 55 MB/s
•! Very high between different processor types
![Page 13: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/13.jpg)
Disks – Warm Up Effect
•! When first using /mnt (ephemeral storage) there are significant allocation performance artifacts
•! Oddly, these seem to correlate with the processor type
![Page 14: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/14.jpg)
Disks – Warm Up Effect
![Page 15: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/15.jpg)
Disks – Long Term Performance
•! Overtime aggregate disk performance is more consistent •! Over 90% of writes occur at 40mb/s or greater •! Average 54.88mb/s, Standard Deviation 9.05
![Page 16: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/16.jpg)
Individual Performance
•! There is a much greater discrepancy between individual machines
•! Most likely due to collocation with other disk intensive customers
![Page 17: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/17.jpg)
Original Topology
•! Originally we saw three distinct sections of their network
![Page 18: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/18.jpg)
New Topology
![Page 19: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/19.jpg)
Network Quirks and Latency
•! 96% RTT < 1ms
•! Occasional route changes lead to weird paths and increased latency
•! Current EC2 bug limits the number of concurrent connections that can be established cluster wide
–!Expected fix in a few weeks
![Page 20: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/20.jpg)
Network Performance
![Page 21: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/21.jpg)
Long Term Network
![Page 22: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/22.jpg)
Long Term Network
![Page 23: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/23.jpg)
Large – CPU
•! No CPU cap
–!Slower CPU, but full speed
![Page 24: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/24.jpg)
Large – Disk
•! Similar performance
![Page 25: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/25.jpg)
Large – Network
•! 2x performance at 4x cost
![Page 26: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/26.jpg)
Small vs. Large
•! Small instance shows statistically better performance/$ –! Due to lack of Xen I/O caps coupled with low overall
cluster utilization
•! Large instance has higher network bandwidth –! Explicitly specified –! Not as high as it costs
•! No CPU caps for large instance –! Seems like not sharing physical core with other
instances –! No future guarantee
![Page 27: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/27.jpg)
Evaluation for Research
•! Typical experimental environment for (distributed) system research
•! Single machine (simulation)
•! Private cluster (many nodes)
•! Emulab (network topology emulation)
•! PlanetLab (geographically distributed)
–! Many experiments on private cluster and Emulab can be done better on EC2
Size Cost Usability Control
Private Cluster Small Expensive Easy Full
Emulab / PlanetLab Medium Free Hard Partial
EC2 Large Cheap Easy Little
![Page 28: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/28.jpg)
Evaluation for Research
•! EC2 is useful when you do…
•! Experiment on “large” cluster
-! distributed storage/computation/service…
•! Computation with “power” of large cluster
-! computational biology, vision, ML…
•! Research on virtualized environment just like EC2, of course
-! cloud computing…
![Page 29: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/29.jpg)
Recommendations
•! Design for I/O variability
–!Do disk writes in bulk in the background
–!Be topology aware
–!Dynamically redistribute work
•! Be aware of the difference between small & large instances
–!Overall, better performance/$ for small
–!Some advantages for large
![Page 30: EC2 Performance v2.Ppt-2](https://reader033.vdocuments.us/reader033/viewer/2022061118/54694ab6af795922598b46ed/html5/thumbnails/30.jpg)
Q&A
•! Thanks!