measuring cdn performance and why you're doing it wrong
TRANSCRIPT
Measuring CDN Performance
Hooman Beheshti
VP Technology
Why this matters • Performance is one of the main reasons we use a CDN
• Measurement often used during evaluation phase to compare CDNs – Most of what we’ll talk about is in this context
• Seems easy, but isn’t • Heavily vendor-‐influenced – “Ok Google: define irony!”
Goals
• What does the measurement landscape look like
• Share measurement experiences
• Help guide towards good testing plan if/when you decide to do this
Background
Delivery: static/cached objects
Client
CDN Node
Origin
Delivery: dynamic/uncached objects
What we’ll be focusing on • Only on delivery and not all the other features CDNs provide
• How we measure • Metrics to measure • What to measure • Some gotchas, misconceptions, and common mistakes
Measurement Techniques
(how we measure)
Measurement techniques • Pretend Users – Synthetic tests – Not actual users
• Real Users – In the browser – Actual users
Synthetic testing
Synthetic testing
• Usually a large network of test nodes all over the globe
• Highly scalable, can do lots of tests at once • Many vendors that have this model – Examples: Catchpoint, Dynatrace(Gomez), Keynote, Pingdom, etc
Synthetic testing • Built to do full performance and availability testing
– Lots of “monitors” – emulating what real users do – DNS, Traceroute, Ping, Streaming, Mobile – HTTP
• Object • Browser • Transactions/Flows
• Tests set up with some frequency to repeatedly test things
– Aggregates reported
Backbone nodes • Test machines sitting in datacenters all around the globe • Really good at:
– Availability and reachability – Scale – Backend problems – Global reach
• Terrible indicators of raw performance – No latency – Infinite bandwidth
Backbone nodes • Test machines sitting in datacenters all around the globe • Really good at:
– Availability and reachability – Scale – Backend problems – Global reach
• Often terrible indicators of raw performance – No latency – Infinite bandwidth
https://www.flickr.com/photos/stars6/4381851322/
Last mile nodes • Test machines sitting behind a real home-‐like internet connection
• Much better at reporting what you can expect from users, but sometimes unreliable
• Also not as dense in deployment
backbone last mile
Real users (RUM)
RUM
• Use javascript to collect timing metrics
• Can collect lots of things through browser APIs – Page metrics, asset metrics, user-‐defined metrics
Use test assets
• Use this model to initiate tests in the browser • Some vendors: – Cedexis, TurboBytes, CloudHarmony, more… – Usually, this isn’t their business, but the data drives their main business objectives
• You can build this yourself too
Use real assets in the page • Collect timings from actual objects – Resource timing
• Vendors – SOASTA, New Relic, most synthetic vendors – Boomerang (open source) – Google Analytics User Timings
DATA, DATA, DATA
• For either RUM technique, we need A LOT of data
• Too much variance – Most vendors don’t use averages – Medians, percentiles, and histograms
Measurement Metrics
Client Server
Client Server
1 x RTT
Client Server
DNS DNS
TCP
Client Server
DNS DNS
TCP
Client Server
DNS DNS
(TLS)
TCP
Client Server
DNS DNS
(TLS)
HTTP (TTFB)
TCP
Client Server
DNS DNS
(TLS)
HTTP (TTFB)
HTTP (Download)
DNS TCP (TLS) TTFB Download (TTLB-‐TTFB)
Time
DNS TCP (TLS) TTFB Download (TTLB-‐TTFB)
Time
DNS RTT to DNS server, DNS iterations, DNS caching and TTLs
DNS TCP (TLS) TTFB Download (TTLB-‐TTFB)
Time
DNS
TCP
RTT to DNS server, DNS iterations, DNS caching and TTLs
RTT to cache server (CDN footprint & routing algorithms)
DNS TCP (TLS) TTFB Download (TTLB-‐TTFB)
Time
DNS
TCP
(TLS)
RTT to DNS server, DNS iterations, DNS caching and TTLs
RTT to cache server (CDN footprint & routing algorithms)
RTT to cache server (or RTTs depending on TLS False Start), efficiency of TLS engine
DNS TCP (TLS) TTFB Download (TTLB-‐TTFB)
Time
DNS
TCP
(TLS)
TTFB
RTT to DNS server, DNS iterations, DNS caching and TTLs
RTT to cache server (CDN footprint & routing algorithms)
RTT to cache server (or RTTs depending on TLS False Start), efficiency of TLS engine
RTT to where the object is stored + storage efficiency (different for requests to origin); lower bound = network RTT
DNS TCP (TLS) TTFB Download (TTLB-‐TTFB)
Time
DNS
TCP
(TLS)
TTFB
TTLB-‐TTFB
RTT to DNS server, DNS iterations, DNS caching and TTLs
RTT to cache server (CDN footprint & routing algorithms)
RTT to cache server (or RTTs depending on TLS False Start), efficiency of TLS engine
RTT to where the object is stored + storage efficiency (different for requests to origin); lower bound = network RTT
Bandwidth, congestion avoidance algorithms (and RTT!)
Core object metrics
• Not every request experiences every metric: – DNS: once per domain – TCP/TLS setup once per connection – TTFB/Download for every object (not already in browser cache)
Resource timing
http://www.w3.org/TR/resource-‐timing/
Resource timing
window.performance.getEntries()
Mistakes we make
(when evaluating)
CDN X
vs CDN Y
“I’ll pick an image from my home page, use backbone synthetic tests from all over the world and pick the CDN that has the fastest average time”
“let’s test an asset via RUM on a million page views a day and pick the fastest CDN”
“let’s run webpagetest on both CDNs and go with whichever has a faster page load time”
~$time curl –v http://…
we measure the wrong thing
Web application: objects • Your application should determine what you test: – Objects served from the edge – Objects served from origin (through CDN)
• If HTML is from origin (through CDN), we must measure it – Essential to critical page metrics
Web application: object sizes
• On any page – DNS queries only happen a small
number of times – 6 TCP connections per domain – 1 TLS setup per connection – Many many many HTTP fetches
• Core metrics – TTFB – Download (TTLB-‐TTFB) if
important large objects – Should have a good idea of DNS/
TCP/TLS, but less critical
Web application • If CDN only for static/cacheable objects: – One or two representative assets – TTFB and maybe download most important
Client CDN Node
X-Cache: HIT
Web application • If CDN also for whole site (HTML going through CDN) – Sample of key HTML pages, delivered from origin – TTFB will show efficiency of routing (and connection management) to origin
– TTLB will show efficiency of delivery
Web Server Client CDN Node
Web application • If CDN also for whole site (HTML going through CDN) – Sample of key HTML pages, delivered from origin – TTFB will show efficiency of routing (and connection management) to origin
– TTLB will show efficiency of delivery
Web Server Client CDN Node CDN Node
we measure the wrong way
Backbone Nodes
(For true performance measurements)
% of tes
ts
msec
TCP Connect Time Histogram (BB nodes)
object metrics or
page metrics
Download: 15Mbps Upload: 5Mbps Latency: 10 ms, 25 ms
10 msec 25 msec
10 msec 25 msec
onload Speed Index Start Render
10 msec
25 msec
What the…??? • We always assume “all things equal” • Too many factors affect page load time
– 3rd parties (sometimes varying), content form origin, layout, JS execution, etc
• Too much variance
Source: httparchive.org
To be clear… • Always use webpagetest (or something like it) to understand your
application’s performance profile
• Continue to monitor application performance, and always spot check
• Be extremely careful when using it to compare CDN performance, it can mislead you – If using RUM to measure page metrics, with lots of data, things
become a little more meaningful (data volume handles variance)
we overgeneralize and
draw the wrong conclusions
Cache hit ratios
Cache hit ratio: traditional calculation
1 -‐ Requests to Origin
Total Requests
Origin
Origin
Cache
TCP
Origin
Cache
HTTP
Origin
Cache
Origin
Cache
HTTP
Origin
Cache
HTTP
Origin
Cache
HTTP
Origin
Cache
HTTP
Origin
Cache
HOT COLD
Origin
Cache
cache “hit”
Cache hit ratio: traditional calculation
1 -‐ Requests to Origin
Total Requests
Isn’t this better?
Hits
Total Requests @edge
Isn’t this better?
Hits
Hits + Misses @edge
Cache hit ratio
vs. 1 -‐ Requests to Origin
Total Requests
Hits
Hits + Misses @edge
Cache hit ratio
vs. 1 -‐ Requests to Origin
Total Requests
Hits
Hits + Misses @edge
Offload
Cache hit ratio
vs. 1 -‐ Requests to Origin
Total Requests
Hits
Hits + Misses @edge
Offload Performance
Effect on long tail content
Effect on long tail content
(long tail: Cacheable but seldom fetched)
Popular Medium Tail (1hr) Long tail (6hr)
Popular Medium Tail (1hr) Long tail (6hr)
Connect (median)
Popular 14msec
1hr Tail 15msec
6hr Tail 16msec
Popular Medium Tail (1hr) Long tail (6hr)
Connect (median)
Popular 14msec
1hr Tail 15msec
6hr Tail 16msec 6,400+ measurements
77,000+ measurements
38,000+ measurements
Popular Medium Tail (1hr) Long tail (6hr)
Connect (median) Wait (median)
Popular 14msec 19msec
1hr Tail 15msec 26msec
6hr Tail 16msec 32msec 6,400+ measurements
77,000+ measurements
38,000+ measurements
Popular Medium Tail (1hr) Long tail (6hr)
Isn’t this better?
Popular Medium Tail (1hr) Long tail (6hr)
Popular Medium Tail (1hr) Long tail (6hr)
After all that….
How much of this really matter?
(when trying to choose between multiple CDNs)
The bigger picture
• It’s really easy to lock in on a metric
• Performance absolutely matters
• True performance isn’t always as easy to measure
We must ask questions …
What’s the storage model and how does it affect long tail content?
What should I expect with cache hit ratios
for offload and performance?
Footprint?
(is what I’m testing the same as what I’m buying?)
HTTP vs TLS footprint?
Can I serve stale content if necessary?
(stale-while-revalidate & stale-if-error)
What if I can cache something I didn’t think I could?
Key takeaways • Everything is application-‐dependent
– Evaluate how your application works and what impacts performance the most
• Don’t get locked into a single number/metric
• Always know your application performance and bottlenecks
• Be mindful of the bigger picture
• Don’t stop measuring!