WebPerf: Evaluating “What-
If” Scenarios for Cloud-hosted Web
Applications
Yurong JiangLenin Ravindranath
Suman NathRamesh Govindan
1
2
A Cloud-Hosted Web Application
3
Front-end
StorageBlob Store
Relational StoreCache
CDN
Workers
Queue
Facebook Google
Search Store
Log Store
NotificationService
Modern Cloud Applications are Complex
Cloud
The Problem
4
Latency of these applications is critical for user experience
Developers find it hard to optimize cloud-side latency for cloud-hosted Web applications
Cloud-side Latency
Configuration ComplexityFront-end
5
Each choice impacts latency
Configuration ComplexityRelational Store Azure SQL
6
Latency implications hard to understand
Exploring Configuration
Choices
Cloud App Service Configuration
What if? What if I move the blob store from basic to standard tier?
$30 $100
200 ms100 ms
7
Challenge
8
Per Req Latency (ms) under 100 Concurrent Reqs
Price (USD/Mon)
Answer to what-if question may depend on workload
Challenge
End
Store Insert Cache Insert
Start
100 ms 20 ms
100 ms
10 ms
9
Answer to what-if question may depend on causal dependencies
Challenge
Relational Store
Relational Store
What if I re-locate this component?
What if I increase this component’s load?
Table Store What if a replica fails?
10
A what-if capability should be expressive
WebPerf
11
WebPerf is a what-if scenario evaluator❖ Input: a what-if scenario❖ Output: resulting cloud-side latency distribution
WebPerf What if I upgrade blob storagefrom basic to standard tier?
12
Frontend Processing
Start
Cache Insert
End
Blob Insert
Store Insert
Dependency graph extraction
Component Profiling
Baseline latency estimation
Cloud-side latency estimation
Developer supplies workload
Computed Offline
Key Insights
Cloud deployments well-engineered❖ Components designed for predictable latency❖ Often co-located in same datacenter
Why might WebPerf work?
13
Key Insights
Many component profiles are application-independent
The dependency graph is usually independent of what-if scenario
Why?
➢Cheap ➢Fast ➢Low Effort
Automate most steps
Compute offline, reuse
Compute once
14
WebPerf Approach
What if I upgrade blob storagefrom basic to standard tier?
Dependency graph extraction
Component Profiling
Baseline latency estimation
Cloud-side latency estimation
❉
15
Frontend Processing
Start
Cache Insert
End
Blob Insert
Store Insert
Dependency Extraction
Fast, accurate dependency extraction with zero developer input
16
Goal
Track dependencies at run-time by instrumenting binary
Approach
Challenge
Task Asynchronous Programming ❖Many cloud apps use this❖ Only mechanism for asynchronous I/O in Azure❖ AWS provides APIs for .NET
17
Prior work has not considered this
Task Asynchronous Programmingasync processRequest (input)
{/* process input */task1 = store.get(key1);value1 = await task1;task2 = cache.get(key2);value2 = await task2;/* construct response */return response;
}
Start task
Continue
On front-end
18
Task Asynchronous Programmingasync processRequest (input)
{/* process input */task1 = store.get(key1);value1 = await task1;task2 = cache.get(key2);value2 = await task2;/* construct response */return response;
}
task1
Start
End
task2
19
Asynchronous Parallel
Operations
async processRequest (input){
/* process input */task1 = store.get(key1);task2 = cache.get(key2);value1, value2 = await
Task.WhenAll(task1, task2);/* construct response */return response;
}
WhenAll: Continue only when all tasks finish
task1
End
task2
Start
20
Asynchronous Parallel
Operations
async processRequest (input){
/* process input */task1 = store.get(key1);task2 = cache.get(key2);value1, value2 = await
Task.WhenAny(task1, task2);/* construct response */return response;
}
WhenAny: Continue when any one task finishes
task1
End
task2
Start
21
Key Ideaasync processRequest (input){
/* process input */task1 = store.get(key1);value1 = await task1;task2 = cache.get(key2);value2 = await task2;/* construct response */return response;
}
Init
Received value1
Received value2
Instrument state machine binary to dynamically track tasks and continuations
22
.NET compiler generates this
WebPerf Approach
What if I upgrade blob storagefrom basic to standard tier?
Dependency graph extraction
Component Profiling
Baseline latency estimation
Cloud-side latency estimation
❉
23
Frontend Processing
Start
Cache Insert
End
Blob Insert
Store Insert
WebPerf Approach
What if I upgrade blob storagefrom basic to standard tier?
Dependency graph extraction
Component Profiling
Baseline latency estimation
Cloud-side latency estimation
❉
24
Frontend Processing
Start
Cache Insert
End
Blob Insert
Store Insert
Component Profiling
A component’s profile contains latency distributions of API calls to component
WebPerf profiles commonly used components offline
Profile dictionary
25
Generalized Profiles
Relational Store
Relational Store
Relational Store
Location
Load
Table Store Failure
26
Tiers
Application Dependent
Profiles
Relational Store
Azure SQL
SQL join latency depends on size
Cache
Redis CacheCache latency depends on hit rate
Relational StoreSharded
TableStore
Access latency depends on skew
27
Not all profiles can be computed offline
Workload HintsWebPerf uses parameterized profiles
❖ User must specify workload hint
Relational Store
Azure SQLSize
Cache
Redis CacheHit rate
Relational StoreSharded
TableStoreSkew
28
WebPerf Approach
Frontend Processing
Start
Cache Insert
End
Blob Insert
Store Insert
What if I upgrade blob storagefrom basic to standard tier?
Dependency graph extraction
Component Profiling
Baseline latency estimation
Cloud-side latency estimation
❉
29
Cloud-Side Latency
Estimation
Frontend Processing
Start
Cache Insert
End
Blob Insert
Store Insert
What if I upgrade blob storagefrom basic to standard tier?
Replace from profile
30
Max
Min
31
Cloud-Side Latency
Estimation
Simple operations on distributions suffice
Evaluation
WebPerf is accurate, fast, cheap, and requires low developer effort
32
❖How accurate is WebPerf?❖Are workload hints necessary?
Applications
33
Application Azure components used Average I/O Calls
SocialForum Blob storage, Redis cache, Service bus, Search, Table
116
SmartStore.Net SQL 41
ContosoAds Blob storage, Queue, SQL, Search 56
EmailSubscriber Blob storage, Queue, Table 26
ContactManager Blob storage, SQL 8
CourseManager Blob storage, SQL 44
Different functionality
Different components
Varyingcomplexity
What-if Scenarios
34
What-if scenario ExampleTier: A component X is upgraded to tier Y
X = A Redis cache, Y = a standard tier (from a basic tier)
Load: X concurrent requests to component Y
X = 100 , Y = the application or a SQL database
Interference: CPU and/or memory pressure, from collocated applications, of X%
X = 50% CPU, 80% memory
Location: A component X is deployed at location Y
X = A Redis Cache or a front end, Y = Singapore
Failure: An instance of a replicated component X fails
X = A replicated front-end or SQL database
Accuracy
35
What if I move the Redis cache in SocialForum from basic to standard tier?
Configuration choices can significantly impact latency
Accuracy
36
What if I move the Redis cache from basic to standard tier?
Prediction closely matches ground truth
Accuracy
37
Median prediction error under 7%
Difference between predicted distribution and ground truth
Accuracy
38
Low median error for tier and replication
Slightly higher error for load and failure
Workload Hints
39
Workload hints can significantly improve accuracy
Conclusions
40
WebPerf predicts cloud-side latency distributions for different what-if scenarios
It accurately tracks dependencies and profiles components offline
Across six different applications and scenarios, its error is less than 7%
Frontend Processing
Start
Cache Insert
End
Blob Insert
Store Insert
WebPerf Contributions and Summary
41
An automated tool to instrument web apps and capture both browser objects and front end cloud processing dependency
Predicting web app cloud latency and end-to-end latency in probabilistic setting under six different scenarios
Evaluations with six real websites show WebPerf achieves < 7% median prediction error
Thank you
42
Large number of configuration choices43
Front-end
StorageBlob Store
Relational StoreCache
CDN
Workers
Queue
Facebook Google
Search Store
Log Store
Notification Service
Large number of configuration choices44
Front-end
StorageBlob Store
Relational StoreCache
CDN
Workers
Queue
Facebook Google
Search Store
Log Store
Notification Service
Large number of configuration choices45
Front-end
StorageBlob Store
Relational StoreCache
CDN
Workers
Queue
Facebook Google
Search Store
Log Store
Notification Service
Azure SQL
Large number of configuration choices46
Front-end
StorageBlob Store
Relational StoreCache
CDN
Workers
Queue
Facebook Google
Search Store
Log Store
Notification Service
Redis Cache
Large number of configuration choices47
Front-end
StorageBlob Store
Relational StoreCache
CDN
Workers
Queue
Facebook Google
Search Store
Log Store
Notification Service
Reasoning about cost-performance trade-off is hard!
Cost-Performance Trade-off48
● Configuration does not directly map to performance
● End-to-end latency depends on application’s causal dependency
End
Store Insert Cache Insert
Start
100 ms 20 ms
100 ms
10 ms
“What-If” Analysis49
CloudApp
Service Configura
tion
What if?
What if I move the front-end from basic to standard tier?$30 $100
900 ms
600 ms
End-to-end latency estimate
Create a new deployment and measure performance➢Expensive ➢Time consuming ➢High overhead
WebPerf: “What-If” Analysis50
CloudApp
Service Configura
tion
What if?
What if I move the front-end from basic to standard tier?$30 $100
900 ms
600 ms
End-to-end latency estimate
➢Zero costPredict performance under hypothetical configurations
➢Near real-time ➢Zero developer effort
Deploy with certain configuration
WebPerf: “What-If” Analysis51
CloudApp
Service Configura
tion
What if?
What if I move the front-end from basic to standard tier?$30 $100
900 ms
600 ms
End-to-end latency estimate
➢Zero costPredict performance under hypothetical configurations
➢Near real-time ➢Zero developer effort
Deploy with certain configuration
WebPerf: Key Insights52
● Offline, application-independent profiling is useful● Modern cloud apps are built using existing services (PaaS)● Individual services have predictable performance
● S3, Azure Table Storage, Dynamo DB, DocumentDB, …● Services are co-located inside the same datacenter
● Tighter latency distribution
● Causal dependency within application is independentof the what-if scenarios we consider
Application-Independent Profiling53
T:Table, R:Redis, S:SQL, B:Blob, Q:Queue
1 Delete(Async) (T)2 UploadFromStream (B)3 AddMessage(Q)4 Execute (T)
5 ExecuteQuerySegmented (T)6 SortedSetRangeByValue (R)7 StringGet (R)8 SaveChanges (S)
9 ToList (S)10 Send (R)11 ReadAsString (B)
WebPerf Design54
● Dependency graph extraction
● Application-independent profiling
● Baseline latency estimation
● Latency prediction
Frontend Processing
Start
Cache Insert
End
Blob Insert
Store Insert
Task Asynchronous Pattern (TAP)55
Thread1
Thread2
Thread3
continuation
continuation
store.get
cache.get
async processRequest (input){
/* process input */task1 = store.get(key1);value1 = await task1;task2 = cache.get(key2);value2 = await task2;/* construct response */return response;
}
Start task asynchronously
Continue after task finishes
Dependency Graph Extraction56
● Design Goals ● Accurate● Real-time with minimal data collection● Zero developer effort● No modifications to the platform● Low overhead
● Automatic Binary Instrumentation● Modern cloud applications are highly asynchronous● Task Asynchronous Programming Pattern
Task Asynchronous Pattern (TAP)57
● Asynchronous operations with a synchronous programming pattern
● Increasingly popular for writing cloud applications● Supported by many major languages● C#, Java, Python, Javascript
● Most Azure services support TAP as the onlymechanism for doing asynchronous I/O● AWS also provides TAP APIs for .NET
Synchronous Programming58
Blocking I/O limits server throughput
Thread1
processRequest (input){
/* process input */value1 = store.get(key1);value2 = cache.get(key2);/* construct response */return response;
} store.get cache.get
Asynchronous Programming Model (APM)59
processRequest (input){
/* process input */store.get(key1, callback1);
}
callback1 (value1){
cache.get(key2, callback2);}
callback2 (value2){
/* construct response */send(response);
}
Thread1
Thread2
Thread3
callback1
callback2
store.get
cache.get
Task Asynchronous Pattern (TAP)60
async processRequest (input){
/* process input */task1 = store.get(key1);value1 = await task1;task2 = cache.get(key2);value2 = await task2;/* construct response */return response;
}
task1
Start
End
task2
Dependency Graph
Task Asynchronous Pattern (TAP)61
task1
End
task2
async processRequest (input){
/* process input */task1 = store.get(key1);task2 = cache.get(key2);value1 = await task1;value2 = await task2;/* construct response */return response;
}
Dependency GraphStart
Task Asynchronous Pattern (TAP)62
async processRequest (input){
/* process input */task1 = store.get(key1);task2 = cache.get(key2);value1, value2 = await
Task.WhenAll(task1, task2);/* construct response */return response;
}
Dependency Graph
WhenAll: Continue only when all tasks finish
task1
End
task2
Start
Task Asynchronous Pattern (TAP)63
async processRequest (input){
/* process input */task1 = store.get(key1);task2 = cache.get(key2);value = await
Task.WhenAny(task1, task2);/* construct response */return response;
}
Dependency Graph
WhenAny: Continue after any one of tasks finishes
task1
End
task2
Start
Automatic Binary Instrumentation64
class processRequest__{string input;AsyncTaskMethodBuilder builder;string key1, key2, response;int asyncId = -1;public void MoveNext(){asyncId = Tracker.AsyncStart(asyncId);Tracker.StateStart(asyncId);switch (state){case -1:state = 0;/* process input */var task1 = store.get(key1);Tracker.TaskStart(task1, asyncId);builder.Completed(task1.Awaiter, this);Tracker.Await(task1, asyncId);
case 0:state = 1;var task2 = cache.get(key1);Tracker.TaskStart(task2, asyncId);builder.Completed(task2.Awaiter, this);Tracker.Await(task2, asyncId);
case 1:/* construct response */builder.SetResult(response);Tracker.StateEnd(asyncId); Tracker.AsyncEnd(asyncId); return;
}Tracker.StateEnd(asyncId);
}}
async processRequest (input){
/* process input */task1 = store.get(key1);value1 = await task1;task2 = cache.get(key2);value2 = await task2;/* construct response */return response;
}
-1
0
1
continuation
continuation
-1
0
1
Instrument state machineTrack tasks and continuations
Automatic Binary Instrumentation65
async processRequest (input){
/* process input */task1 = store.get(key1);value1 = await task1;task2 = cache.get(key2);value2 = await task2;/* construct response */return response;
}
task1
Start
End
task2
Dependency Graph-1
0
1
Instrument state machineTrack tasks and continuations
Automatic Binary Instrumentation66
● Tracking async state machines● Monitor task start and completion● Track state machine transitions
● Tracking pull-based continuations● Link tasks to corresponding awaits● Link awaits to continuations
● Tracking synchronization points● Track WhenAll, WhenAny, cascaded task dependencies
● Keeping the overhead low● Instrument APIs with know signatures● Instrument only leaf tasks
Dependency graph extraction67
● Highly accurate● Real-time● Zero developer effort● Extremely low overhead
Dependency Graph
task1
End
task2
Start
[Some Result]
API Profiling
68
● A profile of a cloud API is a distribution of its latency●
Parameterized profiles(workload-dependent)
(e.g., SQL)
Baseline profilesApplication-specific
Independent profiles
(e.g., Redis)
Computed offline, Or on-demand for reuse
During dependency tracking
Is API in what-if
scenario?
Workload hint given?
No
No
Yes
Yes
API Profiling
69
● WebPerf builds profiles offline and maintains in a dictionary
● Starts with common profiles, and builds additional profiles on-demand and reuses them
● Optimal profiling: to minimize measurement costs (details in paper)
Profile dictionary
task1Start
task2Sync task3
task4 task5
EndBaselinelatencies
What-If Engine
70
● Predicts cloud latency under a given what-if scenario
Instrumented App
Workload
What-if task1 and task4 upgraded?
Profile dictionary
Convolve distributions
Convolving distributions
71
task1
Start
task2
Sync task3
task4 task5
End
T1
T2
T3
T4
T5
Bottom-up evaluation:
• WhenAll: ProbMax(t1, t2, …)
• WhenAny: ProbMin(t1, t2, …)
• WhenDone: ProbAdd(t1, t2, …)
End-to-end Latency Prediction
72
● Details in paper
Te2e = TCloud + TNetwork + TBrowser
WebPerf Network latency model WebProphet[]
Combine using Monte-Carlo simulation
WebPerf Evaluation
73
● Six 3rd party applications and six scenariosApplication Azure services used Average I/O Calls
SocialForum Blob storage, Redis cache, Service bus, Search, Table
116
SmartStore.Net SQL 41
ContosoAds Blob storage, Queue, SQL, Search 56
EmailSubscriber Blob storage, Queue, Table 26
ContactManager Blob storage, SQL 8
CourseManager Blob storage, SQL 44
74
CDF of First Byte Time Percentage
WebPerf Evaluation
75
● Six 3rd party applications and six scenariosWhat-if scenario ExampleTier: A resource X is upgraded to tier Y X = A Redis cache, Y = a standard tier (from a
basic tier)Load: X concurrent requests to resource Y X = 100 , Y = the application or a SQL database
Interference: CPU and/or memory pressure, from collocated applications, of X%
X = 50% CPU, 80% memory
Location: A resource X is deployed at location Y
X = A Redis Cache or a front end, Y = Singapore
Failure: An instance of a replicated resource X fails
X = A replicated front-end or SQL database
WebPerf Evaluation
76● Metric: distribution of relative errors
Ground truth from real deployment
WebPerf prediction
Underspecified Configuration
DimensionsCache Redis Cache
77
What-if the Redis cache is upgraded from the original standard C0 to Standard C2 tier?
78
Maximum cloud side latency prediction error is only 5%
What-if the Redis cache is upgraded from the original standard C0 to Standard C2 tier?
79
What-if the Redis cache is upgraded from the original standard C0 to Standard C2 tier?
80
Maximum cloud side latency prediction error is only 5%
Performance for Six Applications
81
Scenarios
SocialForum ContosoAdsEmailSubscriber CourseManager
SmartStore.NetContactManager
Median prediction error for cloud side latency is < 7%
Performance for Six Applications
82
Scenarios Median prediction error for cloud side latency is < 7%
Workload Hints
83
Workload hints can bring order of magnitude accuracy improvement
Workload Hints
84
Workload hints can bring order of magnitude accuracy improvement
Other findings
85
● Performance of many applications and scenarios can be predicted reasonably well● Thanks to cloud provider’s SLAs
● Harder cases● Workload-dependent performance: hints help● High-variance profiles: prediction has high variance ● Non-deterministic control flow (e.g., cache hit/miss):
● Separate prediction for each control flow● Hard-to-profile APIs (e.g., SQL query with join)
● Poor prediction
86
Behind apps, several distributed, asynchronous components
Modern Cloud Applications are Complex
87
Front-end
StorageBlob Store
Relational StoreCache
CDN
Workers
Queue
Facebook Google
Search Store
Log Store
NotificationService
Modern Cloud Applications are Complex
Cloud
Hard to reason about the performance of cloud-hosted
Web appsCloud-side Latency
88
Hard to reason about the cost/performance tradeoffs of
different configurations
89
Workload Dependence
90
Deploy
Cloud App Service Configuration
What if? What if I move the front-end from basic to standard tier?
900 ms600 ms
BasicStandard
➢Expensive ➢Slow ➢High Effort
91
Model and Predict
Cloud App Service Configuration
What if? What if I move the front-end from basic to standard tier?
200 ms
Service Configuration
Standard
100 ms
92
Combining latency
distributions
task1
task2
93
Combining latency
distributions
task1 task2
WhenAll
Max
94
Combining latency
distributions
task1 task2
WhenAny
Min
95
Cloud-Side Latency
Estimation
Frontend Processing
Start
Cache Insert
End
Blob Insert
Store Insert
What if I move the front-end from basic to standard tier?
Replace from profile
96
WebPerf Approach
Frontend Processing
Start
Cache Insert
End
Blob Insert
Store Insert
What if I move the front-end from basic to standard tier?
Dependency graph extraction
Component Profiling
Baseline latency estimation
Cloud-side latency estimation
❉
97
WebPerf
Frontend Processing
Start
Cache Insert
End
Blob Insert
Store Insert
What if I move the front-end from basic to standard tier?
Dependency graph extraction
Component Profiling
Baseline latency estimation
Cloud-side latency estimation
Developer supplies workload
Computed Offline
98
How accurate is WebPerf?
What is WebPerf’s overhead?
What are the primary sources of prediction error?
Are workload hints necessary?
Can WebPerf predict end-to-end latency?
Evaluation Goals
99
❉
❉
Task Asynchronous Programming
Thread1
Thread2
Thread3
continuation
continuation
store.get
cache.get
async processRequest (input){
/* process input */task1 = store.get(key1);value1 = await task1;task2 = cache.get(key2);value2 = await task2;/* construct response */return response;
}
Start task
Continue
After await returns
On front-end
100
Measure of Prediction Accuracy
101
Ground truth from real deployment
WebPerf prediction
Distributional Difference