locus - usenix.org · 3 serverless analytics … user function + data i.e., map() … f() results...
TRANSCRIPT
![Page 1: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/1.jpg)
Qifan Pu Shivaram Venkataraman Ion Stoica
LocusShuffling Fast and Slow
on Serverless Architecture
1
![Page 2: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/2.jpg)
Serverless Computing
2
…
![Page 3: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/3.jpg)
3
Serverless Analytics
…
User function
+
Data
i.e., Map()
…
F()
Results
Launch short-lived cloud workers with transparent elasticity and fine-grain usage billing
![Page 4: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/4.jpg)
4
Serverless AnalyticsSo far, a great fit for embarrassingly parallel analytics
ExCamera (NSDI’17): video encoding
PyWren (SoCC’17): scaling python functions
NumPyWren: large-block matrix computation
Stanford gg compiler: distributed compiling
AWS Redshift Spectrum: ETL
![Page 5: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/5.jpg)
5
General Serverless Analtyics
To enable general analytics, one needs to implement shuffle.
- Key operation for join() and groupby()
- 70+% of TPC-DS queries use shuffle
- Most expensive operation in production clusters
… … …Query Results
Shuffle Shuffle
How to perform cost-efficient shuffle on a serverless architecture?
![Page 6: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/6.jpg)
An Shuffle Example: Cloud Sort
6
Lowest cost to sort 100TB of data• Record: Apache Spark, 50min / $144
…
…
…
PartitionTask
PartitionTask
…
PartitionTask
mergetask
mergetask
mergetask
……
1…100
100…201
10000…10100
…
![Page 7: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/7.jpg)
Traditional AnalyticsData communicated directly between servers that execute tasks.
mappertask
mappertask
mappertask
reducertask
reducertask
Server
Server
Server
7
![Page 8: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/8.jpg)
Traditional AnalyticsData communicated directly between servers that execute tasks.
reducertask
reducertask
Server
Server
Server
8
![Page 9: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/9.jpg)
How to shuffle in serverless?
Tasks are short-lived
mappertask
mappertask
mappertask
reducertask
reducertask
Server
Server
Server
9
![Page 10: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/10.jpg)
Tasks are short-lived
reducertask
reducertask
Server
Server
Server
10
How to shuffle in serverless?
Shuffling directly between tasks is difficult in serverless.
![Page 11: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/11.jpg)
How about using S3?- Many frameworks already use it for input/results.- Cheap capacity and elastic bandwidth
11
How to shuffle in serverless?
0
40
80
120
0 1000 2000 3000 4000
Agg
rega
te B
W (G
B/s
)
Number of Lambda Workers
write
read
![Page 12: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/12.jpg)
Shuffle with S3
mappertask
mappertask
mappertask
reducertask
reducertask
Server
Server
Server
Compute
S3
12
How to shuffle in serverless?
![Page 13: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/13.jpg)
Shuffle with S3
reducertask
reducertask
Server
Server
Server
Compute 13
How to shuffle in serverless?
S3How much does it cost for CloudSort?
![Page 14: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/14.jpg)
Shuffle with S3
mappertask
mappertask
mappertask
reducertask
reducertask
Server
Server
Server
Compute
S3
14
How to shuffle in serverless?
$$$
How many requests does it take?
![Page 15: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/15.jpg)
An Shuffle Example: Cloud Sort
15
PartitionTask …
PartitionTask …
PartitionTask …
…
mergetask
mergetask
mergetask
…
• Total number of transfers:- Assuming 1GB serverless containers- Num partition/merge tasks = 100TB/1GB = 10#- Number of files: 10$% = 10 billion files
…
![Page 16: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/16.jpg)
An Shuffle Example: Cloud Sort
16
PartitionTask …
PartitionTask …
PartitionTask …
…
mergetask
mergetask
mergetask
…
• Total number of transfers:- Assuming 1GB serverless containers- Num partition/merge tasks = 100TB/1GB = 10#- Number of files: 10$% = 10 billion files- $0.000005/op -> $50k
…
![Page 17: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/17.jpg)
An Shuffle Example: Cloud Sort
17
PartitionTask …
PartitionTask …
PartitionTask …
…
mergetask
mergetask
mergetask
……
0
4000
8000
0 100 200 300
IOPS
Number of Lambda Workers
Bottlenecked at 4400 ops/sec!
Takes !"#$
%%"" seconds = 26 days
![Page 18: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/18.jpg)
An Shuffle Example: Cloud Sort
18
PartitionTask …
PartitionTask …
PartitionTask …
…
mergetask
mergetask
mergetask
……
0
4000
8000
0 100 200 300
IOPS
Number of Lambda Workers
Bottlenecked at 4400 ops/sec!
Takes !"#$
%%"" seconds = 26 daysS3 lacks cheap and elastic IOPS.
AWS s3 ElastiCache (Redis)Azure BlobAzure CacheGoogle Cloud StorageGoogle MemoryStore…
![Page 19: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/19.jpg)
An Shuffle Example: Cloud Sort
PartitionTask …
PartitionTask …
PartitionTask …
…
mergetask
mergetask
mergetask
……
Require 158 large (r5.24xlarge) instances!Costs $1.6K/hour.
Capacity alone is 10x more expansive than record.
…
![Page 20: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/20.jpg)
An Shuffle Example: Cloud Sort
PartitionTask …
PartitionTask …
PartitionTask …
…
mergetask
mergetask
mergetask
……
Require 158 large (r5.24xlarge) instances!Costs $1.6K/hour.
Capacity alone is 10x more expansive than record.
…
Redis capacity is too expensive for large intermediate data.
![Page 21: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/21.jpg)
Cloud Storage
Service $/Mo/GB $/million writes
AWS S3 0.023 5
GCS 0.026 5
Azure Blob 0.023 6.25
ElastiCache 7.9 0
MemoryStore 16.5 0
Azure Cache 11.6 0
… … …
Capacity IOPS
Slow Storage
Fast Storage
High Low
Low High
Can we leverage the strengths of two storages types?
![Page 22: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/22.jpg)
Locus
22
• Hybrid slow and fast cloud storage to achieve cost-efficient shuffle performance – Intuition: use fast storage to absorb IOPS and create bigger
chunks for slow storage.
![Page 23: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/23.jpg)
Hybrid sort (100TB)
Round1:
Round2:
Round20:
…
Clean cache after each round
…
finalmerge
5TB 5TB 5TB
num reqs = num mergers = 10#partition merge
• Total number of S3 reqs: 20×10# ('(. 10*+)• Total memory cache needed: 5TB (vs. 100TB)
1…100
100…201
10000…10100…5TB 5TB 5TBpartition merge
5TB 5TB 5TBpartition merge
![Page 24: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/24.jpg)
Hybrid sort (100TB)
Round1:
Round2:
Round20:
…
Clean cache after each round
…
5TB 5TB 5TB
num reqs = num mergers = 10#partition merge
• Spark record: 50min + $144• Locus: 50min + $163
1…100
100…201
10000…10100…5TB 5TB 5TBpartition merge
5TB 5TB 5TBpartition merge
finalmerge
![Page 25: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/25.jpg)
Hybrid sort (100TB)
Round1:
Round2:
Round20:
…
Clean cache after each round
…
5TB 5TB 5TB
num reqs = num mergers = 10#partition merge
• Spark record: 50min + $144• Locus: 50min + $163
1…100
100…201
10000…10100…5TB 5TB 5TBpartition merge
5TB 5TB 5TBpartition merge
Locus achieves same performance with 5X less resource.
finalmerge
![Page 26: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/26.jpg)
Locus
26
• Hybrid slow and fast cloud storage to achieve cost-efficient shuffle performance – Intuition: use fast storage to absorb IOPS and create bigger
chunks for slow storage.
• When to use hybrid shuffle?– Alongside, many other configurations
– Amount of fast cache– Level of parallelism– Worker size
![Page 27: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/27.jpg)
Locus Performance Model
27
Navigating cost-performance with different compute and storage configurations is difficult.
![Page 28: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/28.jpg)
28
Navigating cost-performance with different compute and storage configurations is difficult.– How many serverless workers? Which size?
1
10
100
1000
10000
0 0.5 1 1.5 2
Shuf
fle T
ime
(s)
Worker Memory Size (GB)
20GB
Locus Performance Model
![Page 29: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/29.jpg)
29
Navigating cost-performance with different compute and storage configurations is difficult.– How many serverless workers? Which size?
1
10
100
1000
10000
0 0.5 1 1.5 2
Shuf
fle T
ime
(s)
Worker Memory Size (GB)
20GB
1TB
Locus Performance Model
![Page 30: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/30.jpg)
30
Navigating cost-performance with different compute and storage configurations is difficult.– How many serverless workers? Which size?
1
10
100
1000
10000
0 0.5 1 1.5 2
Shuf
fle T
ime
(s)
Worker Memory Size (GB)
20GB200GB1TB
Locus Performance Model
![Page 31: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/31.jpg)
Performance model example: slow-storage only shuffle
31
! – IOPSS – shuffle sizeW – worker size
"2 = $%&%×(
" = max "1, "2
mappertask
mappertask
mappertask
Server
Server
Server
…
S – shuffle sizeB – worker BWP – parallelism
"1 = /0 × 1
![Page 32: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/32.jpg)
Evaluation
32
• Implement Locus on top of PyWren– AWS Lambda– S3 (slow) – Redis (fast)
• Workloads:– CloudSort– TPC-DS– Big data benchmark
• Baseline:– PyWren– Apache Spark on VMs– Redshift
![Page 33: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/33.jpg)
Evaluation
33
Reduce resource usage by up to 59% while achieving comparable performance!
Q1 Q16 Q94 Q95
clus
tert
ime
(cor
e·se
c)
Locus
Spark-58%
-59%
Q1 Q16 Q94 Q95av
erag
e qu
ery
time
(s)
Locus
Spark
TPC-DS
![Page 34: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/34.jpg)
Related Works
34
Pocket (OSDI’18)• New elastic store that scales to application demand
![Page 35: Locus - usenix.org · 3 Serverless Analytics … User function + Data i.e., Map() … F() Results Launch short-lived cloud workers with transparent elasticity and fine-grain usage](https://reader034.vdocuments.us/reader034/viewer/2022050717/5e14657173391f03852f8c2f/html5/thumbnails/35.jpg)
Locus
35
• By judiciously combining slow and fast cloud storage, one can achieve cost-efficient shuffle for serverless analytics.
• Locus achieves this by:– Design algorithms that leverage storage characteristics– A performance model that captures the performance and cost
metrics of shuffle operations.
• 4×-500× performance improvements over baseline and reduce resource usage by up to 59% while achieving comparable performance with traditional analytics