ec-cache load-balanced, low-latency cluster caching with ... · ec-cache: load-balanced,...
TRANSCRIPT
![Page 1: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/1.jpg)
EC-Cache: Load-balanced, Low-latency Cluster Caching with
Online Erasure Coding
Joint work with
Mosharaf Chowdhury, Jack Kosaian (U Michigan) Ion Stoica, Kannan Ramchandran (UC Berkeley)
Rashmi'Vinayak'
UC#Berkeley
![Page 2: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/2.jpg)
Caching'for'data4intensive'clusters
• Data.intensive#clusters#rely#on#distributed, in-memory#caching#for#high#performance#
. Reading#from#memory#orders#of#magnitude#faster#than#from#disk/ssd#
. Example:##Alluxio#(formerly#Tachyon†)
†Li#et#al.#SOCC#2014# 2
![Page 3: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/3.jpg)
Imbalances'prevalent'in'clusters'
Sources#of#imbalance:#
• Skew#in#object#popularity#
• Background#network#imbalance#
• Failures/unavailabiliRes
3
![Page 4: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/4.jpg)
Sources#of#imbalance:#
• Skew#in#object#popularity#
• Background#network#imbalance#
• Failures/unavailabilites
Small#fracRon#of#objects#highly#popular#. Zipf.like#distribuRon##. Top#5%#of#objects#7x#more#popular#than#boWom#75%†#
(Facebook#and#MicrosoY#producRon#cluster#traces)
†Ananthanarayanan#et#al.#NSDI#2012#
Imbalances'prevalent'in'clusters'
4
![Page 5: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/5.jpg)
Sources#of#imbalance:#
• Skew#in#object#popularity#
• Background#network#imbalance#
• Failures/unavailabilites
Some#parts#of#the#network#more#congested#than#others#. RaRo#of#maximum#to#average#uRlizaRon#more#than#4.5x#
with#>#50%#uRlizaRon##
(Facebook#data.analyRcs#cluster)
Imbalances'prevalent'in'clusters'
5
![Page 6: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/6.jpg)
Sources#of#imbalance:#
• Skew#in#object#popularity#
• Background#network#imbalance#
• Failures/unavailabilites
Some#parts#of#the#network#more#congested#than#others#. RaRo#of#maximum#to#average#uRlizaRon#more#than#4.5x#
with#>#50%#uRlizaRon##
(Facebook#data.analyRcs#cluster)
Imbalances'prevalent'in'clusters'
†#Chowdhury#et#al.#SIGCOMM#2013#
. Similar#observaRons#from#other#producRon#clusters†
5
![Page 7: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/7.jpg)
Sources#of#imbalance:#
• Skew#in#object#popularity#
• Background#load#imbalance#
• Failures/unavailabilites
Norm#rather#than#the#excepRon#. median#>#50#machine#unavailability#events#every#day#in#a#
cluster#of#several#thousand#servers†#
(Facebook#data#analyRcs#cluster)
Imbalances'prevalent'in'clusters'
†Rashmi#et#al.#HotStorage#2013 6
![Page 8: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/8.jpg)
➡ Adverse#effects:#4 load#imbalance'
. high#read#latency
Imbalances'prevalent'in'cluster'
Sources#of#imbalance:#
• Skew#in#object#popularity#
• Background#network#imbalance#
• Failures/unavailabiliRes
7
![Page 9: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/9.jpg)
➡ Adverse#effects:#4 load#imbalance'
. high#read#latency
Imbalances'prevalent'in'cluster'
Sources#of#imbalance:#
• Skew#in#object#popularity#
• Background#network#imbalance#
• Failures/unavailabiliRes
Single#copy#in#memory#oYen#not#sufficient#to#get#good#performance
7
![Page 10: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/10.jpg)
Popular'approach:'Selec?ve'Replica?on
• Uses#some#memory#overhead#to#cache#replicas#of#objects#based#on#their#popularity#. more#replicas#for#more#popular#objects
8
![Page 11: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/11.jpg)
Popular'approach:'Selec?ve'Replica?on
• Uses#some#memory#overhead#to#cache#replicas#of#objects#based#on#their#popularity#. more#replicas#for#more#popular#objects
A B
GET A GET B
2x 1x
…Server 1 Server 2 Server 3
8
![Page 12: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/12.jpg)
Popular'approach:'Selec?ve'Replica?on
• Uses#some#memory#overhead#to#cache#replicas#of#objects#based#on#their#popularity#. more#replicas#for#more#popular#objects
A B A
GET A GET AGET B
1x 1x1x
…Server 1 Server 2 Server 3
8
![Page 13: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/13.jpg)
Popular'approach:'Selec?ve'Replica?on
• Uses#some#memory#overhead#to#cache#replicas#of#objects#based#on#their#popularity#. more#replicas#for#more#popular#objects
A B A
GET A GET AGET B
1x 1x1x
• Used#in#data.intensive#clusters†#as#well#as#widely#used#in#key.value#stores#for#many#web.services#such#as#Facebook#Tao‡
…Server 1 Server 2 Server 3
†Ananthanarayanan#et#al.#NSDI#2011,##‡Bronson#et#al.#ATC!20138
![Page 14: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/14.jpg)
Memory'Overhead
Read'performance''
&'Load'balance''
9
![Page 15: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/15.jpg)
Memory'Overhead
Read'performance''
&'Load'balance''
Single'copy''
in'memory
9
![Page 16: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/16.jpg)
Memory'Overhead
Read'performance''
&'Load'balance''
Single'copy''
in'memory
Selec?ve'
replica?on
9
![Page 17: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/17.jpg)
Memory'Overhead
Read'performance''
&'Load'balance''
Single'copy''
in'memory
Selec?ve'
replica?on
EC4Cache
9
![Page 18: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/18.jpg)
Memory'Overhead
Read'performance''
&'Load'balance''
Single'copy''
in'memory
Selec?ve'
replica?on
EC4Cache
“Erasure'Coding”
9
![Page 19: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/19.jpg)
Quick'primer'on'erasure'coding
10
![Page 20: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/20.jpg)
Quick'primer'on'erasure'coding
• Takes#in#k data units#and#creates#r##“parity” units
10
![Page 21: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/21.jpg)
Quick'primer'on'erasure'coding
• Takes#in#k data units#and#creates#r##“parity” units
• Any$k#of#the#(k+r)#units#are#sufficient#to#decode#the#original#k#data#units
10
![Page 22: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/22.jpg)
Quick'primer'on'erasure'coding
• Takes#in#k data units#and#creates#r##“parity” units
• k = 5 • r = 4
• Any$k#of#the#(k+r)#units#are#sufficient#to#decode#the#original#k#data#units
data units parity units
d1 d2 d3 d4 d5 p1 p2 p3 p4
10
![Page 23: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/23.jpg)
Quick'primer'on'erasure'coding
• Takes#in#k data units#and#creates#r##“parity” units
• k = 5 • r = 4
• Any$k#of#the#(k+r)#units#are#sufficient#to#decode#the#original#k#data#units
data units parity units
Read
d1 d2 d3 d4 d5 p1 p2 p3 p4
10
![Page 24: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/24.jpg)
Quick'primer'on'erasure'coding
• Takes#in#k data units#and#creates#r##“parity” units
• k = 5 • r = 4
• Any$k#of#the#(k+r)#units#are#sufficient#to#decode#the#original#k#data#units
data units parity units
Read
Decode
d1 d2 d3 d4 d5 p1 p2 p3 p4
d1 d2 d3 d4 d5
10
![Page 25: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/25.jpg)
Quick'primer'on'erasure'coding
• Takes#in#k data units#and#creates#r##“parity” units
• k = 5 • r = 4
• Any$k#of#the#(k+r)#units#are#sufficient#to#decode#the#original#k#data#units
data units parity units
Read
d1 d2 d3 d4 d5 p1 p2 p3 p4
10
![Page 26: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/26.jpg)
Quick'primer'on'erasure'coding
• Takes#in#k data units#and#creates#r##“parity” units
• k = 5 • r = 4
• Any$k#of#the#(k+r)#units#are#sufficient#to#decode#the#original#k#data#units
data units parity units
Read
d1 d2 d3 d4 d5 p1 p2 p3 p4
Decode
d1 d2 d3 d4 d5
10
![Page 27: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/27.jpg)
EC4Cache'bird’s'eye'view:'Writes
11
![Page 28: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/28.jpg)
EC4Cache'bird’s'eye'view:'Writes
…
XPut
Caching#servers
11
![Page 29: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/29.jpg)
EC4Cache'bird’s'eye'view:'Writes
…
X
k#=#2Splitd2
Put
d1
• Object#split#into#k#data#units
Caching#servers
11
![Page 30: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/30.jpg)
EC4Cache'bird’s'eye'view:'Writes
…
k#=#2#r#=#1
X
Encode
p1
k#=#2Splitd2
d1 d2
Put
d1
• Object#split#into#k#data#units
• Encoded#to#generate#r#parity#units
Caching#servers
11
![Page 31: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/31.jpg)
EC4Cache'bird’s'eye'view:'Writes
…
k#=#2#r#=#1
X
Encode
p1
k#=#2Splitd2
d1 d2
p1d1 d2
Put
d1
• Object#split#into#k#data#units
• Encoded#to#generate#r#parity#units
• (k+r)#units#cached#on#disRnct#servers#chosen#uniformly#at#random Caching#servers
11
![Page 32: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/32.jpg)
EC4Cache'bird’s'eye'view:'Reads
12
![Page 33: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/33.jpg)
EC4Cache'bird’s'eye'view:'Reads
• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#
. “AddiRonal#reads”
• Use#the#first#k#units#that#arrive
12
![Page 34: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/34.jpg)
EC4Cache'bird’s'eye'view:'Reads
… k#=#2#r#=#1
p1d1 d2
Get X
• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#
. “AddiRonal#reads”
• Use#the#first#k#units#that#arrive
Caching#servers
12
![Page 35: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/35.jpg)
EC4Cache'bird’s'eye'view:'Reads
… k#=#2#r#=#1
Δ#=#1#k#+#Δ#=#3
Read units
p1d1 d2
Get X
• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#
. “AddiRonal#reads”
• Use#the#first#k#units#that#arrive
Caching#servers
12
![Page 36: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/36.jpg)
EC4Cache'bird’s'eye'view:'Reads
… k#=#2#r#=#1
Δ#=#1#k#+#Δ#=#3
Read units
p1d1 d2
Get X
• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#
. “AddiRonal#reads”
• Use#the#first#k#units#that#arrive
Caching#servers
12
![Page 37: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/37.jpg)
EC4Cache'bird’s'eye'view:'Reads
… k#=#2#r#=#1
Δ#=#1#k#+#Δ#=#3
Read units
p1d1 d2
d2
Get X
p1
• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#
. “AddiRonal#reads”
• Use#the#first#k#units#that#arrive
Caching#servers
12
![Page 38: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/38.jpg)
EC4Cache'bird’s'eye'view:'Reads
… k#=#2#r#=#1
Δ#=#1#k#+#Δ#=#3
Read units
p1d1 d2
d2
Get X
p1
• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#
. “AddiRonal#reads”
• Use#the#first#k#units#that#arrive
Caching#servers
12
![Page 39: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/39.jpg)
EC4Cache'bird’s'eye'view:'Reads
… k#=#2#r#=#1
Decode
Δ#=#1#k#+#Δ#=#3
Read units
d1 d2
p1d1 d2
d2
Get X
p1
• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#
. “AddiRonal#reads”
• Use#the#first#k#units#that#arrive
• Decode#the#data#units
Caching#servers
12
![Page 40: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/40.jpg)
EC4Cache'bird’s'eye'view:'Reads
… k#=#2#r#=#1
Decode
Δ#=#1#k#+#Δ#=#3
Read units
d1 d2
p1d1 d2
d2
X
Get X
p1
Combine
• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#
. “AddiRonal#reads”
• Use#the#first#k#units#that#arrive
• Decode#the#data#units
• Combine#the#decoded#units
Caching#servers
12
![Page 41: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/41.jpg)
Erasure'coding:'How'does'it'help?
13
![Page 42: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/42.jpg)
Erasure'coding:'How'does'it'help?
1. Finer'control'over'memory'overhead'
. SelecRve#replicaRon#allows#only#integer#control#
. Erasure#coding#allows#fracRonal#control#
. E.g.,#k#=#10#allows#control#in#of#mulRples#of#0.1
13
![Page 43: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/43.jpg)
Erasure'coding:'How'does'it'help?
1. Finer'control'over'memory'overhead'
. SelecRve#replicaRon#allows#only#integer#control#
. Erasure#coding#allows#fracRonal#control#
. E.g.,#k#=#10#allows#control#in#of#mulRples#of#0.1
2. Object'spliRng'helps'in'load'balancing'
. Smaller#granularity#reads#help#to#smoothly#spread#load#
. Analysis#on#a#certain#simplified#model:Theorem 1 For the setting described above:
Var(LEC-Cache)
Var(LSelective Replication)=
1
k.
Proof: Let w > 0 denote the popularity of each of thefiles. The random variable LSelective Replication is distributedas a Binomial random variable with F trials and successprobability 1
S , scaled by w. On the other hand, LEC-Cacheis distributed as a Binomial random variable with kF tri-als and success probability 1
S , scaled by wk . Thus we have
Var(LEC-Cache)
Var(LSelective Replication)=
�wk
�2
(kF ) 1
S
�1� 1
S
�
w2F 1
S
�1� 1
S
� =1
k,
thereby proving our claim. ⇤Intuitively, the splitting action of EC-Cache leads to
a smoother load distribution in comparison to selectivereplication. One can further extend Theorem 1 to accom-modate a skew in the popularity of the objects. Such anextension leads to an identical result on the ratio of thevariances. Additionally, the fact that each split of an ob-ject in EC-Cache is placed on a unique server furtherhelps in evenly distributing the load, leading to even bet-ter load balancing.
5.2 Impact on Latency
Next, we focus on how object splitting impacts read la-tencies. Under selective replication, a read request foran object is served by reading the object from a server.We first consider naive EC-Cache without any additionalreads. Under naive EC-Cache, a read request for an ob-ject is served by reading k of its splits in parallel fromk servers and performing a decoding operation. Let usalso assume that the time taken for decoding is negligi-ble compared to the time taken to read the splits.
Intuitively, one may expect that reading splits in paral-lel from different servers will reduce read latencies dueto the parallelism. While this reduction indeed occurs forthe average/median latencies, the tail latencies behave inan opposite manner due to the presence of stragglers –one slow split read delays the completion of the entireread request.
In order to obtain a better understanding of the afore-mentioned phenomenon, let us consider the followingsimplified model. Consider a parameter p 2 [0, 1] andassume that for any request, a server becomes a stragglerwith probability p, independent of all else. There are twoprimary contributing factors to the distributions of the la-tencies under selective replication and EC-Cache:
(a) Proportion of stragglers: Under selective replica-tion, the fraction of requests that hit stragglers is p. Onthe other hand, under EC-Cache, a read request for anobject will face a straggler if any of the k servers fromwhere splits are being read becomes a straggler. Hence,
a higher fraction�1� (1� p)k
�of read requests can hit
stragglers under naive EC-Cache.(b) Latency conditioned on absence/presence of strag-
glers: If a read request does not face stragglers, the timetaken for serving a read request is significantly smallerunder EC-Cache as compared to selective replication be-cause splits can be read in parallel. On the other hand, inthe presence of a straggler in the two scenarios, the timetaken for reading under EC-Cache is about as large asthat under selective replication.
Putting the aforementioned two factors together we getthat the relatively higher likelihood of a straggler underEC-Cache increases the number of read requests incur-ring a higher latency. The read requests that do not en-counter any straggler incur a lower latency as comparedto selective replication. These two factors explain the de-crease in the median and mean latencies, and the increasein the tail latencies.
In order to alleviate the impact on tail latencies, weuse additional reads and late binding in EC-Cache. Reed-Solomon codes have the property that any k of the collec-tion of all splits of an object suffice to decode the object.We exploit this property by reading more than k splitsin parallel, and using the k splits that are read first. It iswell known that such additional reads help in mitigatingthe straggler problem and alleviate the affect on tail la-tencies [36, 82].
6 Evaluation
We evaluated EC-Cache through a series of experimentson Amazon EC2 [1] clusters using synthetic workloadsand traces from Facebook production clusters. The high-lights of the evaluation results are:• For skewed popularity distributions, EC-Cache im-
proves load balancing over selective replication by3.3⇥ while using the same amount of memory. EC-Cache also decreases the median latency by 2.64⇥and the 99.9th percentile latency by 1.79⇥ (§6.2).
• For skewed popularity distributions and in the pres-ence of background load imbalance, EC-Cache de-creases the 99.9th percentile latency w.r.t. selectivereplication by 2.56⇥ while maintaining the samebenefits in median latency and load balancing as inthe case without background load imbalance (§6.3).
• For skewed popularity distributions and in the pres-ence of server failures, EC-Cache provides a gracefuldegradation as opposed to the significant degradationin tail latency faced by selective replication. Specif-ically, EC-Cache decreases the 99.9th percentile la-tency w.r.t. selective replication by 2.8⇥ (§6.4).
• EC-Cache’s improvements over selective replicationincrease as object sizes increase in production traces;
13
![Page 44: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/44.jpg)
Erasure'coding:'How'does'it'help?
3. Object'spliRng'reduces'median'latency'but'hurts'tail'
latency'
. Read#parallelism#helps#reduce#median#latency#
. Straggler#effect#hurts#tail#latency#(if#no#addiRonal#reads)
14
![Page 45: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/45.jpg)
Erasure'coding:'How'does'it'help?
3. Object'spliRng'reduces'median'latency'but'hurts'tail'
latency'
. Read#parallelism#helps#reduce#median#latency#
. Straggler#effect#hurts#tail#latency#(if#no#addiRonal#reads)
4. “Any'k'out'of'(k+r)”'property'helps'to'reduce'tail'latency'
. Read#from#(k#+#Δ)#and#use#the#first#k#that#arrive##
. Δ#=#1#oYen#sufficient#to#reign#in#tail#latency
14
![Page 46: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/46.jpg)
Design'considera?ons
15
![Page 47: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/47.jpg)
Design'considera?ons
Storage#systems EC.Cache
• Space.efficient#fault#tolerance • Reduce#read#latency#
• Load#balance
1. Purpose'of'erasure'codes
15
![Page 48: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/48.jpg)
Design'considera?ons
Storage#systems EC.Cache
2. Choice'of'erasure'code
16
![Page 49: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/49.jpg)
Design'considera?ons
Storage#systems EC.Cache
2. Choice'of'erasure'code
†Rashmi#et#al.#SIGCOMM#2014,##Sathiamoorthy#et#al.#VLDB#2013,#Huang#et#al.#ATC!2012
• OpRmize#resource#usage#during#reconstrucRon#operaRons†#
• Some#codes#do#not#have###“any#k#out#of#(k+r)”#property
16
![Page 50: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/50.jpg)
Design'considera?ons
Storage#systems EC.Cache
2. Choice'of'erasure'code
†Rashmi#et#al.#SIGCOMM#2014,##Sathiamoorthy#et#al.#VLDB#2013,#Huang#et#al.#ATC!2012
• No#reconstrucRon#operaRons#in#caching#layer;#data#persisted#in#underlying#storage#
• “Any#k#out#of#(k+r)”#property#helps#in#load#balancing#and#reducing#latency#when#reading#objects
• OpRmize#resource#usage#during#reconstrucRon#operaRons†#
• Some#codes#do#not#have###“any#k#out#of#(k+r)”#property
16
![Page 51: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/51.jpg)
Design'considera?ons
Storage#systems EC.Cache
3. How'do'we'use'erasure'coding:'across'vs.'within'objects
17
![Page 52: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/52.jpg)
Design'considera?ons
Storage#systems EC.Cache
3. How'do'we'use'erasure'coding:'across'vs.'within'objects
• Some#systems#encode#across#objects#(e.g.,#HDFS.RAID);#some#within#(e.g.,#Ceph)#
• Does#not#affect#fault#tolerance#
17
![Page 53: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/53.jpg)
Design'considera?ons
Storage#systems EC.Cache
3. How'do'we'use'erasure'coding:'across'vs.'within'objects
• Need#to#encode#within#objects#. To#spread#load#across#both#data#&#parity#
• Encoding#across:#Very#high#BW#overhead#for#reading#object#using#pariRes†
• Some#systems#encode#across#objects#(e.g.,#HDFS.RAID);#some#within#(e.g.,#Ceph)#
• Does#not#affect#fault#tolerance#
†Rashmi#et#al.#SIGCOMM#2014,##HotStorage#2013 17
![Page 54: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/54.jpg)
Implementa?on
• EC.Cache#on#top#of#Alluxio#(formerly#Tachyon)#
. Backend#caching#servers:#cache#data#—#unaware#of#erasure#coding##
. EC.Cache#client#library:#all#read/write#logic#handled
18
![Page 55: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/55.jpg)
Implementa?on
• EC.Cache#on#top#of#Alluxio#(formerly#Tachyon)#
. Backend#caching#servers:#cache#data#—#unaware#of#erasure#coding##
. EC.Cache#client#library:#all#read/write#logic#handled
• Reed.Solomon#code#
. Any#k#out#of#(k+r)#property
18
![Page 56: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/56.jpg)
Implementa?on
• EC.Cache#on#top#of#Alluxio#(formerly#Tachyon)#
. Backend#caching#servers:#cache#data#—#unaware#of#erasure#coding##
. EC.Cache#client#library:#all#read/write#logic#handled
• Reed.Solomon#code#
. Any#k#out#of#(k+r)#property
• Intel#ISA.L#hardware#acceleraRon#library##
. Fast#encoding#and#decoding
18
![Page 57: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/57.jpg)
Evalua?on'set4up
19
![Page 58: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/58.jpg)
Evalua?on'set4up
• Amazon#EC2
• 25#backend#caching#servers#and#30#client#servers#
19
![Page 59: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/59.jpg)
Evalua?on'set4up
• Amazon#EC2
• 25#backend#caching#servers#and#30#client#servers#
• Object#popularity:#Zipf#distribuRon#with#high#skew
19
![Page 60: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/60.jpg)
Evalua?on'set4up
• Amazon#EC2
• 25#backend#caching#servers#and#30#client#servers#
• Object#popularity:#Zipf#distribuRon#with#high#skew
• EC.Cache#uses#k#=#10,##Δ#=#1#
. BW#overhead#=#10%
19
![Page 61: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/61.jpg)
Evalua?on'set4up
• Amazon#EC2
• 25#backend#caching#servers#and#30#client#servers#
• Object#popularity:#Zipf#distribuRon#with#high#skew
• EC.Cache#uses#k#=#10,##Δ#=#1#
. BW#overhead#=#10%
• Varying#object#sizes
19
![Page 62: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/62.jpg)
Load'balancing
0
100
200
300
400
Dat
a R
ead
(GB
)
Servers Sorted by Load 0 50
100 150 200 250 300 350 400
Dat
a R
ead
(GB
)
Servers Sorted by Load
SelecRve#ReplicaRon EC.Cache
20
![Page 63: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/63.jpg)
Load'balancing
0
100
200
300
400
Dat
a R
ead
(GB
)
Servers Sorted by Load 0 50
100 150 200 250 300 350 400
Dat
a R
ead
(GB
)
Servers Sorted by Load
SelecRve#ReplicaRon EC.Cache
• Percent#imbalance#metric:
e.g., 5.5⇥ at median for 100 MB objects with an up-ward trend (§6.5).
• EC-Cache outperforms selective replication across awide range of values of k, r, and � (§6.6).
6.1 Methodology
Cluster Unless otherwise specified, our experimentsuse 55 c4.8xlarge EC2 instances. 25 of these machinesact as the backend servers for EC-Cache, each with 8GB cache space, and 30 machines generate thousandsof read requests to EC-Cache. All machines were in thesame Amazon Virtual Private Cloud (VPC) with 10 Gbpsenhanced networking enabled; we observed around 4-5 Gbps bandwidth between machines in the VPC usingiperf.
As mentioned earlier, we implemented EC-Cache onAlluxio [56], which, in turn, used Amazon S3 [2] as itspersistence layer and runs on the 25 backend servers. Weused DFS-Perf [5] to generate the workload on the 30client machines.
Metrics Our primary metrics for comparison are la-tency in reading objects and load imbalance across thebackend servers.
Given a workload, we consider mean, median, andhigh-percentile latencies. We measure improvements inlatency as:
Latency Improvement =Latency w/ Compared Scheme
Latency w/ EC-Cache
If the value of this “latency improvement” is greater (orsmaller) than one, EC-Cache is better (or worse).
We measure load imbalance using the percent imbal-ance metric � defined as follows:
� =
✓Lmax
� Lavg?
Lavg?
◆⇤ 100, (1)
where Lmax
is the load on the server which is maximallyloaded and Lavg? is the load on any server under an oraclescheme, where the total load is equally distributed amongall the servers without any overhead. � measures thepercentage of additional load on the maximally loadedserver as compared to the ideal average load. BecauseEC-Cache operates in the bandwidth-limited regime, theload on a server translates to the total amount of data readfrom that server. Lower values of � are better. Note thatthe percent imbalance metric takes into account the ad-ditional load introduced by EC-Cache due to additionalreads.
Setup We consider a Zipf distribution for the popular-ity of objects, which is common in many real-world ob-ject popularity distributions [20, 23, 56]. Specifically, weconsider the Zipf parameter to be 0.9 (that is, high skew).
Unless otherwise specified, we allow both selectivereplication and EC-Cache to use 15% memory overhead
245
238 286 43
5
1226
99 93 141 22
9
478
0
200
400
600
800
1000
1200
1400
Mean Median 95th 99th 99.9th
Rea
d L
aten
cy (m
s)
Selective Replication
EC-Cache
242
238
283 340
881
96 90 134 193
492
0
200
400
600
800
1000
1200
1400
Mean Median 95th 99th 99.9th
Rea
d L
aten
cy (m
s)
Selective Replication
EC-Cache
Figure 8: Read latencies under skewed popularity of objects.
to handle the skew in the popularity of objects. Selec-tive replication uses all the allowed memory overheadfor handling popularity skew. Unless otherwise specified,EC-Cache uses k = 10 and � = 1. Thus, 10% of the al-lowed memory overhead is used to provide one parityto each object. The remaining 5% is used for handlingpopularity skew. Both schemes make use of the skew in-formation to decide how to allocate the allowed memoryamong different objects in an identical manner: the num-ber of replicas for an object under selective replicationand the number of additional parities for an object underEC-Cache are calculated so as to flatten out the popu-larity skew to the extent possible starting from the mostpopular object, until the memory budget is exhausted.
Moreover, both schemes use uniform random place-ment policy to evenly distribute objects (splits in case ofEC-Cache) across memory servers.
Unless otherwise specified, the size of each objectconsidered in these experiments is 40 MB. We presentresults for varying object sizes observed in the Facebooktrace in Section 6.5. In Section 6.6, we perform a sensi-tivity analysis with respect to all the above parameters.
Furthermore, we note that while the evaluations pre-sented here are for the setting of high skew in objectpopularity, EC-Cache outperforms selective replicationin scenarios with low skew in object popularity as well.Under high skew, EC-Cache offers significant benefitsin terms of load balancing and read latency. Under lowskew, while there is not much to improve in load balanc-ing, EC-Cache will still provide latency benefits.
6.2 Skew Resilience
We begin by evaluating the performance of EC-Cache inthe presence of skew in object popularity.
Latency Characteristics Figure 8 compares the mean,median, and tail latencies of EC-Cache and selectivereplication. We observe that EC-Cache improves medianand mean latencies by 2.64⇥ and 2.52⇥, respectively.EC-Cache outperforms selective replication at high per-
20
![Page 64: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/64.jpg)
Load'balancing
0
100
200
300
400
Dat
a R
ead
(GB
)
Servers Sorted by Load 0 50
100 150 200 250 300 350 400
Dat
a R
ead
(GB
)
Servers Sorted by Load
SelecRve#ReplicaRon EC.Cache
λSR = 43.45% λEC = 13.14%
• Percent#imbalance#metric:
e.g., 5.5⇥ at median for 100 MB objects with an up-ward trend (§6.5).
• EC-Cache outperforms selective replication across awide range of values of k, r, and � (§6.6).
6.1 Methodology
Cluster Unless otherwise specified, our experimentsuse 55 c4.8xlarge EC2 instances. 25 of these machinesact as the backend servers for EC-Cache, each with 8GB cache space, and 30 machines generate thousandsof read requests to EC-Cache. All machines were in thesame Amazon Virtual Private Cloud (VPC) with 10 Gbpsenhanced networking enabled; we observed around 4-5 Gbps bandwidth between machines in the VPC usingiperf.
As mentioned earlier, we implemented EC-Cache onAlluxio [56], which, in turn, used Amazon S3 [2] as itspersistence layer and runs on the 25 backend servers. Weused DFS-Perf [5] to generate the workload on the 30client machines.
Metrics Our primary metrics for comparison are la-tency in reading objects and load imbalance across thebackend servers.
Given a workload, we consider mean, median, andhigh-percentile latencies. We measure improvements inlatency as:
Latency Improvement =Latency w/ Compared Scheme
Latency w/ EC-Cache
If the value of this “latency improvement” is greater (orsmaller) than one, EC-Cache is better (or worse).
We measure load imbalance using the percent imbal-ance metric � defined as follows:
� =
✓Lmax
� Lavg?
Lavg?
◆⇤ 100, (1)
where Lmax
is the load on the server which is maximallyloaded and Lavg? is the load on any server under an oraclescheme, where the total load is equally distributed amongall the servers without any overhead. � measures thepercentage of additional load on the maximally loadedserver as compared to the ideal average load. BecauseEC-Cache operates in the bandwidth-limited regime, theload on a server translates to the total amount of data readfrom that server. Lower values of � are better. Note thatthe percent imbalance metric takes into account the ad-ditional load introduced by EC-Cache due to additionalreads.
Setup We consider a Zipf distribution for the popular-ity of objects, which is common in many real-world ob-ject popularity distributions [20, 23, 56]. Specifically, weconsider the Zipf parameter to be 0.9 (that is, high skew).
Unless otherwise specified, we allow both selectivereplication and EC-Cache to use 15% memory overhead
245
238 286 43
5
1226
99 93 141 22
9
478
0
200
400
600
800
1000
1200
1400
Mean Median 95th 99th 99.9th
Rea
d L
aten
cy (m
s)
Selective Replication
EC-Cache
242
238
283 340
881
96 90 134 193
492
0
200
400
600
800
1000
1200
1400
Mean Median 95th 99th 99.9th
Rea
d L
aten
cy (m
s)
Selective Replication
EC-Cache
Figure 8: Read latencies under skewed popularity of objects.
to handle the skew in the popularity of objects. Selec-tive replication uses all the allowed memory overheadfor handling popularity skew. Unless otherwise specified,EC-Cache uses k = 10 and � = 1. Thus, 10% of the al-lowed memory overhead is used to provide one parityto each object. The remaining 5% is used for handlingpopularity skew. Both schemes make use of the skew in-formation to decide how to allocate the allowed memoryamong different objects in an identical manner: the num-ber of replicas for an object under selective replicationand the number of additional parities for an object underEC-Cache are calculated so as to flatten out the popu-larity skew to the extent possible starting from the mostpopular object, until the memory budget is exhausted.
Moreover, both schemes use uniform random place-ment policy to evenly distribute objects (splits in case ofEC-Cache) across memory servers.
Unless otherwise specified, the size of each objectconsidered in these experiments is 40 MB. We presentresults for varying object sizes observed in the Facebooktrace in Section 6.5. In Section 6.6, we perform a sensi-tivity analysis with respect to all the above parameters.
Furthermore, we note that while the evaluations pre-sented here are for the setting of high skew in objectpopularity, EC-Cache outperforms selective replicationin scenarios with low skew in object popularity as well.Under high skew, EC-Cache offers significant benefitsin terms of load balancing and read latency. Under lowskew, while there is not much to improve in load balanc-ing, EC-Cache will still provide latency benefits.
6.2 Skew Resilience
We begin by evaluating the performance of EC-Cache inthe presence of skew in object popularity.
Latency Characteristics Figure 8 compares the mean,median, and tail latencies of EC-Cache and selectivereplication. We observe that EC-Cache improves medianand mean latencies by 2.64⇥ and 2.52⇥, respectively.EC-Cache outperforms selective replication at high per-
>'3x'reduc?on'in'load'imbalance'metric20
![Page 65: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/65.jpg)
Read'latency
242
238
283 340
881
96
90
134 193
492
0 200 400 600 800
1000 1200 1400
Mean Median 95th 99th 99.9th
Rea
d L
aten
cy (m
s) Selective Replication
EC-Cache
21
![Page 66: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/66.jpg)
Read'latency
• Median:#2.64x#improvement#
• 99th#and#99.9th:#~1.75x#improvement
242
238
283 340
881
96
90
134 193
492
0 200 400 600 800
1000 1200 1400
Mean Median 95th 99th 99.9th
Rea
d L
aten
cy (m
s) Selective Replication
EC-Cache
21
![Page 67: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/67.jpg)
Varying'object'sizes
5.5x improvement for 100MB
More'improvement'for'larger'object'sizes
0
500
1000
1500
2000
10 30 50 70 90
Rea
d L
aten
cy (m
s)
Object Size (MB)
EC-Cache (Median) Selective Replication (Median)
Median#latency
0
500
1000
1500
2000
10 30 50 70 90
Rea
d L
aten
cy (m
s)
Object Size (MB)
EC-Cache (99th) Selective Replication (99th)
Tail#latency
3.85x improvement for 100 MB
22
![Page 68: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/68.jpg)
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80
CD
F
Read Latency (ms)
EC-Cache, �=0EC-Cache, �=1Selective Replication
Role'of'addi?onal'reads'(Δ)
23
![Page 69: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/69.jpg)
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80
CD
F
Read Latency (ms)
EC-Cache, �=0EC-Cache, �=1Selective Replication
Significant degradation in tail latency without additional reads (i.e., Δ = 0)
Role'of'addi?onal'reads'(Δ)
23
![Page 70: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/70.jpg)
Addi?onal'evalua?ons'in'the'paper
• With#background#network#imbalance##
• With#server#failures#
• Write#performance#
• SensiRvity#analysis#for#all#parameters
24
![Page 71: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/71.jpg)
Summary
• EC.Cache#
. Cluster#cache#employing#erasure#coding#for#load#balancing#and#reducing#read#latencies#
. Demonstrates#new#applicaRon#and#new#goals#for#which#erasure#coding#is#highly#effecRve
![Page 72: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/72.jpg)
Summary
• EC.Cache#
. Cluster#cache#employing#erasure#coding#for#load#balancing#and#reducing#read#latencies#
. Demonstrates#new#applicaRon#and#new#goals#for#which#erasure#coding#is#highly#effecRve
• ImplementaRon#on#Alluxio#
• EvaluaRon#. Load#balancing:#>#3x#improvement#. Median#latency:#>#5x#improvement##. Tail#latency:##>#3x#improvement
![Page 73: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,](https://reader034.vdocuments.us/reader034/viewer/2022042621/5f66c594711684327b628279/html5/thumbnails/73.jpg)
Summary
• EC.Cache#
. Cluster#cache#employing#erasure#coding#for#load#balancing#and#reducing#read#latencies#
. Demonstrates#new#applicaRon#and#new#goals#for#which#erasure#coding#is#highly#effecRve
• ImplementaRon#on#Alluxio#
• EvaluaRon#. Load#balancing:#>#3x#improvement#. Median#latency:#>#5x#improvement##. Tail#latency:##>#3x#improvement
Thanks!