services computing technology and system lab ares a high … · 2019-08-12 · services computing...
TRANSCRIPT
Services Computing Technology and System Lab
Cluster and Grid Computing Lab
a High Performance and Fault-tolerant Distributed Stream Processing SystemAres
Changfu [email protected]
Joint work with JingJing Zhan, Hanhua Chen, Jie Tan & Hai Jin{zjj, chen, tjmaster, hjin}@hust.edu.cn
Cluster and Grid Computing LabServices Computing Technology and System LabSchool of Compute Science and TechnologyHuazhong University of Science and Technology, Wuhan, 430074, Chinahttp://grid.hust.edu.cn/
01
Real-time Stream Processing
Cluster and Grid Computing LabServices Computing Technology and System Lab Ares
Use CasesE-commerce RecommendationAnomaly Detection
Ecosystem
02
Real-time Stream Processing
Cluster and Grid Computing LabServices Computing Technology and System Lab Ares
Use CasesE-commerce RecommendationAnomaly Detection
Requirements
High Availability
Low LatencyExtract value from data streams in real-time
Failures are unavailable for long-time running applications
03
Low Latency Vs. High Availability
Cluster and Grid Computing LabServices Computing Technology and System Lab Ares
Rack 1
Switch 1
…
Rack 3
Switch 3
…
Rack 2
Switch 2
…
Net
wor
k To
polo
gy
1
2
O1
3
4
O2
5
6
O3
8
O4
7
App
licat
ion
Topo
logy
04
Low Latency Vs. High Availability
Cluster and Grid Computing LabServices Computing Technology and System Lab Ares
Rack 1
Switch 1
…
Rack 3
Switch 3
…
Rack 2
Switch 2
…
Net
wor
k To
polo
gy
1
2
O1
3
4
O2
5
6
O3
8
O4
7
App
licat
ion
Topo
logy
Low Latency
1 5
Node A
Latency
DEBS’13, CIKM’14, ICDCS’14, Middleware’15, INFOCOM’16
Elaborated task allocation schemes
Co-locate upstream and downstream task pairs
05
Low Latency Vs. High Availability
Cluster and Grid Computing LabServices Computing Technology and System Lab Ares
Rack 1
Switch 1
…
Rack 3
Switch 3
…
Rack 2
Switch 2
…
Net
wor
k To
polo
gy
1
2
O1
3
4
O2
5
6
O3
8
O4
7
App
licat
ion
Topo
logy
Low Latency
1 5
Node A
Latency
Recovery
DEBS’13, CIKM’14, ICDCS’14, Middleware’15, INFOCOM’16
Elaborated task allocation schemes
downstream task must wait upstream task
Rack 1
1 5
5 8
3
5
Cascaded waiting
5
8
1 3 4Co-locate upstream and downstream task pairs
06
Challenge: Exploit Task Dependency
Cluster and Grid Computing LabServices Computing Technology and System Lab Ares
Rack 1
Switch 1
…
Rack 3
Switch 3
…
Rack 2
Switch 2
…
Net
wor
k To
polo
gy
1
2
O1
3
4
O2
5
6
O3
8
O4
7
App
licat
ion
Topo
logy
Recovery
Latency1
Rack A
5
Rack B
1 5
Node A
Latency
Recovery
High Availability
Best trade-off between low latency and high availability via exploiting task dependency?
Low Latency
07
Ares’s Stream Latency Model
Cluster and Grid Computing LabServices Computing Technology and System Lab Ares
Idea: Divide the application topology into multiple source-sink paths
1
2
O1
3
4
O2
5
6
O3
8
O4
7
08
Ares’s Stream Latency Model
Cluster and Grid Computing LabServices Computing Technology and System Lab Ares
Idea: Divide the application topology into multiple source-sink paths
Source
Sink
# source-sink path2 ∗ 3 + 2 ∗ 3 = 12
1
2
O1
3
4
O2
5
6
O3
8
O4
7
09
Ares’s Stream Latency Model
Cluster and Grid Computing LabServices Computing Technology and System Lab Ares
Idea: Divide the application topology into multiple source-sink paths
Source
Sink
latency(1→5→8)= 0.1 + 0.4 + 0.5 + (0.3 + 0.2) = 1.4
1 5 8
A B Ct1
t5
t8
0.1 0.3
0.5
0.2
0.3
0.4 0.7
0.2 0.5
Node A Node B Node C
Processing Time Transferring Time
A
B
C
0.3
0.1
0.2
85
# source-sink path2 ∗ 3 + 2 ∗ 3 = 12
1
1 ( )| | p P
latency pP ∈∑
1
2
O1
3
4
O2
5
6
O3
8
O4
7
10
Ares’s Stream Recovery Model
Cluster and Grid Computing LabServices Computing Technology and System Lab Ares
Rack 1
Switch 1
…
Rack 3
Switch 3
…
Rack 2
Switch 2
…
Idea: Exploit the dependency between upstream and downstream tasks for the rack failure
Task dependency is a main challenge for failure recovery.[NSDI’16,StreamScope]
11
Ares’s Stream Recovery Model
Cluster and Grid Computing LabServices Computing Technology and System Lab Ares
Rack 1
Switch 1
…
Rack 3
Switch 3
Rack 2
Switch 2
…
Idea: Exploit the dependency between upstream and downstream tasks for the rack failure
1 5
1 6
3
5 recovery(rack 3)= 𝑐𝑐(𝑤𝑤15 +𝑤𝑤16 +𝑤𝑤35 + 𝑤𝑤58)
𝑤𝑤58 =4
12= 0.33
Task dependency is a main challenge for failure recovery.[NSDI’16,StreamScope]
1
2
O1
3
4
O2
5
6
O3
8
O4
7
The recovery time is proportional to the sum of weights of task pairs
1 ( )| | r R
recovery rR ∈∑
12
Fault Tolerant Scheduler (FTS) Problem
Cluster and Grid Computing LabServices Computing Technology and System Lab Ares
Definition: Given an application topology G(V, E), a task set T, a node set N, a rack set R, and a network topology ψ, find a task allocation π that minimizes the allocation cost u(π).
1 1( ) ( ) (1 ) ( )| | | |p P r R
u W latency p W recovery rP R
π∈ ∈
= + −∑ ∑
, ( ) ( )( ) ( , ) ( , )
ij ij i j
t t ij i j ij ijt T e E e E
t d wψ π ψ π
µ π α λ π β π π γ∈ ∈ ∈ =
= + +∑ ∑ ∑
processing cost transferring cost recovering cost
Gen
eral
izat
ion
13
The Nirvana algorithm
Cluster and Grid Computing LabServices Computing Technology and System Lab Ares
Think like a player: When we can’t solve a problem from a Holisticperspective, why not solve it from an individual perspective.
Observation: the allocation decision of a task only depends on the decisions of its neighbor tasks
Game Theory Players
Strategies
Cost Function
FTS Game The task set T
The node set N
Individual allocation cost 𝜃𝜃𝑡𝑡(𝜋𝜋)
, ( ) ( )( ) ( , ) ( , )
ij ij i j
t t ij i j ij ijt T e E e E
t d wψ π ψ π
µ π α λ π β π π γ∈ ∈ ∈ =
= + +∑ ∑ ∑
( ) ( ), ( ) ( )
1 1( ) ( , ) ( , )2 2
i t
t t t ij i t it iti neighbor t i neighbor t
t d wψ π ψ π
θ π α λ π β π π γ∈ ∈ =
= + +∑ ∑
( ) ( )tt T
µ π θ π∈
=∑
14
The Nirvana algorithm
Cluster and Grid Computing LabServices Computing Technology and System Lab Ares
Think like a player: When we can’t solve a problem from a holisticperspective, why not solve it from an individual perspective.
Observation: the allocation decision of a task only depends on the decisions of its neighbor tasks
?
15
Theoretical Analysis Results
Cluster and Grid Computing LabServices Computing Technology and System Lab Ares
Theorem 1There exists Nash equilibrium for the FTS game.
Theorem 2Our design achieves a 2-approximation ratio for the FTS problem.
Theorem 3The upper bound of the number of rounds to converge to the Nash
equilibrium for the FTS game is h(X+Y+Z).
Theorem 4The computation complexity of our design is 𝑂𝑂(2𝜉𝜉|𝑁𝑁|( 𝑇𝑇 + 2|𝐸𝐸|)),
where 𝜉𝜉 denotes the number of rounds to converge to Nash equilibriumfor the FTS game.
16
The Ares Architecture
Cluster and Grid Computing LabServices Computing Technology and System Lab Ares
Sche
dulin
g So
lutio
n
Database
Coordinator
Worker Node
ExecutorExecutor
Executor
Worker process
Supervisor
Load Monitor
Worker Node
ExecutorExecutor
Executor
Worker process
Supervisor
Load Monitor
Worker Node
ExecutorExecutor
Executor
Worker process
Supervisor
Load Monitor
...
Master Node
Custom Scheduler
Nirv
ana-
base
dCo
ntro
l
Nirvana Agent
Dist
ribut
ed S
tream
Pro
cess
ing
Syste
m
17
Evaluation
Cluster and Grid Computing LabServices Computing Technology and System Lab Ares
Setups30 node cluster, implement Storm 16 cores & 64GB DDR3 Intel Xeon server1Gbps Ethernet interfacebaseline: R-Storm[Middleware’15]
ApplicationWord Count Application: Follow the setting of Heron[SIGMOD’15]
Join Application: TPC-H dataset
MetricsThroughputAverage Tuple Processing TimeAverage Rack Recovery Time
18
Is Processing Latency Better?
Cluster and Grid Computing LabServices Computing Technology and System Lab Ares
Average Reduction of 50.2%
19
Is Recovery Time Better?
Cluster and Grid Computing LabServices Computing Technology and System Lab Ares
Average Reduction of 48.9%
20
Is Throughput Better?
Cluster and Grid Computing LabServices Computing Technology and System Lab Ares
Average Improvement of 2.24×
21
Summary
Cluster and Grid Computing LabServices Computing Technology and System Lab Ares
The fault tolerant scheduling problem
The Nirvana algorithm
Implementation of Ares on top of Storm
Based on best-response dynamics
The stream latency modelThe stream recovery model
Thank you! Any question?