distributed machine learning: an intro. -...
TRANSCRIPT
![Page 1: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/1.jpg)
Chen Huang
Distributed Machine Learning: An Intro.
Feature Engineering Group,
Data Mining Lab,
Big Data Research Center, UESTC
![Page 2: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/2.jpg)
Contents
2
‒ Background
‒ Some Examples
‒ Model Parallelism & Data Parallelism
‒ Parallelization Mechanisms Synchronous
Asynchronous
…
‒ Parallelization Frameworks MPI / AllReduce / MapReduce / Parameter Server
GraphLab / Spark GraphX
…
![Page 3: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/3.jpg)
Background
3
Why Distributed ML?
• Big Data Problem
Efficient Algorithm
Online Learning / Data Stream• Feasible.
• What about high dimension?
Distributed Machine• The more, the merrier
![Page 4: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/4.jpg)
Background
4
Why Distributed ML?
• Big Data• Efficient Algorithm
• Online Learning / Data Stream
• Distributed Machine
• Big Model• Model Split
• Model Distributed
![Page 5: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/5.jpg)
Background
5
Distributed Machine Learning
• Big model over big data
![Page 6: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/6.jpg)
Background
Overview
6
Distributed Machine Learning
‒ Motivation
‒ Big model over big data
‒ DML
‒ Multiple workers cooperate each other with
communication
‒ Target
‒ Get the job done (convergence, …)
‒ Min communication cost (IO, …)
‒ Max effect (Time, performance…)
![Page 7: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/7.jpg)
Example
7
K-means
![Page 8: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/8.jpg)
Example
8
Distributed K-means
?
![Page 9: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/9.jpg)
Example
9
Spark K-means
![Page 10: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/10.jpg)
Example
10
Spark K-means
![Page 11: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/11.jpg)
Example
11
Spark K-means
![Page 12: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/12.jpg)
Example
12
Item filter
‒ Given two files, you need to output key-value
pairs in file B, whose key exists in file A.
‒ File B is super large. (e.g. 100GB)
‒ What if A is also super large?
Key1Key12Key3Key5…
Key1, val1Key2, val2Key4, val4Key3, val3Key5, val5…
A B
![Page 13: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/13.jpg)
Example
13
Item filter
Key1Key12Key3Key5…
Key1, val1Key2, val2Key4, val4Key3, val3Key5, val5…
A
B
Key1
Key12Key3
Key5
…….
Key1, val1
Key2, val2Key4, val4
Key3, val3Key5, val5…
………, …….
![Page 14: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/14.jpg)
Example
14
Item filter
Key1Key12Key3Key5…
Key1, val1Key2, val2Key4, val4Key3, val3Key5, val5…
A
B
Key1Key3 Key12
Key5 …….
Key1, val1Key2, val2Key3, val3
Key4, val4Key5, val5
Key7, val7………, …….
Hash
Hash
![Page 15: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/15.jpg)
Distributed Machine Learning
15
Overview
* AAAI 2017 Workshop on Distributed Machine Learning for more information
![Page 16: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/16.jpg)
Distributed Machine Learning
How To Distribute
16
Key Problems
‒ How to “split”‒ Data parallelism / model parallelism
‒ Data / Parameters dependency
‒ How to aggregate messages‒ Parallelization mechanisms
‒ Consensus between local & global parameters
‒ Does algorithm converge
‒ Other concerns‒ Communication cost, …
![Page 17: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/17.jpg)
Distributed Machine Learning
How To Split
17
How To Distribute
‒ Data Parallelism
1. Data partition
2. Parallel training
3. Combine local updates
4. Refresh local model with
new parameters
![Page 18: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/18.jpg)
Distributed Machine Learning
How To Split
18
How To Distribute
‒ Model Parallelism
1. Partition model into
multiple local workers
2. Workers collaborate
with each other to
perform optimization
![Page 19: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/19.jpg)
Distributed Machine Learning
How To Split
19
How To Distribute
‒ Model Parallelism & Data Parallelism
Example: Distributed Logistic Regression
![Page 20: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/20.jpg)
Distributed Machine Learning
How To Split
20
Categories
‒ Data Parallelism‒ Split data into many samples sets
‒ Workers calculate the same parameter(s) on different
sample set
‒ Model Parallelism‒ Split model/parameter
‒ Workers calculate different parameter(s) on the same
data set
‒ Hybrid Parallelism
![Page 21: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/21.jpg)
Distributed Machine Learning
How To Split
21
Data / Parameter Split
‒ Data Allocation‒ Random selection. (Shuffling)
‒ Partition. (e.g. Item filter, word count)
‒ Sampling
‒ Parallel graph calculation (for non-i.i.d. data)
‒ Parameter Split‒ Most algorithms assume parameter independent and
randomly split parameters
‒ Petuum (KDD’15, Eric Xing)
![Page 22: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/22.jpg)
Distributed Machine Learning
How To Aggregate Messages
22
Parallelization Mechanisms
• Given the feedback 𝑔𝑖 𝑤 of worker 𝑖, how can we
update the model parameter 𝑊?
𝑊 = 𝑓 𝑔1 𝑤 , 𝑔2 𝑤 ,… , 𝑔𝑚 𝑤
![Page 23: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/23.jpg)
Distributed Machine Learning
Parallelization Mechanism
23
Bulk Synchronous Parallel (BSP)
‒ Synchronous update‒ Update parameter until all workers are done with
their job
‒ Example: Sync SGD (Mini-batch SGD) , Hadoop
![Page 24: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/24.jpg)
Distributed Machine Learning
24
Sync SGD
‒ Perceptron
![Page 25: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/25.jpg)
Distributed Machine Learning
25
Sync SGD
𝑊 ← 𝑊 − 𝛻𝑊𝑖
𝑊 𝑊 𝑊𝛻𝑊1 𝛻𝑊2 𝛻𝑊3 = −
𝑥𝑖∈𝑀
𝑥𝑖𝑦𝑖
![Page 26: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/26.jpg)
Distributed Machine Learning
Parallelization Mechanism
26
Asynchronous Parallel
‒ Asynchronous update‒ Update parameter whenever received the feedback of
workers
‒ Example: Downpour SGD (NIPS’12)
![Page 27: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/27.jpg)
Distributed Machine Learning
27
Downpour SGD
![Page 28: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/28.jpg)
Distributed Machine Learning
28
Async. V.S. Sync.
‒ Sync.‒ Single point of failure: it has to wait until all workers
finished his job. The overall efficiency of algorithm
is determinated by the slowest worker.
‒ Nice convergence
‒ Async.‒ Very fast!
‒ Affect the convergence of algorithm. (e.g. expired
gradient)
‒ Use it, if model is not sensitive to async. update
![Page 29: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/29.jpg)
Distributed Machine Learning
Parallelization Mechanism
29
ADMM for DML
‒ Alternating Direction Method of Multipliers
‒ Augmented Lagrangian + Dual Decomposition
‒ Famous optimization algorithm for both industrial and
academic. (e.g. computing advertising)
For DML case: replace 𝑥2𝑘−1 with
𝑚𝑒𝑎𝑛(𝑥2𝑘−1) and 𝑥1
𝑘 with
mean 𝑥1𝑘 when updating
![Page 30: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/30.jpg)
Distributed Machine Learning
Parallelization Mechanisms
30
Overview
‒ Sync.
‒ Async
‒ ADMM
‒ Model Average
‒ Elastic Averaging SGD (NIPS’15)
‒ Lock Free: Hogwild! (NIPS’11)
‒ ……
![Page 31: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/31.jpg)
31
Distributed ML Framework
![Page 32: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/32.jpg)
Distributed Machine Learning
Frameworks
32
This is a joke, please laugh…
![Page 33: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/33.jpg)
Distributed Machine Learning
Frameworks
33
Message Passing Interface (MPI)
‒ Parallel computing architecture
‒ Many operations:
‒ send, receive, broadcast, scatter, gather…
![Page 34: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/34.jpg)
Distributed Machine Learning
Frameworks
34
Message Passing Interface (MPI)
‒ Parallel computing architecture
‒ Many operations:
‒ AllReduce = reduce + broadcast
‒ Hard to write code!
![Page 35: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/35.jpg)
Distributed Machine Learning
Frameworks
35
MapReduce
‒ Well-encapsulated code, user-friendly!
‒ Designed scheduler,
‒ Integration with HDFS / fault-tolerant /….
![Page 36: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/36.jpg)
Distributed Machine Learning
Frameworks
36
MapReduce
‒ Synchronous parallel, single point of failure.
‒ 数据溢写 (I don’t know how to translate…)
‒ Not so suitable for machine learning task.
‒ Many ML models are solved in iterative manner, and
Hadoop/MapReduce does not naturally support
iteration calculation
‒ Spark does
‒ Iterative MapReduce Style Machine Learning Toolkits
‒ Hadoop Mahout
‒ Spark MLlib
![Page 37: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/37.jpg)
Distributed Machine Learning
Frameworks
37
GraphLab (UAI’10, VLDB’12)
‒ Distributed computing framework for graph
‒ Split graph into sub-graphs by node cut
‒ Asynchronous parallel
![Page 38: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/38.jpg)
Distributed Machine Learning
Frameworks
38
GraphLab (UAI’10, VLDB’12)
‒ Data Graph + Update Function + Sync Operation
‒ Data Graph
‒ Update function: user-defined function, working on
scopes
‒ Sync:global parameter update
Scope allows overlapping
![Page 39: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/39.jpg)
Distributed Machine Learning
Frameworks
39
GraphLab (UAI’10, VLDB’12)
‒ Data Graph + Update Function + Sync Operation
‒ Three Steps = Gather + Apply + Scatter
Read Only Write Node Only Write Edge Only
![Page 40: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/40.jpg)
Distributed Machine Learning
Frameworks
40
GraphLab: Consistency Control
‒ Trade-off between conflict and parallelization
Scope内不能读写Scope内,除了邻居节点,都不能读写
一次更新中,其他操作不能读写该节点
![Page 41: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/41.jpg)
Distributed Machine Learning
Frameworks
41
Spark GraphX
‒ Avoid the cost of moving sub-graphs among workers by
combining Table view & Graph view
![Page 42: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/42.jpg)
Distributed Machine Learning
Frameworks
42
Spark GraphX
‒ Avoid the cost of moving sub-graphs among workers by
combining Table view & Graph view
![Page 43: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/43.jpg)
Distributed Machine Learning
Frameworks
43
Parameter Server
‒ Asynchronous parallel
1. Workers query for current parameters
2. Parameters are stored in distributed way,
among server nodes
Workers calculate
partial parameters
![Page 44: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/44.jpg)
Distributed Machine Learning
Frameworks
44
Parameter Server
‒ Asynchronous parallel
![Page 45: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/45.jpg)
Distributed Machine Learning
45
DML Trends Overview
• For more information, please go to:
AAAI-17 Tutorial on Distributed Machine Learning
![Page 46: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/46.jpg)
Distributed Machine Learning
Take Home Message
46
‒ How to “split”‒ Data parallelism / model parallelism
‒ Data / Parameters dependency
‒ How to aggregate messages‒ Parallelization mechanisms
‒ Consensus between local & global parameters
‒ Does algorithm converge
‒ Frameworks
![Page 47: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ecd7d5a860b3d1d3b6328f8/html5/thumbnails/47.jpg)
Thanks