adaptive distributed convolutional neural network ... · adaptive distributed convolutional neural...
TRANSCRIPT
![Page 1: Adaptive Distributed Convolutional Neural Network ... · Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN 17 August 2020 ICPP 2020 Sai Qian](https://reader035.vdocuments.us/reader035/viewer/2022071006/5fc38fc2c30e7051e35d1615/html5/thumbnails/1.jpg)
Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN
17 August 2020ICPP 2020
Sai Qian Zhang, Jieyu Lin, Qi Zhang
![Page 2: Adaptive Distributed Convolutional Neural Network ... · Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN 17 August 2020 ICPP 2020 Sai Qian](https://reader035.vdocuments.us/reader035/viewer/2022071006/5fc38fc2c30e7051e35d1615/html5/thumbnails/2.jpg)
Option 1: edge onlyImage Edge devices
Audiouser data
user dataVideo
user data
Image
Audiouser data
Video
Cloud data centerOption 2: cloud only
user data
user data
... ...
Executing DNN Inference Tasks for End Users
● Using edge device to handle the end user data leads to a long processing time, while using cloud server to process the end user data acquires a large communication delay.
Limited computing capability Large communication overhead
![Page 3: Adaptive Distributed Convolutional Neural Network ... · Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN 17 August 2020 ICPP 2020 Sai Qian](https://reader035.vdocuments.us/reader035/viewer/2022071006/5fc38fc2c30e7051e35d1615/html5/thumbnails/3.jpg)
Motivation● Edge devices
○ Resource-limited○ Pervasive
● Adaptive Distributed Convolutional Neural Network (ADCNN) ○ We propose a framework for agile execution of inference tasks on edge clusters for
Convolutional Neural Networks (CNNs)
● Challenges○ Reduce the inference latency while keeping the accuracy performance ○ Device heterogeneity and performance fluctuation○ Applicable to different CNN models
![Page 4: Adaptive Distributed Convolutional Neural Network ... · Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN 17 August 2020 ICPP 2020 Sai Qian](https://reader035.vdocuments.us/reader035/viewer/2022071006/5fc38fc2c30e7051e35d1615/html5/thumbnails/4.jpg)
Agenda
● Background● CNN partitioning strategies● ADCNN framework● Modification on CNN architecture● Evaluation● Conclusion
![Page 5: Adaptive Distributed Convolutional Neural Network ... · Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN 17 August 2020 ICPP 2020 Sai Qian](https://reader035.vdocuments.us/reader035/viewer/2022071006/5fc38fc2c30e7051e35d1615/html5/thumbnails/5.jpg)
CNN Background -- Convolutional Layer
Input feature maps
224
224
Filter
33
3
Input feature maps
Output feature maps
...
Filter 1
Filter 2
Filter K
... ...
224
224
33
● The weight filters slide across the ifmaps. The dot product between the entries of each ifmap and weight filter are calculated at each position.
![Page 6: Adaptive Distributed Convolutional Neural Network ... · Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN 17 August 2020 ICPP 2020 Sai Qian](https://reader035.vdocuments.us/reader035/viewer/2022071006/5fc38fc2c30e7051e35d1615/html5/thumbnails/6.jpg)
Background -- CNN Workload Characteristics
● Earlier layers take much longer to process than the later layers.
Processing time for VGG16
![Page 7: Adaptive Distributed Convolutional Neural Network ... · Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN 17 August 2020 ICPP 2020 Sai Qian](https://reader035.vdocuments.us/reader035/viewer/2022071006/5fc38fc2c30e7051e35d1615/html5/thumbnails/7.jpg)
CNN Partitioning Strategies: CNN Channelwise Partitioning
● In channelwise partition, each node needs to exchange their partially accumulated ofmaps to produce final ofmaps, which may lead to a significant communication overhead.
Convolutionifmaps ofmapsFilter 1
...
...
...
...
...C/2
C/2
......
K/2
K/2
Filter K
W
HR
U
N
M
Workload of device 1 Workload of device 2
![Page 8: Adaptive Distributed Convolutional Neural Network ... · Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN 17 August 2020 ICPP 2020 Sai Qian](https://reader035.vdocuments.us/reader035/viewer/2022071006/5fc38fc2c30e7051e35d1615/html5/thumbnails/8.jpg)
CNN Partitioning Strategies: Spatial Partitioning
● In spatial partition, each tile needs to transmit their data halo in order to compute the correct result.
ifmap
A B
DC
data halo
A B
DC
A B
DCData halo transmission among tiles
0.2
0.6
0.9
0.2
0.6
0.4 0.3
0.4 0.3
0.9
(c)(b)(a)
![Page 9: Adaptive Distributed Convolutional Neural Network ... · Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN 17 August 2020 ICPP 2020 Sai Qian](https://reader035.vdocuments.us/reader035/viewer/2022071006/5fc38fc2c30e7051e35d1615/html5/thumbnails/9.jpg)
Fully Decomposable Spatial Partition (FDSP)
A B
DC
Normal Spatial Partition
0.2
0.6
0.9
0.2
0.6
0.4 0.3
0.4 0.3
0.9
A B
DC
● The cross-tile information transfer can be eliminated by padding the edge pixels with zeros.
0.0
0.0
0.0
0.0 0.0
Fully Decomposable Spatial Partition (FDSP)
![Page 10: Adaptive Distributed Convolutional Neural Network ... · Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN 17 August 2020 ICPP 2020 Sai Qian](https://reader035.vdocuments.us/reader035/viewer/2022071006/5fc38fc2c30e7051e35d1615/html5/thumbnails/10.jpg)
ADCNN Framework
ProgressiveRetraining
Dog
Original CNN model Output CNN model
Centralnode
Edge device cluster
Tiles
Input
...
Step 1 Step 2... ......
Convnode
... ......
...... ...
Convnode
...... ......
![Page 11: Adaptive Distributed Convolutional Neural Network ... · Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN 17 August 2020 ICPP 2020 Sai Qian](https://reader035.vdocuments.us/reader035/viewer/2022071006/5fc38fc2c30e7051e35d1615/html5/thumbnails/11.jpg)
ADCNN Framework
...
● The Conv nodes need to transmit the intermediate results to the Central node, which may still cause a significant communication overhead.
Edge device cluster
Inputtiles Results
Convnode
...... ...
Convnode
...... ...
Centralnode
![Page 12: Adaptive Distributed Convolutional Neural Network ... · Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN 17 August 2020 ICPP 2020 Sai Qian](https://reader035.vdocuments.us/reader035/viewer/2022071006/5fc38fc2c30e7051e35d1615/html5/thumbnails/12.jpg)
Modification on CNN Topology
● We modify the CNN model for reducing this communication overhead.● We adopt progressive retraining by adding the modification on the CNN architecture
1.3
0.1-0.2
0.3 0.1
-0.5 2.3
-2.2
0.20.12.5
-3.8
1.2
Apply clipped ReLU
[1,0,2,0,1,0,0,0, 0,0,0,0,2,0,0,0] [1,1,2,1,1,7,2,3]
0.1
-1.3
Unroll the neurons
0.1
1.1
0.0 0.0
0.1 0.0
0.0 1.8
0.00.00.01.8
0.0
1.0
0.0
0.0
0.0
1.0
0.0 0.0
0.0 0.0
0.0 2.0
0.00.00.02.0
0.0
1.0
0.0
0.0
0.0
Quantization RLE1 2 3 4
Output from the CONV nodes
![Page 13: Adaptive Distributed Convolutional Neural Network ... · Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN 17 August 2020 ICPP 2020 Sai Qian](https://reader035.vdocuments.us/reader035/viewer/2022071006/5fc38fc2c30e7051e35d1615/html5/thumbnails/13.jpg)
ADCNN Architecture
● ADCNN takes advantage of the fine-grained, fully independent tiles generated by FDSP and adapt it to dynamic conditions, allowing it to achieve fine-grained load balancing across heterogeneous edge nodes.
CONV nodescluster
Statistics collection
Stats
...
...Input
partition
Layer computation
Input tiles
Intermediate results
Central node
Dog
...
# received results
d:[-0.9,...,1.1],i_id:6,t_id:1,n_id:1
d:[0.3,...,-0.8],i_id:6,t_id:2,n_id:1
d:[-0.4,...,0.2],i_id:6,t_id:4,n_id:N
2
34
1
i_id:6t_id:1n_id:1
i_id:6t_id:2n_id:1
i_id:6t_id:4n_id:N
ADCNN System
![Page 14: Adaptive Distributed Convolutional Neural Network ... · Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN 17 August 2020 ICPP 2020 Sai Qian](https://reader035.vdocuments.us/reader035/viewer/2022071006/5fc38fc2c30e7051e35d1615/html5/thumbnails/14.jpg)
Accuracy Evaluation
● We evaluate different CNN models from different applications.● Accuracy degradations are around 1% for 8 by 8 FDSP on the input sample.
VGG16 Fully Convolutional Network
![Page 15: Adaptive Distributed Convolutional Neural Network ... · Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN 17 August 2020 ICPP 2020 Sai Qian](https://reader035.vdocuments.us/reader035/viewer/2022071006/5fc38fc2c30e7051e35d1615/html5/thumbnails/15.jpg)
Inference Latency Comparison
● We implement ADCNN system with nine identical Raspberry Pi devices which simulate the edge devices. Among these nine devices, eight are used as Conv nodes, and the rest one is used as the Central node.
● Baselines:○ Single device scheme○ Remote cloud scheme
● ADCNN decreases the average processing latency by 6.68x and 4.42x, respectively.
![Page 16: Adaptive Distributed Convolutional Neural Network ... · Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN 17 August 2020 ICPP 2020 Sai Qian](https://reader035.vdocuments.us/reader035/viewer/2022071006/5fc38fc2c30e7051e35d1615/html5/thumbnails/16.jpg)
ADCNN Performance in Dynamic Environment
● We adjust the CPU processing speed on four of the Conv nodes (node 5,6,7,8) in the middle of the processing 50 input images, and detect its impact on tile assignment and overall inference latency.
● ADCNN can handle the dynamic condition on the the node performance effectively.
Variation on Inference Latency
Changes on Tile Assignment
![Page 17: Adaptive Distributed Convolutional Neural Network ... · Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN 17 August 2020 ICPP 2020 Sai Qian](https://reader035.vdocuments.us/reader035/viewer/2022071006/5fc38fc2c30e7051e35d1615/html5/thumbnails/17.jpg)
Conclusion● We introduce ADCNN, a distributed inference framework which jointly optimize CNN
architecture and computing system for better performance in dynamic network environments.
● ADCNN applies FDSP to partition the compute-intensive convolutional layers into many small independent computational tasks which can be executed in parallel on separate edge devices.
● ADCNN system can take advantage of the fine-grained, fully independent tiles generated by FDSP and adapt it to dynamic conditions, allowing it to achieve fine-grained load balancing across heterogeneous edge nodes.
● Compared to existing distributed CNN inference approaches, ADCNN provides up to 2.8x lower latency, while achieving a competitive inference accuracy. Additionally, ADCNN can quickly adapt to the variations on edge device performance.