a random forest using a multi-valued decision diagram on an fpga
TRANSCRIPT
![Page 1: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/1.jpg)
A Random Forest using a Multi-valued Decision Diagram on an FPGA
1Hiroki Nakahara, 1Akira Jinguji, 1Shimpei Sato, 2Tsutomu Sasao
1Tokyo Institute of Technology, JP, 2Meiji University, JP
May, 22nd, 2017@ISMVL2017
![Page 2: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/2.jpg)
Outline• Background• Random forest (RF)• Multi-valued decision diagram (MDD)• RF using MDDs• Experimental results• Conclusion
2
![Page 3: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/3.jpg)
Machine Learning
3
Much computation power, and Big data(Left): “Single-Threaded Integer Performance,” 2016(Right): Nakahara, “Trend of Search Engine on modern Internet,” 2014
![Page 4: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/4.jpg)
Machine Learning Algorithms
M. Warrick, “How to get started with machine learning,” PyCon2014 4
![Page 5: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/5.jpg)
Introduction• Random Forest (RF)
• Ensemble learning method• Consists of multiple decision trees (DTs)• Applications: Segmentation, human pose
detection• It is based on binary DTs (BDTs)
• A node is evaluated by an if-then-else statement
• The same variable may appear several times• Multiple-valued decision diagram (MDD)
• Each variable appears only once on a path
5
![Page 6: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/6.jpg)
Introduction (Contʼd)• Target platform
• CPU: Too slow• GPU: Not suitable to the RF → slow, and consumes much power
• FPGA: Faster, low power, long TAT• High-level synthesis (HLS) for the RF using MDDs on an FPGA• Low power, high performance, short design time
6
![Page 7: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/7.jpg)
Random Forest
7
![Page 8: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/8.jpg)
Classification by a Binary Decision Tree (BDT)• Partition of the feature map
1.00
0.53
0.29
0.00
0.09
0.63
0.71
1.00
C1
C2 C1
C1 C2 C1
X1
X2
X2<0.53?
X2<0.29? X1<0.09?
X1<0.63? X1<0.71?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C2
C1
8
![Page 9: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/9.jpg)
Training of a BDT• It is built by randomized samples• Recursively partition the dataset to maximize its
entropy → The same variables may appear
9
1.00
0.53
0.29
0.00
0.09
0.63
0.71
1.00
C1
C2 C1
C1 C2 C1
X1
X2
X2<0.53?
X2<0.29? X1<0.09?
X1<0.63? X1<0.71?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C2
C1
![Page 10: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/10.jpg)
Random Forest (RF)• Ensemble learning• Classification and regression• Consists of multiple BDT
10
Tree 1 Tree 2 Tree n
C1 C2C1
Voter
C1 (Class)
InputX1<0.53?
X3<0.71? X2<0.63?
X2<0.63? X3<0.72?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C3
C1
Tree 1
Binary Decision Tree (BDT) Random Forest
...
![Page 11: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/11.jpg)
Applications• Key point matching [Lepetit et al., 2006]• Object detector [Shotton et al., 2008][Gall et al., 2011]• Hand written character recognition [Amit&Geman, 1997]• Visual word clustering
[Moosmann et al.,2006]• Pose recognition
[Yamashita et al., 2010]• Human detector
[Mitsui et al., 2011][Dahang et al., 2012]
• Human pose estimation [Shotton 2011]
11
![Page 12: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/12.jpg)
Known Problem• Build BDTs from randomized samples
• The same variable may appear on a path• Tend to be slow, even if we use the GPUs
12
X2<0.53?
X2<0.29? X2<0.09?
X1<0.63? X1<0.71?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C2
C1
if X2 < 0.09 thenoutput C1;elsegoto Child_node;
![Page 13: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/13.jpg)
Multi-valued Decision Diagram
13
![Page 14: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/14.jpg)
14
Binary Decision Diagram (BDD)• Recursively apply Shannon expansion to a given logic function
• Non-terminal node: If-then-else statement• Terminal node: Set functional value
0 1
x1
x2
x3
x4
x5
x6
Non‐terminal node
Terminal node
![Page 15: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/15.jpg)
15
Measurement of BDD
Memory size: # of nodes size of a nodeWorst case performance: LPL (Longest Path Length)
→Dedicated fully pipeline hardware
0 1
x1
x2
x3
x4
x5
x6
![Page 16: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/16.jpg)
16
Multi-Valued Decision Diagram (MDD)
• MDD(k): 2k outgoing edges• Evaluates k variables at a time
0 1
x1
x2
x3
x4
x5
x6
BDD0 1
X3
X2
X1
{x5,x6}
{x3,x4}
{x1,x2}
MDD(2)
![Page 17: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/17.jpg)
Comparison the BDT with the MDD
17
X2<0.53?
X2<0.29? X1<0.09?
X1<0.63? X1<0.71?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C2
C1
X2
X1 X1
C1 C2
<0.29
<0.53<1.00
<1.00<0.71<0.71
<1.00
<0.63
BDT MDD
![Page 18: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/18.jpg)
# of Nodes
18
1.00
0.53
0.29
0.00
0.09
0.63
0.71
1.00
C1
C2 C1
C1 C2 C1
X2
X1
1.00
0.53
0.29
0.00
0.09
0.63
0.71
1.00
C1
C2 C1
C1 C2 C1
X2
X1BDT MDD
![Page 19: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/19.jpg)
Complexities of the BDT and the MDD
19
# Nodes LPL
BDT O(Σ|Xi|) O(Σ|Xi|)
MDD O(|Xi|k) O(n)
The RF prefers shallow decision trees for avoid the overfitting
![Page 20: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/20.jpg)
Random Forest using MDDs on an FPGA
20
![Page 21: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/21.jpg)
FPGA (Field Programmable Gate Array)• Reconfigurable architecture
• Look-up Table (LUT)• Configurable channel
• Advantages• Faster than CPU• Dissipate lower power
than GPU• Short time design
than ASIC
21
![Page 22: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/22.jpg)
Fully Pipeline Circuit
Tree 1 Tree 2 Tree b
C1 C2
C1
VoterC1
X (Input)
...
22
![Page 23: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/23.jpg)
MUX-based Realization
23
![Page 24: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/24.jpg)
System Design Tool
24
①②
④
③
1. Behavior design+ pragmas
2. Profile analysis3. IP core generation by HLS4. Bitstream generation by
FPGA CAD tool5. Middle ware generation
↓Automatically done
![Page 25: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/25.jpg)
Proposed Tool Flow
TrainingDataset
scikit‐learn
HyperParameter(by Grid‐search)
RandomForest
HostCode
KernelCode aocx
Binary
HostPC
FPGABoard
aoc
gcc
RF2AOC
25scikit‐learn Intel SDK for OpenCL
![Page 26: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/26.jpg)
Experimental Results
26
![Page 27: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/27.jpg)
Comparison the MDD based with the BDT based
27
BDT MDDName Path len.
(Peform.)#Nodes(Mem.)
Max. Path
Path len.(Peform.)
#Nodes(Mem.)
Dermatology 720 676 15 322 118336Contraceptive Method
600 1055 9 198 7360
Glass Identification
952 1260 10 268 17204
Hayes‐Roth 480 577 5 73 448Hepatitis 720 1040 15 357 145664Ionosphere 1196 1077 20 381 671744Iris 1056 777 4 199 517
Dataset: UCI Machine Learning Repositoryhttp://archive.ics.uci.edu/ml/datasets.html
![Page 28: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/28.jpg)
Comparison of Platforms• Implemented RF following devices
• CPU: Intel Core i7 650• GPU: NVIDIA GeForce GTX Titan• FPGA: Terasic DE5-NET
• Measure dynamic power includingthe host PC
• Test bench: 10,000 random vectors• Execution time includingcommunication time between the host PC and devices
28
GPU
FPGA
![Page 29: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/29.jpg)
Comparison of Platforms
29
GPU@86WGeForce Titan
CPU@13WXeon (R) E5607
FPGA@15WStratix V A7
Name LPS LPS/W LPS LPS/W LPS LPS/WDermatology 336.2 3.9 211.6 16.3 3221.2 214.7
Contraceptive Method
521.9 6.1 286.4 22.0 10924.3 728.3
Glass Identification
726.7 8.5 587.5 45.2 6442.3 429.5
Hayes‐Roth 1512.9 17.6 1165.5 89.7 12884.6 859.0
Hepatitis 739.1 8.6 662.7 51.0 8209.9 547.3Ionosphere 821.0 9.5 595.9 45.8 9663.5 644.2
Iris 446.6 5.2 436.7 33.6 4831.7 322.1
LPS: #Looks Per Second
![Page 30: A Random Forest using a Multi-valued Decision Diagram on an FPGa](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a6479c67f8b9a63568b4695/html5/thumbnails/30.jpg)
Conclusion• Proposed the RF using MDDs
• Reduced the path length• Increased the column multiplicity
• # of nodes: O(|X|k)• The shallow decision diagram is recommended to avoid the overfitting
• Developed the high-level synthesis design flow toward the FPGA realization
• 10.7x faster than the GPU• 14.0x faster than the CPU
30