maeri architectureand implementationdetails...feb 08, 2019 · resource description numpe 64 distbw...
TRANSCRIPT
![Page 1: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/1.jpg)
MAERI Architecture andImplementation Details
Synergy Lab, Georgia TechHyoukjun Kwon
http://synergy.ece.gatech.edu
![Page 2: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/2.jpg)
Acknowledgement
• Mr. Charlie Hauck• Dr. Rishiyur Nikhil
• Ananda Samajdar• Eric Qin• Yehowshua Immanuel
For providing BSV license for hands-on exercises
For discussions and ASIC/FPGA synthesis
![Page 3: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/3.jpg)
Outline
• Tool Flow of MAERI• MAERI Implementation Details• Using MAERI source code base• Demo and exercises
![Page 4: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/4.jpg)
Tool Flow of MAERI
ResourceDescription
NumPE 64DistBW 4GatrBW 4
K 16C 3R 3S 3Y 224X 224
LayerDescription
Building BlockLibrary (BSV RTL)
AdderSwitch
Mult.Switch
SimpleSwitch Cntl
…
MAERIFront-end
BSVCompiler
MAERI Input MAERI Framework MAERI Outputs
VerilogFiles
Cycle-accurate Simulation# Cycles# Weight Distribution# Input Distribution# Local Communication…
RTLGeneration
Simulation
![Page 5: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/5.jpg)
Tool Flow of MAERI Simulation
MAERI-Compiler
mRNA-generatedMapping
Target Hardware
Config.
Switch Configurations
Tile configurations
MAERI-Simulation
Inputs Machine codes
![Page 6: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/6.jpg)
Bluespec System Verilog (BSV)• A high-level hardware description language
• Generates fully synthesizable Verilog
• Inspired by Haskell and System Verilog• Strong type-checking system and polymorphism• System Verilog-like syntax• Intuitive module interfaces
• Based on “guarded atomic action” blocks• Provides coarse-grained description of parallel actions
![Page 7: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/7.jpg)
Bluespec System Verilog (BSV)• A high-level hardware description language
• Generates fully synthesizable Verilog
• Inspired by Haskell and System Verilog• Strong type-checking system and polymorphism• System Verilog-like syntax• Intuitive module interfaces
• Based on “guarded atomic action” blocks• Provides coarse-grained description of parallel actions
For details, please refer to “BSV by Example” (http://csg.csail.mit.edu/6.S078/6_S078_2012_www/resources/bsv_by_example.pdf)
![Page 8: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/8.jpg)
Outline
• Tool Flow of MAERI• MAERI Implementation Details• Using MAERI source code base• Demo and exercises
![Page 9: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/9.jpg)
MAERI Implementation – Distribution Network
X X X X X X X X X X X X X X X X
Input Port 0 Input Port 1 Input Port 2 Input Port 3
• # Multiplier Switches = 16• Distribution Bandwidth = 4X
![Page 10: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/10.jpg)
MAERI Implementation – Multiplier Network
X X X X X X X X X X X X X X X X
• # Multiplier Switches = 16
![Page 11: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/11.jpg)
MAERI Implementation – Reduction Network
X+
X X X X X X X X X X X X X X X
• # Multiplier Switches = 16• Reduction Bandwidth = 4X
+ + + + + + ++ ++ +
+ +
+
![Page 12: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/12.jpg)
MAERI Implementation – Reduction Network
X+
X X X X X X X X X X X X X X X
• # Multiplier Switches = 16• Reduction Bandwidth = 4X
+ + + + + + ++ ++ +
+ +
+Double
Reduction switchSingle
Reduction switch
![Page 13: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/13.jpg)
MAERI Implementation – Reduction Network
X+
X X X X X X X X X X X X X X X
• # Multiplier Switches = 16• Reduction Bandwidth = 4X
+ + + + + + ++ ++ +
+ +
+
Collection Bus3Collection Bus2Collection Bus1Collection Bus0
![Page 14: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/14.jpg)
Outline
• Tool Flow of MAERI• MAERI Implementation Details• Using MAERI source code base• Demo and exercises
![Page 15: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/15.jpg)
Source Code Directory Structuremaeri_code_hpca2019_tutorial
src
scripts
distribution_networkreduction_networkmultiplier_networkALUs
types
lib
maeri_accelerator
MAERI core implementation
Custom BSV type definitions
Custom BSV libraries
Distribution tree
Augmented reduction tree
Multiplier switch and its array
Fixed point adder/multiplier
MAERI top module…
![Page 16: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/16.jpg)
How to use MAERI front-end• Changing design parameters
• Modify AcceleratorConfig.bsv at the top directory• Distribution bandwidth• Reduction bandwidth• Number of multiplier switches
• Cycle-accurate simulation and Verilog code generation• ./Maeri –c : Compile a simulation• ./Maeri –r : Run compiled simulation• ./Maeri –w : Launch GTKwave for waveform analysis • ./Maeri –v : Generate Verilog code• ./Maeri –clean : Clean up intermediate files
![Page 17: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/17.jpg)
How to use MAERI front-end• Simulation results example
• Commands: “./Maeri –r” after “./Maeri –c”
![Page 18: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/18.jpg)
How to use MAERI front-end• Waveform Analysis
• Commands: “./Maeri –r” and then “./Maeri –w”
![Page 19: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/19.jpg)
How to use MAERI front-end
• Verilog code generation• Commands: “./Maeri –v”
* Verilog files are generated in “(Top_Directory)/Verilog”
Generated Verilog code is synthesizable!
![Page 20: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/20.jpg)
MAERI Synthesis and PnR
• Synthesis/PnR Environment• Technology: 28nm
• Clock frequency: 1GHz
• Design: 64 multiplier switches and 31 adder switches
• Distribution Bandwidth: 32/16/8/4 data per cycle
• Gather Bandwidth: 32/16/8/4 data per cycle
• RTL Code: Verilog generated using MAERI code base
• CAD Tool Chain: Synopsys Design compiler, Cadence
Innovus, Primepower
![Page 21: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/21.jpg)
Post-layout Area and Power
Num PEs16 32 64 128 256
Wire RN MN DNAr
ea(u
m2 )
Bandwidth: 8X
![Page 22: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/22.jpg)
Post-layout Area and PowerWire RN MN DN
Area
(um
2 )
4X 8X 16X 32XBandwidth
NumPEs: 64
![Page 23: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/23.jpg)
FPGA Resource Usage
* Based on Virtex 7 board, synthesis frequency: 50MHzNum PEs
32 64 128 256
LUT FF DSPBandwidth: 8X
![Page 24: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/24.jpg)
FPGA Resource Usage
* Based on Virtex 7 board, synthesis frequency: 50MHzBandwidth
8X 16X 32X
LUT FF DSPNumPEs: 64
![Page 25: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/25.jpg)
Outline
• Tool Flow of MAERI• MAERI Implementation Details• Using MAERI source code base• Demo and exercises
![Page 26: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/26.jpg)
Demo• Launching cycle-accurate simulations
• Modifying user configuration
• Compiling simulations
• Launching wave form analysis
• Generating Verilog files
![Page 27: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/27.jpg)
Outline
• Tool Flow of MAERI• Source code Structure• Using MAERI source code base• Demo• Hands-on Exercises
![Page 28: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/28.jpg)
Testbench Structure
28
Testbench
X X X XX X X X
+++
++
+ +
Weights / Inputs
AcceleratorConfig.bsv(Configuration File)
Outputs
VN0 VN1
InterconnectControl
Layer SizeNumMultSwsDist. BWRed. BW…
MAERI mapper-generated optimized configurations
Switch Configurations
Tile configurations
Machine codes
RN_Config.vmh Layer_Info.vmh
Generated Simulation Model
![Page 29: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/29.jpg)
Testbench Dataflow (MAESTRO description)
29
let vnSize = sizof(R) x sizeof(S)let numVNs = floor(NumMultSwitches / vnSize)
• Temporal_Map (1, 1) C• Spatial_Map (1, 1) K• Temporal_Map (sizeOf(R), 1) Y• Temporal_Map(sizeOf(S), 1) X• Cluster (vnSize, L)• Temporal _Map (SizeOf(S), SizeOf(S)) S• Spatial_Map (1,1) R• Cluster(vnSize, P)• Spatial_Map (1,1) S
High weight filter parallelism
![Page 30: MAERI Architectureand ImplementationDetails...Feb 08, 2019 · Resource Description NumPE 64 DistBW 4 GatrBW 4 K 16 C 3 R 3 S 3 Y 224 X 224 Layer Description Building Block Library](https://reader034.vdocuments.us/reader034/viewer/2022051906/5ff8edd7c65ef079822a03f8/html5/thumbnails/30.jpg)
• Exercise#1: Compile a simulation with default, early, and late layers with 32 PEs (“./Maeri –c,”). Run simulation and compare results.
• Exercise#2: Compile a simulation with 32 and 64 PEs using default seting (“./Maeri –c”). Run simulation and compare results
• Exercise#3: Compile a simulation with 4X/8X/and 16X distribution bandwidth (fix reduction bandwidth as 8X). Run simulation and compare results.
• Exercise#4: Compile a simulation with 4X/and 8X reduction bandwidth (fix distribution bandwidth as 8X). Run simulation and compare results.