10/09/2014
DESCRIPTION
FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimisation in Lookup-Table Based FPGA Designs. Presented by Qiwei Jin. 10/09/2014. Overview. The paper and the authors. Some background information. The algorithm in detail. Results and Conclusion. Questions for discussion. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/1.jpg)
FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimisation in Lookup-
Table Based FPGA Designs
21/04/23
1
Presented by Qiwei Jin
![Page 2: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/2.jpg)
Overview
• The paper and the authors.• Some background information.• The algorithm in detail.• Results and Conclusion.• Questions for discussion.
2
![Page 3: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/3.jpg)
About the Paper
• Originally published in 1992, won IEEE Circuit & Systems Society best paper award in 1994.
• 238 citations in total, 33 self.• The first algorithm to solve a conventionally NP-hard
depth minimisation problem in polynomial time.• The algorithm is a key component in most commercial
FPGA compilers.• FlowMap-r and other more sophisticated algorithms
published by the authors at the same year or later for both depth and area minimisation.
3
![Page 4: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/4.jpg)
Jason Cong• Chairman of Computer Science
Department, UCLA.• Was Assistant Professor in 1994
when this paper was published.• Got Promoted to Associate
Professor in the same year.• His company Aplus was acquired by
Magma in 2004 for "$13 million in stock, cash and incentives“.
4
Picture borrowed from Jason Cong’s homepage
![Page 5: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/5.jpg)
Yuzheng Ding• Very low profile person, no picture,
no home page, not even on FaceBook.
• RA in UCLA for PhD when this paper was published.
• May have left university for work (Mentor Graphics) after graduation.
• Still working actively with Jason Cong, latest paper published in year 2008.
5
![Page 6: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/6.jpg)
Background
• FPGA (Field-Programmable Gate Array): Programmable hardware.
6
Xilinx Virtex 5 FPGA
![Page 7: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/7.jpg)
Background Cont.
7For more information, go to Wayne Luk’s Custom Computing Course
![Page 8: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/8.jpg)
Background Cont.
• FPGAs are essentially a bunch of wires and LUTs (Look-Up Tables) that can be configured to emulate the behaviour of a digital circuit.
• FPGAs can be configured by Hardware Description Language (HDL, such as VHDL).
• Based on the HDL, a netlist can be generated automatically by some algorithm (FlowMap!).
8
![Page 9: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/9.jpg)
Background Cont.
=
9
ASIC
Addr. Value
0000 0
0001 0
... ...
1111 14-Input 1-Output LUT (16 entries in total)
![Page 10: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/10.jpg)
Background Cont.
• Mappings from ASIC to FPGAs are not necessary one to one.
• The question is how to achieve the optimal condition?
10
ASIC =
![Page 11: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/11.jpg)
Background Cont.
• Trade-offs:– Area (number of LUTs used)– Depth (delay of the circuit)
• FlowMap focuses on depth optimisation
11
depth
![Page 12: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/12.jpg)
Depth Minimisation Example
12
![Page 13: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/13.jpg)
The Key Idea of Depth Minimisation
• Try to pack as many gates in different levels into a LUT as possible.
• Number of LUT used (Area) is not the primary concern.• The problem is equivalent to generating optimal code
for expressions containing common subexpressions, hence NP-Hard.*
• Conventional method will decompose the Boolean network into a forest of trees before processing.
• FlowMap can find an optimal mapping directly from a Boolean network within polynomial time.
Let’s see how.
13
* A. Aho, S. C. Johnson, “Optimal Code Generation for Expression Trees”, 23, 3, 488-501 (1976).
![Page 14: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/14.jpg)
PreliminariesX
• Input(T)• Cut (X, X)• Node Cut Size• Edge Capacity• Edge Cut Size• Whether a cut is K-feasible• Height of a Cut
![Page 15: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/15.jpg)
The FlowMap Algorithm
• 2 Phases– Node Labeling: define the optimal depth of the
LUT mapping solution for Nt.
– LUT Mapping: generate the LUT network based on the labeling in the first phase.
![Page 16: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/16.jpg)
Phase 1: Node Labelling
16
![Page 17: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/17.jpg)
Phase 1: Node Labelling Cont.
17
![Page 18: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/18.jpg)
Phase 2: LUT Mapping
18
![Page 19: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/19.jpg)
Phase 2: LUT Mapping Cont.
19
![Page 20: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/20.jpg)
The FlowMap Pseudocode
20
![Page 21: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/21.jpg)
Enhancements
• Maximising Cut Volume During Mapping• Postprocessing (flow-pack) Operations to
reduce number of K-LUTs
21
![Page 22: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/22.jpg)
Results
22
![Page 23: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/23.jpg)
Conclusion
• The paper presents the first algorithm to compute a NP-hard problem in polynomial time.
• Compared to other algorithms FlowMap is about to reduce up to 7% of the LUT network and reduce up to 50% of the number of LUTs.
23
![Page 24: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/24.jpg)
Questions
• It is claimed that a minimum height K-feasible cut can be found in O(Km) time, where K is the number of input of LUT and m is the number of edges of in the network.
• But it is not clear to me how it is derived.
24
![Page 25: 10/09/2014](https://reader033.vdocuments.us/reader033/viewer/2022051621/56814bc6550346895db89a5b/html5/thumbnails/25.jpg)
Questions Cont.
• It would be interesting to see the time taken to compute the mapping for the testing cases with FlowMap vs. Other Algorithms.
• The testing cases are generally small in size, it would be more convincing to see some large size examples.
25