summary: benefits of app-specific design...summary: benefits of app-specific design speed,...
TRANSCRIPT
![Page 1: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/1.jpg)
Summary: Benefits of App-Specific Design
Speed,Efficiency
Flexibility,Programmability
H/W designs General PurposeProcessors
General PurposeProcessors
+ ISA Extensions
ApplicationSpecificProcessor
Specialization limits the scope of a device’s operation
Produces stronger properties and invariants
Results in higher return optimizations
Programmability preserves the flexibility regarded by GPP’s
A natural fit for embedded designs
Where application domains are more likely restrictive
Where cost and power are 1st order concerns
Overcomes growing silicon/architecture bottlenecks Concentrated computation overcomes dark silicon dilemma
Customized acceleration speeds up Amdahl’s serial codes
![Page 2: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/2.jpg)
A Take on Composable Customization
Works presented here are from Jason Cong’s research group @ UCLA
![Page 3: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/3.jpg)
• Tightly Coupled Accelerator (TCA)• Extended Instruction (e.g. MAC, SQRT)
• Dedicated Accelerator (e.g. FFT, MPEG4)
• Loosely Coupled Accelerator (LCA)• Acts independently of individual cores
• Can be shared among cores/resources
• Essentially the “accelerator” we normally see.
TCA vs LCA
![Page 4: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/4.jpg)
• Dedicated: accelerator executes a program using domain-specific component. • Examples: GPU
• Programmable accelerator: Use programming fabrics to customized accelerator• Ex. FPGA-based accelerator
• Composable: combines accelerator building blocks into an accelerator
LCA
![Page 5: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/5.jpg)
• Dark silicon provide extra area for incorporating more accelerator?• Yes… but how many accelerator do we really need?
• An LCA may be useless for new algorithms or new domains
• Essentially, it is not practical to build an accelerator for every single application
• LCA is • Often under-utilized
• Contain many replicated structures (things like fp-ALUs, DMA engines, SPM)• Unused when the accelerator is unused
Why bother composing accelerator?
![Page 6: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/6.jpg)
How do we compose an accelerator?
• ABB (Accelerator building block)• A Block of accelerator unit that performs small specific task
From CHARM: A Composable Heterogeneous Accelerator-Rich Microprocessor ISLPED’12
![Page 7: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/7.jpg)
Example of ABB Flow-Graph (Denoise)
2
![Page 8: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/8.jpg)
Example of ABB Flow-Graph (Denoise)
--
**
--
**
--
**
--
**
--
**
--
**++ ++ ++
++
++
sqrtsqrt
1/x1/x
2
![Page 9: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/9.jpg)
Example of ABB Flow-Graph (Denoise)
--
**
--
**
--
**
--
**
--
**
--
**++ ++ ++
++
++
sqrtsqrt
1/x1/x
2
ABB1: Poly
ABB2: Poly
ABB3: Sqrt
ABB4: Inv
![Page 10: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/10.jpg)
Example of ABB Flow-Graph (Denoise)
--
**
--
**
--
**
--
**
--
**
--
**++ ++ ++
++
++
sqrtsqrt
1/x1/x
2
ABB1:Poly
ABB2: Poly
ABB3: Sqrt
ABB4: Inv
![Page 11: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/11.jpg)
Micro Architecture of CHARM• ABB
• Accelerator Building Blocks (ABB) • Primitive components that can be
composed into accelerators
• ABB islands• Multiple ABBs• Shared DMA controller, SPM and
NoC interface
• ABC• Accelerator Block Composer (ABC)
• To orchestrate the data flow between ABBs to create virtual accelerator
• Arbitrate requests from cores
• Other components• Cores• L2 Banks• Memory controllers
![Page 12: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/12.jpg)
ABC Internal Design• ABC sub-components
• Resource Table(RT): To keep track of available/used ABBs
• Composed LCA Table (CLT): Eliminates the need to re-compose virtual LCAs
• Task List (TL): To queue the broken virtual LCA requests (to smaller data size)
• TLB: To service and share the translation requests by ABBs
• Task Flow-Graph Interpreter (TFGI): Breaks the virtual LCA DFG into ABBs
• vLCA Composer (vLC): Compose the virtual LCA using available ABBs
• Implementation• RT, CLT, TL and TLB are implemented
using RAM• TFGI has a table to keep ABB types and
an FSM to read task-flow-graph and compares
• vLC has an FSM to go over CLT and RT and check mark the available ABBs
Resource Table
Composed LCA Table
TLB
Task List
DFG Interpreter
vLCAComposer
From ABBs
(Done signal)
Cores
Accelerator Block Composer
To ABBs
(allocate)
ABBs
(TLB service)
![Page 13: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/13.jpg)
An Example of ABB Library (for Medical Imaging)
Internal
of Poly
o0 o1 o2 o3
o4 o5
o6
![Page 14: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/14.jpg)
Virtual LCA Composition Process
ABB
ISLAND1
ABB
ISLAND2
ABB
ISLAND3
ABB
ISLAND4
x
y
x
w
z
w
y
z
All islands have X, Y, Z, W
For Simplicity only those
ABBs which are available
now are shown
![Page 15: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/15.jpg)
ABB
ISLAND1
ABB
ISLAND2
ABB
ISLAND3
ABB
ISLAND4
Virtual LCA Composition Process
1. Core initiation• Core sends the task description: task
flow-graph of the desired LCA to ABC together with polyhedral space for input and output
x
y
x
w
z
w
y
z
x
y z
10x10 input and output
Task description
![Page 16: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/16.jpg)
ABB
ISLAND1
ABB
ISLAND2
ABB
ISLAND3
ABB
ISLAND4
Virtual LCA Composition Process
2. Task-flow parsing and task-list creation• ABC parses the task-flow graph and breaks the
request into a set of tasks with smaller data size and fills the task list
x
y
x
w
z
w
y
zNeeded ABBs: “x”, “y”, “z”
With task size of 5x5 block,
ABC generates 4 tasks
ABC generates internally
![Page 17: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/17.jpg)
ABB
ISLAND1
ABB
ISLAND2
ABB
ISLAND3
ABB
ISLAND4
Virtual LCA Composition Process
3. Dynamic ABB mapping• ABC uses a pattern matching algorithm
to assign ABBs to islands• Fills the composed LCA table and
resource allocation table
x
y
x
w
z
w
y
z
Island ID
ABB Type
ABB ID Status
1 x 1 Free
1 y 1 Free
2 x 1 Free
2 w 1 Free
3 z 1 Free
3 w 1 Free
4 y 1 Free
4 z 1 Free
![Page 18: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/18.jpg)
ABB
ISLAND1
ABB
ISLAND2
ABB
ISLAND3
ABB
ISLAND4
Virtual LCA Composition Process3. Dynamic ABB mapping
• ABC uses a pattern matching algorithm to assign ABBs to islands
• Fills the composed virtual LCA table and resource allocation table
x
y
x
w
z
w
y
z
Island ID
ABB Type
ABB ID Status
1 x 1 Busy
1 y 1 Busy
2 x 1 Free
2 w 1 Free
3 z 1 Busy
3 w 1 Free
4 y 1 Free
4 z 1 Free
![Page 19: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/19.jpg)
ABB
ISLAND1
ABB
ISLAND2
ABB
ISLAND3
ABB
ISLAND4
Virtual LCA Composition Process4. LCA cloning
• Repeat to generate more virtual LCAs if ABBs are available
x
y
x
w
z
w
y
z
Core ID
ABB Type
ABB ID Status
1 x 1 Busy
1 y 1 Busy
2 x 1 Busy
2 w 1 Free
3 z 1 Busy
3 w 1 Free
4 y 1 Busy
4 z 1 Busy
![Page 20: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/20.jpg)
ABB
ISLAND1
ABB
ISLAND2
ABB
ISLAND3
ABB
ISLAND4
Virtual LCA Composition Process
5. ABBs finishing task• When ABBs finish, they signal the ABC.
If ABC has another task it sends otherwise it frees the ABBs
x
y
x
w
z
w
y
z
Island ID
ABB Type
ABB ID Status
1 x 1 Busy
1 y 1 Busy
2 x 1 Busy
2 w 1 Free
3 z 1 Busy
3 w 1 Free
4 y 1 Busy
4 z 1 Busy
DONE
![Page 21: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/21.jpg)
ABB
ISLAND1
ABB
ISLAND2
ABB
ISLAND3
ABB
ISLAND4
Virtual LCA Composition Process
5. ABBs being freed• When an ABB finishes, it signals the
ABC. If ABC has another task it sends otherwise it frees the ABBs
x
y
x
w
z
w
y
z
Island ID
ABB Type
ABB ID Status
1 x 1 Busy
1 y 1 Busy
2 x 1 Free
2 w 1 Free
3 z 1 Busy
3 w 1 Free
4 y 1 Free
4 z 1 Free
![Page 22: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/22.jpg)
ABB
ISLAND1
ABB
ISLAND2
ABB
ISLAND3
ABB
ISLAND4
Virtual LCA Composition Process
6. Core notified of end of task• When the virtual LCA finishes ABC
signals the core
x
y
x
w
z
w
y
z
Island ID
ABB Type
ABB ID Status
1 x 1 Free
1 y 1 Free
2 x 1 Free
2 w 1 Free
3 z 1 Free
3 w 1 Free
4 y 1 Free
4 z 1 Free
DONE
![Page 23: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/23.jpg)
Limitation?
• Composing accelerator from building blocks still only serve limited range of applications• So incorporate Programmable fabric
![Page 24: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/24.jpg)
ASICS vs. Programmable Accelerator
ASICS Programmable
+ Fast+ Small Area (per accelerator)+ Energy Efficient- Inflexible- Need more as applications become diverse
Pretty much the opposite+ Reconfigurable+ Small Area (Overall)+ Good Utilization- Not Efficient- Slower than ASICs
![Page 25: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/25.jpg)
CAMEL Architecture (ISLPED’13)
![Page 26: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/26.jpg)
• Operating accelerators with different speeds (frequencies) can create a bottleneck. Especially, since PFs are slower than ABBs.
Challenges in incorporating PF
![Page 27: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/27.jpg)
• Duplicates slow accelerators to bring up throughput
Rate-Matching Technique
![Page 28: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/28.jpg)
Runtime PF Allocations
![Page 29: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/29.jpg)
Note that is kernel being mapped is too large for total # of ASICs + PFs, task flow graph is partitioned (in a way that minimize data transfer)
Compiler Framework
![Page 30: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/30.jpg)
• 11.6X performance improvement, 13.9X energy savings over CHARM (up to over 30X from GP)
• Experimental results found optimal percentage of PFs to be around 30% for application domain like Medical imaging/Navigation, and 50% for commercial application domain and computer visions
• Still more work to be done!
Result?
![Page 31: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/31.jpg)
Brick and Mortar Silicon Manufacturing
Martha MercaldiMark Oskin, Todd Austin, Karl Bohringer, Azita Emami
University of Washington, University of Michigan, Columbia University
January 11, 2007
1
1
![Page 32: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/32.jpg)
Declining ASIC Starts
0
3,750
7,500
11,250
15,000
1997
1998
1999
2000
2001
2002
2003
2004
2005
[DataQuest] 2
2
![Page 33: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/33.jpg)
Cost of Production
[www.edn.com] 3
FPGAStandard Cell ASIC
Product Volume
Pro
duct
ion
Cos
t
3
![Page 34: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/34.jpg)
Cost of Production
[www.edn.com] 4
FPGAStandard Cell ASICBrick & Mortar Goal
Product Volume
Pro
duct
ion
Cos
t
4
![Page 35: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/35.jpg)
System on Chip
• Assemble system out of pre-designed components
• Reduce design time
• In 2004, one engineer costed $392,000 annually [www.design-reuse.com]
• Minimize bugs
• Initial bugs can cost 50% of revenue [www.design-reuse.com]
[www.tomshardware.co.uk]
PXA27X processor
5
5
![Page 36: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/36.jpg)
Brick and Mortar: Assembly
6
uP
ETH
USB
VGA
DMA
PCI
uP
ETHUSB
VGA
DMA
3DES
• Bricks -- ASIC chips
• standard interface
• implement standard functions
• i.e., USB, VGA controller, ethernet NIC, PCI bridge, DMA, SRAM, 3DES, JPEG codec, RISC core
6
![Page 37: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/37.jpg)
uPETH
USB
Brick and Mortar: Assembly
7
uP ETH USB
uP ETH USB
7
![Page 38: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/38.jpg)
uP ETH USB
I/O cap
uP ETH USB
I/O cap
Brick and Mortar: Assembly
8
uP ETH USB
I/O cap
8
![Page 39: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/39.jpg)
Brick and Mortar: Chip
9
9
![Page 40: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/40.jpg)
Brick and Mortar: I/O Pads
10
• One surface covered with I/O pads
• 25 um x 25 um / pad
• 2.5 Gbps / pad
10
![Page 41: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/41.jpg)
Brick and Mortar: I/O Cap Interconnect
11
• I/O cap -- ASIC chip implementing inter-brick interconnect
• packet-switched network
• FPGA-like, island style configurable interconnect
11
![Page 42: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/42.jpg)
uP
ETH
USB JPEG codec
FFT uP + 256K SRAM
uP ETH USB uP + 256K SRAMFFT JPEG codec
I/O cap
uP + 256K SRAM
I/O cap
uP + 256K SRAMFFT JPEG codec uP ETH USB
I/O cap
uP + 256K SRAM FFT JPEG codec uP ETH USB
I/O cap
uP + 256K SRAM FFT JPEG codec
Brick and Mortar: Multiple Brick Sizes
12
12
![Page 43: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/43.jpg)
Brick and Mortar: Multiple Brick Sizes
13
Function Cite Circuit Max. Circuit Min. Perf. 0.25mm2 1.0mm2 4.0mm2
Area (um2) Freq. (MHz) (Mbps) brick brick brick
Valid Freq. Range (MHz)
Small Bricks
USB 1.1 [33] 2,201 2941 12 2 - 2941 No benefit No benefit
PHYSICAL LAYER
VITERBI [45] 2,614 1961 - N/A - 1961 No benefit No benefit
VGA/LCD [33] 4,301 1219 - N/A - 1046 N/A -1219 No benefit
CONTROLLER
WB DMA [33] 13,684 1163 - N/A - 521 N/A - 1163 No benefit
MEMORY [33] 29,338 952 - N/A - 843 N/A - 952 No benefit
CONTROLLER
TRI MODE [33] 32,009 893 1000 125 - 893 No benefit No benefit
ETHERNET
PCI BRIDGE [33] 76,905 1042 - N/A - 610 N/A - 1042 No benefit
WB Switch [33] 81,073 1087 - N/A - 88 N/A - 353 N/A - 1087
(8 master, 16 slave)
FPU [33] 85,250 1515 - N/A - 505 N/A - 1515 No benefit
DES [33] 85,758 1370 1000 16 - 1203 16 - 1370 No benefit
16K SRAM [6] 195,360 2481 - N/A - 2481 No benefit No benefit
(Singleport)
AHO-CORASIK [50] 201,553 2481 - N/A - 1331 N/A - 2481 No benefit
STR. MATCH
RISC CORE (NO [33] 219,971 1087 - N/A - 1087 No benefit No benefit
FPU) / 8K CACHE [6]
8K SRAM [6] 230,580 1988 - N/A - 1988 No benefit No benefit
(Dualport)
Medium Bricks
TRIPLE [33] 294,075 1282 1000 No space 16 - 1282 No benefit
DES
FFT [44] 390,145 1220 - No space N/A - 1220 No benefit
JPEG DECODER [33] 625,457 629 - No space N/A - 629 No benefit
64K SRAM [6] 682,336 2315 - No space N/A - 2315 No benefit
(Singleport)
32K SRAM [6] 733,954 1842 - No space N/A - 1842 No benefit
(Dualport)
RISC CORE [33] 864,017 1087 - No space N/A - 1087 No benefit
+ 64K CACHE [6]
Large Bricks
256K SRAM [6] 2,729,344 2315 - No space No space N/A - 2315
(Singleport)
128K SRAM [6] 2,935,817 2882 - No space No space N/A - 2882
(Dualport)
RISC CORE + [33] 3,111,025 1087 - No space No space N/A - 1087
256K CACHE [6]
Table 1: IP Block Synthesis and Brick Assignment: This table shows the synthesis-produced area and timing
characteristics of each brick-candidate IP block. Each block has been assigned to the smallest brick which met
its area and bandwidth constraints. Note how some of the blocks that we have assigned to small bricks could take
advantage of the increased I/O bandwidth afforded by larger bricks (indicated by the increased frequency range).
6
.5 mm
1 mm
2 mm
13
![Page 44: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/44.jpg)
Advantages of Brick and Mortar
• Low manufacturing costs
• no custom masks
• small design & verification costs
• low-cost assembly system (fluidic self assembly)
• ASIC-like degree of circuit integration
• Heterogeneous processes for bricks
• Exclude defective components from assembly
• Leverage process variation for high performance designs
14
14
![Page 45: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/45.jpg)
Preliminary Performance Analysis
• Three, 16-way CMP designs
• Only 8% - 36% slowdown relative to ASIC
•
15
15
![Page 46: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/46.jpg)
Why RAMP?
• Once a design has been tested and validated on RAMP platform
• Less costly, per unit, than FPGAs (or boards)
• Higher-speed than FPGAs
16
16
![Page 47: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/47.jpg)
Conclusion
• Systems built out of ASIC bricks bonded to an interconnect ASIC
• A viable, low-cost technology if properly architected:
• appropriate brick functions
• general, flexible interconnect
• efficient inter-ASIC communication
17
17
![Page 48: Summary: Benefits of App-Specific Design...Summary: Benefits of App-Specific Design Speed, Efficiency Flexibility, Programmability H/W designs General Purpose Processors General Purpose](https://reader033.vdocuments.us/reader033/viewer/2022041522/5e2eed80cf57bf7c1876527f/html5/thumbnails/48.jpg)
Questions & Discussion
18
18