motivation and objective
DESCRIPTION
MOTIVATION AND OBJECTIVE. Discrete Signal Transforms (DSTs) DFT, DCT: major performance component in many applications Hardware accelerated but at high area cost Example: 4P DFT formulation and others Distributed (dedicated) hardware architectures (DHAs) - PowerPoint PPT PresentationTRANSCRIPT
1
MOTIVATION AND OBJECTIVE
Discrete Signal Transforms (DSTs)– DFT, DCT: major performance component in many applications– Hardware accelerated but at high area cost– Example: 4P DFT formulation and others
Distributed (dedicated) hardware architectures (DHAs)– Partitioning plays key role in performance and resource
optimization.
Partitioning beyond multi-device– Multi-core GPPs: IBM CELL BE, Intel Core Duo– System-on-Chip: Network-On-Chip
Need tools to aid in partition exploration, mapping, implementation design automation
Can we improve partitioning by introducing high-level DST considerations?
DST
Partitioning
Distributed Hardware Architecture
M0
D0
M1
D1
Mk-1
Dk-1
Crossbar
M0
D0
M1
D1
Mk-1
Dk-1
Crossbar
2
DMAGIC METHODOLOGY
DMAGIC = DST Mapping using Algorithmic and Graph Interaction and Computation
2
FormulationManipulator
FormulationTo DFG
Heuristic Control
Partition/Placement
Estimators
High-level partition solution
KPAFormulation
DFGCost andIndicators
RuleSelection
KPA DSTFormulation
ArchitecturalDescription
HypergraphDST Formulation
TOOLS
Weighted DFG with level information
Operator matrices: Identity, Transform, Permutation, Unitary, Unitary Transpose, Twiddle
KPA operations: Tensor product (), Direct Sum ( ), Matrix Multiplication
KTG2 4 2 2 2( )( )....F I I F I
Graph partitioning/placement algorithm
Kronecker to Graph Tool
P/P inspired by Kernighan Lin bipartition heuristic
Extended to k-way partitioning for heterogeneous channels
Cost function sensible to DHA main concerns
DST graphical considerations heuristics
0 1 1, , , mP
weight of channel iii i WR
required comm. through i
Cost Function
3
FORMULATION EXPLORATION
4
Use DST rules to explore space of equivalent formulations in search for one that better suits the target architecture.
Combinatorial explosion of the solution space.
Find rules amenable to hardware implementation.
Conducted experiments to assess the impact of transformations on partition quality. Results used to devise exploration strategy.
Objective:
Challenges:
Approach:
Experiments:
Effect of inter-column permutations Have effect on solution quality, yet hard to establish heuristic.Effect of node granularity
Effect of breakdown strategy using DST decomposition rule
10
5 5
4 1
2 2
10
2 8
4 4
Greedy ‘top-down’ formulation exploration using breakdown strategy
Observations on exhaustive small sized transforms
5
RESULTS
Latency Reduction: Average = 23.3%, Peak = 34.1% Run-Time Reduction: Average = 98.0%, Peak = 99.9%
Latency vs. FFT size
FFT size16 32 64 128 256
Late
ncy
(c-s
teps
)
1
10
100
1000
10000
[Srinivasan01]DMAGIC
Run time vs. FFT size
FFT size16 32 64 128 256
Run
-tim
e (s
econ
ds)
0.01
0.1
1
10
100
1000[Srinivasan01]DMAGIC
Latency vs. DCT size
DCT size16 32 64 128 256
Late
ncy
(c-s
teps
)
1
10
100
1000
10000
[Srinivasan01]DMAGIC
Run time vs. DCT size
DCT size16 32 64 128 256
Run
-tim
e (s
econ
ds)
0.01
0.1
1
10
100
1000[Srinivasan01]DMAGIC
53 4585
116226 175
501373
1047847
8560
13589
213161
389 330
749 595
3.4
12.3
41.189.4
196.4
0.04 0.06
13
0.06
1.6
3.7
17.5 23.5
89190
0.010.02
0.16
0.9
12.3
6
CONCLUSIONS AND CONTRIBUTIONS
Multiple opportunities to improve partitioning by taking advantage of DST and DHA features.
– Graph level: regularity of permutations and operations.– Graph partitioning and area estimator can be made more sensible to DHA and DST
concerns– Algorithmic level
– Reformulation has significant impact on partition quality– Improvements over generic methodology
Latency reduction (23.3% avg, 34.1% max), Runtime (98.8% avg, 99.9% max)
Tools:– KTG: automated and extensible methodology for conversion of Kronecker product algebra
(KPA) formulations into DFGs– Architectural model and high-level estimator for the implementation of distributed DSTs– Graph partitioning heuristic for k-way for DSTs to DHAs
New heuristic for exploring DST formulation space
New arbitrary decomposition DCT formulation