motivation and objective

6
1 MOTIVATION AND OBJECTIVE Discrete Signal Transforms (DSTs) DFT, DCT: major performance component in many applications Hardware accelerated but at high area cost Example: 4P DFT formulation and others Distributed (dedicated) hardware architectures (DHAs) Partitioning plays key role in performance and resource optimization. Partitioning beyond multi-device Multi-core GPPs: IBM CELL BE, Intel Core Duo System-on-Chip: Network-On-Chip Need tools to aid in partition exploration, mapping, implementation design automation Can we improve partitioning by introducing high-level DST considerations? DST Partitionin g Distributed Hardware Architecture M 0 D 0 M 1 D 1 M k-1 D k-1 M 0 D 0 M 1 D 1 M k-1 D k-1 C rossbar

Upload: kenyon-oneill

Post on 31-Dec-2015

18 views

Category:

Documents


0 download

DESCRIPTION

MOTIVATION AND OBJECTIVE. Discrete Signal Transforms (DSTs) DFT, DCT: major performance component in many applications Hardware accelerated but at high area cost Example: 4P DFT formulation and others Distributed (dedicated) hardware architectures (DHAs) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MOTIVATION AND OBJECTIVE

1

MOTIVATION AND OBJECTIVE

Discrete Signal Transforms (DSTs)– DFT, DCT: major performance component in many applications– Hardware accelerated but at high area cost– Example: 4P DFT formulation and others

Distributed (dedicated) hardware architectures (DHAs)– Partitioning plays key role in performance and resource

optimization.

Partitioning beyond multi-device– Multi-core GPPs: IBM CELL BE, Intel Core Duo– System-on-Chip: Network-On-Chip

Need tools to aid in partition exploration, mapping, implementation design automation

Can we improve partitioning by introducing high-level DST considerations?

DST

Partitioning

Distributed Hardware Architecture

M0

D0

M1

D1

Mk-1

Dk-1

Crossbar

M0

D0

M1

D1

Mk-1

Dk-1

Crossbar

Page 2: MOTIVATION AND OBJECTIVE

2

DMAGIC METHODOLOGY

DMAGIC = DST Mapping using Algorithmic and Graph Interaction and Computation

2

FormulationManipulator

FormulationTo DFG

Heuristic Control

Partition/Placement

Estimators

High-level partition solution

KPAFormulation

DFGCost andIndicators

RuleSelection

KPA DSTFormulation

ArchitecturalDescription

HypergraphDST Formulation

Page 3: MOTIVATION AND OBJECTIVE

TOOLS

Weighted DFG with level information

Operator matrices: Identity, Transform, Permutation, Unitary, Unitary Transpose, Twiddle

KPA operations: Tensor product (), Direct Sum ( ), Matrix Multiplication

KTG2 4 2 2 2( )( )....F I I F I

Graph partitioning/placement algorithm

Kronecker to Graph Tool

P/P inspired by Kernighan Lin bipartition heuristic

Extended to k-way partitioning for heterogeneous channels

Cost function sensible to DHA main concerns

DST graphical considerations heuristics

0 1 1, , , mP

weight of channel iii i WR

required comm. through i

Cost Function

3

Page 4: MOTIVATION AND OBJECTIVE

FORMULATION EXPLORATION

4

Use DST rules to explore space of equivalent formulations in search for one that better suits the target architecture.

Combinatorial explosion of the solution space.

Find rules amenable to hardware implementation.

Conducted experiments to assess the impact of transformations on partition quality. Results used to devise exploration strategy.

Objective:

Challenges:

Approach:

Experiments:

Effect of inter-column permutations Have effect on solution quality, yet hard to establish heuristic.Effect of node granularity

Effect of breakdown strategy using DST decomposition rule

10

5 5

4 1

2 2

10

2 8

4 4

Greedy ‘top-down’ formulation exploration using breakdown strategy

Observations on exhaustive small sized transforms

Page 5: MOTIVATION AND OBJECTIVE

5

RESULTS

Latency Reduction: Average = 23.3%, Peak = 34.1% Run-Time Reduction: Average = 98.0%, Peak = 99.9%

Latency vs. FFT size

FFT size16 32 64 128 256

Late

ncy

(c-s

teps

)

1

10

100

1000

10000

[Srinivasan01]DMAGIC

Run time vs. FFT size

FFT size16 32 64 128 256

Run

-tim

e (s

econ

ds)

0.01

0.1

1

10

100

1000[Srinivasan01]DMAGIC

Latency vs. DCT size

DCT size16 32 64 128 256

Late

ncy

(c-s

teps

)

1

10

100

1000

10000

[Srinivasan01]DMAGIC

Run time vs. DCT size

DCT size16 32 64 128 256

Run

-tim

e (s

econ

ds)

0.01

0.1

1

10

100

1000[Srinivasan01]DMAGIC

53 4585

116226 175

501373

1047847

8560

13589

213161

389 330

749 595

3.4

12.3

41.189.4

196.4

0.04 0.06

13

0.06

1.6

3.7

17.5 23.5

89190

0.010.02

0.16

0.9

12.3

Page 6: MOTIVATION AND OBJECTIVE

6

CONCLUSIONS AND CONTRIBUTIONS

Multiple opportunities to improve partitioning by taking advantage of DST and DHA features.

– Graph level: regularity of permutations and operations.– Graph partitioning and area estimator can be made more sensible to DHA and DST

concerns– Algorithmic level

– Reformulation has significant impact on partition quality– Improvements over generic methodology

Latency reduction (23.3% avg, 34.1% max), Runtime (98.8% avg, 99.9% max)

Tools:– KTG: automated and extensible methodology for conversion of Kronecker product algebra

(KPA) formulations into DFGs– Architectural model and high-level estimator for the implementation of distributed DSTs– Graph partitioning heuristic for k-way for DSTs to DHAs

New heuristic for exploring DST formulation space

New arbitrary decomposition DCT formulation