the era of many-module soc : revisiting the noc mapping problem

34
Module Module Module Module Module Module Module Module Module Module Module Module Module Module R R R R R R R R R R R R R R Module R R R Technion – Israel Institute of Technology The Era of Many-Module SoC: Revisiting the NoC Mapping Problem Isask’har (Zigi) Walter, Israel Cidon, Avinoam Kolodny, Daniel Sigalov December, 2009

Upload: rufus

Post on 07-Feb-2016

26 views

Category:

Documents


0 download

DESCRIPTION

The Era of Many-Module SoC : Revisiting the NoC Mapping Problem. Technion – Israel Institute of Technology. Isask’har ( Zigi ) Walter, Israel Cidon , Avinoam Kolodny , Daniel Sigalov. December, 2009. SoC Revolution. PE1. PE2. PE3. R. R. R. PE1. PE2. PE3. R. R. R. PE4. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

Module

Module

Module

Module

Module

Module

Module

Module

Module Module Module

Module

Module

Module

R

R

R R R

R

RR R R R

R R

R

Module

R

R

R

Technion – Israel Institute of Technology

The Era of Many-Module SoC:Revisiting the NoC Mapping Problem

Isask’har (Zigi) Walter, Israel Cidon, Avinoam Kolodny, Daniel Sigalov

December, 2009

Page 2: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

2

SoC Revolution

PE1

R

PE2

PE3

R R

PE4

R

PE5

PE6

R R

PE1

PE2

PE3

PE4

PE5

PE6

Bus-based system NoC-based system

Page 3: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

3

SoC Evolution

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

RR

R RR

R RR

R

R R R R R

R R R R R

R R R R R

R R R R R

R R R R R

Page 4: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

4

Processor Evolution

CPU

Cache

Single Core

CPU1

Cache

Dual Core

CPU2

Cache CPU1Cache

Quad Core

CPU3Cache

CPU2Cache

CPU4Cache

Page 5: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

5

How would such chips be like?

Most likely Power still important Highly parallel IP reuse

Ease of design and verification

The Era of Many-Module SoC

High Certainty

Totally unknown

Large number of modules

NoC Interconne

ct

Applications

R

Page 6: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

Special purpose cores replace general purpose processors Power considerations

Future SoCs - Observation#1

GeneralPurpose

CPU

Pre.Proc.

DSP CPU

Task1

Task2

MEM

Task3

Task4

MEM

Memory

Task1

Task2

MEM

Task3

Task4

MEM

Memory

Task5

Task5

GPU

6

Processing pipes are getting longer

Page 7: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

7

Future SoCs - Observation#2

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

?R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

Large diversity All modules are

unique

Highly regular Classes of Replicated

cores standard modules (DSP,

HW accelerators, Cache banks, etc.)

Page 8: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

8

Increased use of specialized cores Pipes are getting longer

Replication of processing elements

How is the design flow affected? This work – mapping of the NoC

The Era of Many-Module SoCObservation#

1

Observation#2

Page 9: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

9

The Era of Many Module SoC Revisiting the Mapping Problem Cross-Entropy Optimization Evaluation

Outline

Page 10: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

10

Given Traffic pattern(s)

a set (or sets) of pair-wise bandwidth requirements and timing constraints

Routing Topology

Goal Find efficient mapping of cores to tiles

NoC Mapping

PE4

PE1 PE2

PE5

PE3

PE6

PE7 PE8 PE9 PE4

PE1

PE2PE5

PE3

PE6PE7

PE8

PE9

PE4 PE1

PE2

PE5

PE3 PE6

PE7PE8

PE9

PE4

PE1

PE2

PE5

PE3

PE6

PE7

PE8

PE9

PE4

PE1PE2

PE5

PE3

PE6

PE7

PE8 PE9

Page 11: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

11

An important design step Mapping affects power and performance!

A difficult problem! Often heuristic algorithms are used

Common optimization goals Minimize (dynamic) power Minimize power + maximize performance Minimize power subject to performance

constraints

Mapping Optimization

Page 12: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

12

Typical modeling Power and latency proportional to

distance Cost function:

Modeling

1 ,

( ) ( , )l i jl L i j N

Cost P BW b Dist i j

Page 13: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

13

Calculating Mapping Cost

1 ,

( ) ( , )l i jl L i j N

Cost P BW b Dist i j

2( ) 30 2 100 2 260Cost

PE1 PE2

PE4 PE5

PE3

PE6

1 2 6 4 3( ) 30 ( ) 100 ( )Cost Dist PE PE Dist PE PE

100

30

1 2 6 2 6 4 3 4 3( ) ( ) ( ) ( ) ( )Cost bw PE PE Dist PE PE bw PE PE bw PE PE

1( ) 30 2 100 3 360Cost

Mapping π1

Mapping π2

Page 14: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

14

Motivation - Example #1

Optimal mapping (π1):

PE1

PE2

PE3

PE4

MEM1

MEM2

PE1 MEM2

PE2 PE3

PE4

MEM1

1 1 1

1 1 1 1

11 ,

( ) ( , ) 9i ji j N

Cost b Dist i j

Page 15: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

15

Optimal mapping (π2):

Let the mapping algorithm assign the flows!

Motivation - Example #1 (cont.)

PE1

PE2

PE3

PE4

2*MEM

PE1

MEM1

PE2

PE3 PE4

MEM2 PE1 MEM1

PE2 PE3

PE4

MEM2

Cost(π1)=9Cost(π2)=7

Page 16: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

16

Motivation - Example #1 (cont.)

PE1

PE2

PE3

PE4

MEM1

MEM2

1 1 1

1

1 11

PE1

MEM1

PE2

PE3 PE4

MEM2

Cost(π2)=7

The mapping algorithm should be aware of replicated modules!

Page 17: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

17

Pair-wise point-to-point requirements For example, in a 4-module system:

Classic Performance Constraints

PE1

2PE2

1 PE3

1 1PE4

PE4 PE3 PE2 PE1

Page 18: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

18

PE1 PE2 PE3

PE4

Motivation - Example #2

Timing Requirement

PEs Stream ID

4 PE1PE2PE3PE4 Stream 1

1 PE2PE4 Stream 2

Stream 1

Stream 2

Page 19: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

19

Example #2 – Pair-wise req.

No feasible mapping!

PE1 PE2 PE3

PE4

PE2.

PE4.

PE1

PE3 PE2. PE4.

PE1PE3

PE2.PE4

PE1PE3

2 PE2

1 PE3.

1 1 PE4

PE3 PE2 PE1

Req=2 Req=1

Req=1Req=1

Page 20: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

20

PE1 PE2 PE3

PE4

Application-Level Requirements

Requirement PEs Stream ID

4 PE1PE2PE3PE4 Stream 1

1 PE2PE4 Stream 2

PE2

PE4

PE1

PE3

Stream 1

Stream 2

A feasible mapping does exist!

Req=1Req=2

Req=1Req=1

It’s better to work with the application level requirements

Page 21: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

21

Find efficient mappings by extending the formulation of the mapping problem Adding degrees of freedom

Degree of freedom #1 Leverage existence of replicated modules

Degree of freedom #2 Replace p2p constraints with end-to-end,

application-level requirements

This Work

Page 22: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

22

Modifying the Formulation (1)

Time Req. BW Flow

3 100 PE1DSP3

12 200 PE2DSP4

15 100 PE2SRAM1

5 100 PE3SRAM2

… … …

Time Req. BW Flow

3 100 PE1<ANY DSP>

12 200 PE2<ANY DSP>

15 100 PE2<ANY SRAM>

5 100 PE3<ANY SRAM>

… … …

Leverage existence of replicated modules Allow the mapping algorithm to allocate

flows to the best replicated module

Page 23: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

23

∞ 2

4 4 3

∞ 3 1 3

∞ 4 7 7 3

∞ 3 ∞ 2 4 ∞

4 ∞ 7 2 3 ∞ 5

∞ ∞ 2 ∞ 1 5 6 ∞

7 7 3 ∞ ∞ 5 3 ∞ 1

∞ 3 ∞ 3 2 1 2 ∞ 3 1

Modifying the Formulation (2)

E2E Req.

Stream’s PEs Stream ID

23 PE1PE3PE9PE4PE10

1

12 PE5PE2PE3PE8PE7PE6PE10

2

15 PE5PE3PE9 3

20 PE7PE8PE2PE3 4

2 PE1PE2 5

… … …

In this paper, for synthetic task graphs Did so for a real application too

P2P timing req. E2E timing req.

Replace p2p constraints with end-to-end, application-level requirements

Page 24: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

24

The Era of Many Module SoC Revisiting the Mapping Problem Cross-Entropy Optimization Evaluation

Outline

Page 25: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

25

Modern optimization heuristic Good at combinatorial optimization problems

Akin to evolutionary algorithms Generation of new solutions is based on

sampling and estimation Inherently a global search method

Reduced risk of getting trapped in a local minimum

Cross Entropy Optimization

Page 26: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

26

Given an initial parameter vector v=v0, sample a random population of K solutions x1,x2,…,xk from the distribution given by f(x;v).

Evaluate the costs S(xi),i=1,…,K. Using the ρK (0<ρ<1) elite (lowest cost) samples, obtain a new density function

f(x;v) by calculating a new vector v via Maximum Likelihood (ML) estimation. Repeat steps 1-3 with the new vector v unless maximum number of iterations is

reached or no improvement is obtained for a predefined number of iterations.

Cross Entropy Optimization

1. Generate 10 random mappings: π1, π2, …, π10

2. Find 3 lowest cost mappings: π2, π5, π7

3. Examine those 3 best mappings:A. For each tile, calculate the probability

core PEi is mapped to that tile

B. Update probabilities accordingly

For example:

Page 27: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

27

Prob(TileAPE1)= Prob(TileAPE2)= Prob(TileAPE3)=Prob(TileAPE4)=0.25

Prob(TileBPE1)= Prob(TileBPE2)= Prob(TileBPE3)=Prob(TileBPE4)=0.25

Prob(TileCPE1)= Prob(TileCPE2)= Prob(TileCPE3)=Prob(TileCPE4)=0.25

Prob(TileDPE1)= Prob(TileDPE2)= Prob(TileDPE3)=Prob(TileDPE4)=0.25

CE Example

PE3

PE1 PE2

PE4

TileC

TileA

TileD

TileB

π1

PE3

PE1

PE2

PE4

π2

PE3

PE1 PE2

PE4

π3

PE3

PE1PE2

PE4

π4

PE3

PE1 PE2

PE4

π5

PE3

PE1

PE2

PE4

π6

PE3 PE1

PE2PE4

π7

PE3

PE1

PE2

PE4

π8

PE3

PE1

PE2

PE4

π9

PE3

PE1 PE2

PE4

π10

Page 28: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

28

Prob(TileAPE1)=1

Updating Probabilities

PE1 PE2

PE4PE3PE3

PE1

PE2

PE4 PE1 PE2

PE4 PE3

Prob(TileBPE2)=2/3 Prob(TileBPE4)=1/3

Prob(TileDPE2)=1/3 Prob(TileDPE3)=1/3 Prob(TileDPE4)=1/3

Prob(TileCPE3)=2/3 Prob(TileCPE4)=1/3

π2 π5 π3

Following iteration uses these updates probabilities Gradually, probabilities converge to 0/1

TileC

TileA

TileD

TileB

Page 29: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

29

The Era of Many Module SoC Revisiting the Mapping Problem Cross-Entropy Optimization Evaluation

Outline

Page 30: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

30

Scenario 6x6 mesh NoC Synthetic, randomized SoC

Task graphs (and task-to-core mapping) Varying number of replicated modules Varying timing constraints (Real application in DATE10 paper)

Compare with best cost of classic mapping Averaging multiple runs

Evaluation

Page 31: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

31

“Class”: a group of identical PEs Total number of replicated cores=

{Number of classes}*{class size}

Accounting for Replication

Sa

vio

ngs

[%]

Replicated Modules [%]

One ClassTwo ClassesFour Classes

Cost

Reduct

ion

]%[

10 20

Total number of Replicated Modules]%[ 30 40 50

Page 32: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

32

SoCs with a pipeline data path and background P2P traffic Varying pipeline slack Different amounts of background

constraints

Application-Level Requirements

Tight Medium Loose Very Loose

Sav

ings

[%]

Allowed Pipeline Slack

extra constraints extra constraints extra constraints extra constraints

Cost

Reduct

ion

]%[

Pipeline Slack

Page 33: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

33

We are going into the era of “Many module SoC”

Extend the mapping to account for Classes of replicated modules Application-level requirements

Meaningful power savings

But mapping is an example Routing? Task assignment? Link design?

Topology selection?

Conclusions and Future Work

Page 34: The Era of Many-Module  SoC : Revisiting the  NoC  Mapping Problem

34

Thank you!

Questions?

[email protected]

The Era of Many-Module SoC

M odule

M odule M odule

M odule M odule

M odule M odule

M odule

M odule

M odule

M odule

M odule

QNoCResearch

GroupGroup

ResearchQNoC