metrics for reconfigurable architectures characterization: remanence and scalability

Post on 14-Mar-2016

27 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Metrics for Reconfigurable Architectures Characterization: Remanence and Scalability. Pascal BENOIT G. Sassatelli – L. Torres – D. Demigny M. Robert – G. Cambon. Name.Surname@lirmm.fr. Outline. Context Remanence Operative Density Case Study: the Systolic Ring - PowerPoint PPT Presentation

TRANSCRIPT

Metrics for Reconfigurable Architectures Characterization:

Remanence and Scalability

Pascal BENOITG. Sassatelli – L. Torres – D. Demigny M. Robert – G.

CambonName.Surname@lirmm.fr

Outline

Context Remanence Operative Density Case Study: the Systolic Ring Conclusion and perspectives

Context SoC and Customizable Platform Based-

Design

SpecificationsProcessing powerAreaPower consumptionetc.

ReconfigurableHardware

(Coarse Grain)ASIC 1

DSP Reconfigurabl

eHardware

(Fine Grain)

We need metrics to compare !

ASIC 2

ControllerCPU

RAMROM

Flash

?

ControllerCPU

RAMROM

Flash

?

Context Architecture characterization

• Processing power• Power consumption• Flexibility• Parallelism potential• Dynamism• Silicon area• Scalability• …

Metrics• Dehon criterion• Remanence• Operative density

Generalisation toArchitectural model

characterisation and metrics depend on architectural

parameters

« Comparing architectures with a minimum of criteria »

Remanence Definition

NPE: # of processing elements (PE) Nc: # of PE configurable per cycle

Fe: operating frequency Fc configuration frequency

Characterizes the Dynamism # of cycles to (re)configure the whole architecture

Amount of data to compute between 2 configurations

FcNcFeNR PE

..

Interconnection

PE PE PE PE PE

instn

Configuration Memory

Processing Elements

Routing

Sequencing Unit

…inst3inst2inst1inst0Sequencer

Interconnection

PE PE PE PE PE

instn

Configuration Memory

Processing Elements

Routing

Sequencing Unit

…inst3inst2inst1inst0Sequencer

Fe

Fc

Remanence Comparisons

Only 1 cycle to (re)configure the DSP Few cycles to (re)configure coarse grain RA (8) Many cycles to (re)configure fine grain RA

NPE Nc RName Type F (MHz)

2304 0.14 16457

24 4 624 4 6

128 16 8

ARDOISE

Systolic RingDART

MorphoSys

TMS320C62

Fine Grain RA

Coarse Grain RA

Coarse Grain RA

Coarse Grain RA

DSP VLIW 8 8

33

200130

100

300 1

FcNcFeNR PE

..

Operative Density Definition

NPE: # of PE A: Core Area (relative unit ²)

Area can be expressed as a function of NPE (architectural model)

Characterizes Fixed NPE

• # of operators per relative area unit

Variable NPE• OD as a function of NPE

A(NPE) = NPE*APE+Ainterconnect(NPE)+Amemory(NPE) Asequencer(NPE)

• OD(NPE) = k A(NPE) =k.NPE the architectural model is scalable

)()(

PE

PEPE NA

NNOD

Interconnection

PE PE PE PE PE

instn

Configuration Memory

Processing Elements

Routing

Sequencing Unit

…inst3inst2inst1inst0Sequencer

Interconnection

PE PE PE PE PE

instn

Configuration Memory

Processing Elements

Routing

Sequencing Unit

…inst3inst2inst1inst0Sequencer

Operative Density Comparisons

DSP: sequencer area ARDOISE : fine granularity Coarse granularity Reconfigurable architectures

• Scalabilty of interconnect resources ?• Generalization to architectural models

)²2/)(()(²)(

µmWµmAMA

)()(

PE

PEPE NA

NNOD

Name Type Area(M²)

ARDOISE Fine Grain RA 26 12300 0.2

Systolic Ring (S=1, C=6, N=2) Coarse Grain RA 24 500 4.8

Systolic Ring (S=1, C=16, N=4) Coarse Grain RA 128 7600 1.7

DART Coarse Grain RA 24 300 8.0

MorphoSys Coarse Grain RA 128 5500 2.3

TMS320C62 DSP VLIW 8 12300 0.1

Name Type NPEArea(M²) OD (NPE)

ARDOISE Fine Grain RA 26 12300 0.2

Systolic Ring (S=1, C=6, N=2) Coarse Grain RA 24 500 4.8

Systolic Ring (S=1, C=16, N=4) Coarse Grain RA 128 7600 1.7

DART Coarse Grain RA 24 300 8.0

MorphoSys Coarse Grain RA 128 5500 2.3

TMS320C62 DSP VLIW 8 12300 0.1

Interconnection

PE PE PE PE PE

instn

Configuration Memory

Processing Elements

Routing

Sequencing Unit

…inst3inst2inst1inst0Sequencer

Interconnection

PE PE PE PE PE

instn

Configuration Memory

Processing Elements

Routing

Sequencing Unit

…inst3inst2inst1inst0Sequencer

-Architectural Model Characterization -

A Case Study:

The Systolic Ring

Architectural model Characterization The Systolic Ring

Architectural model Based on a coarse-grained

configurable PE

Dnode

RegisterFile

ALU + MULT

IN 1 IN 2

Dnode

RegisterFile

ALU + MULT

IN 1 IN 2

Architectural model Characterization The Systolic Ring

Architectural model Based on a coarse-grained

configurable PE Circular datapaths

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Architectural model Characterization The Systolic Ring

Architectural model Based on a coarse-grained

configurable PE Circular datapaths 3 parameters

• C: # of layers• N: # of Dnodes per layer

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

layer 1

layer 2

layer 3

layer 4

# of layers : 4 (C = 4) # of Dnode per layer : 2 (N = 2)

Architectural model Characterization The Systolic Ring

Architectural model Based on a coarse-grained

configurable PE Circular datapaths 3 parameters

• C: # of layers• N: # of Dnodes per layer

Switch

Dnode Dnode

Dnode Dnode

Switc

hDnode

Dnode

SwitchDnode

Dnode

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Switch

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Switch

Dnode Dnode

Dnode Dnode

Switc

hDnode

Dnode

SwitchDnode

Dnode

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Switch

Dnode Dnode

Switc

hSw

itch

SwitchSwitchSw

itch

Switc

h

SwitchSwitch

layer 1 layer 2

layer 3

layer 4

layer 5layer 6

layer 7

layer 8

# of layers : 8 (C = 8) # of Dnode per layer : 2 (N = 2)

Architectural model Characterization The Systolic Ring

Architectural model Based on a coarse-grained

configurable PE Circular datapaths 3 parameters

• C: # of layers• N: # of Dnodes per layer• S: # of Rings

Switch

Dnode Dnode

Dnode Dnode

Switc

hDnode

Dnode

SwitchDnode

Dnode

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Switch

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Switch

Dnode Dnode

Dnode Dnode

Switc

hDnode

Dnode

SwitchDnode

Dnode

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Switch

Dnode Dnode

Switc

hSw

itch

SwitchSwitchSw

itch

Switc

h

SwitchSwitch

# of layers : 8 (C = 8) # of Dnode per layer : 2 (N = 2)

1 Systolic Ring (S = 1)

layer 1 layer 2

layer 3

layer 4

layer 5layer 6

layer 7

layer 8

Architectural model Characterization The Systolic Ring

Architectural model Based on a coarse-grained

configurable PE Circular datapaths 3 parameters

• C: # of layers• N: # of Dnodes per layer• S: # of Rings

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Global Bus

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

hSw

itch

SwitchSwitchSw

itch

Switc

h

SwitchSwitch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

hSw

itch

SwitchSwitchSw

itch

Switc

h

SwitchSwitch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

hSw

itch

SwitchSwitchSw

itch

Switc

h

SwitchSwitch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

hSw

itch

SwitchSwitchSw

itch

Switc

h

SwitchSwitch

Global Bus

# of layers : 4 (C = 4) # of Dnode per layer : 2 (N = 2)

4 Systolic Ring (S = 4)

Architectural model Characterization The Systolic Ring

Architectural model Based on a coarse-grained

configurable PE Circular datapaths 3 parameters

• C: # of layers• N: # of Dnodes per layer• S: # of Rings

Control Units• Local Dnodes units

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Global Bus

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

hSw

itch

SwitchSwitchSw

itch

Switc

h

SwitchSwitch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

hSw

itch

SwitchSwitchSw

itch

Switc

h

SwitchSwitch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

hSw

itch

SwitchSwitchSw

itch

Switc

h

SwitchSwitch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

hSw

itch

SwitchSwitchSw

itch

Switc

h

SwitchSwitch

Global Bus

Dnode Sequencer

Architectural model Characterization The Systolic Ring

Architectural model Based on a coarse-grained

configurable PE Circular datapaths 3 parameters

• C: # of layers• N: # of Dnodes per layer• S: # of Rings

Control Units• Local Dnode unit• Local Ring unit

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Global Bus

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

hSw

itch

SwitchSwitchSw

itch

Switc

h

SwitchSwitch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

hSw

itch

SwitchSwitchSw

itch

Switc

h

SwitchSwitch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

hSw

itch

SwitchSwitchSw

itch

Switc

h

SwitchSwitch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

hSw

itch

SwitchSwitchSw

itch

Switc

h

SwitchSwitch

Global Bus

Local RingSequencer

Local RingSequencer

Local RingSequencer

Local RingSequencer

Architectural model Characterization The Systolic Ring

Architectural model Based on a coarse-grained

configurable PE Circular datapaths 3 parameters

• C: # of layers• N: # of Dnodes per layer• S: # of Rings

Control Units• Local Dnode unit• Local Ring unit• Global unit

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Global Bus

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

hSw

itch

SwitchSwitchSw

itch

Switc

h

SwitchSwitch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

hSw

itch

SwitchSwitchSw

itch

Switc

h

SwitchSwitch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

hSw

itch

SwitchSwitchSw

itch

Switc

h

SwitchSwitch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

hSw

itch

SwitchSwitchSw

itch

Switc

h

SwitchSwitch

Global Bus

Global Sequencer

Local RingSequencer

Local RingSequencer

Local RingSequencer

Local RingSequencer

Architectural model Characterization Remanence

Only one Systolic Ring S=1 NPE = # of Dnodes = N*C*S = N*C

Remanence formalisation

• k= C/N

PEPE NkNR .)(

0

5

10

15

20

25

30

35

40

0 20 40 60 80 100 120 140 160 180 # Dnodes

REMANENCE

k = 1

k = 2k = 4

k = 8

0

5

10

15

20

25

30

35

40

0 20 40 60 80 100 120 140 160 180 # Dnodes

REMANENCE

k = 1

k = 2k = 4

k = 8

Architectural model Characterization A(NPE) formalisation for OD(NPE)

0.18µ CMOS technology

• C = 4, N = 2, S = 1

• A(8) = 3.3 mm ²

• A(8) = 407M ²

Area formalisation:

• A ( NPE ) = f ( N, C, S )

depends on C / N ratio and S

• NPE = N.C.S

Area formalisation calibrated on these results

Switch 1 Switch 2

Switch 3 Switch 4

N1,1 N1,2

N3,1 N3,2

N2,2N2,1

N4,1 N4,2

BN1 BN3 BN2 BN4

Switch 1 Switch 2

Switch 3 Switch 4

N1,1 N1,2

N3,1 N3,2

N2,2N2,1

N4,1 N4,2

BN1 BN3 BN2 BN4

Systolic Ring layout(C=4, N=2, S=1)

Architectural model Characterization OD(NPE) for 1 Systolic Ring (S=1)

k = C/N = [ 0.25 ; 4 ]

decreasing OD(NPE)

OD(NPE) for several Systolic Ring

k = C/N = 4

multi-ring instanciations increase

scalability

0,000

0,005

0,010

0,015

0,020

0,025

0,030

0,035

0,040

0 50 100 150 200 # Dnodes

Operative Density

C/N=4

C/N=0.5

C/N=0.25

0,000

0,005

0,010

0,015

0,020

0,025

0,030

0,035

0,040

0 50 100 150 200 # Dnodes

Operative Density

C/N=4

C/N=0.5

C/N=0.25

0,000

0,005

0,010

0,015

0,020

0,025

0,030

0,035

0,040

0,045

0 50 100 150 200 # Dnodes

Operative Density

1 Systolic Ring

2 Systolic Ring

4 Systolic Ring

8 Systolic Ring

0,000

0,005

0,010

0,015

0,020

0,025

0,030

0,035

0,040

0,045

0 50 100 150 200 # Dnodes

Operative Density

1 Systolic Ring

2 Systolic Ring

4 Systolic Ring

8 Systolic Ring

Architectural model Characterization Customisation and design technique

• between 60 and 80 processing elements

0,000

0,005

0,010

0,015

0,020

0,025

0,030

0,035

0,040

0 20 40 60 80 100 120 140

# Dnodes

Ope

rativ

e D

ensi

ty

S=1

S=2

S=4

S=8

0

5

10

15

20

Remanence

Rem

anence

0,000

0,005

0,010

0,015

0,020

0,025

0,030

0,035

0,040

0 20 40 60 80 100 120 140

# Dnodes

Ope

rativ

e D

ensi

ty

S=1

S=2

S=4

S=8

0

5

10

15

20

Remanence

Rem

anence

Architectural model Characterization Customisation and design technique

• between 60 and 80 processing elements

0,000

0,005

0,010

0,015

0,020

0,025

0,030

0,035

0,040

0 20 40 60 80 100 120 140

# Dnodes

Ope

rativ

e D

ensi

ty

S=1

S=2

S=4

S=8

0

5

10

15

20

Remanence

Rem

anence

0,000

0,005

0,010

0,015

0,020

0,025

0,030

0,035

0,040

0 20 40 60 80 100 120 140

# Dnodes

Ope

rativ

e D

ensi

ty

S=1

S=2

S=4

S=8

0

5

10

15

20

Remanence

Rem

anence

Architectural model Characterization Customisation and design technique

0,000

0,005

0,010

0,015

0,020

0,025

0,030

0,035

0,040

0 20 40 60 80 100 120 140

# Dnodes

Ope

rativ

e D

ensi

ty

S=1

S=2

S=4

S=8

0

5

10

15

20

Remanence

Rem

anence

0,000

0,005

0,010

0,015

0,020

0,025

0,030

0,035

0,040

0 20 40 60 80 100 120 140

# Dnodes

Ope

rativ

e D

ensi

ty

S=1

S=2

S=4

S=8

0

5

10

15

20

Remanence

Rem

anence

Design Space

Architectural model Characterization

0,000

0,005

0,010

0,015

0,020

0,025

0,030

0,035

0,040

0 20 40 60 80 100 120 140

# Dnodes

Ope

rativ

e D

ensi

ty

S=1

S=2

S=4

S=8

0

5

10

15

20

Remanence

Rem

anence

0,000

0,005

0,010

0,015

0,020

0,025

0,030

0,035

0,040

0 20 40 60 80 100 120 140

# Dnodes

Ope

rativ

e D

ensi

ty

S=1

S=2

S=4

S=8

0

5

10

15

20

Remanence

Rem

anence

Best OD and remanenceWorst interconnect resources and processing power

Design Space

Architectural model Characterization

0,000

0,005

0,010

0,015

0,020

0,025

0,030

0,035

0,040

0 20 40 60 80 100 120 140

# Dnodes

Ope

rativ

e D

ensi

ty

S=1

S=2

S=4

S=8

0

5

10

15

20

Remanence

Rem

anence

0,000

0,005

0,010

0,015

0,020

0,025

0,030

0,035

0,040

0 20 40 60 80 100 120 140

# Dnodes

Ope

rativ

e D

ensi

ty

S=1

S=2

S=4

S=8

0

5

10

15

20

Remanence

Rem

anence

Design Space

Worst OD and remanenceBest interconnect resources and processing power

Architectural model Characterization

0,000

0,005

0,010

0,015

0,020

0,025

0,030

0,035

0,040

0 20 40 60 80 100 120 140

# Dnodes

Ope

rativ

e D

ensi

ty

S=1

S=2

S=4

S=8

0

5

10

15

20

Remanence

Rem

anence

0,000

0,005

0,010

0,015

0,020

0,025

0,030

0,035

0,040

0 20 40 60 80 100 120 140

# Dnodes

Ope

rativ

e D

ensi

ty

S=1

S=2

S=4

S=8

0

5

10

15

20

Remanence

Rem

anence

R and OD can be integrated in CAD tools to observe architectural parameters effects and choose best trade-offs in the design space

R1 OD1 R2 OD2 R3 OD3 Rn ODn

Conclusion and perspectives

IP 1

ControllerCPU

RAMROM

Flash

?

ControllerCPU

RAMROM

Flash

?

SpecificationsProcessing powerAreaPower consumptionetc.

IP 2 IP 3 IP n

R1 OD1 R2 OD2 R3 OD3 Rn ODn

Conclusion and perspectives

IP 1

ControllerCPU

RAMROM

Flash

?

ControllerCPU

RAMROM

Flash

?

SpecificationsProcessing powerAreaPower consumptionetc.

IP 2 IP 3 IP n

Architectural models

Comparisons

R1 OD1 R2 OD2 R3 OD3 Rn ODn

Conclusion and perspectives

IP 1

SpecificationsProcessing powerAreaPower consumptionetc.

IP 2 IP 3 IP n

Architectural model

Customisation

ControllerCPU

RAMROM

Flash

IP 3N=4 C=8

S=2

ControllerCPU

RAMROM

Flash

IP 3N=4 C=8

S=2

Thank You

top related