international symposium on low power electronics and design noc frequency scaling with flexible-...

26
International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible-Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia Zhai, and Sachin S. Sapatnekar University of Minnesota – Twin Cities

Upload: dorthy-fleming

Post on 12-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

International Symposium on Low Power Electronics and Design

NoC Frequency Scaling with Flexible-Pipeline Routers

Pingqiang Zhou, Jieming Yin, Antonia Zhai, and Sachin S. Sapatnekar

University of Minnesota – Twin Cities

Page 2: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

MEM

MEM

MEM

MEM

NoC dissipates substantial system energy

C L1

L2

RR

Tile-Based Multicore System

RAW – 36%; Intel 80-tile – 28% [Vangal et al. 2008]

2

Page 3: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

MEM

MEM

MEM

MEM

Superscalar Machine

VFS and Its Limitations• NoC is– Potential performance bottleneck– Source of energy consumptionDesigned for diverse traffic patterns

• VFS to reduce energy• Limitations of Aggressive VFS– Reduce throughput– Increase latencyWork for limited traffic pattern

Can we make VFS work for other important traffic patterns?

3

Sensitive Insensitive

Hig

h

Latency

Thro

ughp

utLo

w

3

Page 4: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

Frequency Scaling1 2 3 4 Frequency = F1

T

44

2

ammp art blackscholes equake fkmeans kmeans Avg0

0.2

0.4

0.6

0.8

1

StaticClockDynamic

Net

wor

k En

ergy

Br

eakd

own

1 2 3 4 Frequency = 0.5F

Animationammp art blackscholes equake fkmeans kmeans Avg

0

0.2

0.4

0.6

0.8

1

StaticClockDynamic

Net

wor

k En

ergy

Br

eakd

own

Frequency scaling harms performance

Page 5: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

1 2 3 4

Reconfigure Pipeline

Frequency = 0.5F

Frequency = 0.5F

T

4

Flexible pipeline can reduce router pipeline delay

5

1 2 3 4

T

T

Page 6: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

Flexible Pipeline Routers

+ Reduce NoC energy+ Negligible performance

degradationSensitive Insensitive

Hig

hLo

w

Latency

Thro

ughp

utReduce frequency without increasing router latency

56

Target Application• Low throughput• Latency sensitive

Page 7: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

Outline

• Background/Motivation• Router Design• Experimental Results• Related work• Conclusion

67

Page 8: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

Route Computation

VC Allocator(VA)

Switch Allocator(SA)

MC 1, VC 1

MC n, VC 1

Crossbar Switch(ST)

Outputports

Inputports

Input Controller(BW/RC)

BWRC VA SA ST

Headflit

BW SA STBody/tailflit

Baseline Router Architecture

How to reconfigure

pipeline?

BWRC

Route Computation

VA

VC Allocator(VA)

SA

Switch Allocator(SA)

ST

78

Page 9: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

Pipeline Stage Delay

BW+RC VA SA ST

100 τ 65.5 τ 77.7 τ 45 τ

Delay of 4-stage pipeline:

Tclk = 72.1τ

109

Time-borrowing• Boost pipeline frequency• Average out stage delays

τ : inverter delay

The router delay model is presented in [Peh et al., HPCA 2001].

Page 10: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

Pipeline Reconfiguration

• Flex Router: pipeline reconfiguration

BW+RC VA SA ST

100 τ4 65.5 τ4 77.7 τ4 45 τ4

BW+RC VA+SA+ST

100 τ2 170.2 τ2

BW+RC VA SA+ST

100 τ3 65.5 τ3 113.7 τ3

BW+RC+VA+SA+ST

270.2 τ1

4-stage pipelineVdd = 1.2 V

3-stage pipelineVdd = 1.0 V

2-stage pipelineVdd = 1.0 V

1-stage pipelineVdd = 0.8 V

How much hardware overhead?

Tclk = 93.1τ3

= 102.1τ4

Tclk = 135.1τ2

= 148.7τ4

Tclk = 72.1τ4

Tclk = 270.2τ1

= 337.7τ4

1010

Page 11: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

Route Computation

VC Allocator

Switch Allocator

Input Controller(with buffers)

Flits outFlits inRoute

Computation

VASA

Input Controller(with buffers)

Flits outFlits in

BW/RC ST

Architecture Support

BW+RC VA SA ST 4-stage pipeline

R

R

R

11

R R R

11

Page 12: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

BW+RC VA SA ST 4-stage pipelineR R R

Architecture Support

Route Computation

VASA

Input Controller(with buffers)

Flits outFlits in

R

R

MU

X

RM

UX

R

R

MU

X11

BW/RC ST

BW+RC VA SA ST 3-stage pipelineR R

MU

X

BW+RC VA SA ST 2-stage pipelineR

MU

XMU

XBW+RC VA SA ST 1-stage pipelineM

UXM

UXM

UXLess than 2% overhead in router area

+ Control Logics11

Page 13: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

Outline

• Background/Motivation• Router Design• Experimental Results• Related work• Conclusion

1212

Page 14: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

Experimental Platform

• Simulator– Full system simulator: GEMS– Power module: Wattch & Orion2.0– Infrastructure: 8 Core, 1 issue in-order

• Benchmarks– From SPEC OMP2001, NU-Mine and PARSEC

1313

MEM

MEM

C L1

L2

R

1.5 GHz

Page 15: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

Base: Baseline RouterBase-2: VFS, Slowdown Factor of 2Flex-2: VFS + Flexible-Pipeline Router

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

ammp art blackscholes

equake fkmeans kmeans Avg

00.20.40.60.8

11.2

Dynamic Clock Static

Net

wor

k En

ergy

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

ammp art blackscholes

equake fkmeans kmeans Avg

00.20.40.60.8

11.2

Dynamic Clock Static

Net

wor

k En

ergy

Efficacy in Network Energy Saving

14

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

ammp art blackscholes

equake fkmeans kmeans Avg

00.20.40.60.8

11.2

Dynamic Clock Static

Net

wor

k En

ergy

41%

2%

14

Dynamic energy decreases quadratically as voltage goes downClock energy reduction is significant (65%)

Changes in static energy are minimal

Page 16: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

Sensitive Insensitive

Hig

hLo

w

Latency

Thro

ughp

utBase: Baseline RouterBase-2: VFSFlex-2: VFS + Flexible-Pipeline Router

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

ammp art blackscholes equake fkmeans kmeans G.M.

0.8

0.9

1

1.1

1.2

Nor

mal

ized

Exe

cutio

n Ti

me

Efficacy in Execution Time

Workload L1 data cache(misses/K instructions)

L2 cache(misses/K instructions)

ammp 13.7 4.4art 40.8 18.1blackscholes 8.1 0.9equake 2.8 2.6fkmeans 1.9 1.7kmeans 2.4 1.9

1.5%

Average system performance degradation is reduced

1515

Page 17: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

SystemEnergy

System Delay

• System-level ED2 Product– Cores, caches and the interconnection networks– E: System Energy– D: System Delay

System-Level Evaluation

1616

NetworkEnergy

Network Delay Tradeoff

Page 18: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

ammp art blackscholes equake fkmeans kmeans G.M.

0.80.9

11.11.21.31.41.5

Syst

em E

D2

Efficacy in System ED2 Product

ED2 increase

16

Base: Baseline RouterBase-2: VFSFlex-2: VFS + Flexible-Pipeline Router

Frequency tuning should be based on workloads

17

Page 19: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

Base: Baseline RouterFlex-2: Flexible-Pipeline Router + Slowdown Factor of 2Flex-4: Flexible-Pipeline Router + Slowdown Factor of 4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

ammp art blackscholes equake fkmeans kmeans Avg

00.20.40.60.8

11.2

Dynamic Clock Static

Net

wor

k En

ergy

More Aggressive VFS: Network Energy Saving

Flexible –Pipeline Router is scalable in reducing network energy

43%

39%

1718

Page 20: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

Base: Baseline RouterFlex-2: Flexible-Pipeline Router + Slowdown Factor of 2Flex-4: Flexible-Pipeline Router + Slowdown Factor of 4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

ammp art blackscholes equake fkmeans kmeans G.M.

0.8

0.9

1

1.1

Nor

mal

ized

Exe

cutio

n Ti

me

More Aggressive VFS: Execution Time

18

Performance degradation is increasing19

Page 21: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

Base: Baseline RouterFlex-2: Flexible-Pipeline Router + Slowdown Factor of 2Flex-4: Flexible-Pipeline Router + Slowdown Factor of 4

Base

Flex

-2

Flex

-4

Base

Flex

-2

Flex

-4

Base

Flex

-2

Flex

-4

Base

Flex

-2

Flex

-4

Base

Flex

-2

Flex

-4

Base

Flex

-2

Flex

-4

Base

Flex

-2

Flex

-4

ammp art blackscholes equake fkmeans kmeans G.M.

0.8

0.9

1

1.1

1.2

Syst

em E

D2

Limits of VFS: System ED2 Product

Diminishing returns when pushing the frequency scaling limitWorkload-dependent

1920

Page 22: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

Related Works

• “A case for dynamic frequency tuning in on-chip networks” [Mishra `09]

Dynamically router VFS for reducing network power consumption

– Flexible-pipeline routers enable more drastic scaling

• “A variable-pipeline on-chip router optimized to traffic pattern”[Hirata `10]Dynamically router VFS + variable-pipeline-routers

– Flexible-pipeline routers have lower hardware overhead– Our work presents system-level evaluation

2021

Page 23: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

Conclusions

Network

21

Energy Performance

Flexible-Pipeline Router Minimal hardware overhead Enable aggressive VFS

System Level Implications Considerable energy saving Negligible performance degradation

22

Page 24: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

Thank you!

21

Q & A

Page 25: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

Router Delay Model*

• Router stage delay:

9

htT istage

9

Route Computation

VC Allocator(VA)

Switch Allocator(SA)

MC 1, VC 1

MC n, VC 1

Crossbar Switch(ST)

Outputports

Inputports

Input Controller(BW/RC)

p: # of input/output portsc: # of message classesv: # of VCs/message classω: flit size in bits

ti: sequential logic latencyh: setup delay τ: inverter delay

Stage ti hBW/RC constant 0

VA f(p, v) 9 τSA f(p, c, v) 9 τST f(p, ω) 0

*This model is presented in [Peh et al., HPCA 2001].

Page 26: International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia

System Energy BreakdownBa

seBa

se-2

Flex

-2Fl

ex-4

Base

Base

-2Fl

ex-2

Flex

-4

Base

Base

-2Fl

ex-2

Flex

-4

Base

Base

-2Fl

ex-2

Flex

-4

Base

Base

-2Fl

ex-2

Flex

-4

Base

Base

-2Fl

ex-2

Flex

-4

ammp art blackscholes equake fkmeans kmeans

00.20.40.60.8

11.2

Network Core+Cache