variable-width datapath for on-chip network static power … · 2014-09-17 · variable-width...
TRANSCRIPT
![Page 1: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/1.jpg)
Variable-Width Datapath for On-Chip Network Static Power Reduction
George Michelogiannakis, John Shalf
Postdoctoral Research Fellow
Computer Architecture Laboratory
Lawrence Berkeley National Laboratory
![Page 2: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/2.jpg)
In a Nutshell
Leakage power is an increasing problem in future or near threshold
voltage (NTV) technologies
Leakage power can be important even at high network loads
This work proposes variable-width datapaths
Parts of channels, buffers, and crossbars can be activated on
demand
We demonstrate an average of 33% total power reduction with
PARSEC benchmarks
![Page 3: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/3.jpg)
Today’s Menu
Leakage power / motivation
Related work
Variable-width datapaths
Results
Conclusions
![Page 4: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/4.jpg)
Leakage Power Contribution
32nm (above). 45nm (below). Orion 2.0.
[Top left] “FlexiBuffer: Reducing Leakage Power in On-
Chip Network Routers”. DAC 2011
[Rest] “NoRD: Node-Router Decoupling for Effective
Power-gating of On-Chip Routers”. MICRO 2012
Single router. 45nm. 1.0V Orion 2.0.
PARSEC benchmarks
![Page 5: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/5.jpg)
Subthreshold Leakage at NTV
0%
10%
20%
30%
40%
50%
60%
45nm 32nm 22nm 14nm 10nm 7nm 5nm
SD
Leakag
e P
ow
er
100% Vdd
75% Vdd
50% Vdd
40% Vdd
Increasing
Variations
NTV operation reduces total power, improves energy efficiency
Subthreshold leakage power is substantial portion of the total
Near-threshold voltage (NTV) design — Opportunities and challenges. DAC 2012
Assumes 20% leakage power at 100% Vdd
![Page 6: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/6.jpg)
Applications Load Network Unevenly
“Fine-grained bandwidth adaptivity in networks-on-chip using bidirectional channels”. NOCS 2012
![Page 7: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/7.jpg)
Today’s Menu
Leakage power / motivation
Related work
Variable-width datapaths
Results
Conclusions
![Page 8: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/8.jpg)
Power Gating
High threshold voltage “sleep switch” transistor
Savings when sleep time enough to overcome energy overheads
“MP3: Minimizing Performance Penalty for Power-gating of Clos Network-on-Chip”. HPCA 2014
![Page 9: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/9.jpg)
Drowsy SRAMs
Put SRAM lines into low-power (low voltage) “drowsy” mode
Preserves data
Faster activation than power-gated SRAMs (1-2 cycles)
Higher leakage current while drowsy. Higher activation penalty
“Drowsy Caches: Simple Techniques for Reducing Leakage Power”. ISCA 2002
![Page 10: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/10.jpg)
CatNap: Multiple Networks
Router
Router
Router
Router
Router
Router
Router
Router
Router
Router
Router
Router
Router
Router
Router
Router
Router
Router
“Catnap: Energy Proportional Multiple Network-on-Chip”. ISCA 2013
High traffic area
Source
Low traffic
injection
Multinets cannot share resources (e.g.,
channels) in low traffic regions
Optimal decisions at injection
are a challenge
![Page 11: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/11.jpg)
Today’s Menu
Leakage power / motivation
Related work
Variable-width datapaths
Results
Conclusions
![Page 12: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/12.jpg)
Variable-Width Channels
Router Router
Router RouterPacket 1
Packet 1
Packet 2
Same bisection bandwidth
![Page 13: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/13.jpg)
Input Buffer Gating
VC1
VC2
VC3
VC4
“Adding slow-silent virtual channels for low-power on-chip networks”. NOCS 2008
Router
Packet 1
Packet 2
VC1
VC2
If flits from any channel lane can choose any VC, that
necessasitates multiplexers
We map channel lanes to use only a subset of VCs (1-1 with equal
VCs and lanes)
VC 0
VC 1
VC 2
VC 3
![Page 14: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/14.jpg)
Crossbar Gating
“Segment gating for static energy reduction in networks-on-chip”. NoCArc 2009
VC1
VC2
VC3
VC4
VC1
VC2
VC1 VC2 VC3 VC4VC1 VC4
Output VCs
Input VCs
![Page 15: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/15.jpg)
Activation Mechanism
Flits winning switch allocation (SA) activate in the next router:
Output channels and switch lanes (3 cycles)
Input buffers (1 cycle with drowsy SRAMs)
No false activations
With the below 4-stage router pipeline, no activation stalls
InBuf VA SA ST
InBuf VA SA STSTInBuf
![Page 16: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/16.jpg)
Router Impact
ABN switch allcocators: ( Inputs x VCs ) x ( Outputs x ChanLanes )
As long as ChanLanes no greater than VCs, switch allocator no more
complex than VC allocator
If VC and switch allocators in different pipeline stages, router cycle
time does not extend
VC allocators consume 2-10mW and occupy 5000um
Both very small percentages of the router
Therefore increase of switch allocator’s cost insigificant
“Allocator implementations for network- on-chip routers”. SC 2009
![Page 17: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/17.jpg)
Today’s Menu
Leakage power / motivation
Related work
Variable-width datapaths
Results
Conclusions
![Page 18: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/18.jpg)
Methodology
Booksim network simulator
8x8 Mesh. DOR
We compare:
Single-lane: Single-lane power-gated network
ABN: Flits choose a lane based on their output VC
Multinets: Multiple power-gated subnetworks
Router pipeline previously presented
2 VCs as baseline. Normalize for buffer size by adjusting VCs
Activation and deactivation delays (65nm at 1GHz):
Channel and crossbar activation delay: 3 cycles
Channel and crossbar activation wait: 15 cycles
Channel and crossbar deactivation wait: 6 cycles
Buffer (VC) deactivation wait: 3 cycles
![Page 19: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/19.jpg)
Two Subnetworks/Lanes. Static Power
0 20 40 600
1
2
3
4
5
Injection rate (request packets/cycle * 1000)
Sta
tic p
ow
er
(W)
8x8 mesh. DOR. UR traffic
Single lane
ABN
MultiNets
UR worst case for ABNs
ABNs better powers down resources at high loads
![Page 20: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/20.jpg)
Two Subnetworks/Lanes. Dynamic Power
0 10 20 30 40 500
2
4
6
8
Injection rate (request packets/cycle * 1000)
Dyn
am
ic p
ow
er
(W)
8x8 mesh. DOR. UR traffic
Single lane
ABN
MultiNets
UR worst case for ABNs
Multinets takes advantage of lower-radix switches
![Page 21: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/21.jpg)
Two Subnetworks. Latency
0 20 40 600
50
100
150
200
Injection rate (request packets/cycle * 1000)
Ave
rag
e la
ten
cy (
clo
ck c
ycle
s)
8x8 mesh. DOR. UR traffic
Single lane
ABN
MultiNets
Multinets cannot make perfect injection decisions or use resources in
another subnetwork after injection to combat transient imbalance
![Page 22: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/22.jpg)
PARSEC Results
0
5
10
15
20
25
30
35
40
Perc
en
tag
e t
ota
l p
ow
er
red
ucti
on
(%
)
Low Medium High
Two ABN lanes and two multinet subnetworks
![Page 23: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/23.jpg)
Scaling Up
0
5
10
15
20
25
30
Perc
en
tag
e t
ota
l p
ow
er
red
ucti
on
(%
)
Low Medium High
Four ABN lanes and four multinet subnetworks
![Page 24: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/24.jpg)
Scaling Up
0 10 20 30 40 500
50
100
150
200
250
Injection rate (request packets/cycle * 1000)
Ave
rag
e la
ten
cy (
clo
ck c
ycle
s)
8x8 mesh. DOR. UR traffic
ABN two lanes
ABN four lanes
Two multinets
Four multinets
Effects of transient imbalance in multinets are intensified
![Page 25: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/25.jpg)
Today’s Menu
Leakage power / motivation
Related work
Variable-width datapaths
Results
Conclusions
![Page 26: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/26.jpg)
Conclusions
Leakage power is a growing concern in future technologies
Dividing datapaths in lanes provides more flexibility than multi-
network approaches
But there are tradeoffs
Using drowsy SRAMs allows hiding the activation delay without
false activations
Can change with shallow router pipelines
We demonstrate an average of 33% total power reduction with
PARSEC benchmarks
![Page 27: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf](https://reader033.vdocuments.us/reader033/viewer/2022050116/5f4d3648e0e6df0443297ddc/html5/thumbnails/27.jpg)
Questions?
Acknowledgment: D.O.E. office of science