three-dimensional layout of on-chip tree-based networks

Three-Dimensional Layout of On-Chip Tree-Based

Networks

Hiroki Matsutani (Keio Univ, Japan)Michihiro Koibuchi (NII, Japan)D. Frank Hsu (Fordham Univ, USA)Hideharu Amano (Keio Univ, Japan)

Outline• Introduction

– Network-on-Chip (NoC)– 2-D vs. 3-D

• Fat Tree– 2-D layout– 3-D layout

• Fat H-Tree– 2-D layout– 3-D layout

• Evaluations– Area, Wire length, Energy

[Matsutani, IPDPS’07]

Network-on-Chip (NoC)• Tile architectures

– MIT RAW

– Texas U. TRIPS

– Intel 80-tile NoC

• Various topologies– Mesh, Torus– Fat Trees– Fat H-Tree (FHT)

[Vangal, ISSCC’07]

[Buger, Computer’04]

[Taylor, Micro’02]

16-core Tile architecture

Tile (core & router)

Packet switched network on a chip

We proposed FHT as an alternative to Fat Trees[Matsutani, IPDPS’07]

2D Topologies: Mesh & Torus

Router Core

• 2-D Mesh • 2-D Torus– 2x bandwidth of

RAW [Taylor, IEEE Micro’02]

2D Topologies: Fat Tree

• Fat Tree (p, q, c)p: # of upward linksq: # of downward

linksc: # of core ports

Router Core

Fat Tree (2,4,2)Fat Tree (2,4,1)In this talk, we focus on 3-D layout scheme

of tree-based topologiesIn this talk, we focus on 3-D layout scheme

of tree-based topologies

Rank-1

Rank-2

2D NoC vs. 3D NoC• 2D NoCs

– Long wires (esp. trees)

– Wire delay– Packets consume

power at links according to their wire length

• 3D NoCs– Several small wafers

or dices are stacked

• Vertical link– Micro bump

– Through-wafer via

– Very short (10-50um)

[Ezaki, ISSCC’04]

[Burns, ISSCC’01]

Long horizontal wires in 2D NoCs can be replaced by very short vertical links in 3D

Next slides show the 3D layout scheme of Fat Tree and FHT

Fat Tree: 2-D layout

• Fat Tree (p, q, c)p: # of upward linksq: # of downward

linksc: # of core ports

Router Core

Fat Tree (2,4,2)Fat Tree (2,4,1)

We preliminarily show the 3D layout scheme of Fat Trees

Fat Tree: 3-D layout (4-split)

• 2-D coordinates • 3-D coordinates

Original 2-D layout

),( 22 DD YX ),,( 333 DDD ZYX

transformation

Dividing into 4 layers

Top-rank routers are distributed to each layer

Layer-0 Layer-1

Layer-2 Layer-3

Original 2-D layout

Fat Tree: 3-D layout (4-split)

Top-rank links are replaced with vertical

interconnects (10-50um)

• 2-D coordinates • 3-D coordinates),( 22 DD YX ),,( 333 DDD ZYX

transformation

3-D layout (4-stacked)This 3-D layout is evaluated in terms of area, wire, & energy

Layer-0

• Evaluations– Area, Wire length, Power

Fat H-Tree: Structure

• Fat H-Tree– Red Tree (H-Tree)– Black Tree (H-Tree)

Combining two H-Trees (red & black)

Router Core Router Core

Location of black tree is shifted lower-right direction of red tree

By shifting the location of black tree, the connection pattern of trees is different from the original Fat Trees

Fat H-Tree is formed on red & black trees

Rank-2 or upper routers are omitted in this figure

Each core is connected to

both red & black trees

Ring is formed with cores & rank1

routers

Torus-level performance by combing only two H-Trees

Fat H-Tree: 2-D layout on VLSI

• Fat H-Tree– Torus structure Folded as well as the folded layout of 2-D Torus

Fat H-Tree’s 2-D layoutRouter Core

Topologically equivalent

(Long feedback links across the chip)

The next slides propose the 3D layout scheme of Fat H-Tree

Fat H-Tree: 3-D layout (overview)

• Fat H-Tree– (Problem) Fat H-Tree has a torus structure– Folding so as to keep the torus structure

(step 1) fold it horizontally

(step 2) fold it vertically

consisting of red & black trees

Until the # of folded pieces meets the # of layers the 3-D IC has

E.g., four layers fold twice

Here we show the 3D layouts of red & black trees separately

Fat H-Tree: 3-D (Red tree; 4-split)

transformation

Original 2-D layout 3-D layout (4-stacked)

Layer-0 Layer-1

Layer-2 Layer-3

Fat H-Tree: 3-D (Red tree; 4-split)

transformation

Layer-0

Fat H-Tree: 3-D (Black tree;4-split)

transformation

Layer-0 Layer-1

Layer-2 Layer-3

They can be connected via only a vertical link

The periphery cores are connected to different layers

transformation

The periphery cores are connected to different layers

Layer-0

Fat H-Tree: 3-D layout (4-split)

Red tree (3-D)

Layer-0 Layer-0

Black tree (3-D) Fat H-Tree (3-D)

Layer-0

The 3-D layout of Fat H-Tree can be formed by superimposing 3-D layouts of red & black

Evaluations: 2-D vs. 3-D

• 2-D layout– 64-core

• 3-D layout– 16-core x 4-layer– Vertical

interconnects

L/2 mm

Network logic area: # of routers

N= N=16 N=64 N=256

FT1 6 28 120

FT2 12 56 240

FHT 10 42 170

3Dmesh 16 64 256

3Dtorus 16 64 256

# of routers & their ports in trees are less than mesh/torus

• 3-D mesh/torus: node degree 7

• Fat H-Tree: node degree 5

• Fat Tree (2,4,2): node degree 6

2/)24( nn nn 24

3/)14(2 n

FT1: Fat tree(2,4,1) FT2: Fat tree(2,4,2) FHT: Fat H-Tree

Network logic area: 2-D vs. 3-D

[Davis, DToC’05]

• Wormhole router– 1-flit = 64-bit– 3-stage pipeline

• Network interface– FIFO buffer– Packet forwarding

(Fat H-Tree only)

• Inter-wafer via– 1-10um square– 100um per layer

per 1-bit signal

Inter-wafer via area is calculated according to # of vertical links

• Network logic area– Routers, NIs– Inter-wafer vias

Arbiter

5x5 XBAR

Typical wormhole router

Synthesized with a 90nm CMOS

[Matsutani, ASPDAC’08]

Network logic area: Overhead of 3D

Synthesis result of 64-core (16-core x 4)

FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2) FHT: Fat H-Tree3D layout of trees area overheat is modest (at most 7.8%)

3D torus

2D torus

Inter-wafer via area (+7.8%)

Total wire length of all links

• Total unit-length of links– Core router– Router router

1-unit link

How many unit-links is required ?

1-unit = distance between neighboring cores

Total wire length of all links

FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2) FHT: Fat H-Tree

N= N=16 N=64 N=256

2D FT1 32 192 1,024

2D FT2 64 384 2,048

2D FHT 72 392 1,800

2Dmesh 24 112 480

2Dtorus 48 224 960

)2(2 nN

)2(4 nN

1-unit

Total wire length of all links N= N=16 N=64 N=256

2D FT1 32 192 1,024

2D FT2 64 384 2,048

2D FHT 72 392 1,800

2Dmesh 24 112 480

2Dtorus 48 224 960

nn 22 1-unitnN

)2(2 nN

)2(4 nN

N= N=16 N=64 N=256

3D FT1 16 128 768

3D FT2 32 256 1,536

3D FHT 40 200 904

3Dmesh 16 96 448

3Dtorus 32 192 896

Nn )1(

Nn )1(2

)2(4 1 nN

)2(2 1 nN

1-unit4-stacked

FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2) FHT: Fat H-TreeWire length of trees is reduced by 25%-50% (close to torus)

Energy: NoC’s energy model

• Ave. flit energy– Send 1-flit to dest.– How much

energy[J] ?

• Parameters– 8mm square chip– 64-core (16-core x 4)– 90nm CMOS

• Switching energy– 1-bit switching @

Router– Gate-level sim– 0.183 [pJ / hop]

• Link energy– 1-bit transfer @ Link– 0.150 [pJ / mm]

• Via energy– 4.34 [fF / via]

)( linkswaveflit EEHwE

[Davis, DToC’05]

Energy: Reduction by going 3D

Frequent use of longest

Short hop count less

energy

2-D layout

2-D layout 3-D layout

Moving distance of packets is reduced

The 3D layout of trees reduces the energy by 30.8%-42.9%

Summary: 3-D layout of trees

• Drawbacks of on-chip tree-based topologies– Long links around the root of tree– Wire delay problem– Repeater insertion additional energy

consumption

• 3-D layout schemes of Fat Trees & Fat H-Tree– Wire length is reduced by 25%-50%– Area overhead is at most 7.8%– Flit transmission energy is reduced by 30.8%-42.9%

Need to consider negative impacts of 3-D (cost,heat,yield…)

In addition, energy-hungry repeater buffers can be removed

Thank you for your attention

Backup slides

2-D layout (w/o repeaters)

2-D layout (with repeaters)

(*) Repeater insertion model:

N. Weste et.al, “CMOS VLSI Design (3rd ed)”, 2005.

Energy is increased

three-dimensional layout of on-chip tree-based networks

ipdps07fat htree

d layout scheme of tree

d layoutfat tree p

d layouttransformationdividing

ipdps07fat tree

layerfat tree

red black treesfat htree

original fat treesfat

Documents

truss layout optimization within a continuum · examples of...

three-dimensional gauging byelectronie moiré · pdf...

three-dimensional object detection and layout prediction...

two-dimensional photonic crystal micro-cavities for chip...

simulation symmetric n-dimensional cube network-on-chip...

layout-conscious random topologies for hpc off-chip...

sdip asic update tensilica day hannover 2018 · cadence...

dimensional layout - industrial machinery

analog chip design layout 3 - graz university of...

innovative dome design: applying geodesic patterns with...

single-chip/port 10/100 fast ethernet...

lecture 1: circuits & layout - cmosvlsi.com 1: circuits &...

(4.3) layout slimnoc: a low-diameter on-chip network

stage viii : march 24 th 2004 chip level layout

dimensional metrology and positioning operations: basics...

on-chip silicon optical phased array for two-dimensional...

two-dimensional photonic crystal micro-cavities...

on-chip generation of high-dimensional entangled …...for...

on three-dimensional layout of interconnection networks...

layout-conscious random topologies for hpc off-chip...