networks-on-chips (nocs) basics ece 284 on-chip interconnection networks spring 2013

23
Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

Upload: rhoda-conley

Post on 16-Jan-2016

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

Networks-on-Chips (NoCs)Basics

ECE 284On-Chip Interconnection Networks

Spring 2013

Page 2: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

2

Examples of Tiled Multiprocessors• 2D-mesh networks often used as on-chip

fabricI/O Area

I/O Area

single tile

1.5mm

2.0mm

21.7

2mm

12.64mm

65nm, 1 poly, 8 metal (Cu)Technology

100 Million (full-chip) 1.2 Million (tile)

Transistors

275mm2 (full-chip) 3mm2 (tile)

Die Area

8390C4 bumps #

65nm, 1 poly, 8 metal (Cu)Technology

100 Million (full-chip) 1.2 Million (tile)

Transistors

275mm2 (full-chip) 3mm2 (tile)

Die Area

8390C4 bumps #

Tilera Tile64

Intel 80-core

Page 3: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

Typical architecture

• Each tile typically comprises the CPU, a local L1 cache, a “slice” of a distributed L2 cache, and a router

Compute UnitRouter

CPUL1

Cache

Slice of L2 Cache

Page 4: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

Router function

• The job of the router is forward packets from a source tile to a destination tile (e.g., when a “cache line” is read from a “remote” L2 slice).

• Two example switching modes:– Store-and-forward: Bits of a packet are forwarded only after

entire packet is first stored.– Cut-through: Bits of a packet are forwarded once the header

portion is received.

Page 5: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

Store-and-forward switching

Source end node

Destination end node

Packets are completely stored before any portion is forwarded

Store

Buffers for datapackets

[adapted from instructional slides of Pinkston & Duato, Computer Architecture: A Quantitative Approach]

Page 6: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

Store-and-forward switching

Source end node

Destination end node

Packets are completely stored before any portion is forwarded

StoreForward

Requirement:buffers must be

sized to holdentire packet

[adapted from instructional slides of Pinkston & Duato, Computer Architecture: A Quantitative Approach]

Page 7: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

Cut-through switching

• Wormhole

Source end node

Destination end node

Source end node

Destination end node

Buffers for datapackets

Requirement:buffers must be sized to hold entire packet

Buffers for flits:packets can be larger

than buffers

• Virtual cut-through

[adapted from instructional slides of Pinkston & Duato, Computer Architecture: A Quantitative Approach]

Page 8: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

• Wormhole

• Virtual cut-through

Cut-through switching

Source end node

Destination end node

Source end node

Destination end node

Busy Link

Packet stored along the path

Busy Link

Packet completelystored atthe switch

Buffers for datapackets

Requirement:buffers must be sized to hold entire packet

(MTU)

Buffers for flits:packets can be larger

than buffers

[adapted from instructional slides of Pinkston & Duato, Computer Architecture: A Quantitative Approach]

Page 9: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

Packets to flits

Transact. Type

Message Type

Packet Size

Read Request 1 flit

Read Reply 1+n flits

Write Request 1+n flits

Write Reply 1 flit

[adapted from Becker STM’09 talk]

Page 10: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

Wormhole routing

• Head flit establishes the connection from input port to output port. It contains the destination address.

• Body flits goes through the established connection (does not need destination address information)

• Tail flit releases the connection.• All other flits blocked until connection is released

Page 11: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

Deadlock

Page 12: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

Virtual channels

• Share channel capacity between multiple data streams– Interleave flits from different packets

• Provide dedicated buffer space for each virtual channel– Decouple channels from buffers

• “The Swiss Army Knife for Interconnection Networks”– Prevent deadlocks– Reduce head-of-line blocking– Also useful for providing QoS

[adapted from Becker STM’09 talk]

Page 13: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

Using VCs for deadlock prevention

• Protocol deadlock– Circular dependencies between messages at network edge– Solution:

• Partition range of VCs into different message classes

• Routing deadlock– Circular dependencies between resources within network– Solution:

• Partition range of VCs into different resource classes• Restrict transitions between resource classes to impose partial order

on resource acquisition

• {packet classes} = {message classes} × {resource classes}

[adapted from Becker STM’09 talk]

Page 14: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

Using VCs for flow control

• Coupling between channels and buffers causes head-of-line blocking– Adds false dependencies between packets– Limits channel utilization– Increases latency– Even with VCs for deadlock prevention, still applies to packets in same class

• Solution:– Assign multiple VCs to each packet class

[adapted from Becker STM’09 talk]

Page 15: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

VC router pipeline

• Route Computation (RC)– Determine candidate output

port(s) and VC(s)– Can be precomputed at

upstream router (lookahead routing)

• Virtual Channel Allocation (VA)– Assign available output VCs to

waiting packets at input VCs• Switch Allocation (SA)

– Assign switch time slots to buffered flits

• Switch Traversal (ST)– Send flits through crossbar

switch to appropriate output

Per packet

Per flit

[adapted from Becker STM’09 talk]

Page 16: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

Allocation basics

• Arbitration:– Multiple requestors– Single resource– Request + grant vectors

• Allocation:– Multiple requestors– Multiple equivalent resources– Request + grant matrices

• Matching:– Each grant must satisfy a request– Each requester gets at most one grant– Each resource is granted at most once

[adapted from Becker STM’09 talk]

Page 17: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

Separable allocators

• Matchings have at most one grant per row and per column

• Implement via to two phases of arbitration– Column-wise and row-wise– Perform in either order– Arbiters in each stage are fully independent

• Fast and cheap• But bad choices in first phase can prevent second

stage from generating a good matching!

Input-first:

Output-first:

[adapted from Becker STM’09 talk]

Page 18: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

Wavefront allocators

• Avoid separate phases– … and bad decisions in first

• Generate better matchings• But delay scales linearly• Also difficult to pipeline

• Principle of operation:– Pick initial diagonal– Grant all requests on diagonal

• Never conflict!– For each grant, delete requests

in same row, column– Repeat for next diagonal

[adapted from Becker STM’09 talk]

Page 19: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

Wavefront allocator timing

• Originally conceived as full-custom design

• Tiled design• True delay scales linearly• Signal wraparound creates

combinational loops– Effectively broken at priority

diagonal– But static timing analysis

cannot infer that– Synthesized designs must be

modified to avoid loops!

[adapted from Becker STM’09 talk]

Page 20: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

Diagonal Propagation Allocator

• Unrolled matrix avoids combinational loops

• Sliding priority window activates sub-matrix cells

• But static timing analysis again sees false paths!– Actual delay is ~n– Reported delay is ~(2n-1)– Hurts synthesized designs

20

[adapted from Becker STM’09 talk]

Page 21: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

VC allocation

• Before packets can proceed through router, need to acquire ownership of VC at downstream router

• VC allocator matches unassigned input VCs with output VCs that are not currently in use– P×V requestors (input VCs), P×V resources (output VCs)

• VC is acquired by head flit, inherited by body & tail flits

[adapted from Becker STM’09 talk]

Page 22: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

VC allocator implementations

• Not shown:– Masking logic for busy VCs

[adapted from Becker STM’09 talk]

Page 23: Networks-on-Chips (NoCs) Basics ECE 284 On-Chip Interconnection Networks Spring 2013

Typical pipelined router

ST LTRC

switchtraversal

linktraversal

routecomputation

VC + switchallocation

VASA