ben abdallah abderazek the university of aizu, graduate school of computer science and eng. adaptive...

71
Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: [email protected] 1 Hong Kong University of Science and Technology, March 2010 Networks-on- Networks-on- Chip Chip 03/01/2010

Upload: terence-lucas

Post on 28-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Ben Abdallah Abderazek

The University of Aizu,

Graduate School of Computer Science and Eng.

Adaptive Systems Laboratory,

E-mail: [email protected]

1Hong Kong University of Science and Technology, March 2010

Networks-on-Networks-on-ChipChip

03/01/2010

Page 2: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Part IPart I Application RequirementsApplication Requirements

Network on Chip: A paradigm Shift in VLSINetwork on Chip: A paradigm Shift in VLSI

Critical problems addressed by NoCCritical problems addressed by NoC

Traffic abstractions Traffic abstractions

Data AbstractionData Abstraction

Network delay modelingNetwork delay modeling

2Hong Kong University of Science and Technology, March 2010

Page 3: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Application RequirementsApplication Requirements

Signal processingo Hard real timeo Very regular loado High quality

Typically on DSPs

Media processingo Hard real timeo Irregular loado High quality

SoC/media processors

Multimediao Soft real timeo Irregular loado Limited quality

PC/desktop

Very challenging!

3Hong Kong University of Science and Technology, March 2010

Page 4: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

What the Internet Needs?What the Internet Needs?

Increasing Huge Amount of Packets

&Routing,

Packet Classification, Encryption, QoS, New Applications

and Protocols, etc…..

General Purpose RISC (not capable enough)

ASIC(large,

expensive to develop, not flexible)

• High processing power• Support wire speed• Programmable• Scalable• Specially for network applications

Hong Kong University of Science and Technology, March 2010 4

SoC, MCSoC?

Page 5: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Example - Network Processor Example - Network Processor (NP)(NP)

5Adaptive Systems Laboratory, Univ. of Aizu

IBM PowerNP

16 pico-procesors and 1 powerPC

Each pico-processor Support 2 hardware threads 3 stage pipeline :

fetch/decode/execute Dyadic Processing Unit

Two pico-processors 2KB Shared memory Tree search engine

Focus is layers 2-4 PowerPC 405 for control plane

operations 16K I and D caches

Target is OC-48

Page 6: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Example - Network Processor Example - Network Processor (NP)(NP)

6Adaptive Systems Laboratory, Univ. of Aizu

NP can be applied in various network layers and applications Traditional apps – forwarding,

classification Advanced apps – transcoding, URL-

based switching, security etc. New apps

Page 7: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Telecommunication Systems Telecommunication Systems and NoC Paradigm and NoC Paradigm

The trend nowadays is to integrate telecommunication system on complex multicore SoC (MCSoC): Network processors, Multimedia hubs ,and base-band telecom circuits

These applications have tight time-to-market and performance constraints

7Adaptive Systems Laboratory, Univ. of Aizu

Page 8: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Telecommunication Systems Telecommunication Systems and NoC Paradigm and NoC Paradigm

Telecommunication multicore SoC is composed of 4 kinds of components: 1. Software tasks, 2. Processors executing software, 3. Specific hardware cores , and 4. Global on-chip communication

network

8Adaptive Systems Laboratory, Univ. of Aizu

Page 9: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Telecommunication Systems Telecommunication Systems and NoC Paradigm and NoC Paradigm

Telecommunication multicore SoC is composed of 4 kinds of components: 1. Software tasks, 2. Processors executing software, 3. Specific hardware cores , and 4. Global on-chip communication

network

9Adaptive Systems Laboratory, Univ. of Aizu

This is the most challenging part.

Page 10: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Technology & Architecture Technology & Architecture TrendsTrends

Technology trends: Vast transistor budgets Relatively poor interconnect scaling Need to manage complexity and power Build flexible designs (multi-/general-

purpose)Architectural trends:

Go parallel ! Keep core complexity constant or simplify

Result is lots of modules (cores, memories, offchip interfaces, specialized IP cores, etc.)

10Hong Kong University of Science and Technology, March 2010

Page 11: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Operation Delay(.13mico)

Delay(.05micro)

32-bit ALU Operation 650ps 250ps

32-bit Register read 325ps 125ps

Read 32-bit from 8KB RAM 780ps 300ps

Transfer 32-bit across chip (10mm)

1400ps 2300ps

Transfer 32-bit across chip (200mm)

2800ps 4600ps

2:1 global on-chip communication to operation delay

9:1 in 2010 Ref: W.J. Dally HPCA Panel presentation 2002

Wire Delay vs. Logic Delay

11Hong Kong University of Science and Technology, March 2010

Page 12: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Information transfer is inherently unreliable at the electrical level, due to: Timing errors Cross-talk Electro-magnetic interference (EMI) Soft errors

The problem will get increasingly worse as technology scales down

Communication Reliability

Adaptive Systems Laboratory, UoA 12

Page 13: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Evolution of on-chip Evolution of on-chip communication communication

13Hong Kong University of Science and Technology, March 2010

Page 14: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Traditional SoC nightmareTraditional SoC nightmare

Variety of dedicated interfaces Design and verification complexity Unpredictable performance Many underutilized wires

14

DMA CPU DSP

Bridge

IO IO IOC

A

BPeripheral Bus

CPU Bus

Control signals

Hong Kong University of Science and Technology, March 2010

Page 15: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Network on Chip: A Network on Chip: A paradigm Shift in VLSIparadigm Shift in VLSI

15Adaptive Systems Laboratory, UoA

s

s

s

s

s s

s

s

Module

Module

s

Module

From: Dedicated signal wires To: Shared network

Point- To-point Link

Network switch

Computing Module

Page 16: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

NoC essentialNoC essential

Communication by packets of bits Routing of packets through several hops, via switchesEfficient sharing of wires Parallelism

16

s

s

s

s

s s

s

s

Module

Module

s

Module

Hong Kong University of Science and Technology, March 2010

Page 17: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Characteristics of a Characteristics of a paradigm shiftparadigm shift

Solves a critical problem Step-up in abstraction Design is affected:

Design becomes more restricted New tools The changes enable higher

complexity and capacity Jump in design productivity

17Hong Kong University of Science and Technology, March 2010

Page 18: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Characteristics of a Characteristics of a paradigm shiftparadigm shift

Solves a critical problem Step-up in abstraction Design is affected:

Design becomes more restricted New tools The changes enable higher

complexity and capacity Jump in design productivity

18Hong Kong University of Science and Technology, March 2010

Page 19: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Origins of the NoC Origins of the NoC conceptconcept The idea was talked about in the 90’s, but actual

research came in the new illenium. Some well-known early publications: Guerrier and Greiner (2000) “A generic architecture for

on-chip packet-switched interconnections” Hemani et al. (2000) “Network on chip: An architecture

for billion transistor era” Dally and Towles (2001) “Route packets, not wires: on-

chip interconnection networks” Wingard (2001) “MicroNetwork-based integration of SoCs” Rijpkema, Goossens and Wielage (2001) “A router

architecture for networks on silicon” Kumar et al. (2002) “A Network on chip architecture and

design methodology” De Micheli and Benini (2002) “Networks on chip: A new

paradigm for systems on chip design”

19Hong Kong University of Science and Technology, March 2010

Page 20: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Don't we already know how Don't we already know how to design interconnection to design interconnection networks?networks?

Many existing network topologies, router designs and theory has already been developed for high end supercomputers and telecom switches

Yes, and we'll cover some of this material, but the trade-offs on-chip lead to very different designs!!

Hong Kong University of Science and Technology, March 2010 20

Page 21: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Critical problems Critical problems addressed by NoCaddressed by NoC

21

1) Global interconnect design problem:delay, power, noise, scalability, reliability

2) System integrationproductivity problem

3) Chip Multi Processors(key to power-efficient computing

Hong Kong University of Science and Technology, March 2010

Page 22: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

1(a): NoC and Global wire 1(a): NoC and Global wire delaydelay

22

Long wire delay is dominated by Resistance

Add repeaters

Repeaters become latches (with clock frequency scaling)

Latches evolve to NoC routers

NoC Router

NoC Router

NoC Router

Hong Kong University of Science and Technology, March 2010

Page 23: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

1(b): Wire design for NoC1(b): Wire design for NoC

23

NoC links: Regular Point-to-point (no fanout tree) Can use transmission-line layout Well-defined current return path

Can be optimized for noise / speed / power Low swing, current mode, ….

Hong Kong University of Science and Technology, March 2010

Page 24: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

1(c): NoC scalability1(c): NoC scalability

24

NoC:O(n)

O(n)

Point –to-Point

O(n^2 √n)

O(n √n)

Simple BusO(n^3 √n)

O(n√n)

Segmented Bus:O(n^2 √n)

O(n√n)

For Same Performance, compare the wire area and power

Hong Kong University of Science and Technology, March 2010

Page 25: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

1(d): NoC and 1(d): NoC and communication reliabilitycommunication reliability

25

Router

UMODEM

UMODEM

UMODEM

UMODEM

Router

UMODEM

UMODEM

UMODEM

UMODEM

Router

Error correction

Synchronization

ISI reduction

Parallel to Serial Convertor

Modulation

Link Interface

Interconnect

Input buffer

m

n

Fault tolerance & error correction

A. Morgenshtein, E. Bolotin, I. Cidon, A. Kolodny, R. Ginosar, “Micro-modem – reliability solution for NOC communications”, ICECS 2004 Hong Kong University of Science and Technology, March 2010

Page 26: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

1(e): NoC and GALS1(e): NoC and GALS

Modules in NoC System use different clocks May use different voltages

NoC can take care of synchronization

NoC design may be asynchronous No waste of power when the links

and routers are idle

26Hong Kong University of Science and Technology, March 2010

Page 27: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

2: NoC and engineering 2: NoC and engineering productivityproductivity

NoC eliminates ad-hoc global wire engineering

NoC separates computation from communication NoC supports modularity and reuse

of cores NoC is a platform for system

integration, debugging and testing

27Hong Kong University of Science and Technology, March 2010

Page 28: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

3: NoC and CMP3: NoC and CMP

Uniprocessors cannot provide Power-efficient performance growth Interconnect dominates dynamic power Global wire delay doesn’t scale Instruction-level parallelism is limited

Power-efficiency requires many parallel local computations

Chip Multi Processors (CMP) Thread-Level Parallelism (TLP)

28

Gate

Inte

rcon

nect

Diff.

Uniprocessordynamic power

(Magen et al., SLIP 200

UniprocessirPerformance

Die Area (or Power)

Hong Kong University of Science and Technology, March 2010

Page 29: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

3: NoC and CMP3: NoC and CMP Uniprocessors cannot provide Power-efficient

performance growth Interconnect dominates dynamic power Global wire delay doesn’t scale Instruction-level parallelism is limited

Power-efficiency requires many parallel local computations Chip Multi Processors (CMP) Thread-Level Parallelism (TLP)

Network is a natural choice for CMP!

29Hong Kong University of Science and Technology, March 2010

Page 30: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

3: NoC and CMP3: NoC and CMP Uniprocessors cannot provide Power-efficient

performance growth Interconnect dominates dynamic power Global wire delay doesn’t scale Instruction-level parallelism is limited

Power-efficiency requires many parallel local computations Chip Multi Processors (CMP) Thread-Level Parallelism (TLP)

Network is a natural choice for CMP!

30Hong Kong University of Science and Technology, March 2010

Network is a

natural choice

for CMP

Page 31: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Why Now is the time for Why Now is the time for NoC?NoC?

31

Difficulty of DSM wire design

Productivity pressure

CMPs

Hong Kong University of Science and Technology, March 2010

Page 32: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Traffic abstractionsTraffic abstractions Traffic model are generally captured from actual

traces of functional simulation A statically distribution is often assumed for

message

32

PE1 PE2PE3

PE4PE12

PE11

PE5

PE9PE7

PE8 PE6

PE10

Flow Bandwidth Packet size Latency1 ->10 400kb/s 1kb 5ns2->10 1.8Mb/s 3kb 12ns1->4 230kb/s 2kb 6ns4->10 50kb/s 1kb 3ns4->5 300kb/s 3kb 4ns3->10 34kb/s 0.5kb 15ns5->10 400kb/s 1kb 4ns6->10 699kb/s 2kb 1ns8->10 300kb/s 3kb 12ns9->8 1.8mb/s 5kb 7ns9->10 200kb/s 5kb 10ns7->10 200kb/s 3kb 12ns11->10 300kb/s 4kb 10ns12->10 500kb/s 5kb 12ns

Hong Kong University of Science and Technology, March 2010

Page 33: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Data abstractionsData abstractions

33Hong Kong University of Science and Technology, March 2010

Page 34: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Layers of abstraction in Layers of abstraction in network modelingnetwork modeling

Software layers Application, OS

Network & transport layers Network topology e.g. crossbar, ring, mesh, torus, fat tree,… Switching Circuit / packet switching(SAF, VCT), wormhole Addressing Logical/physical, source/destination, flow, transaction Routing Static/dynamic, distributed/source, deadlock

avoidance Quality of Service e.g. guaranteed-throughput, best-effort Congestion control, end-to-end flow control

Data link layer Flow control (handshake) Handling of contention Correction of transmission errors

Physical layer Wires, drivers, receivers, repeaters, signaling, circuits,..

34Hong Kong University of Science and Technology, March 2010

Page 35: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

How to select architecture How to select architecture ??

Architecture choices depends on system needs.

35

ASIC

FPGA

ASSP

CMP/Multicor

e

ReconfigurationRate

During run time

At boot time

At design time

Single application General purpose or Embedded systems

Flexibility

Hong Kong University of Science and Technology, March 2010

Page 36: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

How to select architecture How to select architecture ??

Architecture choices depends on system needs.

36

ASIC

FPGA

ASSP

CMP/Multicor

e

ReconfigurationRate

During run time

At boot time

At design time

Single application General purpose or Embedded systems

Flexibility

A large range of solutions!

Hong Kong University of Science and Technology, March 2010

Page 37: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Example: OASISExample: OASISASIC assumed

Traffic requirement are known a-priori

Features Packet switching – wormhole Quality of service e Mesh topology

37

K. Mori, A. Ben Abdallah, and K. Kuruda, “Design and Evaluation of a Complexity Effective Network-on-Chip Architecture on FPGA", The 19th Intelligent System Symposium (FAN 2009), pp.318-321, Sep. 2009.S. Miura, A. Ben Abdallah, and K. Kuroda, "PNoC - Design and Preliminary Evaluation of a Parameterizable NoC for MCSoCGeneration and Design Space Exploration", The 19th Intelligent System

Symposium (FAN 2009), pp.314-317, Sep. 2009. Hong Kong University of Science and Technology, March 2010

Page 38: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Perspective 1: NoC vs. Perspective 1: NoC vs. BusBus Aggregate bandwidth

grows Link speed unaffected by

N Concurrent spatial reuse Pipelining is built-in Distributed arbitration Separate abstraction

layers

However: No performance guarantee Extra delay in routers Area and power overhead? Modules need NI Unfamiliar methodology 38

Bandwidth is limited, shared Speed goes down as N grows No concurrency Pipelining is tough Central arbitration No layers of abstraction(communication and computation are coupled)

However: Fairly simple and familiar

NoC Bus

Hong Kong University of Science and Technology, March 2010

Page 39: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Perspective 2: NoC vs. Off-Perspective 2: NoC vs. Off-chip Networkschip Networks

Cost is in the links Latency is tolerable Traffic/applications

unknown Changes at runtime Adherence to networking standards

39

Sensitive to cost: area power

Wires are relatively cheap

Latency is critical

Traffic may be known a-priori

Design time specialization

Custom NoCs are possible

Off-Chip NetworksNoC

Hong Kong University of Science and Technology, March 2010

Page 40: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

VLSI CAD problemsVLSI CAD problems

Application mapping Floorplanning / placement Routing Buffer sizing Timing closure Simulation Testing

40Hong Kong University of Science and Technology, March 2010

Page 41: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

VLSI CAD problems in NoCVLSI CAD problems in NoC Application mapping (map tasks to cores) Floorplanning / placement (within the network) Routing (of messages) Buffer sizing (size of FIFO queues in the

routers) Timing closure (Link bandwidth capacity

allocation) Simulation (Network simulation, traffic/delay/power

modeling) Other NoC design problems (topology

synthesis, switching, virtual channels, arbitration, flow control,……)

41Hong Kong University of Science and Technology, March 2010

Page 42: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Typical NoC design flowTypical NoC design flow

42

Place Modules

Determine routing and adjust link capacities

Hong Kong University of Science and Technology, March 2010

Page 43: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Timing closure in NoCTiming closure in NoC

43

Define inter-module traffic

Place modules

Increase link

capacities

QoS satisfie

d ?

No

Yes

Finish Too long capacity results in poor QoS Too high capacity wastes area Uniform link capacities are a waste in ASIP system

Hong Kong University of Science and Technology, March 2010

Page 44: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Network delay modelingNetwork delay modeling Analysis of mean packet delay us wormhole network

Multiple Virtual-Channels Different link capacities Different communication demands

44Hong Kong University of Science and Technology, March 2010

Page 45: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

NoC design NoC design requirementsrequirements

High-performance interconnect High-throughput, latency, power, area

Complex functionality (performance again) Support for virtual-channels QoS

Synchronization Reliability, high-throughput, low-laten

45

Page 46: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

ISO/OSI network protocol stack ISO/OSI network protocol stack modelmodel

46Hong Kong University of Science and Technology, March 2010

Page 47: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Part IIPart IINoC topologies NoC topologies

Switching strategiesSwitching strategies

Routing algorithmsRouting algorithms

Flow control schemesFlow control schemes

Clocking schemesClocking schemes

QoSQoS

Basic Building Blocks Basic Building Blocks

Status and Open ProblemsStatus and Open Problems

47Hong Kong University of Science and Technology, March 2010

Page 48: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

NoC TopologyNoC Topology

Adopted from large-scale networks and parallel computing

Topology classifications: Direct topologies Indirect topologies

48Adaptive Systems Laboratory, Univ. of Aizu

The connection map between PEs

Page 49: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Direct topologiesDirect topologies

Each switch (SW) connected to a single PEAs the # of nodes in the system increases,

the total bandwidth also increases

49Hong Kong University of Science and Technology, March 2010

1 PE is connected to only a single SW

PE

PE PE

PE

SW

SW SW

SW

Page 50: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Direct topologiesDirect topologiesMeshMesh2D mesh is most popular

All links have the same lengthEases physical design

Area grows linearly with the the # of nodes

504x4 Mesh Hong Kong University of Science and Technology, March 2010

Page 51: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Direct topologiesDirect topologiesTorus and Folded TorusTorus and Folded Torus

51Hong Kong University of Science and Technology, March 2010

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

RPE

RPE

RPE

RPE

RPE

RPE

RPE

R

PE

RPE

RPE

RPE

RPE

RPE

RPE

RPE

R

Folded TorusTorus

Overcomes the long link limitation of a 2-D torus

Links have the same size

Similar to a regular Mesh Excessive delay problem due to

long-end-around connection

Page 52: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Direct topologiesDirect topologiesOctagon topologyOctagon topology

Messages being sent between any 2 nodes require at most two hops

More octagons can be tiled together to accommodate larger designs

52Hong Kong University of Science and Technology, March 2010

PE

PE

PE

PE PE

PE

PE

PE

SW

Page 53: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Indirect topologiesIndirect topologies

Fat tree topology Nodes are connected only to the leaves of the tree More links near root, where bandwidth

requirements are higher

53Hong Kong University of Science and Technology, March 2010

A set of PEs are connected to a switch (router).

PE PEPEPE PE PE PEPE

SW

SW

SW

SW SW SW

SW

Page 54: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Indirect topologiesIndirect topologiesk-ary n-fly butterfly networkk-ary n-fly butterfly network

Blocking multi-stage network – packets may be temporarily blocked or dropped in the network if contention occurs

54Hong Kong University of Science and Technology, March 2010

Example: 2-ary 3-fly butterfly network

Page 55: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Indirect topologiesIndirect topologies(m, n, r) symmetric Clos network(m, n, r) symmetric Clos network

3-stage network in which each stage is made up of a number of crossbar switches

m : number of middle-stage switchesn : number of input/output

nodes on each input/output switchr : number of I and O switches

Example: (3, 3, 4) Clos network

Non-blocking networkExpensive (several full crossbars)

55Hong Kong University of Science and Technology, March 2010

Page 56: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Indirect topologiesIndirect topologiesBenes networkBenes network Rearrangeable network in which paths may have

to be rearranged to provide a connection, requiring an appropriate controller

Clos topology composed of 2 x 2 switches

56

Example: (2, 2, 4) re-arrangeable Clos network constructed using two (2, 2, 2) Clos networks with 4 x 4 middle switches.

Hong Kong University of Science and Technology, March 2010

Page 57: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Irregular TopologiesIrregular TopologiesCustomizedCustomized

Customized for an applicationUsually a mix of shared bus, direct,

and indirect network topologies

57Hong Kong University of Science and Technology, March 2010

Example1: Reduced mesh Example 2: Cluster-based hybrid topology

PE PE

PE PE PE

PE

PE PE

PE PE

PE PE

PE

PE

PE PE PE

PE

PE

PE

PE

PE PE

PE PEsw

sw

swsw

sw

sw

sw

sw sw

sw

sw sw

sw sw

sw sw

sw sw swsw

sw

sw sw

sw swsw

Page 58: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Example 1: Example 1: Partially irregular Partially irregular 2D-Mesh topology2D-Mesh topology

58Adaptive Systems Laboratory, Univ. of Aizu

R

PE

R

PE

R R

PE

R

PE

R

PE

R

PE

R R

PE

R

PE

R

PE ∆y

∆x

PE 2∆y

∆x

PE 2∆y

2∆x

PE

PE

PE

Contains oversized rectangularly shaped PEs.

Page 59: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Example 2: Irregular MeshExample 2: Irregular Mesh

59Adaptive Systems Laboratory, Univ. of Aizu

R

R

R R

RR

R

R

R R

This kind of chip does not limit the shape of the PEs or the placement of the routers. It may be considered a "custom" NoC

Page 60: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

How to Select a How to Select a Topology ?Topology ?

Application decides the topology typeIf PEs = few tens Star, Mesh topologies are recommendedIf PEs = 100 or more Hierarchical Star, Mesh are recommendedSome topologies are better for certain

designs than othersMost of the times, when one topology is

better in performance, it is worse in power consumption!!

60Adaptive Systems Laboratory, Univ. of Aizu

Page 61: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Part IIPart IINoC topologies NoC topologies

NoC Switching strategiesNoC Switching strategies

Routing algorithmsRouting algorithms

Flow control schemesFlow control schemes

Clocking schemesClocking schemes

QoSQoS

Basic Building Blocks Basic Building Blocks

Status and Open ProblemsStatus and Open Problems

61Hong Kong University of Science and Technology, March 2010

Page 62: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

NoC Switching Strategies NoC Switching Strategies

There are two basic modes: Circuit switching Packet switching

62Adaptive Systems Laboratory, Univ. of Aizu

Switching determines how flits and packets flows through routers in the network

Page 63: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Circuit SwitchingCircuit Switching

Network resources (channels) are reserved before a packet is sent

Entire path must be reserved firstThe packets do not contain routing

information, but rather data and information about the data.

Circuit-switched networks require no overhead for packetisation, packet header processing or packet buffering

63Hong Kong University of Science and Technology, March 2010

Page 64: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Circuit SwitchingCircuit Switching

64Adaptive Systems Laboratory, Univ. of Aizu

Header ACK Data

Setup time Transfer time

Router DelayRouting + switching delay

R1

R2

R3

Page 65: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Circuit Switching Circuit Switching

Once circuit is setup, router latency and control overheads are very low

Very poor use of channel bandwidth if lots of short packets must be sent to many different destinations More commonly seen in embedded SoC

applications where traffic patterns may be static and involve streaming large amounts of data between different IP blocks

65Hong Kong University of Science and Technology, March 2010

Page 66: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Packet SwitchingPacket SwitchingWe can aim to make better use of

channel resources by buffering packets. We then arbitrate for access to network resources dynamically.

We distinguish between different approaches by the granularity at which we reserve resources (e.g. channels and buffers) and conditions that must be met for a packet to advance to the next node

66Hong Kong University of Science and Technology, March 2010

Page 67: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Packet SwitchingPacket Switching

67

Store-and-forward (SaF)

Cut-through

Wormhole

Advance when entire packet is buffered + L free flit buffers at next node

Advance when L free flit buffers at the next node

Can advance when at least one flit buffer is available

L : Packet LengthHong Kong University of Science and Technology, March 2010

Packet-Buffer Flow Control

Flit-Buffer Flow Control

Page 68: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Packet Switching Packet Switching Store and Forward (SAF)Store and Forward (SAF)

Packet is sent from one router to the next only if the receiving router has buffer space for entire packet

Buffer size in the router is at least equal to the size of a packet

68

Switch

Buffer

Switch

Buffer

Switch

Buffer

Forward packet by packet

Store and Forward switching

data flit header flit

packet

Hong Kong University of Science and Technology, March 2010

Page 69: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Packet switching Packet switching Wormhole (WH)Wormhole (WH) Flit is forwarded to a router if space exists for that flit Parts of the packet can be distributed among two or more routers Buffer requirements are reduced to one flit, instead of an entire

packet

69

Switch

Buffer

Switch

Buffer

Switch

Buffer

WH switching technique

packet

data flit header flit

Forward flit by flit

Hong Kong University of Science and Technology, March 2010

Page 70: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Packet switching Packet switching Virtual Channel (VC)Virtual Channel (VC)

Improve performance of WH routing, prevent a single packet blocking a free channel e.g. if the green packet is blocked, the red

packet may still make progress through the network

We can interleave flits from different packets over the same channel

70Hong Kong University of Science and Technology, March 2010

Page 71: Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jpbenab@u-aizu.ac.jp

Part IIPart IINoC topologies NoC topologies

NoC Switching strategiesNoC Switching strategies

Routing algorithmsRouting algorithms

Flow control schemesFlow control schemes

Clocking schemesClocking schemes

QoSQoS

Basic Building Blocks Basic Building Blocks

Status and Open ProblemsStatus and Open Problems

71Hong Kong University of Science and Technology, March 2010