approaching ideal noc latency with pre-con fi gured routes

31
Module Module Module Module Module Module Module Module Module Module Module Module Module Module R R R R R R R R R R R R R R Module R R R George Michelogiannakis Master’s Thesis Thesis Advisor: Prof. Manolis Katevenis Computer Science Department University of Crete Approaching Ideal NoC Latency with Pre- Configured Routes

Upload: everly

Post on 14-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Approaching Ideal NoC Latency with Pre-Con fi gured Routes. George Michelogiannaki s Master’s Thesis Thesis Advisor: Prof. Manolis Katevenis Computer Science Department University of Crete. The Future is CMPs –> OCINs Critical. 2006. 2007.5. 2009. 2010.5. 2012. 2013.5? 2015?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

Module

Module

Module

Module

Module

Module

Module

Module

Module Module Module

Module

Module

Module

R

R

R R R

R

RR R R R

R R

R

Module

R

R

R

George Michelogiannakis

Master’s ThesisThesis Advisor: Prof. Manolis Katevenis

Computer Science DepartmentUniversity of Crete

Approaching Ideal NoC Latency with Pre-Configured

Routes

Page 2: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

The Future is CMPs –> OCINs Critical

NOCS: 2

2006 2007.5 2009

2010.5 2012 2013.5? 2015?

Page 3: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

SoCs Grow in Complexity

June 2007 CSD - UOC, Heraklion, Greece 3

Page 4: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

What are NoCs? On-chip structured communication

infrastructure. i.e. the solution!

Composed of routers (switches), channels (wires), network interface logic.

NoC approach inspired by macro networks success.

Some concepts shared.

June 2007 CSD - UOC, Heraklion, Greece 4

Page 5: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

NoC Cost, Channels & Workload

Resource limitation. Cost is Si area and power. Wires plentiful. Many long, wide channels. Buffers limited.

Worrying area & power overhead numbers!

Different constraints motivate some surprising differences in design. NoCs usually developed for a specific

application set.June 2007 CSD - UOC, Heraklion, Greece 5

Page 6: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

Packet Format

Packet divided into flits (wormhole routing).

First flit address or request.

Data flits may follow. They contain no address information.

June 2007 CSD - UOC, Heraklion, Greece 6

Page 7: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

Reference NoC Topology

2D mesh. Router 5x5. Relevant issues:

Routing. Floorplan. Topology. Fault-tolerance. Virtual Channels.

June 2007 CSD - UOC, Heraklion, Greece 7

Page 8: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

June 2007 CSD - UOC, Heraklion, Greece 9

Virtual Channel Routers

Page 9: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

June 2007

Our Work - Introduction Problem: Latency NoCs impose. Motivation: Latency introduced to every

communication pair. Past work: Achieves 1 cycle/hop at 500 MHz. We extend speculation to routing decisions. Goal: Approach buffered wire latency.

Fraction of cycle/hop.

11CSD - UOC, Heraklion, Greece

Page 10: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

June 2007

Our Approach

400 ps good scenario; 1 cycle otherwise.

130 nm library

12CSD - UOC, Heraklion, Greece

Page 11: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

Preliminary Simulation ResultsDynamic multiplexer

I/O wires Typical c.

Worst c.

5 mm 514 ps 808 ps

2 mm 590 ps 893 ps

Pre-configured multiplexer

I/O wires Typical c.

Worst c.

5 mm 240 ps 368 ps

2 mm 254 ps 378 ps

June 2007 CSD - UOC, Heraklion, Greece 13

Latency Latency: 2 – 2.5 times lower

Page 12: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

June 2007

Preferred Paths

Each output has one preferred input. This pref. I/O pair is connected by a

single pre-enabled tri-state driver. Pre-enabling is crucial. Later check if flits correctly

forwarded. Thus, preferred paths are formed.

Reconfigurable at run-time. Custom routes (shapes) allowed.

14CSD - UOC, Heraklion, Greece

Page 13: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

Opportunities for (Re-)configurability

Uniform allocation of memory blocks to processors

Non-uniform allocation of memory blocks to

processors

M M

M

M

M

M

M

M

M M

M

M M

M M

MM

MM

M

M M

M

M

M

M

MM

M

M

M

M

M

P

M

M

M

M

MM

P

M

M

M

M

M

M

M

P

M

M

M

M M

M

M P

M

M

P

M

P

M

P

M

P

M

M

M

M

M

M

June 2007 15CSD - UOC, Heraklion, Greece

Page 14: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

June 2007

Switch Architecture - Output

400

ps1

cycleInput FIFOs. Selectable when non-

empty, or flit to be enqueued.

Pref. path pre-enabled tri-

states.

Routing logic tri-state.

Config. & arbitration logic. Stores pref. path

config. & arbitrates.

16CSD - UOC, Heraklion, Greece

Page 15: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

June 2007

Switch Architecture - Input Dead flits:

Incorrectly eagerly forwarded. Terminated at end of

preferred path. Switch resembles a

buffered crossbar.Decides if flit needs

to be enqueued.

17CSD - UOC, Heraklion, Greece

Page 16: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

Input Queueing Suboptimal

Dead flits enqueued in FIFOs. Impact non-preferred flits. Wasted power.

VCs wouldhelp.

June 2007 CSD - UOC, Heraklion, Greece 18

Page 17: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

June 2007

Routing Algorithm

Deterministic routing employed. Non-preferred paths follow XY

routing. We slightly modify XY routing to

handle preferred paths: Flit correctly eagerly forwarded if it

approaches the destination in any axis. Flit considered dead otherwise.

19CSD - UOC, Heraklion, Greece

Page 18: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

June 2007

Routing Characteristics Flits in preferred

paths may not follow XY routing.

Duplicate copies of a flit may be delivered.

XY routing. Pref. paths.

D

S

20CSD - UOC, Heraklion, Greece

Page 19: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

June 2007

Routing Characteristics Out-of-order

delivery is disallowed. By applying new

configuration at a “safe” time.

XY routing. Pref. paths.

D

S

21CSD - UOC, Heraklion, Greece

Page 20: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

Adaptive Routing Many benefits to offer.

Extra challenging: dead flit classification. An output may switch to “adaptive

mode”. According to application-dictated factors.

Then routes according to the adaptive algorithm. Not previously-decided packets.

Flits routed this way are not dead.June 2007 CSD - UOC, Heraklion, Greece 23

Page 21: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

Deadlock-Freedom

XY routing deadlock-free. What we added to it: Preferred paths.

Provide constraints to prevent circles. Networks remains functional in any case.

Adaptive routing. Depends on exact algorithm.

June 2007 CSD - UOC, Heraklion, Greece 24

Page 22: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

RAM Blocks 4096 x 32 chosen

to balance latency & area efficiency: Larger blocks were

disproportionally slower.

Smaller blocks imposed greater area overhead.

SP area efficient.

TP 75% and DP 100% larger.

June 2007 CSD - UOC, Heraklion, Greece 25

Page 23: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

Simple 2D Mesh Topology 5x5 switches. One for each PE. Empty space

between PEs. Equally far away

from A/D pins. Does not take

advantage of environment.June 2007 CSD - UOC, Heraklion, Greece 26

Page 24: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

2D Mesh with 2 Subnetworks

Divide the network. Request, reply? X, Y axis?

Switches in front of pins.

Need to be interconnected.

Still 1 switch/PE.June 2007 CSD - UOC, Heraklion, Greece 27

Page 25: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

Rotated RAM Blocks Switches every

two X axes. Half the number!

Switches slightly larger.

Our final topology an optimization of this.

June 2007 CSD - UOC, Heraklion, Greece 28

Page 26: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

June 2007

NoC Topology – Bar Floorplan

Each switch is 6x6 and serves 4 PEs.

29CSD - UOC, Heraklion, Greece

Page 27: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

June 2007

Bar Floorplan

Would be 8x12: Vertical links drive

address inputs. 2 PE data ports

served by 1 switch port.

30CSD - UOC, Heraklion, Greece

Page 28: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

June 2007

Cross Floorplan

31CSD - UOC, Heraklion, Greece

Page 29: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

June 2007

Switch P&R Results 130 nm

implementation library. Typical case.

Pref. path latency: 300-420 ps. 450-500 ps (incl. 1mm).

1 cycle/node otherwise.

Past work: 1 cycle/node at 500 MHz.

Clock frequency 667 MHz

Flit width 39

FIFO lines 2

Number of FIFOs 30

Bar area overhead

13%

Cross area overhead

18%

Number of cells 15 K

Number of gates 45 K

Total dynamic power

80 mW32CSD - UOC, Heraklion, Greece

Page 30: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

June 2007

Future Work

Synchronization issues – A flit may arrive at any time. Impose preferred path constraints. Implement switch asynchronously.

Evaluation in complete system. Implement fault-tolerance.

33CSD - UOC, Heraklion, Greece

Page 31: Approaching Ideal NoC Latency with Pre-Con fi gured Routes

June 2007

Conclusion We approach ideal latency.

By pre-enabled tri-state paths. Our NoC is a generalized “mad-

postman” [C. R. Jesshope et al, 1989].

Our NoC is easily generalized – topology may need to be changed.

Past NoC research can be applied for further optimizations.

34CSD - UOC, Heraklion, Greece