centralized and software-based run-time traffic management ... · higher robustness due to...

18
22.04.2015 Centralized and software-based run-time traffic management inside configurable regions of interest in mesh-based Networks-on-Chip Philipp Gorski, Tim Wegner, Dirk Timmermann Institute of Applied Microelectronics and Computer Engineering University of Rostock

Upload: dangtruc

Post on 10-Aug-2019

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Centralized and software-based run-time traffic management ... · Higher robustness due to dual-path routing (spatially independent) Tolerable costs Traffic monitoring (HW) and path

22.04.2015

Centralized and software-based run-time traffic management inside configurable regions of interest in mesh-based Networks-on-Chip

Philipp Gorski, Tim Wegner, Dirk Timmermann

Institute of Applied Microelectronics and Computer Engineering

University of Rostock

Page 2: Centralized and software-based run-time traffic management ... · Higher robustness due to dual-path routing (spatially independent) Tolerable costs Traffic monitoring (HW) and path

Outline Fundamentals and trends

Chip-Multi-Processor

Network-on-Chip

Quadrant-based mesh (QMesh)

Overview

Traffic management

Experiments

Setup & flow

Results

Conclusion

22.04.2015 2

Page 3: Centralized and software-based run-time traffic management ... · Higher robustness due to dual-path routing (spatially independent) Tolerable costs Traffic monitoring (HW) and path

Fundamentals and trends - CMPs Modern Chip-Multi-Processors (CMPs)

Modular design

High degree of parallelization (core/thread level)

Challenge: efficient on-chip communication despite rising core count

22.04.2015 3

computation IP

local

infrastructure

I/O

I/O global

exte

rna

l mem

ory

on-chip components

IP

local

IP

local

IP

local communication

memory

system I/O

P2P, busses, on-chip networks

GPP, GPU, DSP,…

L1/L2 cache, regs, FIFO, …

L3/L4 cache, eDRAM, …

Page 4: Centralized and software-based run-time traffic management ... · Higher robustness due to dual-path routing (spatially independent) Tolerable costs Traffic monitoring (HW) and path

Fundamentals and trends - CMPs

Key trends an issues

Starting point: on-chip communication infrastructure (here: Networks-on-Chip)

22.04.2015 4

IP cores # increases Heterogeneous On-chip memory

On-chip communi-cation dominant

Application diversity Multiple domains Latency: BW, comp. Virtualization

Interferences Memory access Communication

Utilization wall Power/temperature Bandwidth (BW) Dark silicon

PVT variation increases Reliability

Architecture Workload Technology scaling

Page 5: Centralized and software-based run-time traffic management ... · Higher robustness due to dual-path routing (spatially independent) Tolerable costs Traffic monitoring (HW) and path

0,0 1,0 2,0 3,0

0,1 1,1 2,1 3,1

0,2 1,2 2,2 3,2

0,3 1,3 2,3 3,3

(xSRC ,ySRC)

(xDST ,yDST)

Fundamentals and trends - NoCs Networks-on-Chip (NoCs): approach for scalable on-chip communication

Packet-based communication, GALS principle

Topology: interconnection between components

Path: E2E packet route through NoC determined by routing algorithm

22.04.2015 5

Router (R)

Link

Network inter-face (NI)

Tile: voltage & frequency island

Dimension-ordered XY/YX routing

Minimal path length

Deterministic

Dead-/livelock-free

Minimal HW effort

Non-adaptive

4x4 2D-mesh

Page 6: Centralized and software-based run-time traffic management ... · Higher robustness due to dual-path routing (spatially independent) Tolerable costs Traffic monitoring (HW) and path

QMesh - overview Idea: improve IP core connectivity

Increase number of NIs per IP core (Q0 – Q3)

Connect each core to all surrounding routers

QMesh: quadrant-based mesh + XY routing

Dual-path routing

Spatially independent paths

22.04.2015 6

IP core

Q3 Q0

Q1Q2

R

North

South

EastWest

Page 7: Centralized and software-based run-time traffic management ... · Higher robustness due to dual-path routing (spatially independent) Tolerable costs Traffic monitoring (HW) and path

QMesh - overview DST at same row (L, R) or column (U, D)

22.04.2015 7

DST in quadrant (Q0 – Q3)

L R

U

D

IP IP IP IP

IP IP IP IP IP

IP SRC IP

IP IP IP IP IP

IP DST IP IPQ3

Q2

Q0

Q1

Q2 Q1

Q0Q3

IP IP

IP IP IP IP IP

IP IP SRC IP IP

IP IP IP IP IP

IP DSTQ3

Q2

Q0

Q1

L, R, U, D: path length reduced by 1 hop (compared to 2D-mesh)

Q0 – Q3: path length reduced by up to 2 hops (worst case: same as 2D-mesh)

Page 8: Centralized and software-based run-time traffic management ... · Higher robustness due to dual-path routing (spatially independent) Tolerable costs Traffic monitoring (HW) and path

QMesh - traffic management Periodic algorithm for dynamic adaptation of routing paths (re-)balancing

Performed in entire NoC or Region Of Interest (ROI)

Centralized calculation on Master Tile (MT)

Sensing: by local HW counters measure activity (routers, links, NIs)

Evaluation & update: by SW on MT choose path with smallest workload

22.04.2015 8

ROI

IP IP IP IP IP

IP IP IP IP IP

IP MT IP IP IP

IP IP IP IP IP

IP IP IP IP IP

Activity sensing

Traffic evaluation

Path update

Page 9: Centralized and software-based run-time traffic management ... · Higher robustness due to dual-path routing (spatially independent) Tolerable costs Traffic monitoring (HW) and path

Simulation flow Sniper: IP core simulation SW timing

SystemC simulator: NoC activity statistics

DSENT/HOTSPOT: provision of power/temperature calculation profiles

Post processing: activity power temperature wear-out

22.04.2015 9

post-processing

DSENT HOTSPOT

NoC simulator

Sniper software

timing

power = f(activity) temperature = f(power)

activity statistics

NoC setup

calculation functions

Page 10: Centralized and software-based run-time traffic management ... · Higher robustness due to dual-path routing (spatially independent) Tolerable costs Traffic monitoring (HW) and path

Experimental setup Tools:

NoC simulator: SystemC-based, cycle-accurate

Sniper, DSENT, HOTSPOT (third-party)

Parameters

2D-mesh (XY routing) and QMesh (XY routing + TM + PA)

8x8 NoC (1GHz frequency), 9 flit FIFO depth (router), 64 bit link width

Synthetic traffic patterns

Single-threaded applications (bit complement/reverse, transpose, shuffle)

Multi-threaded applications (nearest neighbor, hotspot, rentian)

Evaluated parameters

Packet delay (∆DELAY) vs. power overhead (∆POWER)

Reliability: wear-out acceleration factor 𝑎𝑀𝑇𝑇𝐹

Network saturation margin

22.04.2015 10

Page 11: Centralized and software-based run-time traffic management ... · Higher robustness due to dual-path routing (spatially independent) Tolerable costs Traffic monitoring (HW) and path

Increased power consumption but reduced packet delay

Locality/fewer hops & dynamic path adaptations

Reduced & balanced traffic

Lower & evenly distributed activity

reduced Pdyn & thermal hot spots

∆POWER lower than expected (~100%)

Experimental results – power and delay

22.04.2015 11

Page 12: Centralized and software-based run-time traffic management ... · Higher robustness due to dual-path routing (spatially independent) Tolerable costs Traffic monitoring (HW) and path

Experimental results - reliability

General wear-out decrease through QMesh

Increase of mean router lifetime: 10% for low PIR, 60% for high PIR (avg. ~ 35%)

22.04.2015 12

𝒂𝑴𝑻𝑻𝑭 =𝑴𝑻𝑻𝑭𝑸𝑴𝒆𝒔𝒉

𝑴𝑻𝑻𝑭𝟐𝑫−𝒎𝒆𝒔𝒉

wear-out increase : 𝑎𝑀𝑇𝑇𝐹 < 1

wear-out decrease: 𝑎𝑀𝑇𝑇𝐹 > 1

Page 13: Centralized and software-based run-time traffic management ... · Higher robustness due to dual-path routing (spatially independent) Tolerable costs Traffic monitoring (HW) and path

Conclusions Modern CMPs require efficient architecture for on-chip communication

NoCs provide appropriate infrastructure

QMesh topology: integration of multiple NIs per IP core to improve connectivity

Preservation of basic NoC structure and associated benefits

Improvements over standard 2D-mesh

Increase of network saturation margin

Reduction of avg. packet delay

Reliability: increased router lifetime due to lower max. temperatures

Higher robustness due to dual-path routing (spatially independent)

Tolerable costs

Traffic monitoring (HW) and path adaptations (SW) for QMesh at runtime

Dynamic traffic (re-)balancing & hotspot reduction

22.04.2015 13

Page 14: Centralized and software-based run-time traffic management ... · Higher robustness due to dual-path routing (spatially independent) Tolerable costs Traffic monitoring (HW) and path

22.04.2015 14

Thank you for your attention!

Questions?

Page 15: Centralized and software-based run-time traffic management ... · Higher robustness due to dual-path routing (spatially independent) Tolerable costs Traffic monitoring (HW) and path

QMesh - overview QMesh characteristics

Preservation of basic 2D-mesh structure

Dual-path routing

Spatially independent paths

Required modifications / additional HW costs

8-ported router

4 NIs per IP core

1 programmable Path Table (PT) per IP core

4 bit addressing extension (for Qin and Qout)

Advantages

Costs comparable to 2D-mesh with XY/YX routing

Reduced average path length

Mitigation of traffic interferences

Increased traffic locality

Benefits of 2D-mesh maintained (e.g. deterministic routing)

15 College of Computer Science and Electrical Engineering

Institute of Applied Microelectronics and Computer Engineering

IP coreNI Q3 NI Q0

NI Q1PT

Q0

Right

Q1Q2

Q3

Left

Up

Do

wn

RR

RR

0011...

01

Qout

0011...

01

QIN

QMesh tile

Qin

Q0 = 11

1010

Q1 = 10 Q2 = 00 Q3 = 01

4 bits

Processing element

CX

NI Q2

Page 16: Centralized and software-based run-time traffic management ... · Higher robustness due to dual-path routing (spatially independent) Tolerable costs Traffic monitoring (HW) and path

Traffic Management – Evaluation and Update SNoC: transmission of monitoring data

Evaluation done by master tile (MT)

Basically: choose path with smallest workload balancing

SNoC: transmission of update to path tables data

22.04.2015

Page 17: Centralized and software-based run-time traffic management ... · Higher robustness due to dual-path routing (spatially independent) Tolerable costs Traffic monitoring (HW) and path

Experimental results - reliability Evaluation via acceleration factor of Mean-Time-To-Failure (MTTF): 𝑎𝑀𝑇𝑇𝐹

Wear-out increase: 𝑎𝑀𝑇𝑇𝐹 < 1

Wear-out decrease: 𝑎𝑀𝑇𝑇𝐹 > 1

17

𝒂𝑴𝑻𝑻𝑭 =𝑴𝑻𝑻𝑭𝑸𝑴𝒆𝒔𝒉

𝑴𝑻𝑻𝑭𝟐𝑫−𝒎𝒆𝒔𝒉= 𝒆

𝑬𝒂𝒌

∙𝟏

𝑻𝑸𝑴𝒆𝒔𝒉 –

𝟏𝑻𝟐𝑫−𝒎𝒆𝒔𝒉

𝑡𝑄𝑀𝑒𝑠ℎ , 𝑡2𝐷−𝑚𝑒𝑠ℎ: MTTF of QMesh/2D-mesh

𝑇𝑄𝑀𝑒𝑠ℎ , 𝑇2𝐷−𝑚𝑒𝑠ℎ: avg. router temperature for QMesh/2D-mesh

k: Boltzmann’s constant (8.6×10-5 eV/K)

𝐸𝑎: activation energy of the CMOS devices (here: 0.7 eV at 45nm CMOS)

22.04.2015

Page 18: Centralized and software-based run-time traffic management ... · Higher robustness due to dual-path routing (spatially independent) Tolerable costs Traffic monitoring (HW) and path

∆SAT: relative improvement of network saturation margin

Due to hop reduction and dual-path options

Experimental results – network saturation

22.04.2015 18