on-time network on-chip: analysis and architecture

23
On-time Network On- Chip: Analysis and Architecture CS252 Project Presentation Dai Bui

Upload: yves

Post on 08-Jan-2016

37 views

Category:

Documents


5 download

DESCRIPTION

On-time Network On-Chip: Analysis and Architecture. CS252 Project Presentation Dai Bui. Introduction. The project aims at providing predictable timing delay for network on chip communication paradigm Purpose is not only to improve network speed - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: On-time Network On-Chip: Analysis and Architecture

On-time Network On-Chip: Analysis and ArchitectureCS252 Project PresentationDai Bui

Page 2: On-time Network On-Chip: Analysis and Architecture

IntroductionThe project aims at providing predictable

timing delay for network on chip communication paradigm

Purpose is not only to improve network speed

Packet worst-case delay should be estimated analytically instead of empirically

Page 3: On-time Network On-Chip: Analysis and Architecture

MotivationsCyber Physical Systems(Example from Hermann Kopetz at TU Vienna)

Need for separate flows instead of networks

Page 4: On-time Network On-Chip: Analysis and Architecture

QNoC From Technion

Asynchronous communication

Support multiple service levels:

Seems to be suitable for soft real-time applications like video streaming

But what happens when multiple real-time flows have to share the same link? Non deterministic behaviors for flows So we need to keep track of the number of guaranteed flows and its

demand on on each link

Page 5: On-time Network On-Chip: Analysis and Architecture

SoCBUSFrom University of Linkoping

Guarantee real-time properties by setting up a path when sending: Initiate a path by sending a setting up packet The path will be locked until all data have been sent After that the path is unlocked

Drawbacks: What happens if we have two real-time flows on the

same link? Other traffic is blocked. This seems to be a good

solution when sending a large bulk of data but not good for a periodic, non-continuous flows

Bad link utilization due to link locking

Page 6: On-time Network On-Chip: Analysis and Architecture

ÆtherealFrom NXP

Try to employ the conflict free routing-> no packet is dropped

Avoid conflict between two packets on the same link by delaying one packet

Drawback: Global scheduling of packets

inflexible, difficult to scale Partial design-time

scheduling -> not suitable for multi-core

Page 7: On-time Network On-Chip: Analysis and Architecture

IdeaExploit

Admission control Real-time packet scheduling Run-time configuration Spatial diversity

Page 8: On-time Network On-Chip: Analysis and Architecture

Design Goals Multiple real-time flows can be multiplexed on one link

Utilize the spatial data paths between sources and nodes to avoid the conflicts between real-time flows

Does not block links completely as SoCBUS, best-effort flows still can travel links used by real-time flows

Avoid unpredicted behaviors networks as in QNoC, when there are multiple real-time flows suddenly travelling on the same link and their total bandwidth exceeds the bandwidth of the link. The admission control in our architecture can prevent that. Senders should always know if their required specifications for their communications can be met or not

Provide a reconfigurable state for real-time flows on a network, we do not need to pre-calculate that at design time as in Æthereal, which is really not suitable for the multi-core architecture

Verifiable for critical systems

Page 9: On-time Network On-Chip: Analysis and Architecture

Path Setup Protocol Sender initiates a new real-time flow by sending

its REQUEST to the master node with its specifications about the new flow: end-to-end delay, max packet length, minimum interval between packets, …

The master node computes the delay constraints and specifications of the new flow against its knowledge about previously reserved flows If it can not find a path, it send back to the

requested node a REJECT If there is any possible path, it sends SETUP

command to routers on the path to reconfigure the routers (possibly modify configurations for other flows as well) It waits for all routers to receive this command

and ACKs from these routers It sends back to the requested node an ACCEPT

Page 10: On-time Network On-Chip: Analysis and Architecture

MiscellaneousWhen a real-time flow is not needed, its path

can be torn. Path tear-up protocol is almost the same as path setup protocol but with reversed commands.

Each real-time flow is uniquely identified by a flow ID, this is embedded in each real-time packet so that routers on the path can identified the packets

Master node interacts with other routers about flows using this ID

Page 11: On-time Network On-Chip: Analysis and Architecture

Heterogeneous CommunicationBest-effort packets are preemptible

by real-time packets. Real-time packets are not preemptible

No acknowledgement for real-time flits since the scheduling mechanism will make sure that the buffer size for real-time flows is bounded (based on specifications)

Double the speed of a packet since no ACK mechanism is needed

Page 12: On-time Network On-Chip: Analysis and Architecture

Router StructureLooks the same as

virtual channels routers

When a packet is identified by the flow ID, the router will put it in a designated FIFO queue.

The previously reserved information of a flow will tell the router which port packets of a flow will be forwarded to.

Page 13: On-time Network On-Chip: Analysis and Architecture

Delay Model and Fixed Priority Scheduler Delay bound of a packet of flow f on out going edge e of a

node is defined as the total of the queueing time at that node plus the propagation time for the head flit of a packet to reach next node

Fixed Priority Scheduler: Step 1: Mature packets are scheduled first. Packets of flows with

highest priority are selected to forwarded first. Step 2(optional): Immature packets can be forwarded if there is no

mature packets.

Queueing delay by fixed-priority scheduler

Where Oe(g) is the order function (to compare priority) of flows on edge e. There is no notion of global priority

Details of the proof is in the report

Page 14: On-time Network On-Chip: Analysis and Architecture

End-to-end DelayThe end-to-end delay has to be larger than the

total delay at each edge on the path

Assume that because it takes one cycle to transmit a flit in a NoC.

Page 15: On-time Network On-Chip: Analysis and Architecture

Utilization ConstraintUtilization of each link when shared by multiple

flows must not greater than 1

Where tf is the minimum interval between two successive packets of flow f

Page 16: On-time Network On-Chip: Analysis and Architecture

Buffer bound for each flowEach flow has a buffer bound at each node:

With some constrains, the sufficient buffer size will be smaller than 2 packet-size. In some cases, buffer size is just 1 packet size

Page 17: On-time Network On-Chip: Analysis and Architecture

Routing Always deadlock-free

Test example Three flows Flow 1: PE7-> PE23 Flow 2: PE6-> PE3 Flow 3: PE5-> PE19

Exhaustive depth first search

Routing example Routing algorithm is based on XY

routing Flow 3 comes last (the request

packet reach the master node last When the routing algorithm for it

reach the link from PE7->PE8, the link can not afford 3 flows due to utilization of it will be greater than 1

Link from PE7->Pe12 is then selected

Page 18: On-time Network On-Chip: Analysis and Architecture

Example 1 Results Implemented

in SystemC based loosely on Noxim

Graph for Flow 2: PE6-> PE3

Max Packet Length 3

Min Interval 8

Page 19: On-time Network On-Chip: Analysis and Architecture

Example 1ResultsFlow 1: PE7-

>PE23 Max Packet

Length: 4 Min

interval: 9

Flow 3: PE5->PE19 Max Packet

Length: 5 Min

Interval 7

Page 20: On-time Network On-Chip: Analysis and Architecture

Buffer size

Page 21: On-time Network On-Chip: Analysis and Architecture

Example 2 - Three flows on one linkPacket Scheduling without the step 2

Page 22: On-time Network On-Chip: Analysis and Architecture

ComparisonExperiments

The same total input buffer size of 10 flits for all protocols Packet size = 5 flits For real-time NoC, we restrict two real-time flows per link

and exploit spatial diversity of paths in NoC

Page 23: On-time Network On-Chip: Analysis and Architecture

Future WorkFind a better optimization for delays and path

in network

Integrate with real-time processors like PRET to form a real-time multi-core processor

Understand how it can support the Byzantine Generals problem for fault-tolerance

Suitable for PTIDES No packet reordering Bounded communication delay