run-time adaptive on-chip communication scheme

30
Run-time Adaptive on-chip Communication Scheme 林林林 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C

Upload: carl

Post on 23-Feb-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Run-time Adaptive on-chip Communication Scheme. 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C. Outline. Abstract Introduction Motivation Case Study AdNoC Concept Definitions Algorithm Hardware Implementation Conclusion. Abstract. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Run-time  Adaptive on-chip Communication Scheme

Run-time Adaptive on-chip Communication Scheme

林孟諭 Dept. of Electrical EngineeringNational Cheng Kung University

Tainan, Taiwan, R.O.C

Page 2: Run-time  Adaptive on-chip Communication Scheme

2

Outline

• Abstract• Introduction• Motivation Case Study • AdNoC Concept

– Definitions

• Algorithm• Hardware Implementation• Conclusion

Page 3: Run-time  Adaptive on-chip Communication Scheme

3

Abstract

• During run-time varying workloads and/or constraints in embedded systems require run-time adaptivity to provide a high degree of efficiency during any operation mode/scenario.

• We are presenting the first approach of an adaptive on-chip communication scheme.

• It provides an adaptive routing/path allocation algorithm to meet a required level of Quality of Services (QoS) which is guaranteed bandwidth.

Page 4: Run-time  Adaptive on-chip Communication Scheme

4

Introduction (1/2)

• A run-time adaptive network on chip that adapts the underlying interconnection infrastructure on-demand in response to changing communication requirements imposed by an application.

• To provide on-demand interconnections, we present a novel adaptive routing/path allocation algorithm that meets QoS requirements (bandwidth).

Page 5: Run-time  Adaptive on-chip Communication Scheme

5

Introduction (2/2)

• The scheme makes decisions locally at each router depending on the available bandwidth in each direction to the neighboring router.

• Dynamic connections are realized by re-assigning a certain number of buffer blocks to different output ports of a router on-demand.

• It also increases the resource utilization, especially buffer utilization, through on-demand buffer block configuration.

Page 6: Run-time  Adaptive on-chip Communication Scheme

6

Motivation Case Study (1/4)

• We motivate the need of an adaptive NoC by means of a very simple scenario. We study an MPEG decoder [1] and an Image Processing Line (IPL) [18] application.

The task graphs are shown in Figures 1a and 1b.

Assume at time t0 the NoC is running the MPEG video decoder (Fig. 1c).

At time t1, the IPL needs to be executed then it is also mapped besides the MPEG onto the processing elements. Once a mapping is performed, the routers attempt to set up meaningful routes (Fig. 1d).

Page 7: Run-time  Adaptive on-chip Communication Scheme

7

Fig. 1. Motivation to use an adaptive communication architecture

Page 8: Run-time  Adaptive on-chip Communication Scheme

8

Motivation Case Study (2/4)

• In this example, the Gauss task Gauss1 first establishes a route to its neighboring filter task Filter1. It then conducts QNoC a deterministic XY routing algorithm for Filter2.

• However, that will fail due to the limited bandwidth availability.

• Consequently, it forces the router at Gauss1 to try another route which is successful (Fig. 1e).

Page 9: Run-time  Adaptive on-chip Communication Scheme

9

Motivation Case Study (3/4)

• With the routes, the routers supply a corresponding buffer block, allocating the buffer to output ports on-demand.

• The second Gauss task Gauss2 attempts to conduct the same action.

• However, it fails at finding a route to Filter1 and Filter2. Thus it becomes necessary to invoke a re-mapping (Fig. 1f).

Page 10: Run-time  Adaptive on-chip Communication Scheme

10

Motivation Case Study (4/4)

• Routing needs to be implemented through an algorithm which can identify feasible routes.

• After path selection, appropriate buffer blocks need to be employed on-demand to that path.

• If path and buffer blocks are not available the mapping function sends appropriate feedback to the upper layer.

• Therefore, in a dynamic run-time application scenario an adaptive on-chip communication infrastructure which can build connections on-demand to provide QoS.

Page 11: Run-time  Adaptive on-chip Communication Scheme

11

AdNoC Concept

• The AdNoC architecture is proposed to support QoS-supported on-chip communication for a network exposed to varying system constraints.

• As most NoCs, it utilizes packet-based communication. The architecture is pipelined and deploys wormhole routing because of its low latency in practice and small buffer space requirements.

Page 12: Run-time  Adaptive on-chip Communication Scheme

12

Definitions (1/4)

• Definition 1: An application task graph (TG) is a directed graph Gk = (T, F), – T is the set of all tasks ti used by an application – fi, j F ∈ represents the connection from task ti to tj

• Definition 2: Physical Network (PN) is a directed graph P = (N, V, Bt, r). – N is a set of tiles ni

– vi, j V ∈ represent an edge, the physical channel between ni and nj

– Each tile has a current buffer configuration at time t, bi,t B∈ t represents the state of a buffer assignment to individual output ports.

– A routing function r which determines the paths taken.

Page 13: Run-time  Adaptive on-chip Communication Scheme

13

Definitions (2/4)

• Definition 3: Logical Network (LN) at time t is a directed graph Lt = (M, W)– M is a set of task groups mi

– w i, j W ∈ represents the set of connections between two task groups mi and mj

• Definition 4: The Task Mapping Function is a function lt : T’ T → L⊆ t which maps subset T’ of each task graph T to the logical network LN.

Page 14: Run-time  Adaptive on-chip Communication Scheme

14

Definitions (3/4)

• Definition 5: The Network Mapping Function is a function pt : Lt → S P ⊆which maps a logical network onto a subset of the physical network.

• Definition 6: A Routing Function r : N × N → V , r : (ni , nk) → vi,j returns a path vi,j away from the current PE (ni) given the input port for each transaction and the destination nk.

Page 15: Run-time  Adaptive on-chip Communication Scheme

15

Definitions (4/4)

• Definition 7: – The Buffer Configuration bi,t is the current buffer configuration of tile ni

N∈ . – A Virtual Channel (VC) is a unidirectional logical or virtual connection

between the tile ni and nj – Each VC is realized by an independently managed pair of message

buffers referred to as the Virtual Channel Buffer (VCB).

Page 16: Run-time  Adaptive on-chip Communication Scheme

16

Definitions (4/4)

• Definition 8: The System Monitor M is an infrastructure which is used to collect, aggregate, and process system statistics.

• Definition 9: Our Adaptive Network on Chip AdNoC is defined as the tuple AdNoC = (P, M, Lt, Gi, pt, lt, r) with the parameters as given above.

Page 17: Run-time  Adaptive on-chip Communication Scheme

17

Algorithm (1/11)

• To provide bandwidth guarantee in an adaptive NoC, the underlying communication infrastructure needs to provide an adaptive path allocation strategy.

• Therefore, finding a path/routing for a given logical network and physical mapping of the application is a major challenge. The run-time path allocation algorithm is given in Alg. 1.

Page 18: Run-time  Adaptive on-chip Communication Scheme

18

Algorithm (2/11)

Page 19: Run-time  Adaptive on-chip Communication Scheme

19

Algorithm (3/11)

• For a requesting transaction, the path is checked in every possible direction and the VCB is assigned accordingly on-demand.

• The weighted XY algorithm wXY presented in Alg. 2 assigns each output port a weight based on available bandwidth and dx or dy between the current and the destination nodes.

• This ideally gives the packet a maximum number of sensible routing choices along its path. The weight is also proportional to the available bandwidth.

Page 20: Run-time  Adaptive on-chip Communication Scheme

20

Algorithm (4/5)

Page 21: Run-time  Adaptive on-chip Communication Scheme

21

Algorithm (5/11)

• The wXY route allocation strategy is described as follows: given is the tuple ρ = {N, E, S, W, P}.

• Each i ∈ ρ has a weight wi and available bandwidth bi with bi ≤ bmax, bmax being the maximum line bandwidth.

Page 22: Run-time  Adaptive on-chip Communication Scheme

22

Algorithm (6/11)

• The current router coordinates are x, y. Each packet p has destination coordinates xd , yd and a required bandwidth bp. The weights are assigned as follows:

Page 23: Run-time  Adaptive on-chip Communication Scheme

23

Algorithm (7/11)

• The route r chosen is then:

• The router distribute the VCBs to any route as needed by assigning it to the according output port.

Page 24: Run-time  Adaptive on-chip Communication Scheme

24

Algorithm (8/11)

• Our scheme to assign buffers on-demand (at runtime) is given in Alg. 3.

• The benefits of such on-demand assignment is evident: buffers are only allocated when needed meaning that virtual channels can be reused by different ports.

Page 25: Run-time  Adaptive on-chip Communication Scheme

25

Algorithm (9/11)

• Fig. 3 shows an exemplary scenario to showcase the run-time behavior using different transactions in one router.

Page 26: Run-time  Adaptive on-chip Communication Scheme

26

Algorithm (10/11)

t0: All four directions are occupied with four different transactions; buffers are also assigned.

t1: Transaction T5 requests a path and weights are calculated till tδ taking 4 hardware cycles. A buffer is also assigned to the calculated direction before tδ.

t2: Transaction T1, T2, and T4 free their corresponding channels and assigned buffers.

Page 27: Run-time  Adaptive on-chip Communication Scheme

27

Algorithm (11/11)

t3: Four new transactions T1, T2, T4, and T6 request processing and they are granted resources.

t4: Transactions T7 requests a path and buffer but due to unavailable buffer resources, the transaction cannot be granted. So, the requesting transaction has to wait or inform the upper layer through the system monitor.

Page 28: Run-time  Adaptive on-chip Communication Scheme

28

Hardware Implementation

• Our hardware platform for the AdNoC is illustrated in Fig. 4.

• It consists of mainly two parts: the run-time path allocation the on-demand VCB assignment part.

• The path allocation part either decides based on the lookup table or by calculating the type of the flit.

Page 29: Run-time  Adaptive on-chip Communication Scheme

29

Page 30: Run-time  Adaptive on-chip Communication Scheme

30

Conclusion

• We have introduced the first approach of an adaptive on-chip communication architecture. It provides an adaptive path allocation algorithm to meet varying bandwidth guarantees.

• Run-time connections are realized by re-assigning a number of buffer blocks on-demand.

• Our buffer allocation scheme increases the buffer utilization and decreases the overall buffer use.