improving topological mapping on nocs · a.topology formed by links and nodes b.partition in...
TRANSCRIPT
Improving Topological Mapping on NoCs Rafael Tornero (PhD Student) and Juan M. Orduña Huertas (PhD Advisor)Departament dʼInformàtica, Universitat de València, Spain
Networks-on-Chip (NoCs) have been proposed as an efficient solution to the complex communications on System-on-chip (SoCs). The design flow of network-on-chip (NoCs) include several key issues, and one of them is the decision of where cores have to be topologically mapped. This thesis proposes a new approach to the topological mapping strategy for NoCs. Concretely, we propose a new topological mapping technique for regular and irregular NoC platforms and its application for optimizing application specific NoC based on distributed and source routing.
The Topological Mapping Technique [3]
Currently, we are performing a study of heuristic methods for adapting the mapping technique to NoC platforms using source routing. This kind of platforms introduce new constraints to the mapping process. Thus, the heuristic search methods employed in distributed routing NoC platforms might not be suitable for them. Therefore, the goal of this study is to find the most suitable search method for satisfying the new constraints of source routing NoC platforms.Also, we plan to apply the technique for optimizing applications in shared memory MPSoCs, where communications are carried out by synchronization mechanism. Finally, we would like to evaluate our technique in NoCs with irregular topologies. The reason is that most of the topological mapping proposals in the literature use NoCs with regular topology, leaving the mapping on NoCs with irregular topology as a currently open issue.
[1] A. Jantsch and H. Tenhunen, Eds., Networks on Chip. Kluwer Academic Publishers, Feb. 2003. [2] R. Marculescu, J. Hu, and U. Ogras, “Key research problems in noc design: a holistic perspective,” in Proc. of CODES+ISSS ʼ05, 2005, pp. 69–74.[3] R. Tornero, J. M. Orduña, M. Palesi, and J. Duato, “A communication-aware topological mapping technique for nocs,” in Proc. of the 14th international Euro-Par conference on Parallel Processing. Springer-Verlag, 2008, pp. 910-919.[4] R. Tornero, J. M. Orduña, A. Mejía, J. Flich, and J. Duato, “Cart: Communication-aware routing technique for application-specific nocs,” in Proc. 11th EUROMICRO Conf. on Digital System Design Architectures, Methods and Tools, 3–5 Sept. 2008, pp. 26–31.[5] A. Mejia, J. Flich, J. Duato, S.-A. Reinemo, and T. Skeie, “Segment-based routing: an efficient fault-tolerant routing algorithm for meshes and tori,” in Proc. 20th International Parallel and Distributed Processing Symposium IPDPS 2006, 25–29 April 2006, p. 10pp.[6] R. Tornero, V. Sterrantino, M. Palesi, and J. M. Orduña, “A multi-objective strategy for concurrent mapping and routing in network on chip,” in Proc. of the 23rd IEEE International Parallel and Distributed Processing Symposium, 25–29 May 2009.[7] M. Palesi, R. Holsmark, S. Kumar, and V. Catania, “Application specific routing algorithms for networks on chip,” IEEE Transactions on Parallel and Distributed Systems, vol. 20, no. 3, pp. 316–330, 2009.[8] R. Tornero, S. Kumar, S. Mubeen, and J. M. Orduña, “Distance constrained mapping to support noc platforms based on source routing,” in Proc. of the Euro-Par 2009 Workshops - Parallel Processing." Springer- Verlag, 25 August 2009, pp. To be published, January 2010.
Abstract Proposed Approach Results
24th IEEE International Parallel and Distributed Processing SymposiumApril 19-23, 2010ATLANTA (Georgia) USA
Description of the Problem
A/V Mul t imedia system mapped on 16 IPs
Average packet latency of 3000 random mappings
Results for the top 478 mappings
The remaining 2522 mapping have latency much higher than 200 clock cycles
The Network-on-Chip (NoC) paradigm has emerged recently as a promising solution to the complex on-chip communications derived from the increasing number of Intellectual Property (IP) cores that can be integrated on a single chip as semiconductor technology scales down to make complex Systems-on-Chip (SoCs) [1].
The use of NoCs as the interconnection infrastructure for complex SoCs has opened several interesting research and design issues [2]. These include: topology selection, routing strategy selection and application mapping.
Impact of Mapping on Performance
What does Topological Mapping Consist of?Splitting the application
on a set of concurrent t a s k s , a s s i g n i n g a n d scheduling on a list of IP co res (by p rocedures beyond the scope of this thesis)
D e c i d i n g h o w t o topological ly place the selected set of cores onto the tiles of the system such that the interested metrics (energy, latency, etc) are optimized
Mapping Technique
Characteristics
Heuristics Methods
Experimental validation avoided
Quality solutions as for regular as for irregular networks topologies
Communication-driven
Routing-awareness
Three key issues:
1.Modeling the network as a table of distances (or costs), model of communication cost, between each pair of source/destination nodes. This table contains the cost of communicat ing every pair of network nodes without taking into account any traffic pattern, but taking into account the underlying network topology and routing algorithm.
Example
2.For each pair of source/destination IPs, the amount of information exchanged by them is measured, and a table of communicat ion requirements of the application is computed.3.In order to search the best assignment of IPs
to network nodes, the technique defines a mapping array and an associated quality function for each array. This quality function measures the overall cost for sending all the messages exchanged by the application (exchanged by all the tasks hosted in the IPs) if the IPs were mapped as indicated in the mapping array. Finally, this approach consists of using a heuristic method based on a random search for obtaining a near-optimal assignment of IPs to network nodes (topological mapping) that minimizes the quality function value.
Application Specific NoC Platforms Based on Distributed Routing
Remaining Objectives References
Analysis of CART4390 different routing algorithms for
a 5x5 mesh have been constructed .Best, Worst and Random mappings
have been combined with the Best, Worst and Random SR configurations.
All plots corresponding to Best Mappings show a similar stable throughput. Additionally, the Best Mappings significantly reduce the average network delay, regardless of the SR configuration used.
Fault Tolerance of the Multi-Objective Strategy
Distance Constrained Mapping. Feasibility exp.
Validation of the Network Model
Application Specific NoC Platforms using Source Routing
Optimizes the network performance for Application-Specific NoCs.
Combines a flexible, topology-agnostic routing algorithm (SR) with the communication-aware mapping technique described above.
Communication-Aware Routing Technique (CART) [4]
The Segment-based Routing (SR) algorithm [5] is based on a segmentation process of the network. It guarantees deadlock freedom and full connectivi ty among nodes by placing a bidirectional routing restriction at every segment. This results in a larger degree of freedom when placing turn restrictions.
A B
G
ED
H I
F
C
restrictionsPosible bidir(c)
(a) Topology
23
4
56
78
910
11121
A B
G
ED
H I
F
CA B
G
ED
H I
F
C
restrictions(d) Elected set of
(b) Segmentation
A B
G
ED
H I
F
C
S1 S2
S3S0
Figure 1. An example of the SR algorithm
A.Topology formed by links and nodesB.Part i t ion in different network segmentsC.All possible bidirectional routing restrictionsD.Random set of routing restriction elected
The first step consists of computing a random SR configuration (combination of segments and routing restrictions applied to the network topology). Then, the communication-aware mapping technique is applied to that SR configuration.
Multi-Objective Strategy for Concurrent Mapping and Routing [6]This strategy is able to determine an approximation of the Pareto-set of NoC configurations which o p t i m i z e m u l t i p l e p e r f o r m a n c e i n d e x e s simultaneously. We have considered the routing algorithm designed by APSRA [7] and the topological mapping as the multidimensional interacting parameters. We have considered message delay and fault tolerance as the objectives to be optimized
s
d
s
d
l”
l'
A single link fault does not compromise the communication from s to d
A single link fault in either l' or l” makes it impossible the communication from s to d
Robustness Index
Design Space Exploration Flow
Source Routing Packet Format
Path 1 , 1
Packet
1 , 2 1 , 3 2 , 3 2 , 3
Packet
Destination Address
Distributed Routing Packet Format
DSP ( Source )
1 , 1 Video Receiver
1 , 2 Processor
1 , 3 Audio Receiver
1 , 4
FPGA 2 , 1
Processor 2 , 2
Memory ( Destination )
2 , 3 Processor
2 , 4
DSP 3 , 1
Memory 3 , 2
Processor 3 , 3
DSP 3 , 4
Video Transmitter
4 , 1 I / O Interface
4 , 2 DSP
4 , 3 Audio Transmitter
4 , 4
Network Interface Switch
Advantages:Simple and fast routersTopology independenceGuaranteed throughputMixing of minimal and
non-minimal paths
Disadvantages:Overhead and low
bandwidth utilizationStatic routing
Distance constrained mapping [8] to:Minimize the overheadSupport NoC platforms using source routing with a
constraint in the path length field of the header packet
The method consists of two steps:1.F i n d a m a p p i n g u s i n g t h e technique described above, but with a distance constraint added.
2.Find a source routing function for every core pair Ci and Cj where:
The path length is equal to the manhattan distance between Ci and Cj
There is no possibility of deadlockThe traffic is balanced among all the links
0 1 2 3 4 5 60
10
20
30
40
50
60
70
80
90
100
pir scale factor
Aver
age
Dela
y (c
ycle
s)
CARTRandom mapping SRRandom mapping UDRandom mapping XY
Performance Evaluation
MMS - CART provides more than 30% improvement in latency and 15% higher throughput than the rest of routing techniques.
0 2 4 6 8 100
1
2
3
4
5
6
7
8 x 104
pir scale factor
Ener
gy (
J)
CARTRandom mapping SRRandom mapping UDRandom mapping XY
MMS - energy required for CART is more than twice lower than the energy required by the rest of the routing techniques considered.
The figure shows the average latency in cycles. It shows that at both low traffic loads and high traffic loads both the UNCONSTDISTMAP and MINDISTMAP present similar behavior and save more than 20% cycles and 25% over RNDMAP respectivelyTherefore, the evaluation results show that is possible to reduce the path length of the header flit at least to half of the network diameter without significantly degradation of the performance, less than 5%
Average delay has been measured at a packet injection rate where none of the algorithms are saturated.
For all traffic scenarios and low pirs MCBMAP-XY performs slightly better than the other.
For higher values of pir, the best solution in terms of MC found by the proposed approach outperforms the others.
The best solution in terms of RI found by the p r o p o s e d a p p r o a c h , e x h i b i t s d e l a y characteristics close to MCBMAP-AS.
Performance evaluation of the CART method compared to other routing techniques that do not take into account the traffic generated by the application.
Simulation of the 5×5 2D-mesh topology with different applications when using Up/Down (UD) routing, Dimension Order (XY) routing, SR and CART.
The evaluation results show that the CART method can significantly improve the network performance for those application specific N o C s w h e r e t h e communication requirements of the application can be measured or estimated.
CART
Distance Constrained Mapping for NoC Platforms using Source Routing
Multi-Objective Strategy
MMS Traffic
How mapping is changedMulti-objective genetic algorithm
Mapping of 500 random applicationsUniform probabil i ty of spatial communicationsCommunication bandwidth un i fo rm ly d i s t r i bu ted between 10 and 100 KB/s
0 1 2 3 4 5 60
10
20
30
40
50
60
70
80
90
100
pir scale factor
Average Delay (cycles)
Best mapping, Best SRRandom mapping, Best SRBest mapping, Worst SRBest mapping, Random SRRandom mapping, Random SRRandom mapping, Worst SR
6 hops 5 hops 4 hops 3 hops
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Normalized MC
Norm
alize
d 1/
RI
MOGARMAPMCBMAP XYMCBMAP AS
MMS Traffic
Multi-objective Pareto mapping with customize
routing algorithm
Mono-objective mapping with customize routing
algorithm
Mono-objective mapping with XY routing algorithm
2 4 6 8 101
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
Correlation Coefficient Graph
Points
Cor
rela
tion
Coe
ffici
ent
Correlation for Average LatenciesCorrelation for Mean Average Latencies
0%
50%
100%
5x5 6x6 6x8 7x7
Topologies
% of feasible mappings4 6 8 10 12 14 16
20
30
40
50
60
70
80
90
Least square Linear adjustment for point S8Correlation Coef. = 0.85507
Value in Table of Distances
Aver
age
Late
ncy
Valu
e
Simulation pointsy = 4.4516 * x + 11.435
4 6 8 10 12 14 16
500
1000
1500
2000
2500
3000
Least square Linear adjustment for point S10Correlation Coef. = 0.141
Value in Table of Distances
Aver
age
Late
ncy
Valu
e
Simulation pointsy = 30.7631 * x + 333.2556
5 10 15500
550
600
650
700
750
800
850
900
Least square Linear adjustment for point S10 (Mean)Correlation Coef. = 0.96539
Value in Table of Distances
Mea
n of
Ave
rage
Lat
ency
Val
ue
P1 = 224P2 = 388P3 = 496P4 = 552P5 = 560P6 = 524P7 = 448P8 = 336P9 = 224P10 = 140P11 = 80P12 = 40P13 = 16P14 = 4
Simulation Pointsy = 35.6185 * x + 312.1961
P2P1P3 P4
P5
P6
P7P8
P12
P9
P10
P11
P13
P14
M
4 6 8 10 12 14 165
10
15
20
25
30
35
Least square Linear adjustment for point S1Correlation Coef. = 0.98901
Value in Table of Distances
Aver
age
Late
ncy
Valu
e
Simulation pointsy = 2.0402 * x 1.6573
0%
20%
40%
60%
80%
sat. pir (improv. vs. M
CBM
AP-XY)
Uniform Hot-spot MMS Average
MCBMAP-ASMOGARMAP-RIMOGARMAP-MC
0%
23,333%
46,667%
70%
avg. delay (improv. vs. M
CBM
AP-XY)
Uniform Hot-spot MMS Average
MCBMAP-ASMOGARMAP-RIMOGARMAP-MC
0%
20%
40%
60%
80% % of dead com
munications0,5%
2,0%4,0%
10,0%
% of faulty links
MCBMAP-XYMCBMAP-ASMOGARMAP-MCMOGARMAP-RI