noc research summary - unict
TRANSCRIPT
1University of Nevada, Las Vegas July 30th, 2009
Università degli Studi di CataniaDipartimento di Ingegneria Informatica e delle Telecomunicazioni (DIIT)
NoC Research NoC Research SummarySummary
2University of Nevada, Las Vegas July 30th, 2009
CataniaCatania
3University of Nevada, Las Vegas July 30th, 2009
CataniaCatania
4University of Nevada, Las Vegas July 30th, 2009
UniCTUniCT 12 Faculties
~65,000 students
Faculty of Engineering (6,000 students)➔ 7 Departments
✔DAU: Architecture and Urban design
✔DIEES: Electrical, Electronic and Systemics
✔DICA: Civil
✔DIIM: Industrial and Mechanics
✔DIIT: Informatics and Telecommunications
✔DM: Mathematic
✔DMFCI: Physics and Chemistry
5University of Nevada, Las Vegas July 30th, 2009
DIITDIIT Department of Computer Science and
Telecommunications Engineering (DIIT)
~50 people
Research groups➔Computer Sciences
➔Telecommunications
➔Electromagnetic Fields
6University of Nevada, Las Vegas July 30th, 2009
Research Areas of InterestResearch Areas of InterestComputer SciencesComputer Sciences Computer architectures
Embedded systems
Hardware/software codesign
Operating systems
Real time systems
Peer to peer systems and applications
Trust and reputation systems
ELearning technologies
Sensor networks
Selforganizing and selfadaptive systems
Mobile agents
Ubiquitous computing
Human computer interaction
7University of Nevada, Las Vegas July 30th, 2009
Research Areas of InterestResearch Areas of InterestTelecommunicationsTelecommunications
Speech signal classification and recognition
Speech coding for mobile communications
Digital signal processing in communications
Distributed multimedia applications
Multimedia traffic modeling and analysis
Wireless and satellite networks
Mobile systems
Next generation internet
Wireless sensor networks
8University of Nevada, Las Vegas July 30th, 2009
Research Areas of InterestResearch Areas of InterestElectromagnetic FieldsElectromagnetic Fields
Development of ultrawideband antennas
Interaction of microwaves with anisotropic media
Singlemode solidstate waveguide lasers and amplifiers
Computational electromagnetism
9University of Nevada, Las Vegas July 30th, 2009
Industrial CooperationsIndustrial Cooperations
10University of Nevada, Las Vegas July 30th, 2009
Academic CooperationsAcademic Cooperations Austrian Academy of Sciences, Austria
Institute for Industrial IT, Germany
Institutfür Automation und Kommunikation, Germany
University of Aveiro, Portugal
University of York, UK
University of Lund, Sweden
University of Porto, Portugal
Universitat Politècnica de Catalunya, Spain
Malardalen Universuty, Sweden
University of Illinois at Urbana Champain, USA
Technical University of Vienna, Austria
Halmstad University, Sweden
Jönköping University, Sweden
Universidad de Valencia, Spain
Universidad Politecnica de Valencia, Spain
Concordia University
Columbia University, New York, USA
Rice University, Houston, USA
Dartmouth College, Dartmouth, USA
German Aerospace Center (DLR), Munich, Germany
University of California, Riverside, USA
University of California, Irvine, USA
Florida Institute of Technology
Exeter University
Centre for Research on Embedded Systems, University of Halmstad, Sweden
Institute fur Automation und Kommunikation, Magdeburg, Germany
State University of Aerospace instrumentation, San Pietroburgo, Russia
Università of Craiova Romania
University of Helsinki, Finland
Technical University of Cluj
University Politehnica of Budapest
University of Thessaloniki
University of Edinburgh
National Center for High performance Computing (NCHC) Taiwan‐ ‐
CoSBi The Microsoft Research University of Trento Centre for‐ ‐
Computational and Systems Biology
Georgia Institute of Technology, Atlanta, USA
University of Cyprus, Cyprus
11University of Nevada, Las Vegas July 30th, 2009
European ProjectsEuropean Projects NEWCOM Network of Eecellence in Wireless
Communications
CRUISE Creating Ubiquitous Intelligent Sensing Environments
ITRACE Interactive Tracing and Graphical Annotation in Penbased elearning
flexWARE Flexible Wireless Automation in RealTime Enviroments
P2PPROVIDEO P2P middleware for deployment of an innovative business model for the provision of a QoSaware video multicast transport service over the Internet
12University of Nevada, Las Vegas July 30th, 2009
TeamTeam Vincenzo Catania, Full Professor
Giuseppe Ascia, Associate Professor
Daniela Panno, Associate Professor
Maurizio Palesi, Contract Researcher
Davide Patti, Contract Researcher
Alessandro G. Di Nuovo, Contract Researcher
Fabrizio Fazzino, PhD Student
Master thesis students
Undergraduate students
13University of Nevada, Las Vegas July 30th, 2009
Research Topics in ES AreaResearch Topics in ES Area
Multiobjective DSE
Instructionlevel power modeling
RISCbased VLIWbased
Speeding up techniques
Hybridization Grid computing
Fuzzy SystemsNeural Networks
Manycores architectures
Parallelizing applications for MPSoC
Advanced onchip interconnection systems (NoCs)
Bus encoding techniques
timepast present future~2000
Open Source HW & OpenSPARC Initiative
Opensource Development Platforms
Open Embedded
Android
14University of Nevada, Las Vegas July 30th, 2009
Research Topics in ES AreaResearch Topics in ES Area
Multiobjective DSE
Instructionlevel power modeling
RISCbased VLIWbased
Speeding up techniques
Hybridization Grid computing
Fuzzy SystemsNeural Networks
Manycores architectures
Parallelizing applications for MPSoC
Advanced onchip interconnection systems (NoCs)
Bus encoding techniques
timepast present future~2000
Open Source HW & OpenSPARC Initiative
Opensource Development Platforms
Open Embedded
Android
15University of Nevada, Las Vegas July 30th, 2009
OutlineOutline Application Specific Routing Algorithms
Concurrent Mapping and Routing
Dealing with Manufacturing Defects
Encoding Scheme for Low Power
16University of Nevada, Las Vegas July 30th, 2009
OutlineOutline Application Specific Routing Algorithms
Concurrent Mapping and Routing
Dealing with Manufacturing Defects
Encoding Scheme for Low Power
17University of Nevada, Las Vegas July 30th, 2009
Routing AlgorithmsRouting Algorithms
NoCperformance
Topology
RoutingFlow control
Switching
18University of Nevada, Las Vegas July 30th, 2009
Routing AlgorithmsRouting Algorithms
Routing determines thepath selected by a packetto reach its destination
➔Deterministic
➔Adaptive
NoCperformance
Topology
RoutingFlow control
Switching
19University of Nevada, Las Vegas July 30th, 2009
Routing AlgorithmsRouting Algorithms
Routing determines thepath selected by a packetto reach its destination
➔Deterministic
➔Adaptive
NoCperformance
Topology
RoutingFlow control
Switching
S
D
20University of Nevada, Las Vegas July 30th, 2009
Routing AlgorithmsRouting Algorithms
Routing determines thepath selected by a packetto reach its destination
➔Deterministic
➔Adaptive
NoCperformance
Topology
RoutingFlow control
Switching
S
D
21University of Nevada, Las Vegas July 30th, 2009
Performance of Routing AlgorithmsPerformance of Routing Algorithms
Uniform traffic
Worst
Best
XYWestFirstOddEvenNegativefirst
22University of Nevada, Las Vegas July 30th, 2009
Performance of Routing AlgorithmsPerformance of Routing Algorithms
Transpose 1 traffic
Worst
Best
NegativefirstOddEvenWestFirstXY
23University of Nevada, Las Vegas July 30th, 2009
Performance of Routing AlgorithmsPerformance of Routing Algorithms
Hotspot traffic
Worst
Best
OddEvenWestFirstXY, Negativefirst
24University of Nevada, Las Vegas July 30th, 2009
No Winner Routing AlgorithmNo Winner Routing Algorithm
Worst
Best
XYWestFirstOddEvenNegativefirst
Worst
Best
NegativefirstOddEvenWestFirstXY
Worst
Best
OddEvenWestFirstXY, Negativefirst
Uniform traffic Transpose 1 traffic Hotspot traffic
25University of Nevada, Las Vegas July 30th, 2009
Application Specific Routing AlgorithmApplication Specific Routing Algorithm Information about
➔Tasks which communicate and tasks which do never communicate
✔ After task mapping → Information about network nodes which communicate
➔Cuncurrent/non cuncurrent communications
➔Communications bandwidth requirements
Many opportunities
➔ Improving performance (e.g., maximize routing adaptivity)
➔Simplify the estimation/control of congestion
➔Design more effective selection policies
AS RoutingAlgorithm
Design
NetworkTopology
ApplicationSpecification
T1 T4
T3
T2
Tn
26University of Nevada, Las Vegas July 30th, 2009
APSRA ExampleAPSRA Example
P1 P2
P4 P5
l12
l21
l45
l54
l41 l14 l52
P3
P6
l23
l32
l56
l65
l25 l63 l36
l12
l21
l45
l54
l41 l14 l52
l23
l32
l56
l25 l63 l36
l65
Topology Graph Channel Dependency Graph
27University of Nevada, Las Vegas July 30th, 2009
APSRA Example (cnt'd)APSRA Example (cnt'd)
T6 T3
T1
T5 T4
T2
Communication Graph
P1 P2
P4 P5
l12
l21
l45
l54
l41 l14 l52
P3
P6
l23
l32
l56
l65
l25 l63 l36
Topology Graph
28University of Nevada, Las Vegas July 30th, 2009
APSRA Example (cnt'd)APSRA Example (cnt'd)
l12
l21
l45
l54
l41 l14 l52
l23
l32
l56
l25 l63 l36
l65
Channel Dependency Graph
P1 P2
P4 P5
l12
l21
l45
l54
l41 l14 l52
P3
P6
l23
l32
l56
l65
l25 l63 l36
T6 T3
T1
T5 T4
T2 Communication Graph
Topology Graph
29University of Nevada, Las Vegas July 30th, 2009
APSRA Example (cnt'd)APSRA Example (cnt'd)
l12
l21
l45
l54
l41 l14 l52
l23
l32
l56
l25 l63 l36
l65
Channel Dependency Graph
P1 P2
P4 P5
l12
l21
l45
l54
l41 l14 l52
P3
P6
l23
l32
l56
l65
l25 l63 l36
T6 T3
T1
T5 T4
T2 Communication Graph
Topology Graph
30University of Nevada, Las Vegas July 30th, 2009
APSRA Example (cnt'd)APSRA Example (cnt'd)
l12
l21
l45
l54
l41 l14 l52
l23
l32
l56
l25 l63 l36
l65
Channel Dependency Graph
P1 P2
P4 P5
l12
l21
l45
l54
l41 l14 l52
P3
P6
l23
l32
l56
l65
l25 l63 l36
T6 T3
T1
T5 T4
T2 Communication Graph
Topology Graph
T1 T3T4 T3T1 T6
31University of Nevada, Las Vegas July 30th, 2009
APSRA Example (cnt'd)APSRA Example (cnt'd)
l12
l21
l45
l54
l41 l14 l52
l23
l32
l56
l25 l63 l36
l65
Channel Dependency Graph
P1 P2
P4 P5
l12
l21
l45
l54
l41 l14 l52
P3
P6
l23
l32
l56
l65
l25 l63 l36
T6 T3
T1
T5 T4
T2 Communication Graph
Topology Graph
T1 T3T4 T3T1 T6
32University of Nevada, Las Vegas July 30th, 2009
APSRA Example (cnt'd)APSRA Example (cnt'd)
l12
l21
l45
l54
l41 l14 l52
l23
l32
l56
l25 l63 l36
l65
Channel Dependency Graph
T1 T3T4 T3T1 T6
P1 P2
P4 P5
l12
l21
l45
l54
l41 l14 l52
P3
P6
l23
l32
l56
l65
l25 l63 l36
T6 T3
T1
T5 T4
T2 Communication Graph
Topology Graph
33University of Nevada, Las Vegas July 30th, 2009
APSRA Example (cnt'd)APSRA Example (cnt'd)
l12
l21
l45
l54
l41 l14 l52
l23
l32
l56
l25 l63 l36
l65
Application SpecificChannel Dependency Graph
P1 P2
P4 P5
l12
l21
l45
l54
l41 l14 l52
P3
P6
l23
l32
l56
l65
l25 l63 l36
T6 T3
T1
T5 T4
T2 Communication Graph
Topology Graph
34University of Nevada, Las Vegas July 30th, 2009
APSRA Example (cnt'd)APSRA Example (cnt'd)
l12
l21
l45
l54
l41 l14 l52
l23
l32
l56
l25 l63 l36
l65
Application SpecificChannel Dependency Graph
P1 P2
P4 P5
l12
l21
l45
l54
l41 l14 l52
P3
P6
l23
l32
l56
l65
l25 l63 l36
T6 T3
T1
T5 T4
T2 Communication Graph
Topology Graph
T4 T2
35University of Nevada, Las Vegas July 30th, 2009
APSRA Example (cnt'd)APSRA Example (cnt'd)
l12
l21
l45
l54
l41 l14 l52
l23
l32
l56
l25 l63 l36
l65
Application SpecificChannel Dependency Graph
P1 P2
P4 P5
l12
l21
l45
l54
l41 l14 l52
P3
P6
l23
l32
l56
l65
l25 l63 l36
T6 T3
T1
T5 T4
T2 Communication Graph
Topology Graph
T4 T2
36University of Nevada, Las Vegas July 30th, 2009
APSRA MethodologyAPSRA Methodology
Application to be mapped T1
T4
T3
T2
Tn
Communication Graph
P1 P2 P3 P4
P5
P6
P7
P8 P9
P10 P11 P12 P13
Network Topology
MappingFunction
APSRA
RoutingTables
Compression
CompressedRoutingTables
Memory budget
[Palesi, et al., TPDPS’09]
[Palesi, et al., SAMOS’06]
GOALMaximize adaptivity
37University of Nevada, Las Vegas July 30th, 2009
APSRA Performance (1/2)APSRA Performance (1/2)Transpose 1
XYOddEven (sel=random)APSRA (sel=random)OddEven (sel=buffer level)APSRA (sel=buffer level)
MMS
38University of Nevada, Las Vegas July 30th, 2009
APSRA Performance (2/2)APSRA Performance (2/2)
Max. pir (packets/cycle/IP) APSRA improvementXY OE APSRA vs. XY vs. OE
Random 0.012 0.011 0.012 0.0% 14.3%Locality 0.019 0.020 0.021 10.5% 5.0%Transpose 1 0.011 0.015 0.027 145.5% 80.0%Transpose 2 0.011 0.016 0.027 145.5% 68.8%Hotspot4c 0.003 0.004 0.004 13.6% 7.1%Hotspot4tr 0.003 0.003 0.004 29.6% 12.9%Hotspot8r 0.004 0.006 0.007 71.8% 13.6%Mms 0.017 0.017 0.020 12.6% 12.6%
Average improvement 53.6% 26.8%
Traffic scenario
Thro
ughp
ut
pir
Saturation pir
39University of Nevada, Las Vegas July 30th, 2009
OutlineOutline Application Specific Routing Algorithms
Concurrent Mapping and Routing
Dealing with Manufacturing Defects
Encoding Scheme for Low Power
✓
40University of Nevada, Las Vegas July 30th, 2009
DSE for NoC ArchitecturesDSE for NoC Architectures[Ogras et al., ASAP'05]
Design quality
Design effort
Increased customization level and flexibility
41University of Nevada, Las Vegas July 30th, 2009
DSE for NoC ArchitecturesDSE for NoC Architectures[Ogras et al., ASAP'05]
Design quality
Design effort
Increased customization level and flexibility
42University of Nevada, Las Vegas July 30th, 2009
The Mapping ProblemThe Mapping Problem
Application(concurrent
apps.)
IPLibrary
NoC
43University of Nevada, Las Vegas July 30th, 2009
The Mapping ProblemThe Mapping Problem
Application(concurrent
apps.)
T1 T2
T5
T6
T4
Tm
T3
Graph of concurrent tasks
IPLibrary
NoC
The application is divided into a
graph of concurrent tasks
44University of Nevada, Las Vegas July 30th, 2009
The Mapping ProblemThe Mapping Problem
Application(concurrent
apps.)
T1 T2
T5
T6
T4
Tm
T3
Graph of concurrent tasks
IPLibrary
ASIC1
CPU1DSP1
CPU2MEM1
NoC
The application is divided into a
graph of concurrent tasks
The application tasks are
assigned and scheduled
T1 T2
T5
T6
T4
Tm
T3
45University of Nevada, Las Vegas July 30th, 2009
The Mapping ProblemThe Mapping Problem
Application(concurrent
apps.)
T1 T2
T5
T6
T4
Tm
T3
Graph of concurrent tasks
IPLibrary
ASIC1
CPU1DSP1
CPU2MEM1
Mapping(NP-hard)
NoC
The application is divided into a
graph of concurrent tasks
The application tasks are
assigned and scheduled
Decide to which tile each selected IP
should be mapped such that the
metrics of interest are optimized
11 22 33
T1 T2
T5
T6
T4
Tm
T3
46University of Nevada, Las Vegas July 30th, 2009
Impact of Mapping on PerformanceImpact of Mapping on Performance
A/V multimedia system
➔ Mapped on 16 IPs
Average packet latency of 3000 random mappings
➔ Results for the top 478 mappings. The
➔ Remaining 2522 mappings have latency much higher than 200 clock cycles
47University of Nevada, Las Vegas July 30th, 2009
Problem FormulationProblem Formulation Given
➔An application (or a set of concurrent applications) already mapped and scheduled into a set of IPs
➔A network topology
Find the best mapping and the best routing function which
➔Maximize Performance (Minimize the mapping coefficient)
➔Maximize fault tolerant characteristics (Maximize the robustness index)
Such that
➔The aggregated communications assigned to any channel do not exceed its capacity
48University of Nevada, Las Vegas July 30th, 2009
Robustness IndexRobustness Index Is an extension of the concept of path diversity
49University of Nevada, Las Vegas July 30th, 2009
Robustness IndexRobustness Index Is an extension of the concept of path diversity
s
d
A single link fault does not compromise the communication from s to d
50University of Nevada, Las Vegas July 30th, 2009
Robustness IndexRobustness Index Is an extension of the concept of path diversity
s
d
A single link fault does not compromise the communication from s to d
A single link fault in either l' or l” makes it impossible the communication from s to d
s
d
l”
l'
51University of Nevada, Las Vegas July 30th, 2009
Robustness IndexRobustness Index Is an extension of the concept of path diversity
RI c=1
∣Lc ∣ ∑l∈Lc ∣P c ∖P c , l∣
s
d
R sd =111111118
=1
R sd =0111106
=0.67
s
d
l”
l'
52University of Nevada, Las Vegas July 30th, 2009
Armament to Deal with the Mapping ProblemArmament to Deal with the Mapping Problem
Characterization of NoC resources➔As the first step to develop a mapping technique for
NoCs
Find a model of the communication cost➔Correlated with the performance metrics➔Which does not require expensive simulations
53University of Nevada, Las Vegas July 30th, 2009
Model of Communication CostModel of Communication Cost Define a metric that does not depend on the traffic pattern
➔Based exclusively on the internode distance✔ Topology✔ Routing algorithm
Equivalent distance➔Analogy to the electrical equivalent resistence
S
D
S
D
54University of Nevada, Las Vegas July 30th, 2009
Multi Media SystemMulti Media System
Source: Hu and Marculescu, TCAD 24(4), 2005
8x8 meshbased NoC8flits packets3flits buffersSS packets injection distribution50 simulations per point
55University of Nevada, Las Vegas July 30th, 2009
Simulation ResultsSimulation Results
56University of Nevada, Las Vegas July 30th, 2009
Correlation Index for Correlation Index for S1S1
57University of Nevada, Las Vegas July 30th, 2009
Correlation Index for Correlation Index for S8S8
58University of Nevada, Las Vegas July 30th, 2009
Correlation Index for Correlation Index for S10S10
59University of Nevada, Las Vegas July 30th, 2009
Correlation with the Mean of the Avg LatenciesCorrelation with the Mean of the Avg Latencies
60University of Nevada, Las Vegas July 30th, 2009
Correlation CoefficientsCorrelation Coefficients
61University of Nevada, Las Vegas July 30th, 2009
Cyclic DependencyCyclic Dependency
T1 T4
T3
T2
Tn
Application
Network Topology
MappingMappingRoutingRoutingFunctionFunction
Depends on
Depends on
62University of Nevada, Las Vegas July 30th, 2009
Design Space Exploration FlowDesign Space Exploration Flow
63University of Nevada, Las Vegas July 30th, 2009
Experiments: Pareto frontExperiments: Pareto frontMMS Traffic
Monoobjectivemapping with customized
routing algorithm
Monoobjectivemapping with XY routing
algorithm
MultiobjectivePareto mapping with customized routing
algorithm
64University of Nevada, Las Vegas July 30th, 2009
Experiments: Dead CommsExperiments: Dead Comms
0.5% 1.0% 2.0% 3.0% 4.0% 5.0% 10.0% 15.0%0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
MCBMAPXYMCBMAPASMOGARMAPMCMOGARMAPRI
Percentage of faulty links
Per
cent
age
of d
ead
com
mun
icat
ions
MMS Traffic
65University of Nevada, Las Vegas July 30th, 2009
Experiments: DelayExperiments: Delay
66University of Nevada, Las Vegas July 30th, 2009
Experiments: Experiments: SummarySummary Improvement Improvement
Uniform Hotspot MMS Average0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%MCBMAPASMOGARMAPRIMOGARMAPMC
Sat
. pir
impr
. vs.
MC
BM
AP
XY
Uniform Hotspot MMS Average0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%MCBMAPASMOGARMAPRIMOGARMAPMC
Avg
. del
ay im
pr. v
s. M
CB
MA
PX
YAverage delay reduction(MCBMAPXY as baseline)
Saturation pir improvement(MCBMAPXY as baseline)
67University of Nevada, Las Vegas July 30th, 2009
OutlineOutline Application Specific Routing Algorithms
Concurrent Mapping and Routing
Dealing with Manufacturing Defects
Encoding Scheme for Low Power
✓✓
68University of Nevada, Las Vegas July 30th, 2009
Introduction and MotivationsIntroduction and Motivations Initial yield of complex SoC is very small
➔Yield goes down with size and complexity of the chip
➔Dealing with the reduction of yield due to manufacturing defects
✔Isolate faulty blocks of the chip
✔Using the chip as a “depowered” chip (E.g., Sun UltraSparc T1)
Onchip communication system
➔Represents the heart of the system
➔Gets a quite high percent of the system silicon area✔ E.g., 20% of the silicon area in Intel's TeraScale 80cores chip
✔ High probability of being affected by manufacturing defects
69University of Nevada, Las Vegas July 30th, 2009
Managing Faulty LinksManaging Faulty Links
S
D
70University of Nevada, Las Vegas July 30th, 2009
Managing Faulty LinksManaging Faulty Links
S
D Faulty links
71University of Nevada, Las Vegas July 30th, 2009
Managing Faulty LinksManaging Faulty Links
S
D
Faulty links elimination
➔The routing function is computed on the basis of the network filtered by all the fully faulty or partially faulty links
72University of Nevada, Las Vegas July 30th, 2009
Managing Faulty LinksManaging Faulty Links
Router
TFM_RX(flit assembler)
TFM_TX(flit splitter)
Flit
HF
lit
LFl
it
Partially Faulty Links usage
➔The routing function is computed on the basis of the network filtered by all partially faulty links with a fault degree greater than a certain threshold
73University of Nevada, Las Vegas July 30th, 2009
Two QuestionsTwo Questions How routing paths are determined?
➔Routing function
When does a partially faulty link should be used?➔Selection function
RoutingFunction
SelectionFunction
Payload H
Source & Destinationaddresses
Network ConditionsInformation
AdmissibleOutput
Channels OutputChannel
74University of Nevada, Las Vegas July 30th, 2009
Routing FunctionRouting Function
Application to be mapped T1
T4
T3
T2
Tn
Communication Graph
P1 P2 P3 P4
P5
P6
P7
P8 P9
P10 P11 P12 P13
Network Topology
MappingFunction
APSRA
RoutingTables
Compression
CompressedRoutingTables
Memory budget
[Palesi, et al., CODES+ISSS’06]
[Palesi, et al., SAMOS’06]
GOALMaximize adaptivity
75University of Nevada, Las Vegas July 30th, 2009
Selection FunctionSelection Function Faulty Links elimination (FE)
Partially Faulty Links usage strategy (FU)
Partially Faulty Links usage with Lookahead strategy (FUL)
Load Balancing based strategies (LB)
➔FE+LB
➔FU+LB
➔FUL+LB
76University of Nevada, Las Vegas July 30th, 2009
Partially Faulty Links usage StrategyPartially Faulty Links usage Strategy
In low traffic conditions➔Only fault free links are chosen, if
available. Otherwise, links with lowest FD are chosen
In high traffic conditions➔A link is selected with a probability
inversely proportional to its FD
Pr F li=1−FDli
1− ∑l j∈Lao
FD l j
16 li
nes 4
faul
tylin
es
FD = 4/16
77University of Nevada, Las Vegas July 30th, 2009
Detecting Traffic ConditionsDetecting Traffic Conditions Hu and Marculescu, DyAD: Smart Routing for NetworksonChip, DAC 2004
SW
SN
SE
SS
If max{SN,S
E,S
S,S
W} > T
High traffic conditionselse
Low traffic conditions
Free slots
78University of Nevada, Las Vegas July 30th, 2009
Limitation of FU strategyLimitation of FU strategy The selection function used in FU, does not take into
consideration the entire path➔ It takes the decision based solely on the quality of the next link
➔This can cause inefficiencies
✔ E.g., although the next link is of highquality (e.g., fault free), the rest of the path(s), in which the message will be obliged to travel on is/are formed by many low quality links
ns
nd
79University of Nevada, Las Vegas July 30th, 2009
FU with Lookahead Strategy (FUL)FU with Lookahead Strategy (FUL) The selection function used in FU, does not take into
consideration the entire path➔ It takes the decision based solely on the quality of the next link
➔This can cause inefficiencies
✔ E.g., although the next link is of highquality (e.g., fault free), the rest of the path(s), in which the message will be obliged to travel on is/are formed by many low quality links
ns
nd
ns
80University of Nevada, Las Vegas July 30th, 2009
FU with Lookahead Strategy (FUL)FU with Lookahead Strategy (FUL) The selection function used in FU, does not take into
consideration the entire path➔ It takes the decision based solely on the quality of the next link
➔This can cause inefficiencies
✔ E.g., although the next link is of highquality (e.g., fault free), the rest of the path(s), in which the message will be obliged to travel on is/are formed by many low quality links
ns
nd
li
lj
ll
lk l
m
ln
FD(li)
FD(lj)
FD(ll)
FD(lk)
FD(lm)
FD(ln)
81University of Nevada, Las Vegas July 30th, 2009
FU with Lookahead Strategy (FUL)FU with Lookahead Strategy (FUL)
FD(li)
FD(lj)
FD(ll)
FD(lk)
FD(lm)
FD(ln)
dst N E S W N E S W 0 1 1 0 0 1 0 0
ERMAOsRouting Table
Set if the equivalent resistance of paths having as first link E and ending at node is minimum
82University of Nevada, Las Vegas July 30th, 2009
FU with Lookahead Strategy (FUL)FU with Lookahead Strategy (FUL) In low traffic conditions
➔One of the AO links li such that the ith bit of ERM is
set, is randomly chosen✔If multiple faulty free paths exist, they are always used
In high traffic conditions➔The same as FU
83University of Nevada, Las Vegas July 30th, 2009
Load Balancing TechniqueLoad Balancing Technique Determining the best set of selection probabilities
which allow to optimally distributing the traffic over the network
Set of admissible outputsfor a certain destination
lN
lNE
lE
lSE
lS
SelectionFunction
d
l
Pr(lE)
Pr(lS)
Pr(lSE
)Selection probabilities for a given router and a given destination
84University of Nevada, Las Vegas July 30th, 2009
Load Balancing TechniqueLoad Balancing Technique Traffic Function TF(l,X)
➔Returns an estimation of the traffic load on network link l when the set of selection probabilities X is used
Goal
➔Such that X is a feasible set of selection probabilities (x
i[0,1], x
li,d,lo=1)
➔Sequential Quadratic Programming Optimization
min var TF(l,X)X lL
85University of Nevada, Las Vegas July 30th, 2009
Load Balancing based StrategiesLoad Balancing based Strategies Fault Elimination + LB (FE+LB)
➔Selection function based on the selection probabilities computed resolving the LB problem (Pr(LB))
Partially Faulty Links usage strategy + LB (FU+LB)
➔In low traffic conditions In high traffic conditions
✔ Selection function uses Pr(LB) Pr(FLB)=Pr(LB)Pr(F)
Partially Faulty Links usage with Lookahead strategy + LB (FUL+LB)
➔In low traffic conditions In high traffic conditions
✔ Selection function uses Pr(LB) Like FUL+LBbetween AOs where ERM is set
86University of Nevada, Las Vegas July 30th, 2009
Entry in Routing TableEntry in Routing Table
dst | AOs |
dst | AOs | Pr(F) |
dst | AOs | Pr(F)| ERM |
dst | AOs | Pr(LB) |
dst | AOs | Pr(LB) | Pr(FLB) | ERM |
dst | AOs | Pr(LB) | Pr(FLB) |
Size
FE
FUFE+LB
FUL
FU+LB
FUL+LB
Routing Table
87University of Nevada, Las Vegas July 30th, 2009
Router Architecture (FUL+LB)Router Architecture (FUL+LB)
88University of Nevada, Las Vegas July 30th, 2009
Router Architecture (FUL+LB)Router Architecture (FUL+LB)
FE
FE+LB (2 bits)
FE+LB (3 bits)
FE+LB (continuous)
89University of Nevada, Las Vegas July 30th, 2009
Implication on Router DesignImplication on Router Design
FE FU FUL FE+LB FU+LB FUL+LB0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
TFM RX
TFM TX
CTRL
ARBITER
ROUTING TABLE
SELECTOR
CROSSBAR
FIFO
FE FU FUL FE+LB FU+LB FUL+LB0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
TFM RX
TFM TX
CTRL
ARBITER
ROUTING TABLE
SELECTOR
CROSSBAR
FIFO
Area Breakdown Power Breakdown
90University of Nevada, Las Vegas July 30th, 2009
Implication on Router DesignImplication on Router Design
Area Timing Power0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
FEFUFULFE+LBFU+LBFUL+LB
91University of Nevada, Las Vegas July 30th, 2009
Performance EvaluationPerformance Evaluation Simulation platform: Noxim
Traffic scenarios: Uniform, transpose, bitreversal, butterfly, shuffle
8x8 meshbased NoC architecture➔ Buffers: 4 flits
➔ Packet size: random 216 flits
➔ Poisson packet injection distribution
➔ Simulation time: 100,000 clock cycles (warmup session: 10,000 clock cycles)
92University of Nevada, Las Vegas July 30th, 2009
Delay VariationDelay Variation
Delay variation under uniform traffic
93University of Nevada, Las Vegas July 30th, 2009
Energy VariationEnergy Variation
Energy variation under uniform traffic
94University of Nevada, Las Vegas July 30th, 2009
Improvement in Saturation Improvement in Saturation pirpir
Uniform Transpose Bit reversal Butterfly Shuffle0%
5%
10%
15%
20%
25%
30%
35%
40%
Increase in saturation pir
FUFULFE+LBFU+LBFUL+LB
95University of Nevada, Las Vegas July 30th, 2009
Improvement in Average DelayImprovement in Average Delay
Uniform Transpose Bit reversal Butterfly Shuffle0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Reduction in average delay
FUFULFE+LBFU+LBFUL+LB
96University of Nevada, Las Vegas July 30th, 2009
Reduction in EnergyReduction in Energy
Uniform Transpose Bit reversal Butterfly Shuffle5%
0%
5%
10%
15%
20%
25%
Reduction in energy
FUFULFE+LBFU+LBFUL+LB
97University of Nevada, Las Vegas July 30th, 2009
Delay vs. Percentage of Link FaultsDelay vs. Percentage of Link Faults
Faults random distributed Faults random clustered
FE
FUL+LB
FE
FUL+LB
98University of Nevada, Las Vegas July 30th, 2009
OutlineOutline Application Specific Routing Algorithms
Concurrent Mapping and Routing
Dealing with Manufacturing Defects
Encoding Scheme for Low Power
✓✓✓
99University of Nevada, Las Vegas July 30th, 2009
MotivationsMotivations The interconnect system is one of the main
elements which characterizes the system in terms of both power dissipation and energy consumption➔Approx 28% in the Intel’s 80tiles TeraFLOPS
As technology shrinks, the power ratio between NoC links and routers increases ➔Links are becoming more power hungry than routers
100University of Nevada, Las Vegas July 30th, 2009
General Scheme – busbased ArchsGeneral Scheme – busbased Archs
P Mem
LongHigh capac.
bus
101University of Nevada, Las Vegas July 30th, 2009
General Scheme – busbased ArchsGeneral Scheme – busbased Archs
P Mem
LongHigh capac.
bus
Enc
oder
Dec
oder
Overhead
102University of Nevada, Las Vegas July 30th, 2009
Encoding SchemesEncoding Schemes Applied in the context of busbased architectures
Businvert method [Stan and Burleson, TVLSI'95]
Gray code [Su et al., D&T'94]
T0 method [Benini et al., GLSVLSI'97]
Workingzone encoding [Mussoll et al., PDAW'98]
…
Do not take into account coupling effects➔Dominant in DSM regime
103University of Nevada, Las Vegas July 30th, 2009
General Scheme – NoC ArchsGeneral Scheme – NoC Archs
PEs
R0
NI
R1
Rh
PEd
NI
NIE
D
Router
NetworkInterface
ProcessingElement
104University of Nevada, Las Vegas July 30th, 2009
Power ModelPower Model
T c=k1 T 1k 2T 2k 3T 3k 4 T 4
Cc
c c
c c
P=[T 01C sC lT c C c ]V dd2 FCK
105University of Nevada, Las Vegas July 30th, 2009
Proposed Scheme (1/2)Proposed Scheme (1/2)
Invert if P > P'
P∝T 0 1C sk1T 1k 2 T 2k 3T 3k4 T 4C c
P'∝T 0 1
' C sk 1T 1'k 2 T 2
'k 3T 3
'k 4 T 4
'C c
106University of Nevada, Las Vegas July 30th, 2009
Proposed Scheme (2/2)Proposed Scheme (2/2)
Invert if P > P'
T 0 18T2T 0 08T4**
107University of Nevada, Las Vegas July 30th, 2009
Encoder and DecoderEncoder and Decoder
108University of Nevada, Las Vegas July 30th, 2009
PartitionsPartitionsData in Data out
inv
Data in Data out
inv32
32Encoder
32Encoder
Data in
Data out[0..15]
inv0
3216
Encoder16
inv1
16Encoder
16Data out[16..31]
Data in
Data out[0..7]
inv0
328
Encoder8
inv1
8Encoder
8Data out[8..15]
Data out[16..23]
inv2
8Encoder
8
inv3
8Encoder
8Data out[24..31]
16
16
8
8
8
8
109University of Nevada, Las Vegas July 30th, 2009
OverheadOverhead
BI8 BI16 BI32 CDBI8 CDBI16 CDBI32 SC8 SC16 SC320%
2%
4%
6%
8%
10%
12%
Area Delay Power
Per
cent
ove
rhea
d
110University of Nevada, Las Vegas July 30th, 2009
Power and Energy SavingPower and Energy Saving
Text PDF Pic BW bmp Pic C bmp Pic BW jpg Pic C jpg Music Video30%
20%
10%
0%
10%
20%
30%
BI32
CDBI32
SC32
BI16
CDBI16
SC16
BI8
CDBI8
SC8
Per
cent
pow
er s
avin
g
Text PDF Pic BW bmp Pic C bmp Pic BW jpg Pic C jpg Music Video35%
30%
25%
20%
15%
10%
5%
0%
5%
10%
BI32CDBI32SC32BI16CDBI16SC16BI8CDBI8SC8
Per
cent
ene
rgy
savi
ng
111University of Nevada, Las Vegas July 30th, 2009
Saving Saving vs.vs. Path Length Path Length
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160%
5%
10%
15%
20%
25%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1615%
10%
5%
0%
5%
10%
15%
20%Power saving Energy saving
BI32 CDBI32 SC32 Scs32 BI16 CDBI16 SC16 Scs16 BI8
Hops Hops
112University of Nevada, Las Vegas July 30th, 2009
OutlineOutline Application Specific Routing Algorithms
Concurrent Mapping and Routing
Dealing with Manufacturing Defects
Encoding Scheme for Low Power
✓✓✓✓
113University of Nevada, Las Vegas July 30th, 2009
References (1/3)References (1/3) M. Palesi, R. Holsmark, S. Kumar, V. Catania. Application Specific Routing Algorithms for Networks on Chip. IEEE Transactions on
Parallel and Distributed Systems, 20(3), pp. 316330, March 2009.
A. Mejia, M. Palesi, J. Flich, S. Kumar, P. Lopez, R. Holsmark and J. Duato. RegionBased Routing: A Mechanism to Support Efficient Routing Algorithms in NoCs IEEE Transactions on on Very Large Scale Integration Systems, 17(3), pp. 356369, March 2009.
G. Ascia, V. Catania, M. Palesi, D. Patti. Implementation and Analysis of a New Selection Strategy for Adaptive Routing in NetworksonChip. IEEE Transactions on Computers, 57(6), pp. 809820, June 2008.
R. Holsmark, M. Palesi, S. Kumar. Deadlock free Routing Algorithms for Irregular Mesh Topology NoC Systems with Rectangular Regions. Journal of Systems Architecture, 54/34 (2008) pp. 427440.
D. Bertozzi, S. Kumar, M. Palesi. NetworksonChip: Emerging Research Topics and Novel Ideas. VLSI Design, vol. 2007, Article ID 26454, doi:10.1155/2007/26454.
G. Ascia, V. Catania, M. Palesi. A Multiobjective Genetic Approach to Mapping Problem on NetworkonChip. Journal of Universal Computer Science, 12(4):370394, 2006.
G. Ascia, V. Catania, M. Palesi. Mapping Cores on NetworkonChip. International Journal of Computational Intelligence Research (IJCIR), ISSN 09729836, 1(12):109126, 2005.
M. Palesi, F. Fazzino, G. Ascia, V. Catania. Data Encoding for LowPower in WormholeSwitched NetworksonChip. To appear in The 12th Euromicro Conference on Digital System Design (DSD), 2729 Aug 2009, Patras, Greece.
R. Tornero, V. Sterrantino, M. Palesi, J. M. Orduna. A Multiobjective Strategy for Concurrent Mapping and Routing in Networks on Chip. To appear in The 12th International Workshop on Nature Inspired Distributed Computing held in conjunction with The 23th IEEE/ACM International Parallel and Distributed Processing, May 2528, 2009, Rome, Italy.
R. Holsmark, M. Palesi, S. Kumar, A. Mejia. HiRA: A Methodology for Deadlock Free Routing in Hierarchical Networks on Chip. 3rd ACM/IEEE International Symposium on Networks on Chip. May 1013, 2009, San Diego, CA
114University of Nevada, Las Vegas July 30th, 2009
References (2/3)References (2/3) D. Frazzetta, G. Dimartino, M. Palesi, S. Kumar, V. Catania. Efficient Application Specific Routing Algorithms for NoC Systems utilizing
Partially Faulty Links. 11th EUROMICRO Conference on Digital System Design, Architectures, Methods and Tools, pp. 1825, Sep. 35, 2008, Parma, Italy.
R. Tornero, J. M. Orduna, M. Palesi, J. Duato. A CommunicationAware Topological Mapping Technique for NoCs. International Conference on Parallel and Distributed Computing, pp. 910919, August 2629th, 2008, Las Palmas de Gran Canaria, Spain.
M. Palesi, G. Longo, S. Signorino, S. Kumar, R. Holsmark, V. Catania. Design of Bandwidth Aware and Congestion Avoiding Efficient Routing Algorithms for NetworksonChip Platforms. IEEE International Symposium on NetworksonChip, pp. 97106, 7th11th April 2008, Newcastle University, UK.
G. Longo, S. Signorino, M. Palesi, S. Kumar, R. Holsmark, V. Catania. Bandwidth Aware Routing Algorithms for NetworksonChip. 2nd Workshop on Interconnection Network Architectures: OnChip, MultiChip. Goteborg, Sweden, January 27, 2008.
R. Tornero, J. M. Orduna, M. Palesi, J. Duato. A CommunicationAware Task Mapping Technique for NoCs. 2nd Workshop on Interconnection Network Architectures: OnChip, MultiChip. Goteborg, Sweden, January 27, 2008.
M. Palesi, S. Kumar, R. Holsmark, V. Catania. Exploiting Communication Concurrency for Efficient Deadlock Free Routing in Reconfigurable NoC Platforms. IEEE International Parallel and Distributed Processing Symposium, pp. 18, Long Beach, CA, March 2007.
G. Ascia, V. Catania, M. Palesi, D. Patti. NeighborsonPath: A New Selection Strategy for OnChip Networks. Fourth IEEE Workshop on Embedded Systems for Real Time Multimedia, pp. 7984. Seoul, Korea, October 2627, 2006.
M. Palesi, R. Holsmark, S. Kumar, V. Catania. A Methodology for Design of Application Specific Deadlockfree Routing Algorithms for NoC Systems. International Conference on HardwareSoftware Codesign and System Synthesis, pp. 142147. Seoul, Korea, October 2225, 2006.
R. Holsmark, M. Palesi, S. Kumar. Deadlock Free Routing Algorithms for Mesh Topology NoC Systems with Regions. DSD 2006, 9th EUROMICRO Conference on Digital System Design, Architectures, Methods and Tools, pp. 696703. Croatia, Sept 2006.
115University of Nevada, Las Vegas July 30th, 2009
References (3/3)References (3/3) M. Palesi, S. Kumar, R. Holsmark. A Method for Router Table Compression for Application Specific Routing in Mesh Topology NoC
Architectures. SAMOS VI Workshop: Embedded Computer Systems: Architectures, Modeling, and Simulation, pp. 373384. Samos, Greece, July 1720, 2006.
G. Ascia, V. Catania, M. Palesi, D. Patti. A New Selection Policy for Adaptive Routing in Network on Chip. International Conference on Electronics, Hardware, Wireless and Optical Communications. Madrid, Spain, February 1517, 2006.
G. Ascia, V. Catania, M. Palesi. An Evolutionary Approach to NetworkonChip Mapping Problem. IEEE Congress on Evolutionary Computation. Edinburgh, UK, September 2nd5th, 2005.
G. Ascia, V. Catania, M. Palesi. Multiobjective Mapping for Meshbased NoC Architectures. In Second IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, pages 182187, Stockholm, Sweden, Sept. 810, 2004.