university of rostock institute of applied microelectronics and computer engineering monitoring and...
DESCRIPTION
3 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Monitoring and Control of Temperature in NoCs 1. Introduction Increasing integration density → rising complexity, shrinking device sizes NoCs able to deal with arising requirements (e.g. for communication) But: Reliability becomes a dominant factor for chip design Goal: Increase reliability in NoC-based systems Increasing integration density → rising complexity, shrinking device sizes NoCs able to deal with arising requirements (e.g. for communication) But: Reliability becomes a dominant factor for chip design Goal: Increase reliability in NoC-based systems Impacts of technological developmentTRANSCRIPT
University of Rostock Institute of Applied Microelectronics and Computer Engineering
Monitoring and Control of Temperature in Networks-
on-ChipTim Wegner, Claas Cornelius, Andreas Tockhorn, Dirk
Timmermann;
MEMICS 2010, Mikulov, Czech Republic, October 22-24
2Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Monitoring and Control of Temperature in NoCsOutline
1. Introduction
2. Networks-on-Chip (NoCs)
3. Impact of Temperature on Reliability
4. Monitoring & Control of Temperature in NoCs
5. Summary
Tran
sist
or c
ount
1954: IBM 704 Mainframe
1981: IBM PC5150
2007: Apple iPhone
3Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Monitoring and Control of Temperature in NoCs1. Introduction
Increasing integration density → rising complexity, shrinking device sizes
NoCs able to deal with arising requirements (e.g. for communication)
But: Reliability becomes a dominant factor for chip design Goal: Increase reliability in NoC-based systems
Impacts of technological development
4Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Monitoring and Control of Temperature in NoCsOutline
1. Introduction
2. Networks-on-Chip (NoCs)
3. Impact of Temperature on Reliability
4. Monitoring & Control of Temperature in NoCs
5. Summary
IP core
IP core
IP core
IP core
R
R R
R
CLK0
CLK3
CLK1
CLK2
5Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Monitoring and Control of Temperature in NoCs2. Networks-on-Chip
Infrastructure for on-chip interconnection Point-to-point links replace long global
busses Parallel packet-based communication Separation of communication &
computation Globally asynchronous locally synchronous
(GALS) Modularity of IP cores (not part of actual
NoC) reusability, high abstraction level
Properties
NoCs are able to satisfy requirements of modern VLSI systems
6Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Monitoring and Control of Temperature in NoCsOutline
1. Introduction
2. Networks-on-Chip (NoCs)
3. Impact of Temperature on Reliability
4. Monitoring & Control of Temperature in NoCs
5. Summary
7Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Monitoring and Control of Temperature in NoCs3. Impact of Temperature on Reliability
Increasing integration densities, progress of nanotechnology Growing number of transistors per chip = raised probability
of failure decreasing structural size of ICs = higher susceptibility to
environmental influences & deterioration
Impacts of technological progress
Intel 8086 (1978): ≈879
transistors/mm²
Intel Bloomfield (2008): ≈2,78 Mio.
transistors/mm²
8Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Monitoring and Control of Temperature in NoCs3. Impact of Temperature on Reliability
Particular physical effects (e.g. TDDB, EM) contribute to deterioration Abetted by high temperatures
Correlation between temperature & failure mechanisms established by Arrhenius model Exponential decrease of IC lifetime with
temperature
Why is thermal awareness important?
Growing influence of on-chip temperature distribution on lifetime, operability, performance etc.
TkE
failb
a
eT *
9Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Monitoring and Control of Temperature in NoCsOutline
1. Introduction
2. Networks-on-Chip (NoCs)
3. Impact of Temperature on Reliability
4. Monitoring & Control of Temperature in NoCs
5. Summary
Mitigate effects contributing to deterioration & delay occurrence of failures Control of on-chip temperature distribution
10Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Monitoring and Control of Temperature in NoCs4. Monitoring and Control of Temperature for NoCs
Objective:
Effective mechanisms to monitor & control on-chip temperature
Integration into existing NoC Preservation of modularity & reusability Minimum costs (area, frequency) Maximum performance of monitoring and control Minimum impact on system performance
Requirements:
11Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Monitoring and Control of Temperature in NoCs4.1 Mechanisms for monitoring Concept: attach physical monitoring
probes to every IP core
temperature variation ∆T Continuous checking of
TIPC
|TIPC,old - TIPC,new| ≥ ∆T ? Report TIPC,new
Area: 66 LUT/FF pairs Frequency: 227 MHz
Event-driven:
Period of time ∆t Report TIPC,new every ∆t
Area: 80 LUT/FF pairs Frequency: 338 MHz
Time-driven:
IP core
CCU
IP core
IP core
R
R R
R
12Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Monitoring and Control of Temperature in NoCs4.2 Mechanisms for control
Reception & interpretation of probe packets
Instructions for Dynamic Frequency Scaling to probes (if necessary)
Area: 507 LUT/FF pairs Frequency: 165 MHz
Central Control Unit (CCU):
!!! Not the smartest approach, but suffices to test functionality !!!
R
IP coreP
IP coreP
R
IP core
RP
Area penalty: 30,5%
Freq. penalty: 8,2%
Area penalty: 7,3% Freq. penalty: /
(but Mux/Demux)
Area penalty: / Freq.
penalty: /
13Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Monitoring and Control of Temperature in NoCs4.3 Integration of monitoring 3 approaches Different impact on performance & costs
Into IP core: Router port of IP core: Extra router port:
14Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Monitoring and Control of Temperature in NoCs4.4 Impact on system performance
15Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Monitoring and Control of Temperature in NoCs4.5 Performance of monitoring & control
16Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Monitoring and Control of Temperature in NoCs5. Summary
Event-driven approach preferable (situational monitoring, better performance, no redundant traffic, lower area costs)
Integration into NoC using router port of IP core best trade-off between costs & preservation of modularity/non-intrusiveness
Conclusion
Implementation of 2 approaches for monitoring on-chip temperature + 3 methods for integration into NoC
Investigation of: Costs (area, frequency) Impact on system performance Performance of monitoring & control
Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Thanks for your attention!Any questions?
www.networks-on-chip.com
University of Rostock, GermanyInstitute of Applied Microelectronics and Computer Engineering
Contact:
Homepage:
Establishes relationship between temperature and failure mechanisms
Describes dependence of chemical reactions on temperature changes
Assumption: all other parameters constant
T fai
l
Temperature
18Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Arrhenius Model
TbkaE
efailT*
Lifetime of ICs decreases exponentially with temperature
Monitoring and Control of Temperature in NoCs
19Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Monitoring and Control of Temperature in NoCs
Inoperability of transistor through gate oxide breakdown (long-term)
Time Dependent Dielectric Breakdown (TDDB)
20Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Transport of material in conductors (i.e. wires) Cause: ion movement induced by current flow (ions’
mobility increases with temperature) Effects:
• Hillocks short circuits
• Voids interruption of current paths
Electromigration (EM)
Monitoring and Control of Temperature in NoCs
21Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Intel Bloomfield:• Year: 2008• 731 Mio. Transistors• 263mm²• 2779467 Tr./mm2
Intel 8086:• Year: 1978• 29k transistors• 33mm²• 879 Tr./mm²
Intel Processors
Monitoring and Control of Temperature in NoCs
22Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Impact on system performance
Monitoring and Control of Temperature in NoCs
23Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Performance of monitoring & control
Monitoring and Control of Temperature in NoCs
24Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24
Synthesis results for monitoring & control
Component Integration method
Event-driven probe
Time-driven probe
Central Control
Unit
Into IP core
Using IP core port
Extra port
Frequency [MHz]
227 338 165 122 119 112
Area [LUT/FF pairs]
66 80 507 1901 1896 2312
Unmodified NoC router: 1771 LUT/FF pairs, 122 MHz
Monitoring and Control of Temperature in NoCs