exploring wireless technology for off-chip memory access
TRANSCRIPT
Exploring Wireless Technology for Off-Chip Memory Access
Ashif SikderϮ, Avinash KodiϮ, Dominic DiTomasoϮ, Savas KayaϮ, William Rayess‡, and David Matolak‡
School of Electrical Engineering and Computer Science, Ohio UniversityϮ
Department of Electrical Engineering, University of South Carolina‡
E-mail: [email protected], [email protected], [email protected], [email protected]: http://oucsace.cs.ohiou.edu/~avinashk/
24th Annual Symposium on High-Performance Interconnects (HOTI)August 24 – 26, 2016
Huawei North America Headquarters, Santa Clara, CA, USA
Talk Outline• Motivation & Background
• Wireless Architectures & Communication Mechanism
• Performance Analysis
• Conclusions & Future Work
2
Energy in Multi-Core Processor
3
=> Potential solution: Emerging technologies such as wireless
S. Borkar, “Exascale computing-a fact or affliction?” Keynote presentation at IPDPS, 2013.
Interconnect Energy
RouterMemoryControllerLink
DynamicandStatic
36%
40%
=> Data movement energy will start to dominate
Latency in Multi-Core Processor
MC0 MC1
MC3 MC2
Router
Core L1
L2 Bank
Bank 0Bank 1Bank 2
L1
L2REQUESTMESSAGE
RESPONSEMESSAGE
1. L1-L22. L2-Mem
4. Mem-L25. L2-L1
3. Mem
1
2
3
4
5
Sharifi, A.; Kultursay, E.; Kandemir, M.; Das, C.R., "Addressing End-to-End Memory Access Latency in NoC-Based Multicores," Microarchitecture (MICRO), 2012 45th Annual IEEE/ACM International Symposium on , vol., no., pp.294,304, 1-5 Dec. 2012.
4
Off-Chip Memory Module (DRAM)
OFF-CHIPMEMORYACCESS
Off-Chip Memory Access Limitations
• Increased Energy and Latency: Multiple hop (on-chip) to access off-chip memory
• Hot-Spot: Corner routers are connected to MC• Connection Inflexibility: How to connect distant routers to memory
controller (MC)• Long Trace Length: Increased off-chip memory access latency and
energy cost
5
Memory Controller
DRAMChip
DRAMChip
DRAMChip
Trace length
MC2
MC0
MC3
MC1
=> Potential solution: Emerging technologies such as wireless
Wireless Interconnect
6
• Wireless offers several advantages:
• CMOS compatibility• Omnidirectional communication
without wires using multicasting and broadcasting
• Bandwidth extension using Frequency Division Multiplexing (FDM), Time Division Multiplexing (TDM), Space Division Multiplexing (SDM)• Disadvantages of Wireless :
• High transceiver area and energy/bit• Low wireless bandwidth at 60 GHz
center frequency for CMOS technology
• Latency due to resource sharing1. D. DiTomaso, A. Kodi, D. Matolak, S. Kaya, S. Laha, and W. Rayess, “Energy-efficient adaptive wireless nocs architecture,” in Networks on Chip (NoCS), 2013 Seventh IEEE/ACM International Symposium on. IEEE, 2013, pp. 1–8.
RF-CMOS transceiver trend for WiNoC1
MulticastingBroadcasting
Naming Convention• (On-chip)-(Off-chip)-(Antenna Type)-(Bandwidth)• M = Metallic (on-chip or off-chip)• W = Wireless (on-chip or off-chip)• O = Omnidirectional Antenna• D = Directional Antenna• C = Conservative wireless bandwidth (128 Gbps)• A = Aggressive wireless bandwidth (512 Gbps)
• For example, M-W-O-A has metallic on-chip and wireless off-chip links with omnidirectional antenna and total wireless bandwidth of 512 Gbps
7
8
R0 R1 R2 R3
R4 R5 R6 R7
R8 R9 R10 R11
R12 R13 R14 R15
MC0
MC3
MC
1
MC
2
DR
AM
DRAM
DR
AM
DRAM(On-chip)-(Off-chip)-(Antenna Type)-(BW)M = Metallic (on-chip or off-chip)W = Wireless (on-chip or off-chip)O = Omnidirectional AntennaD = Directional AntennaC = Conservative (128 Gbps)A = Aggressive (512 Gbps)
M-M-X-X Network: 16 Core
5 mm
10 mm50.8 m
m
5 mm 5 mm
R0 R1 R2 R3
R4 R5 R6 R7
R8 R9 R10 R11
R12 R13 R14 R15
MC0
MC3
MC
1
MC
2
DR
AM
DRAM
DR
AM
DRAM
9
W-M/W-X-X Network: 16 Core(On-chip)-(Off-chip)-(Antenna Type)-(BW)M = Metallic (on-chip or off-chip)W = Wireless (on-chip or off-chip)O = Omnidirectional AntennaD = Directional AntennaC = Conservative (128 Gbps)A = Aggressive (512 Gbps)
Communication Mechanism: M-X-X-X
10
R0 R1 R2 R3
R4 R5 R6 R7
R8 R9 R10 R11
R12 R13 R14 R15
MC0
MC3
MC
1
MC
2
DRAM
DRAM
DR
AM
DR
AM
12 Hops
(On-chip)-(Off-chip)-(Antenna Type)-(BW)M = Metallic (on-chip or off-chip)W = Wireless (on-chip or off-chip)O = Omnidirectional AntennaD = Directional AntennaC = Conservative (128 Gbps)A = Aggressive (512 Gbps)
Communication Mechanism: W-X-O-X
11
R0 R1 R2 R3
R4 R5 R6 R7
R8 R9 R10 R11
R12 R13 R14 R15
MC0
MC3
MC
1
MC
2
DRAM
DRAM
DR
AM
DR
AM
8 Hops
(On-chip)-(Off-chip)-(Antenna Type)-(BW)M = Metallic (on-chip or off-chip)W = Wireless (on-chip or off-chip)O = Omnidirectional AntennaD = Directional AntennaC = Conservative (128 Gbps)A = Aggressive (512 Gbps)
Communication Mechanism: W-X-D-X
12
R0 R1 R2 R3
R4 R5 R6 R7
R8 R9 R10 R11
R12 R13 R14 R15
MC0
MC3
MC
1
MC
2
DRAM
DRAM
DR
AM
DR
AM
8 Hops
(On-chip)-(Off-chip)-(Antenna Type)-(BW)M = Metallic (on-chip or off-chip)W = Wireless (on-chip or off-chip)O = Omnidirectional AntennaD = Directional AntennaC = Conservative (128 Gbps)A = Aggressive (512 Gbps)
Summary of ArchitecturesArchitecture Description
M-M-X-X Metallic on-chip and metallic off-chip (link BW 128 Gbps)
W-M-O-A Hybrid wireless on-chip with omnidirectional antenna, metallic off-chip (link BW 128 Gbps), and total wireless BW 512 Gbps
W-M-D-C Hybrid wireless on-chip with directional antenna, metallic off-chip (link BW 128 Gbps), and total wireless BW 128 Gbps
W-M-D-A Hybrid wireless on-chip with directional antenna, metallic off-chip (link BW 128 Gbps), and total wireless BW 512 Gbps
M-W-O-A Metallic on-chip, wireless off-chip (link BW 64 Gbps) with omnidirectional antenna, and total wireless BW 512 Gbps
W-W-D-C Hybrid wireless on-chip with directional antenna, off-chip wireless (link BW 32 Gbps) with directional antenna, andtotal wireless BW 128 Gbps employing SDM
W-W-D-A Hybrid wireless on-chip with directional antenna, off-chip wireless (link BW 128 Gbps) with directional antenna, and total wireless BW 512 Gbps employing SDM
13
Antenna Model: Physical Structure
14
Parameter Dimension (mm)
Helix Diameter 0.1
Helix Spacing 0.112
Feed Pin Height 0.0785
Wire Diameter 0.095
d_S = 16mm
d_E = 2mmd_D = 16√2 mm
Antenna Model: Loss Characteristics
15
Return Losses
Diagonal Link Insertion Loss
Side-to-Side LinkInsertion Loss
– Architectures: Baseline (M-M-X-X), On-chip hybrid-wireless (W-M-X-X), Off-chip hybrid-wireless (M-W-X-X), On and off chip hybrid-wireless (W-W-X-X)
– Number of cores: 16– Real Benchmarks: PARSEC 2.1 (Blackscholes)&
– Network Simulation: Multi2Sim*
– Area and Energy Analysis• Dsent# to calculate metallic link and router area and
energy at bulk 45nm LVT• Wireless transceiver area is 0.62 mm2$, energy: 1 pJ/bit$
(on-chip), and 2.54 pJ/bit% (off-chip)
16
Performance Analysis
* R. Ubal, J. Sahuquillo, S. Petit, P. Lopez, Z. Chen, and D. R. Kaeli, “The multi2sim simulation framework: A cpu-gpu model for heterogeneous computing,” 2011. # C. Sun, C.-H. Chen, G. Kurian, L. Wei, J. Miller, A. Agarwal, L.-S. Peh, and V. Stojanovic, “Dsent-a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling,” in Networks on Chip (NoCS), 2012 Sixth IEEE/ACM International Symposium on. IEEE, 2012, pp. 201–210.$ A. K. Kodi, M. A. I. Sikder, D. DiTomaso, S. Kaya, S. Laha, D. Matolak, and W. Rayess, “Kilo-core wireless network-on-chips (nocs) architectures,” in Proceedings of the Second Annual International Conference on Nanoscale Computing and Communication. ACM, 2015, p. 33.& C. Bienia, S. Kumar, J. P. Singh, and K. Li, “The parsec benchmark suite: Characterization and architectural implications,” in Proceedings of the 17th international conference on Parallel architectures and compilation techniques. ACM, 2008, pp. 72–81.% D. DiTomaso, A. Kodi, D. Matolak, S. Kaya, S. Laha, and W. Rayess, “Energy-efficient adaptive wireless nocs architecture,” in Networks on Chip (NoCS), 2013 Seventh IEEE/ACM International Symposium on. IEEE, 2013, pp. 1–8.
Area Estimate
17
On & Off Chip hybrid-wireless architecture, W-W-D-A requires about 82% higher area compared to the baseline, M-M-X-X
M-M-X
-X
M-W-O
-A
W-M-O
-A
W-M-D
-C
W-M-D
-A
W-W-D
-C
W-W-D
-A05
1015
2025
3035
40Router MC Metallic Link Wireless Link
mm
2
82% 77% 31% 6% 6% 4%
W-W-D
-A
W-M-O
-A
W-M-D
-A
W-M-D
-C
M-M-X
-X
M-W-O
-A
W-W-D
-C0
2000000000
4000000000
6000000000
8000000000
10000000000
12000000000
14000000000
Seco
nd
Execution Time
18
2% 12%2%7%8%10%
On & Off Chip hybrid-wireless architecture, W-W-D-A requires about 10% less execution time compared to the baseline, M-M-X-X
Energy per Byte: Conservative
19
M-W-O
-A
W-W-D
-C
W-W-D
-A
M-M-X
-X
W-M-D
-C
W-M-D
-A
W-M-O
-A0
10
20
30
40
50
60
70M-On W-On M-Off W-Off Router MC
Ener
gy p
er B
yte
(pJ)
Geometric Mean:X-W-X-X 22.95 pJ/ByteX-M-X-X 60.51 pJ/Byte62% Improvement
Energy per Byte: Aggressive
20
M-W-O
-A
W-W-D
-C
W-W-D
-A
M-M-X
-X
W-M-D
-C
W-M-D
-A
W-M-O
-A05
1015
2025
3035
40M-On W-On M-Off W-Off Router MC
Ener
gy p
er B
yte
(pJ)
Geometric Mean:X-W-X-X 22.95 pJ/ByteX-M-X-X 35.595 pJ/Byte35% Improvement
Conclusions & Future Work• Explores the potential of wireless technology for off-chip memory access
• On and off chip hybrid-wireless architecture requires 10% less execution time compared to the baseline metallic architecture
• On and off chip hybrid-wireless architecture consumes 62% and 35% less energy per byte compared to the baseline architecture for conservative and aggressive off-chip metallic link energy assumption respectively
• On and off chip hybrid-wireless architecture requires 82% higher area compared to the baseline architecture
• Explore optical interconnect for off-chip memory access
21
Thank You
Questions?