1
Integrated Systems GroupMassachusetts Institute of Technology
Inside the PlayStation 5:Solving the Chip Bandwidth Challenge
Ben Moss
Integrated Systems Group 2
Moore’s Law
Photo credit: Intel
Inexpensive transistor density doubles roughly every two years
Integrated Systems Group 3
Processor Parallelism
Intel Desktop Processors: 1970 - Now
Pentium
486
4004
Pentium III
2fCVP
• Power Constrained• No power density scaling• No battery energy scaling
Multi-Core
Integrated Systems Group 4
Inside the PlayStation 3: IBM Cell
• How do we make the PS5? Add more cores!
Core
Core
Core
Core
Core
Core
Core
Core
Core
Power Core
L2 Cache
Central Bus
I/O C
ontr
olle
r
Photo Source: IBM
Integrated Systems Group 5
Manycore System Bandwidth Needs“PlayStation 5”
Integrated Systems Group 6
Interconnect Scaling: Not Good
• Wire gets smaller R increases• Wire gets smaller fringe cap dominates• Dense wires high crosstalk• Overall, RC time constant is not improving
Integrated Systems Group 7
Manycore Bandwidth Bottlenecks
CPU
Cache
DR
AM
D
IMM
Manycore systemcores
Cache
DR
AM
D
IMM
Cache
DR
AM
D
IMM
CPU CPU
InterconnectNetwork
InterconnectNetwork
Bottlenecks due to energy and
bandwidth density limitations
Sun Niagara8 GPP cores (32 threads)
Intel®
XScale™
Core32K IC
32K DC
MEv210
MEv211
MEv212
MEv215
MEv214
MEv213
Rbuf64 @ 128B
Tbuf64 @ 128BHas
h48/64/12
8
Scratch16K
B
QDRSRA
M2
QDRSRA
M1
RDRAM1
RDRAM3
RDRAM2
GASKET
PCI
(64b)66 MHz
IXP2IXP2800800 16b16b
16b16b
1188
1188
1188
1188
1188
1188
1188
6464bb
SPI4orCSIX
Stripe
E/D Q
E/D Q
QDRSRA
M3
E/D Q11
881188
MEv29
MEv216
MEv22
MEv23
MEv24
MEv27
MEv26
MEv25
MEv21
MEv28
CSRs-Fast_wr-UART-Timers-GPIO-BootROM/SlowPort
QDRSRA
M4
E/D Q1188
1188
Intel Network Processor1 GPP Core
16 ASPs (128 threads)
IBM Cell1 GPP (2 threads)
8 ASPs
Picochip DSP1 GPP core248 ASPs
Cisco CSR-1188 Tensilica GPPs
Constraints:• Area (Bandwidth Density)• Energy Efficiency• Compatibility with Bulk-CMOS
Integrated Systems Group 8
Silicon Photonicsmonolithic
• Conceived in the 1980’s
• Recently became feasible
• Must be compatible with Bulk-CMOS
• Need a fair electrical comparison
Integrated Systems Group 9
An Optical Link
65 nm bulk CMOS chip designed to test various optical devices
Integrated Systems Group 10
Optical Devices: Waveguides
• Basic Building Block• Poly (transistor gate) layer• ~0.5um wide, 0.1um tall• Must be undercut• Bi-directional• Can turn, bend, cross!• WDM
Integrated Systems Group 11
Post-Processing the Waveguide
Define Etch Hole Selectively Etch Substrate
Silicon Substrate
DielectricPoly Waveguide
Integrated Systems Group 12
Optical Devices: Vertical Couplers• Vital component• Sensitive to fiber position• Also made of poly• Can enable 3D stacking
Integrated Systems Group 13
Optical Devices: Ring Filters
• Drops one λ onto a different waveguide
• Geometry is very sensitive • 1 THz/nm sensitivity on thickness• 30 GHz/nm sensitivity on width• Thermally sensitive
Integrated Systems Group 14
Optical Devices: Ring Filter Bank
-200 0 200 400 600 800 1000
-14
-12
-10
-8
-6
-4
-2
0
Tran
smis
sion
, dB
Frequency, GHzFirst 100 GHz spaced bank in sub-100nm, first in bulk CMOS, first in poly Si- Enables 60-120 wavelengths/waveguide ( >200 Gb/s/um data rate density)
• FSR is ~16nm (2.7THz)• The scans are 1268nm -1276nm• 5nm ring radius step• Sensitivity
• 1THz/nm poly • 30GHz/nm poly W
Orcutt, J. S., A. Khilo, M. A. Popovic, C. W. Holzwarth, B. Moss, H. Li, M. S. Dahlem, T. D. Bonifield, F. X. Kärtner, E. P. Ippen, J. L. Hoyt, R. J. Ram, and V. Stojanovic, “Demonstration of an Electronic Photonic Integrated Circuit in a Commercial Scaled Bulk CMOS Process,” Optical Society of America - CLEO/QELS Conference, May 2008.
Integrated Systems Group 15
Optical Devices - Photodiode
The current signal from the photodiode goes to a receiver
Integrated Photonic Waveguide
…01001101…
Φ ΦΦ
ΦΦ
Latching Data Receiver
Integrated Systems Group 16
Optical Clocking
Φ
*
*
*
Integrated Photonic Waveguide
Off-Chip Laser
Clock Receiver
Optical clocking can reduce clock skew and increase energy efficiency
Integrated Systems Group 17
Optical Devices: Ring Modulators
10 Gb/s
Integrated Systems Group 18
Optical Devices: Ring Modulators
2
1
)0(11)(
o
injectedinjected
TQT
Integrated Systems Group 19
Optical Devices: Ring Modulators
• Thermal crosstalk is high• Temperature changes ring’s refractive index
Integrated Systems Group 20
P-I-N SPICE Model
Antonio G. M. Strollo, A New SPICE Model of Power P-I-N Diode Based on Asymptotic Waveform Evaluation, IEEE TRANSACTIONS ON POWER ELECTRONICS, VOL. 12, NO. 1, pp.12-20, JANUARY 1997.
q0q0
i1
Integrated Systems Group 21
Pre-emphasized Current Profile
with
without
Injected Current into P-I-N Diode
Configurability is necessary
Pre-emphasis Advantages
• Less power
• Controlling Qinjected
• Faster
Injected Current Profiles
Integrated Systems Group 22
Modulator Driver Schematic
Integrated Systems Group 23
Manufacturing Considerations
• Manhattan Geometry• Density requirements• No Optical LVS• 1-5 nm design grid
– Minimizes edge roughness• Thick-BOX SOI incompatible with
processor design (thermal issues)• Post-processing
Integrated Systems Group 24
An Optical Link
65 nm bulk CMOS chip designed to test various optical devices
Integrated Systems Group 25
Full System (Conventional DRAM)
25
Photonics < 7% chip area~40k rings, 200 waveguides, 64λ/waveguide/direction, 40Tbits/sec
NOT TO SCALE
Integrated Systems Group 26
Optical power budgetComponent Preliminary Design Power loss Optimized Design Power loss
Coupler loss 1 dB/coupler 3 dB 1 dB/coupler 3 dBSplitter loss 0.2 dB/split 1 dB 0.2 dB/split 1 dB
Non-linearity 1 dB 1dB 1dB 1dBThrough loss 0.01 dB/ring 3.17 dB 0.01 dB/ring 3.17 dB
Modulator Insertion loss 1 dB 1 dB 0.5 dB 0.5 dBCrossing loss 0.2 dB/crossing 12.8 dB 0.05 dB/crossing 3.2 dB
On-chip waveguide loss 5 dB/cm 20 dB 1 dB/cm 4 dBOff-chip waveguide loss 0.5e-5 dB/cm ~ 0 dB 0.5e-5 dB/cm ~ 0 dB
Drop loss 2.5 dB/drop 5 dB 1.5dB/drop 3 dBPhotodetector loss 0.1 dB 0.1 dB 0.1 dB 0.1 dBReceiver sensitivity -20 dBm -20 dBm -20 dBm -20 dBm
Power per wavelength 26.07 dBm(0.40 W)
-1.03 dBm(0.78 mW)
Power required at source 3.3 kW 6.38 W
Integrated Systems Group 27
Data transmission latency
• Total latency 14 bit times– Less than 4 clock cycles (with
2.5 GHz core clock)• Comparable to Electrical
Component LatencySerializer/Deserializer
(50ps each) 50ps
Modulator driver latency 108psThrough latency
(2.5ps/adjacent channel) 7.5ps
Drop latency (20ps/drop) 60ps
Waveguide latency (106.7ps/cm) 427ps
SM fiber latency(48.3ps/cm) 483ps
Photodetector+TIA latency 200ps
Total latency 1.385ns
Integrated Systems Group 28
Silicon Photonics Area and Energy Advantage
Metric Energy (pJ/b)
Bandwidth density (Gb/s/μm)
Global on-chip photonic link 0.25 160-320
Global on-chip optimally repeated electrical link 1 5
Off-chip photonic link (50 μm coupler pitch) 0.25 13-26
Off-chip electrical SERDES (100 μm pitch) 5 0.1
On-chip/off-chip seamless photonic link 0.25
Integrated Systems Group 29
Thermal Sensitivity/Crosstalk Issues
29
• High thermal crosstalk between adjacent rings• Released substrate can thermally isolate filter banks• Random variation small; systematic variation corrected thermally
NOT TO SCALE
Integrated Systems Group 30
The Problem Spans Many Layers
• Device Level – Jason Orcutt, Milos Popovic, Anatoly Khilo, Charles Holzwarth, Jie Sun, Hanquing Li, Reja Amatya
• Circuits Level – Ben Moss, Michael Georgas, Jonathan Leu
• System Architecture Level – Ajay Joshi, ImranShamim, Chris Batten
Integrated Systems Group 31
Conclusion
• Multi-core machines demand more bandwidth than ever before
• On-chip optical networks are a new technology to widen this bottleneck
• We have demonstrated that on-chip optical networks are possible in standard CMOS processes
• Many tradeoffs exist that must be analyzed at the system, circuit and device level
• Photonic interconnects have over 5x energy efficiency improvement compared to electrical