more-than-moore with integrated silicon-photonics · • yong-jin kwon, scott beamer, yunsup lee,...

36
More-than-Moore with Integrated Silicon-Photonics Vladimir Stojanović Berkeley Wireless Rearch Center UC Berkeley 1

Upload: others

Post on 26-Jan-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

  • More-than-Moore with Integrated Silicon-Photonics

    Vladimir Stojanović

    Berkeley Wireless Rearch Center

    UC Berkeley

    1

  • • Milos Popović (Boulder/BU), Rajeev Ram, Jason Orcutt, Hanqing Li (MIT), Krste Asanović (UC Berkeley)

    • Jeffrey Shainline, Christopher Batten, Ajay Joshi, Anatoly Khilo

    • Mark Wade, Karan Mehta, Jie Sun, Josh Wang

    • Chen Sun, Sen Lin, Sajjad Moazeni, Nandish Mehta, Michael Georgas, Benjamin Moss, Jonathan Leu

    • Yong-Jin Kwon, Scott Beamer, Yunsup Lee, Andrew Waterman, Miquel Planas, Rimas Avizienis, Henry Cook, Huy Vo

    • Roy Meade, Gurtej Sandhu and Micron Fab12 team (Zvi, Ofer, Daniel, Efi, Elad, …)

    • DARPA, Micron, NSF, BWRC

    • IBM Trusted Foundry, Global Foundries

    Acknowledgments

    Slide 2

  • More-than-Moore perspective

    World’s first 60GHz CMOS AmplifierNiknejad & Brodersen

    One of the first CMOS radiosRudell & Gray

    1997

    2004

    2012

    World’s first siPhotonic transmitter in 45nm SOIStojanovic, Popovic, Ram

    Enhanced CMOS enables new applications

    3

    Inductors in IC processNguyen & Meyer1990

  • IBM/GF 12SOI (45nm) CMOS

    IBM Cell IBM Power 7IBM Espresso

    • 300mm wafer, commercial process• MOSIS and TAPO MPW access• Advanced process used in microprocessors• Photonic enhancement enables VLSI photonic

    systems (no required process changes)

  • “Zero-Change” Photonics in 45nm

    Monolithic photonics platform with fastest transistors

    [C. Sun, JSSC 2016]

    • Photonics for free! (No modification to the process)

    • Closest proximity of electronics and photonics

    • Single substrate removal post-processing step

  • • Each λ carries one bit of dataBandwidth Density achieved

    through DWDMEnergy-efficiency achieved through

    low-loss optical components and tight integration

    Integrated photonic interconnects

    Slide 6

  • Single channel link tradeoffs

    10-dB

    15-dB

    Loss

    5-fF 25-fFRx Cap7

  • 512 Gb/s aggregate throughput

    assuming 32nm CMOS

    Georgas CICC 2011

    • Laser energy increases with data-rate – Limited Rx sensitivity

    – Modulation more expensive -> lower extinction ratio

    • Tuning costs decrease with data-rate

    • Moderate data rates most energy-efficient

    Need to optimize carefully

    8

  • DWDM link efficiency optimization

    • Optimize for min energy-cost

    • Bandwidth density dominated by circuit and photonics area (not coupler pitch)

    Slide 9

  • Photonic-Interconnected DRAM (PIM)

    Towards an Optical DRAM System

    [ISCA 2010]

    EOS22 Test ChipHigh Performance 45nm SOI

    Micron D1L Test ChipPower-optimized0.22µm Bulk

    70M transistors1000 optical devices

    DARPA POEM

    2M transistors1000s optical devices

    6m

    m

    5m

    m

    Slide 10

  • 11

    World’s First Processor to Communicate with Light

    Silicon-Photonic components integrated directly in the chip

    C. Sun et al. Nature, Dec. 15.

    DARPA POEM & PERFECT – Stojanović, Asanovićzero-change 45nm SOI

  • Processor Cores – 45nm SOI

    • RISC-V open ISA

    • Scalar-vector cores - Boot Linux

    • 0.2-1.35GHz, 4-16 GFlops/W

    Slide 12

    0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20

    15.8 12.8 10.6 8.7 7.1 5.7 4.6 3.7 3.1 2.6 2.1 1.6 200

    16.7 14.0 11.6 9.6 7.9 6.4 5.3 4.3 3.6 3.0 2.4 1.9 250

    14.9 12.4 10.3 8.6 7.0 5.8 4.8 4.0 3.3 2.8 2.2 300

    15.6 13.0 10.9 9.1 7.5 6.2 5.2 4.3 3.7 3.0 2.4 350

    13.6 11.3 9.6 7.9 6.6 5.5 4.7 3.9 3.3 2.6 400

    14.1 11.8 9.9 8.3 6.9 5.8 4.9 4.2 3.5 2.8 450

    12.2 10.3 8.6 7.2 6.1 5.2 4.4 3.7 3.0 500

    12.5 10.5 8.8 7.5 6.3 5.4 4.5 3.8 3.1 550

    10.8 9.1 7.7 6.5 5.6 4.7 4.0 3.3 600

    11.0 9.3 7.9 6.7 5.8 4.9 4.2 3.3 650

    11.2 9.5 8.1 6.9 5.9 5.0 4.3 3.5 700

    11.4 9.7 8.3 7.1 6.1 5.2 4.4 3.6 750

    9.8 8.4 7.2 6.2 5.3 4.5 3.6 800

    8.6 7.4 6.4 5.4 4.6 3.7 850

    8.7 7.5 6.5 5.5 4.7 3.8 900

    8.8 7.6 6.6 5.6 4.8 3.9 950

    7.3 6.6 5.7 4.8 4.0 1000

    6.7 5.7 4.9 4.0 1050

    5.8 5.0 4.1 1100

    5.9 5.0 4.2 1150

    5.1 4.2 1200

    5.1 4.3 1250

    4.2 1300

    1350

    Fre

    que

    ncy (M

    Hz)

    Vdd (V)

    Not Operational

    Gflops/W

    [Lee ESSCIRC 2014]

  • 470nm width

    700nm width

    Si Waveguides

  • Key Device Components

    Slide 14

    submitted to Nature – August 2015 – please keep confidential – funded in part by DARPA POEM program, Stojanovic, Asanovic, Popovic, Ram

    Vertical couplers

    Waveguide Taper

    Diffraction GratingWaveguide

    [Wade OIC 2015]

  • Key Device Components

    Slide 15

    SiGe from PMOS strain engineering used in Photodetectors

    SiGe Photodetector

    Waveguide

    Waveguide Taper

    [Orcutt 2013, Alloatti APL 2015]

  • Key Device Components

    Slide 16

    Integrated Heater

    Drop Waveguide

    Output Waveguide

    Modulator Microring

    Input Waveguide[Shainline OL 2013, Wade OFC 2014]

  • Transmitter

    • Driven by a CMOS logic inverter (1.2Vpp)

    • 5 Gb/s data rate at ~30fJ/b with >6dB extinction ratio, 3dB insertion loss

    – Up to 12 Gb/s with 2-3dB extinction

    InTransmit Site 50um

    PRBS31

    8:1

    In/Out Grating

    Couplers

    VBIAS

    Modulator Driver

    Modulator

    [Wade OFC 2014][Sun VLSI 2015]

    Slide 17

  • Receiver

    • Low parasitics from monolithic integration enable single-stage 5kΩ TIA receiver

    • 10 Gb/s operation at 290 fJ/bit with 8.3uA sensitivity

    VPD

    φ

    φ

    Clock Buffers

    OutA

    OutB-

    +Dummy TIA

    TIA

    5k Ω

    5k Ω

    Photo-

    detector

    -

    +

    ReceiverReceive Site 50um

    BER Checker

    2:8

    Input Grating Coupler

    [Georgas VLSI 2014]

    Slide 18

  • 5 Gb/s Chip-to-Chip Link

    Rx Out A

    Rx Out B

    Chip 1

    PRBS31

    8:1

    Thermal Tuner

    PD Tx

    Chip 2

    BER Check

    2:8

    PD

    RxOptical

    Amplifier1189nm

    Laser

    3.8 dBm

    -0.2 dBm

    -3.2 dBm

    -10.5 dBm

    -14.5 dBm

    -7.2 dBm

    0.8 dBm

    -5.9 dBm-3.2 dBm 9.65uA

    2.04uA

    Bit 1

    Bit 0-9.9 dBm

    Laser Power

    Bit 1

    Bit 0

    Time [ps]

    Decis

    ion T

    hre

    shold

    [uA

    ]

    0 50 100 150 200

    -2

    0

    2

    1e-0011e-0021e-0031e-0041e-0051e-0061e-0071e-0081e-0091e-010

  • • “zero-change” monolithic competitive with state-of-the-art heterogeneous platforms

    • 680 fJ/bit, 14mW optical power [Zheng PTL 2012**]

    5 Gb/s Link Efficiency Summary

    Slide 20

    *Includes all closed-loop circuits + 0.5 nm tuning power **0.5nm tuning power only

    662 fJ/bit for circuits

    Thermal Tuning*275 fJ/bit (42%)

    Receiver357 fJ/bit (54%)

    Optical power: 3.6mW (13mWextrapolated without amplifier)

    Modulator Driver30 fJ/bit (5%)

  • • Not using our best devices in the link– 1dB loss couplers [Wade, OIC 2015] (on the same chip instead of 4dB in the link)

    – 5-10x better photodetector (0.1-0.2 A/W photodetector on the same chip)

    • Expect to obtain >40x smaller laser power (65fJ/b optical)

    5 Gb/s Link Efficiency Summary

    Slide 21

    662 fJ/bit for circuits

    Thermal Tuning*275 fJ/bit (42%)

    Receiver357 fJ/bit (54%)

    Optical power: 3.6mW (13mWextrapolated without amplifier)

    *Includes all closed-loop circuits + 0.5 nm tuning power

    Modulator Driver30 fJ/bit (5%)

    560 fJ/bit for laser – wall-plug**

    **11.6% QD laser wall-plug efficiency

  • Electronic-Photonic Packaging

    • Flip-chip onto FR4 PCB using C4 bumps

    • Selective substrate removal of optical transceiver regions

    Die-thinned chip with selective substrate removal

    Processor

    and SRAM

    regions

    WDM

    transceiver

    regions Epoxy

    Slide 22

  • Optical Memory System Demo

    • Chip 1 acts as processor, Chip 2 acts as memory• Custom memory controller, DRAM interface emulator

    – Takes advantage of full duplex (as opposed to half-duplex) memory interface

    • Video demonstration (thermal stress test)http://www.nature.com/nature/journal/v528/n7583/fig_tab/nature16454_SV1.html

    Mem

    ory

    Co

    ntr

    olle

    r

    PD

    50/50 Power Splitter

    PD

    Transmitter

    Transmitter

    Receiver

    Receiver

    Inte

    rfac

    e

    RIS

    C-V

    Pro

    cess

    or

    1M

    B M

    emo

    ry A

    rray

    Chip (Processor Mode) Chip (Memory Mode)

    Processor to memory link

    Command + address + write data

    Memory to processor link

    read data

    OpticalAmplifier

    OpticalAmplifier

    Laser

    Single-Mode Fiber

    1MB Memory Array (Inactive)

    RISC-V Processor (Inactive)

    Slide 23

  • Tx and Rx DWDM Transceiver Banks

    Slide 24

    -16

    -14

    -12

    -10

    -8

    -6

    -4

    -2

    0

    1170 1172 1174 1176 1178 1180 1182 1184 1186 1188 1190

    Tra

    nsm

    issi

    on [

    dB]

    Wavelength [nm]

    01

    23

    45 6

    78

    9 10

    0

    12 3 4

    Tx

    Rx

    -12

    -10

    -8

    -6

    -4

    -2

    0

    1170 1172 1174 1176 1178 1180 1182 1184 1186 1188 1190

    Tra

    nsm

    issi

    on [

    dB]

    Wavelength [nm]

    0 1

    2 34 5

    6

    78 9

    100

    9 10

    Advanced lithography enables tight ring resonance control

  • 11 x 8 Gbps Tx Demonstration

    • 11 rings, each demonstrating 8 Gbps modulation– Independently testing one at a time – Potential for 88 Gb/s on a single fiber/waveguide– Each ring is auto-locked

    Slice 8 Slice 9 Slice 10

    Slice 4 Slice 5 Slice 6 Slice 7

    Slice 0 Slice 1 Slice 2 Slice 3

    >8 Gb/s limited by duty-cycle distortion of off-chip clock source

    Slide Slide 19

    [Sun JSSC 2016]

  • Going Faster – PAM2 and PAM4

    Direct Digital-to-Optical Conversion!

    [Moazeni et al, ISSCC 2017]

  • Chip floorplan

  • Transmitter eye diagrams

    • Extinction ratio (ER): 3dB, Insertion loss (IL): 5.5dB

    • PAM4 coding used: (0,5,10,15)

    • 42fJ/b driver energy efficiency

  • Transmitter specs

    29

  • • Leverage tight electronic-photonic integration to create new, more sensitive receiver structures

    – Differential, DDR receiver

    Improved Rx Topologies

    [Nandish Mehta et al. ESSCIRC16]

  • Platform Performance Summary

    Metric [Beamer ISCA 2010]Conservative Estimates

    45nm SOI Platform

    Bulk Photonics Platform*

    Waveguide Loss 4 dB/cm 3.7 dB/cm 10.5 dB/cm

    Vertical Coupler Loss 1 dB 1 dB 3 dB

    Tx Data Rate 10 Gb/s 20 Gb/s 5 Gb/s

    Tx Energy Per Bit 120 fJ/b 42 fJ/b 350 fJ/b

    Rx Data Rate 10 Gb/s 12 Gb/s 5 Gb/s

    Rx Energy Per Bit 80 fJ/b 297 fJ/b 1700 fJ/b

    Rx Sensitivity 10 μA 8 μA 36 μA

    PD Responsivity 0.9 A/W 0.44 A/W 0.2 A/W

    Thermal Tuning Efficiency 1.6 μW/GHz 3.8 μW/GHz 10 μW/GHz

    *considerably slower process than one assumed in [Beamer ISCA 2010]

    • Comparison to a proposal for the processor-memory system we published 6 years ago

    • Meeting/exceeding most system specsSlide 31

  • Array

    Periphery

    DDR3-1333 Technology2 Gb die cost ~90¢

    8 mm

    8m

    m

    DRAM processes heavily optimized for cost

    Poly Si Photonics in Bulk CMOS

    Key constraints:

    Bulk Substrate

    Low Cost

    No SiGeMeade et al. OI 13, VLSI Tech Symp 14

    Micron wafers

    17

    http://moss.micron.com/CORP/ca/brand/imagelibrary/test/Mobile LPDRAM.pnghttp://moss.micron.com/CORP/ca/brand/imagelibrary/test/Mobile LPDRAM.png

  • Memory: Bulk photonics integration

    DTI adjacent to STI

    [Meade et al. VLSI Tech Symp 14, Sun et al VLSI Ckts Symp 14]

    Micron D1L Reticle

    180nmBulk chip

    First-ever link result with bulk CMOS photonics

  • WDM in bulk-photonics - Tx

    • All slices BER checked at 5Gb/s

    • 45Gb/s aggregate rate per waveguide34

  • WDM in bulk-photonics - Rx

    • All receive slices functional and BER checked at 5Gb/s

    • Single fiber more I/O BW than x16 DDR4 part

  • Conclusions

    • Silicon-photonics – enabler of new capabilities

    – Think “new on-chip inductor” or “new on-chip t-line”

    • Potentially revolutionize many applications despite

    slowdown in CMOS scaling

    – VLSI compute and network infrastructure just a start

    • Need process, device, circuit and system-level

    understanding

    36