diram™ architecture overview · 2014. 4. 9. · amd 2014 3d-asip . conventional ram architecture...

33
DiRAM™ Architecture Overview Bob Patti CTO

Upload: others

Post on 26-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • DiRAM™ Architecture Overview

    Bob Patti

    CTO

  • Why 3D? – Expiring Economics

    4/9/2014 2

    AMD 2014 3D-ASIP

  • Why 3D? – Power is killing us

    4/9/2014 3

    AMD 2014 3D-ASIP

  • Conventional RAM Architecture

    4/9/2014 4

    Memory

    Bits

    Periphery

    • Decoders

    • Amps

    • Drivers

    • etc.

  • Conventional 3D Packaging

    4/9/2014 5

    • Preserves traditional RAM problems

    • Adds stacking costs

  • DiRAM™ True 3D RAM Architecture

    4/9/2014 6

  • DiRAM4 Stack Overview

    • 64 Gb of Memory in 175 mm2

    • 256 fully independent RAMs

    • 16 Banks per RAM

    • 64 bit Sep I/O Data per RAM

    • 15ns tRC (Page Open to Page Open in a Bank)

    • 16 Tb/s Data Bandwidth

    • Competitive Manufacturing Cost

    4/9/2014 7

  • 256 Independent RAMs

    Each RAM

    • 256 Mb Storage

    • 64 Gb/s Bandwidth

    • 9ns Latency

    • 15ns tRC

    • 16 Banks

    4/9/2014 8

  • DiRAM4 Stack Performance

    • 64 Gb Storage

    • 16.4 Terabit/s Data

    Bandwidth

    • 4096 Banks

    • >500 Billion

    Transactions Per

    Second (Minimum)

    4/9/2014 9

    256

    Independent

    RAMs

  • Dis-Integrated 3D RAM Architecture

    4/9/2014 10

    Memory Cells

    and

    Access Transistors

    Sense Amps, etc.

    I/O Layer

    DiRAM™ Architecture

  • DiRAM4 Scale* Drawing

    4/9/2014 11

    Top View

    175 mm2 14

    mm

    12.5 mm

    Side View

    0.5 mm

    Isometric View * Almost to scale

  • Via-Free Wafer Stacking

    4/9/2014 12

    Aggressive

    Copper TSV

    ≈ 10µ x 50µ

    2x

    Tezzaron Tungsten SuperContact™

    ≈ 1µ x 5.5µ

  • Tiny, Common, Cheap, Fast and…

    10µ Diameter

    Copper TSV

    1µ Diameter

    Tungsten SuperContact

    One 100

    4/9/2014 13

  • Radically Different Manufacturing

    Conventional Flow

    • Fabricate Wafer

    • Probe Test Die

    • Thin Wafer

    • Singulate Die

    • Stack Good Die

    • Package Stack

    • Burn-In & Test Stack

    Tezzaron Flow

    • Fabricate Wafer

    • Stack & Thin Wafers

    • Probe Test Wafer

    Stacks

    • Singulate Stacks

    • Package Stacks

    • Burn-In & Test Stack

    4/9/2014 14

    • Stack Wafers

    • Thin Top Wafer

    • Repeat

    • Probe Test Stacks

    • Singulate Stacks

  • Bi-STAR Repair Improves Yield

    4/9/2014 15

    100%

    Yie

    ld

    Stack Height

  • Enables

    Effective

    Repair

    Enables

    Tiny

    Vias

    Enables

    Super Thin

    Wafers

    Bonding

    Untested

    Wafers

    Solve 3D Problems with 3D

    4/9/2014 16

  • DiRAM: Efficiency for the Future

    • Less aggressive wafers

    • Higher array efficiency

    • Much lower test cost

    • Higher yield

    • Longer product life cycles

    4/9/2014 17

  • The Right I/O for Each Market

    4/9/2014 18

    SerDes I/O

    • Highest Power

    • Moderate Performance

    Pico-SerDes I/O

    • Lower Power

    • Moderate Performance

    Photonic I/O • Lowest Power

    • Highest Performance

    DiRAM4

    Launches

    with

    0.7 V CMOS I/O

    2.5D Si or Organic

    • Low Power

    • High Performance

  • High End Routing

    Tasks

    • Packet Buffer

    (Burst Read/Write)

    • Tables

    (Read Dominated)

    • Stats

    (Read-Mod-Write)

    4/9/2014 19

    Performance Metrics

    • Density

    – Line Rate / Seconds

    • Bandwidth

    – Line Rate x 2.5

    • Transaction Rate

    – Transactions x Line

    Rate / Min Packet

    Size

  • 400Gb NP Standard RAM BOM

    4/9/2014 20

    (30) 4 Gb DDR3 DRAMs = 1 Tb/s Packet Buffer

    (4) 576 Mb SigmaQuad-IIIe SRAMs = 5 BT/s Stats

    (12) 576 Mb RLDRAM3 DRAMs = 12 BT/s Tables

  • 400Gb Routing with DiRAM4

    4/9/2014 21

    Network

    Processor

    22 mm x 19 mm

    26 mm x 32 mm Interposer

    DiRAM4

    14 mm x 12.5 mm

  • 128 Gb Supercomputer Mainstore

    4/9/2014 22

    Two 2.5D 0.7V CMOS I/O DiRAM4

    One Hub Chip

    Three 25Gb/s x 4 Lane SerDes

  • 128 Gb, 1Tb/s Memory Tile

    23

    Two 64 Gb DiRAM4 stacks

    One MHUB die

    Integrated Configuration Management and

    Power Resources

    580 pin low-swing interface

    4/9/2014

  • Contenders

    DiRAM4 DiRAM4

    CMOS

    DiRAM4

    Hub

    Density 64 Gb 64Gb 128 Gb

    8 GB 8 GB 16 GB

    Latency 9 ns 9 ns Variable

    Min Ref 64 bits 64 bits 256 bits

    Interface 0.7 V 0.7 – 1.2 V 1.2 V

    tRC 15 ns 15 ns 15 ns

    BW 16 Tb/s 4 Tb/s 1Tb/s

    2 TB/s 500 GB/s 128 GB/s

    Channels 256 64 1

    Banks per 16 64 8192

    24 4/9/2014

  • Luxtera 2.5D Photonic Data Pump

    • 2.5pJ/bit power target

    • Bare metal protocol – Ultra low latency

    – Protocol agnostic

    • 8 core Fiber

    • 28Gb SERDES or 2.5Gb low-swing interface

    • Self-calibrating self-tuning

    • >1.6Tb/s payload

    25 4/9/2014

  • Tezzaron Memory Components

    • Memory Hubs – Full memory management and optimization for 2

    memories

    – Can manage >500 in progress memory transactions

    – Maintains up to 8096 open memory pages

    • Memory Fabric Switches – 16 ports cascadable

  • Tezzaron Memory Components

    • Memory Fabric Hub

    – CPU / memory/ network interfaces

    – Low latency

  • Near End-of-Line TSV Insertion

    poly

    STI

    SIN M1

    M2

    M3

    M4

    M5

    M6

    M7

    5.6µ TSV is 1.2µ Wide and ~10µ deep

    W

    M8

    TM

    M4

    M5

    2x,4x,8x Wiring

    level

    ~.2/.2um S/W

    4/9/2014 28

  • Mixed CMOS-3/5 100mm InP/CMOS

    • DARPA DAHI

    • GaN

    • 3D CMOS/InP/GaN

    4/9/2014 29

  • 2.5/3D Circuits

    30

    CC

    FPGA (4Xnm)

    Active Silicon Circuit Board

    2 Layer Processor2 Layer Processor3 Layer 3D Memory

    CC

    Organic Substrate

    level#0

    level#1

    level#2

    level#3

    Solder Bumps

    μBumps

    C4 Bumps

    Die to Wafer Cu Thermal Diffusion Bond

    level#4

    IME A-Star / Tezzaron Collaboration

    IME A-Star / Tezzaron Collaboration

    4/9/2014

  • “5.5D” Systems • SIP/SSIP

    – Power Conversion

    – Cooling

    – Photonics

    • Optimization – Extending to power

    • Mixed PCB/IC Metaphor

    4/9/2014 31

  • Cooling Block Illustrations

    4/9/2014 32

  • RAM is now Singular DiRAM4™ Changes the Language of System Design…

    DiRAM4™ in Networking

    • High Transaction Rate

    • High Bandwidth

    • High Density

    DiRAM4™ in Computing

    • High Bandwidth

    • High Density

    • Small References

    DiRAM4™ in Everything

    • Multiple Independent Channels

    • High Bandwidth

    • High Transaction Rate

    High

    Performance Processor

    Tezzaron

    DiRAM4

    Local Memory and/or Cache

    Network

    Processor Tezzaron

    DiRAM4

    Table Memory + Packet Buffer

    FPGA Tezzaron

    DiRAM4

    Fast Local Memory

    4/9/2014 33