diram™ architecture overview · 2014. 4. 9. · amd 2014 3d-asip . conventional ram architecture...
Post on 26-Jan-2021
1 Views
Preview:
TRANSCRIPT
-
DiRAM™ Architecture Overview
Bob Patti
CTO
-
Why 3D? – Expiring Economics
4/9/2014 2
AMD 2014 3D-ASIP
-
Why 3D? – Power is killing us
4/9/2014 3
AMD 2014 3D-ASIP
-
Conventional RAM Architecture
4/9/2014 4
Memory
Bits
Periphery
• Decoders
• Amps
• Drivers
• etc.
-
Conventional 3D Packaging
4/9/2014 5
• Preserves traditional RAM problems
• Adds stacking costs
-
DiRAM™ True 3D RAM Architecture
4/9/2014 6
-
DiRAM4 Stack Overview
• 64 Gb of Memory in 175 mm2
• 256 fully independent RAMs
• 16 Banks per RAM
• 64 bit Sep I/O Data per RAM
• 15ns tRC (Page Open to Page Open in a Bank)
• 16 Tb/s Data Bandwidth
• Competitive Manufacturing Cost
4/9/2014 7
-
256 Independent RAMs
Each RAM
• 256 Mb Storage
• 64 Gb/s Bandwidth
• 9ns Latency
• 15ns tRC
• 16 Banks
4/9/2014 8
-
DiRAM4 Stack Performance
• 64 Gb Storage
• 16.4 Terabit/s Data
Bandwidth
• 4096 Banks
• >500 Billion
Transactions Per
Second (Minimum)
4/9/2014 9
256
Independent
RAMs
-
Dis-Integrated 3D RAM Architecture
4/9/2014 10
Memory Cells
and
Access Transistors
Sense Amps, etc.
I/O Layer
DiRAM™ Architecture
-
DiRAM4 Scale* Drawing
4/9/2014 11
Top View
175 mm2 14
mm
12.5 mm
Side View
0.5 mm
Isometric View * Almost to scale
-
Via-Free Wafer Stacking
4/9/2014 12
Aggressive
Copper TSV
≈ 10µ x 50µ
2x
Tezzaron Tungsten SuperContact™
≈ 1µ x 5.5µ
-
Tiny, Common, Cheap, Fast and…
10µ Diameter
Copper TSV
1µ Diameter
Tungsten SuperContact
One 100
4/9/2014 13
-
Radically Different Manufacturing
Conventional Flow
• Fabricate Wafer
• Probe Test Die
• Thin Wafer
• Singulate Die
• Stack Good Die
• Package Stack
• Burn-In & Test Stack
Tezzaron Flow
• Fabricate Wafer
• Stack & Thin Wafers
• Probe Test Wafer
Stacks
• Singulate Stacks
• Package Stacks
• Burn-In & Test Stack
4/9/2014 14
• Stack Wafers
• Thin Top Wafer
• Repeat
• Probe Test Stacks
• Singulate Stacks
-
Bi-STAR Repair Improves Yield
4/9/2014 15
100%
Yie
ld
Stack Height
-
Enables
Effective
Repair
Enables
Tiny
Vias
Enables
Super Thin
Wafers
Bonding
Untested
Wafers
Solve 3D Problems with 3D
4/9/2014 16
-
DiRAM: Efficiency for the Future
• Less aggressive wafers
• Higher array efficiency
• Much lower test cost
• Higher yield
• Longer product life cycles
4/9/2014 17
-
The Right I/O for Each Market
4/9/2014 18
SerDes I/O
• Highest Power
• Moderate Performance
Pico-SerDes I/O
• Lower Power
• Moderate Performance
Photonic I/O • Lowest Power
• Highest Performance
DiRAM4
Launches
with
0.7 V CMOS I/O
2.5D Si or Organic
• Low Power
• High Performance
-
High End Routing
Tasks
• Packet Buffer
(Burst Read/Write)
• Tables
(Read Dominated)
• Stats
(Read-Mod-Write)
4/9/2014 19
Performance Metrics
• Density
– Line Rate / Seconds
• Bandwidth
– Line Rate x 2.5
• Transaction Rate
– Transactions x Line
Rate / Min Packet
Size
-
400Gb NP Standard RAM BOM
4/9/2014 20
(30) 4 Gb DDR3 DRAMs = 1 Tb/s Packet Buffer
(4) 576 Mb SigmaQuad-IIIe SRAMs = 5 BT/s Stats
(12) 576 Mb RLDRAM3 DRAMs = 12 BT/s Tables
-
400Gb Routing with DiRAM4
4/9/2014 21
Network
Processor
22 mm x 19 mm
26 mm x 32 mm Interposer
DiRAM4
14 mm x 12.5 mm
-
128 Gb Supercomputer Mainstore
4/9/2014 22
Two 2.5D 0.7V CMOS I/O DiRAM4
One Hub Chip
Three 25Gb/s x 4 Lane SerDes
-
128 Gb, 1Tb/s Memory Tile
23
Two 64 Gb DiRAM4 stacks
One MHUB die
Integrated Configuration Management and
Power Resources
580 pin low-swing interface
4/9/2014
-
Contenders
DiRAM4 DiRAM4
CMOS
DiRAM4
Hub
Density 64 Gb 64Gb 128 Gb
8 GB 8 GB 16 GB
Latency 9 ns 9 ns Variable
Min Ref 64 bits 64 bits 256 bits
Interface 0.7 V 0.7 – 1.2 V 1.2 V
tRC 15 ns 15 ns 15 ns
BW 16 Tb/s 4 Tb/s 1Tb/s
2 TB/s 500 GB/s 128 GB/s
Channels 256 64 1
Banks per 16 64 8192
24 4/9/2014
-
Luxtera 2.5D Photonic Data Pump
• 2.5pJ/bit power target
• Bare metal protocol – Ultra low latency
– Protocol agnostic
• 8 core Fiber
• 28Gb SERDES or 2.5Gb low-swing interface
• Self-calibrating self-tuning
• >1.6Tb/s payload
25 4/9/2014
-
Tezzaron Memory Components
• Memory Hubs – Full memory management and optimization for 2
memories
– Can manage >500 in progress memory transactions
– Maintains up to 8096 open memory pages
• Memory Fabric Switches – 16 ports cascadable
–
-
Tezzaron Memory Components
• Memory Fabric Hub
– CPU / memory/ network interfaces
– Low latency
-
Near End-of-Line TSV Insertion
poly
STI
SIN M1
M2
M3
M4
M5
M6
M7
5.6µ TSV is 1.2µ Wide and ~10µ deep
W
M8
TM
M4
M5
2x,4x,8x Wiring
level
~.2/.2um S/W
4/9/2014 28
-
Mixed CMOS-3/5 100mm InP/CMOS
• DARPA DAHI
• GaN
• 3D CMOS/InP/GaN
4/9/2014 29
-
2.5/3D Circuits
30
CC
FPGA (4Xnm)
Active Silicon Circuit Board
2 Layer Processor2 Layer Processor3 Layer 3D Memory
CC
Organic Substrate
level#0
level#1
level#2
level#3
Solder Bumps
μBumps
C4 Bumps
Die to Wafer Cu Thermal Diffusion Bond
level#4
IME A-Star / Tezzaron Collaboration
IME A-Star / Tezzaron Collaboration
4/9/2014
-
“5.5D” Systems • SIP/SSIP
– Power Conversion
– Cooling
– Photonics
• Optimization – Extending to power
• Mixed PCB/IC Metaphor
4/9/2014 31
-
Cooling Block Illustrations
4/9/2014 32
-
RAM is now Singular DiRAM4™ Changes the Language of System Design…
DiRAM4™ in Networking
• High Transaction Rate
• High Bandwidth
• High Density
DiRAM4™ in Computing
• High Bandwidth
• High Density
• Small References
DiRAM4™ in Everything
• Multiple Independent Channels
• High Bandwidth
• High Transaction Rate
High
Performance Processor
Tezzaron
DiRAM4
Local Memory and/or Cache
Network
Processor Tezzaron
DiRAM4
Table Memory + Packet Buffer
FPGA Tezzaron
DiRAM4
Fast Local Memory
4/9/2014 33
top related