dbi -enabled next generation soc architectures · dbi®-enabled next generation soc architectures...

21
3D Design and Performance DBI ® -Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019 presentation

Upload: others

Post on 06-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DBI -Enabled Next Generation SoC Architectures · DBI®-Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019

3D Design and PerformanceDBI®-Enabled Next Generation SoC Architectures

Special Thanks to for contributions

Javi DeLaCruz

Adapted from S3S Conference 2019 presentation

Page 2: DBI -Enabled Next Generation SoC Architectures · DBI®-Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019

S3S Conference 20192

• Limitations of Computing• Fine-pitch 3D interconnect confers unique, powerful, new capabilities

• Lower power, higher performance, reduced area

• Production-proven 3D technologies: ZiBond® and DBI• Design in 3D instead of stacking 2D designs• Reticle Limitations Emerging• Case study in High Performance Compute

Agenda

Page 3: DBI -Enabled Next Generation SoC Architectures · DBI®-Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019

S3S Conference 2019

1 mm

3

Main computation bottleneck is connectivityWith 10nm manufacturing…• 12 signals/µm of beachfront on middle layers• 4 middle layers ~100,000 connections / mm2

1 mm

Page 4: DBI -Enabled Next Generation SoC Architectures · DBI®-Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019

S3S Conference 2019

1 mm

4

Main computation bottleneck is connectivityWith 10nm manufacturing…• 12 signals/µm of beachfront on middle layers• 4 middle layers ~100,000 connections / mm2

With most advanced TSVs…• Only 625 connections / mm2

1 mm

Page 5: DBI -Enabled Next Generation SoC Architectures · DBI®-Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019

S3S Conference 2019

1 mm

5

Main computation bottleneck is connectivityWith 10nm manufacturing…• 12 signals/µm of beachfront on middle layers• 4 middle layers ~100,000 connections / mm2

With most advanced TSVs…• Only 625 connections / mm2

105-106 / mm2

1 mm

Page 6: DBI -Enabled Next Generation SoC Architectures · DBI®-Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019

S3S Conference 20196

Interface Between Die• What’s the best interface for 2.5D and 3D? …the answers may be different• Adding standard interfaces reduces the benefit of 3D design• Leverage smaller load between die than within die• Internal interconnects across die layers (AXI, APB, ASB, NoC, SRAM Bus)• Folding alone, without planning improves average net length by 30%• Deliberate 3D architectural planning can shrink routes from mm to μm

Interface between die can be the same as (or better than) interfaces within die

Page 7: DBI -Enabled Next Generation SoC Architectures · DBI®-Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019

S3S Conference 20197

ZiBond & DBI 3D wafer/die bonding solutions

Courtesy Chipworks/Sony

ZiBond DBI

Courtesy Chipworks/Sony

Direct Bonding

Si

Si

Hybrid Bonding

InterconnectSi

CHIPWORKS CONFIDENTIALAll content © 2013, Chipworks Inc. All rights reserved.11

9

Sony IMX135 13-Mpixel CMOS Image Sensor

SEM cross-section of stacked dies

image sensor

image processor

bonding interface

Ⴗ 90-nm back-illuminated sensor bonded face-to-face with 65-nm image processor

Ⴗ “up & over” TSVs filled with Cu & appear to be filled simultaneously

SEM cross-section of TSVs

Cu

Cu

Si

Oxide

Die to Die (D2D) Bonding

Wafer to Wafer (W2W) Bonding

Die to Wafer (D2W) Bonding

Page 8: DBI -Enabled Next Generation SoC Architectures · DBI®-Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019

S3S Conference 20198

The Ultimate 2.5D and 3D Integration Technology for High-Performance Computing

DBI Ultra Image: Gao et al; ECTC 2019

8

4

2

1

Page 9: DBI -Enabled Next Generation SoC Architectures · DBI®-Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019

S3S Conference 20199

• The Industry is reaching a high hurdle with the reticle limits• Impacts on yield, performance, cost, etc.• Several ways to address this, which include chiplets

Reticle Buster Problem

AMD EPYC 2 RomeImage from www.servethehome.com

NVIDIA Deep Neural Network AcceleratorImage from HotChips 2019, Krizhevsky et al.

Intel 8th Generation Core with Radeon RX Vega M GraphicsImage from Anandtech

Page 10: DBI -Enabled Next Generation SoC Architectures · DBI®-Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019

S3S Conference 201910

51.2Tbps SwitchHigh Performance Compute Case Study

Page 11: DBI -Enabled Next Generation SoC Architectures · DBI®-Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019

S3S Conference 201911

• 51.2 Tbps Switch requires ~4 reticles at 7nm• 512 lanes of 112Gbps SerDes off package• Same logic/memory area in each solution, DBI Ultra• Logic and memory on both layers when stacked.

IO on top die due to SerDes hard IP

High Performance Compute Analysis

7nm bottom layer7nm top layer

2.1 or 2.5 Interconnect 2 Stacks of 2 Die

2.5D Array of 4 Die

USR (no interposer) Option A Option C

HBI (Stitched interposer) Option B Option D

Native Option E

Package Substrate

Option AOption B includes interposer

Page 12: DBI -Enabled Next Generation SoC Architectures · DBI®-Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019

S3S Conference 201912

• 51.2 Tbps Switch requires ~4 reticles at 7nm• 512 lanes of 112Gbps SerDes off package• Same logic/memory area in each solution, DBI Ultra• Logic and memory on both layers when stacked.

IO on top die due to SerDes hard IP

High Performance Compute Analysis

7nm bottom layer7nm top layer

2.1 or 2.5 Interconnect 2 Stacks of 2 Die

2.5D Array of 4 Die

USR (no interposer) Option A Option C

HBI (Stitched interposer) Option B Option D

Native Option E

Package SubstrateStitched Silicon Interposer 65nm

Option AOption B includes interposer

Option C has no interposerOption D includes interposer

7nm die

Page 13: DBI -Enabled Next Generation SoC Architectures · DBI®-Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019

S3S Conference 201913

• 51.2 Tbps Switch requires ~4 reticles at 7nm• 512 lanes of 112Gbps SerDes off package• Same logic/memory area in each solution, DBI Ultra• Logic and memory on both layers when stacked.

IO on top die due to SerDes hard IP

High Performance Compute Analysis

7nm bottom layer7nm top layer

Base die uses 9 exposures on single 28nm die. Only center exposure uses active circuits

2.1 or 2.5 Interconnect 2 Stacks of 2 Die

2.5D Array of 4 Die

USR (no interposer) Option A Option C

HBI (Stitched interposer) Option B Option D

Native Option E

Package SubstrateStitched Silicon Interposer 65nm

Active Bridge Regions

Pass-Thru Interconnects

Option AOption B includes interposer

Option C has no interposerOption D includes interposer

Option E

7nm die

7nm die

Page 14: DBI -Enabled Next Generation SoC Architectures · DBI®-Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019

S3S Conference 201914

• Utilizing DBI Ultra for yield improvement• Unable to floorplan the USR in Option A due to limited beachfront with two rows of USR. • Option E utilizes active and unstitched large base die in 28nm

Obstacles and Advantages in Analysis

7nm bottom layer7nm top layerPackage Substrate

Option AOption B includes interposer

Option C has no interposerOption D includes interposer

Option E

Stitched Silicon Interposer 65nm

DBI Ultra Interconnects

DBI Ultra Interconnects

Bridge InterconnectsIn Center Exposure

28nm activebottom layer

Page 15: DBI -Enabled Next Generation SoC Architectures · DBI®-Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019

S3S Conference 2019

0

0.2

0.4

0.6

0.8

1

1.2

1.4

A B C D E

Normalized Interconnect Power

15

• Only the lateral chip-chip interconnect power considered • Native interconnects on Option E consume the least

power

Comparative Power AnalysisInterface 2.1D + 3D 2

Stacks of 2 Die2.5D Array of 4 Die

USR Option A Option C

HBI Option B Option D

Native Option E

-79%

Page 16: DBI -Enabled Next Generation SoC Architectures · DBI®-Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019

S3S Conference 201916

• HBI has an inherently lower latency than a USR interface• Native interconnects have a 57% improvement over using

a USR SerDes

Comparative Latency AnalysisInterface 2.1D + 3D 2

Stacks of 2 Die2.5D Array of 4 Die

USR Option A Option C

HBI Option B Option D

Native Option E

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

A B C D E

Normalized Latency of Short Route

-36%-57%

Page 17: DBI -Enabled Next Generation SoC Architectures · DBI®-Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019

S3S Conference 201917

• Options A and B comprise two 7nm tapeouts• Option B had higher NRE due to additional cost of 65nm

interposer• Option C is the simplest with a single 7nm tapeout• Option E has only one 7nm and one 28nm tapeout

Comparative Mask NRE AnalysisInterface 2.1D + 3D 2

Stacks of 2 Die2.5D Array of 4 Die

USR Option A Option C

HBI Option B Option D

Native Option E

0

0.2

0.4

0.6

0.8

1

1.2

A B C D E

Normalized Mask Cost

+4%

-37%

25%

Page 18: DBI -Enabled Next Generation SoC Architectures · DBI®-Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019

S3S Conference 201918

• Reduced total die area improves yield on Option E due to reduced interface area with native interconnects

• HBI is more efficient in space than a USR, but both impact die size

Comparative Unit Cost AnalysisInterface 2.1D + 3D 2

Stacks of 2 Die2.5D Array of 4 Die

USR Option A Option C

HBI Option B Option D

Native Option E

0

0.5

1

1.5

2

2.5

A B C D E

Normalized Unit Cost

-20%

-77%

Page 19: DBI -Enabled Next Generation SoC Architectures · DBI®-Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019

S3S Conference 2019

• The most compelling case is option E• Lowest interconnect power (-79%)• Lowest short route latency (-57%)• Lowest unit cost (-77%)• Additional mask cost (25%)

19

Data Summary

Page 20: DBI -Enabled Next Generation SoC Architectures · DBI®-Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019

S3S Conference 201920

• What is the barrier for adoption on this?

• DBI Ultra® die-to-wafer strategies enable new architectures

• Leverage the existing interfaces used within die to span die boundaries.

• 3D allows for a path beyond reticle limits without PPA tradeoffs

Summary

Acknowledgements: Contributions and PPA analysis performed by Ferran Martorell and Prasad Subramaniam of eSilicon

3 μm

STEM from a thin lamella: Z contrast

Page 21: DBI -Enabled Next Generation SoC Architectures · DBI®-Enabled Next Generation SoC Architectures Special Thanks to for contributions Javi DeLaCruz Adapted from S3S Conference 2019

S3S Conference 201921