3dic: which flavor for you? paul franzon north carolina state university raleigh, nc [email protected]...
TRANSCRIPT
3DIC: Which Flavor for You?3DIC: Which Flavor for You?
Paul Franzon
North Carolina State University
Raleigh, NC
2© Paul Franzon, 2011
OutlineOutline
The 3D Smorgasbord
Matching to value propositions
Some NCSU work
Opportunities and Road Blocks
3© Paul Franzon, 2011
The Raw IngredientsThe Raw Ingredients
Bonding
Through Silicon Vias (TSV)
Sony
Ziptronix
Typically 2 mm – 30 mm pitch0.5 mm potential
height
diameter
Height/diameter ~ 10:1
4© Paul Franzon, 2011
AssemblyAssembly
Wafer bonding and thinning Thin and thick chip
handling
Substrates and Redistribution Layers
Test Dice
(Thin)
Temporary Carrier
5© Paul Franzon, 2011
Transistor/TSV Integration OptionsTransistor/TSV Integration Options
Face-to-Face Face-to-Back Back-to-Back
Via-First/
Via-Middle
Via-Last
SOI
Each option offers its own unique blend of Cost Via Density Routing Congestion Heat Dissipation
6© Paul Franzon, 2011
Chip to Wafer (C2W) vs. Wafer to Wafer (W2W)Chip to Wafer (C2W) vs. Wafer to Wafer (W2W)
Wafer to Wafer (W2W)
Wafer 1
Wafer 2
Wafer 3
Mount Thin
Mount Thin BumpAdvantages Disadvantages
Simpler Identical sized chips
Lower Cost Accumulated Yield Loss
Higher via Density
Better Alignment Thinner Chips1 tier 2 tiers 3 tiers 4 tiers
90% 81% 73% 65%
7© Paul Franzon, 2011
Chip to Wafer (C2W) vs. Wafer to Wafer (W2W)Chip to Wafer (C2W) vs. Wafer to Wafer (W2W)
ONE chip to wafer (or wafer stack) (face mounted)
Wafer 1
Wafer 2
Test
Test
Dice
Advantages Disadvantages
Known Good Die – no accumulated yield loss
Higher cost – serial pick and place
Different die sizes Worse alignment
Wafer die size largest
Solder bump requires coarse TSVs in one layer
Limited to stack of two
8© Paul Franzon, 2011
Thinned Chip to Wafer (C2W) vs. Wafer to Wafer (W2W)Thinned Chip to Wafer (C2W) vs. Wafer to Wafer (W2W)
Multiple chips to wafer (or wafer stack)
Wafer 1
Wafer 2
Test
Test
Dice
Wafer 3
Mount Thin
Temporary Carrier
Attach/Demount
Advantages Disadvantages
Known Good Die in multiple chips Highest cost – temporary carrier
Thin TSV – Little area loss to connections to solder bumps
Still in research
Limited to stack of two
9© Paul Franzon, 2011
Interposers and RDLsInterposers and RDLs
Redistribution layer= thick metal (layers) added to wafers to customize interface to next chip in 3D stack
Interposer= Silicon or other carrier used to mount chips WITHIN package Examples:
Modifed Older Process
50 – 200 mm
1-2 mm thick metal
Modifed 65 nm or 90 nm Back End of Line Process
10© Paul Franzon, 2011
Possible AssembliesPossible Assemblies
In a commercial product integrated amongst multiple vendors, the IO within the chip stack will not line up without standards
ASIC Memory
Face to Face
ASICMemory
Chip on Substrate on Chip
ASIC
Memory
TSV Through ASICwith backside
Redistribution Layer (RDL)
Memory
TSV Through Memorywith backside
Redistribution Layer (RDL)
ASIC
11© Paul Franzon, 2011
ASIC
Substrate AlternativesSubstrate Alternatives
Side-by-side mounting Silicon Interposer or thin film Multi-chip Module
Top-to-bottom mounting
RAMASIC
Memory
ASIC
Conventional Interposere.g. High Density Laminate
Memory
ASIC
TSV Enabled Silicon Interposer
Face-up-Silicon Interposer
ASICRAM
TSV Enabled
No TSVs
12© Paul Franzon, 2011
OutlineOutline
The 3D Smorgasbord
Matching to value propositions
Overview of NCSU work
Opportunities and Road Blocks
13© Paul Franzon, 2011
Value PropositionsValue Propositions
Fundamentally, 3DIC permits:
Shorter wires consuming less power, and costing less The memory interface is the biggest source of
large wire bundles
Heterogeneous integration Each layer is different! Giving fundamental performance and cost
advantages, particularly if high interconnectivity is advantageous
Consolidated “super chips” Reducing packaging overhead Enabling integrated microsystems
14© Paul Franzon, 2011
Memory on LogicMemory on Logic
Conventional TSV Enabled
nVidea
or
x32
to
x128
or
N x 128
“wide I/O”
Less Overhead
Flexible bank access
Less interface power
3.2 GHz @ >10 pJ/bit
1 GHz @ 0.3 pJ/bit
Flexible architecture
Short on-chip wires
Processor
Mobile
15© Paul Franzon, 2011
Energy per OperationEnergy per Operation
DDR3 4.8 nJ/word
MIPS 64 core 400 pJ/cycle
45 nm 0.8 V FPU 38 pJ/Op
20 mV I/O 128 pJ/Word
(64
bit
wo
rds)
LPDDR2 512 pJ/Word
SERDES I/O 1.9 nJ/Word
On-chip/mm 7 pJ/Word
TSV I/O (ESD) 7 pJ/Word
TSV I/O (no ESD) 2 pJ/Word
Optimized DRAM core 128 pJ/word
Optimized ReRAM core 12? pJ/word
11 nm 0.4 V core 200 pJ/op
1 cm on interposer 45 pJ/Word
Computation is more efficient than communication or memory
16© Paul Franzon, 2011
Heterogeneous IntegrationHeterogeneous Integration
The poorly understood step-child
Opportunities? “Disintegrated memory” (Tezzaron)
3D stacks for miniature sensors
Optimized technology stack for advanced imagers
Advanced processors for systolic computation
Highly connected neuromorphic architectures
17© Paul Franzon, 2011
Choosing a flavor!Choosing a flavor!
Highest 3Dconnection density
Continued stacking
Easiest, lowest assembly cost
Easy and cost effective
Most difficult but highest end yield
ASICRAMASIC
Memory
ASIC
Better Thermal Management and Power Integrity, simplified supply chain
18© Paul Franzon, 2011
OutlineOutline
The 3D Smorgasbord
Matching to value propositions
Overview of NCSU work
Opportunities and Road Blocks
19© Paul Franzon, 2011
Synthetic Aperture Radar ProcessorSynthetic Aperture Radar Processor
Built FFT in Lincoln Labs 3D Process
Metric Undivided Divided %Bandwidth (GBps) 13.4 128.4 +854.9Energy Per Write(pJ) 14.48 6.142 -57.6Energy Per Read (pJ)
68.205 26.718 -60.8
Memory Pins (#) 150 2272 +1414.7Total Area (mm2) 23.4 26.7 +16.8%
Thor Thorolfsson
20© Paul Franzon, 2011
3D FFT Floorplan3D FFT Floorplan
All communications is vertical
Support multiple small memories WITHOUT an interconnect penalty AND Gives 60% memory power savings
Thor Thorolfsson
21© Paul Franzon, 2011
2DIC vs. 3DIC Implementation2DIC vs. 3DIC Implementation
vs.
Metric 2D 3D Change
Total Area (mm2) 31.36 23.4 -25.3%
Total Wire Length (m) 19.107 8.238 -56.9%
Max Speed (Mhz) 63.7 79.4 +24.6%
Power @ 63.7MHz (mW) 340.0 324.9 -4.4%
FFT Logic Energy (µJ) 3.552 3.366 -5.2%
Thor Thorolfsson
22© Paul Franzon, 2011
Tezzaron SAR ProcessorTezzaron SAR Processor
2D 3D mPl 3D
588 487.3 -1.17% 464.8 -21%
31.6 33.84 +7.1% 38.74 +22.6%
316.1 338.4 +7.1% 387.4 +22.6%
1.51 1.79 -15.5% 0.984 -45.2%
5.975 5.692 -4.8% 5.21 -12.9%
10.3 3.1 -71%
Metric
Total Wire length (mm)
Max. Frequency (MHz)
Max Performance (MFlops)
Parasitic Power (mW)
Logic Power (mW)
Memory Power (W)
Thor Thorolfsson
23© Paul Franzon, 2011
Mobile GraphicsMobile Graphics
Problem: Want more graphics capacity but total power is constrained
Solution: Trade power in memory interface with power to spend on computation
POP with LPDDR2 TSV Enabled
LPDDR2
GPU
TSV IO
GPU
532 M triangles/s 695 M triangles/s
Won Ha Choi
Power Consumption
Power Consumption
24© Paul Franzon, 2011
3D Miniaturization3D Miniaturization
Miniature Sensors mm3 scale - Human Implantable (with Jan
Rabaey, UC(B)) cm3 scale - Food Safety & Agriculture (with
KP Sandeep, NCSU)
Biggest problem is power delivery and storage
Secondary problem is availability of “infrastructure” items (MEMS, sensors, power storage) in 3D compatible form factors
Peter Gadfort, Akalu Lentiro, Steve Lipa
25© Paul Franzon, 2011
CAD FlowCAD Flow
Architectural
Evaluator
NCSU tools Commercial Tools Desirable Tools
SystemC Modelse.g. NOC performance, power evaluator
Trial Designse.g. Interconnect, Thermal
SRAM evaluation
Partitioning
Floorplanner
2520
µm
2524 µm
Instruction SRAM
CPU 1
CPU 2
Data Memory Controller
Wishbone Traffic Cop SRAM Address Pins
Power Stripe
Power Rings
2520
µm
2524 µm
Data SRAM Data SRAM
IMMU 1 DMMU 1
OpenRISC Instruction Wishbone Interface Module 1
OpenRISC Data
Wishbone Interface Module 1
SRAM Address PinsSRAM Address Pins
2520
µm
2524 µm
Data SRAM Data SRAM
IMMU 2 DMMU 2
OpenRISC Instruction Wishbone Interface Module 2
OpenRISC Data
Wishbone Interface Module 2
SRAM Address PinsSRAM Address Pins
TSV placement
Routability Evaluation
True 3D Floorplanner
Richer Calculators
Rhett Davis, Steve Lipa
26© Paul Franzon, 2011
… CAD Flow… CAD Flow
NCSU tools Commercial Tools Desirable Tools
Individual Tier Place and Route
Chip Reassembly & Layout
True 3D Design Kit
Verification
Thermal Extraction
Thermal Verification
LVS, DRC and Performance Verification
True 3D Design Kit
3D DFM
DFT Optimization
Merged decks
27© Paul Franzon, 2011
OutlineOutline
The 3D Smorgasbord
Matching to value propositions
Overview of NCSU work
Opportunities and Road Blocks
28© Paul Franzon, 2011
Opportunities and RoadblocksOpportunities and Roadblocks
Roadblocks:
Cost and price! Fabrication AND test
Need to build up volume to reduce price
Cost of added test steps
Complexity of design and codesign Packaging, thermal,
power delivery, etc.
Opportunities:
Solving these problems!
Creating unique opportunities that make price (cost) not an issue!
High value products providing pathway to high volume products
29© Paul Franzon, 2011
ConclusionsConclusions
Three dimensional integration offers potential to Deliver memory bandwidth power-effectively; Improve system power efficiency through 3D
optimized codesign Miniaturize sensors to cm and mm scale
Main challenges in 3D integration (from design perspective) Effective early codesign to realize these
advantages in workable solutions Managing cost and yield, including test and test
escape Managing thermal, power and signal integrity
while achieving performance goals (For sensors) technology mix for sensing, power
scavenging, storage in 3D form factors
30© Paul Franzon, 2011
AcknowledgementsAcknowledgements
Faculty: William Rhett Davis, Michael B. Steer,
Professionals: Steven Lipa, Students: Hua Hao, Samson Melamed, Peter Gadfort, Akalu Lentiro,
Sonali Luniya, Christopher Mineo, Julie Oh, Won Ha Choi, Zhou Yang, Ambirish Sule, Gary Charles, Thor Thorolfsson,
Department of Electrical and Computer EngineeringNC State University
31© Paul Franzon, 2011
Challenges in 2013 TimeFrameChallenges in 2013 TimeFrame
Challenge Problem Solution(s)
Cost TSV processing adds ~10% to chip fab cost
Focus on adding value, orRecover cost from simpler packaging
Test Can not probe ~10,000 microbumps cost-effectively
Use of BIST and a dedicated probe test port; good partitioning
Thermal Hot spotsMemory leakage
Better integration of tools; Better thermal isolation and conduction layers
Codesign Managing performance; thermal; power and signal integrity simultaneously
ESL (SystemC) centric early design flow (“pathfinding”)
TSV self-test
32© Paul Franzon, 2011
Opportunities beyond 2013Opportunities beyond 2013
An airborne 1 cm resolution Synthetic Aperture Radar processor, with three dimensional imaging
Northrop Grumman
3D SAR (NASA)
Floating Point Performance
1.8 TFLOPS
Memory Bandwidth
600 GB/s
Memory Capacity
16+ GB
“Extreme” 3D Integration
33© Paul Franzon, 2011
Challenges in “Extreme” IntegrationChallenges in “Extreme” Integration
Challenges – Large Scale Problem
Processing cost, and yield management
Approaching cost levels of conventional packaging while managing yield and processing costs
Thermal Getting heat out laterally, not vertically, preferably without flowing liquids
Power delivery Getting current in laterally, not vertically
Early design Good system design in context of many tradeoffs
Test Managing test cost
Challenges – Small Scale Problem
Power delivery Power flux to suit draw in environment
Power storage Energy density; 3D form factor
Sensors Sensor integration in 3D form factor