© CEA. All rights reserved
Yvain Thonnart, Mounir Zid
September 18th, 2014
8th IEEE/ACM International Symposium on Networks-on-Chip, Ferrara, Italia
Technology Assessment of Silicon Interposers for Manycore SoCs: Active, Passive, or Optical?
© CEA. All rights reserved
Context
Technological
Convergence of 3D packaging technologies
From Foundry:
Si wafers, TSVmiddle…
From OSATs (Outsourced Assembly & test):
migration from organic/ceramic to Si wafers
With Optical modules for I/O:
chip-level connection
Architectural & Applicative
Manycore multiprocessors
Ever increasing processing needs
Multiprocessors in data centers
Thanks to NoC research on design, hardware & software architecture
Technology Assessment of Silicon Interposers for Manycore SoCs: Active, Passive or Optical | Yvain Thonnart | NOCS’14 2
Between technology and architecture the yield issue of advanced nodes
© CEA. All rights reserved
Objective
Technology Assessment of Silicon Interposers for Manycore SoCs: Active, Passive or Optical | Yvain Thonnart | NOCS’14 3
Stack N chips on a matrix fashion on an interposer Increase the overall yield with smaller dies
Provide a given average bandwidth
Adapt the design choices and architecture following the different technology options
© CEA. All rights reserved
Outline
Context and objectives
Technological and design assumptions
Results and analysis
Target architectures
Technology Assessment of Silicon Interposers for Manycore SoCs: Active, Passive or Optical | Yvain Thonnart | NOCS’14 4
© CEA. All rights reserved
Methodology
Technology Assessment of Silicon Interposers for Manycore SoCs: Active, Passive or Optical | Yvain Thonnart | NOCS’14 5
Consistent assumptions Use LETI’s own state of the art to size things
best-in-class in every domain can be inconsistent
Gather figures of merit on each building block Area per bit
Forward latency
Peak bandwidth
Static power per bit (cell leakage)
Idle power per bit (minimum operation point - some clocking in most cases)
Dynamic power per transferred bit
Size system to guarantee system bandwidth Closed-form formulas
(polynomial fits from measurements if not simple)
Report aggregated figures of merit Considering actual utilization of each component
© CEA. All rights reserved
Technological & Design Assumptions
STMicro CMOS 28nm FDSOI chiplets
STMicro CMOS 65nm Low-Power active interposer
40µm pitch for 3D interconnect Mature assembly process
Leti passive & optical interposers line pitches & electrical
characterization
Optical devices performance & sensitivity
External laser source with 20% wall-plug efficiency
Most design data from previous NoCs, electrical and optical chips at Leti, reported to values per bit
Technology Assessment of Silicon Interposers for Manycore SoCs: Active, Passive or Optical | Yvain Thonnart | NOCS’14 6
© CEA. All rights reserved
Applicative Assumptions
Technology Assessment of Silicon Interposers for Manycore SoCs: Active, Passive or Optical | Yvain Thonnart | NOCS’14 7
Chiplets : 4-core clusters running at 2GHz 40mm² per chiplet, 25W
Communication needs : 1byte/flop at chiplet I/O
Discarded external memory controller contributions Chip I/O is a board/datacom/telecom problematic in itself
Distributed last-level of cache traffic between clusters 512b/line+1/8 protocol overhead
Uniform random traffic between chiplets
Locality is contained inside a cluster
Target contention & NoC contention with XY routing brute-forced up to 20 chiplets
© CEA. All rights reserved
CMOS Interposer Architectures
Use of relaxed pitch CMOS interposer => up to full reticle design with standard process, more with accurate stepper
Must keep reasonably low density for yield
Digital communications with rail-to-rail switching, buffering and pipelining
2 options for physical layer : Synchronous & Asynchronous
2 options for routing layer : 2D-mesh NoC || crossbar/mesh-of-trees
Based on our design experience with ANoC/SNoC
Technology Assessment of Silicon Interposers for Manycore SoCs: Active, Passive or Optical | Yvain Thonnart | NOCS’14 8
Synchronous Asynchronous
2D
-me
sh
cro
ssb
ar
© CEA. All rights reserved
Passive Interposer Architectures
Metal lines on interposer, using relaxed pitch BEOL Finer pitch would be too
resistive
DC lines with R-C propagation delay not considered Similar to active CMOS
performance, without possibility of inserting repeaters or pipeline to restore signal
Might be OK for neighbour to neighbour communication
NoC is implemented within top dies, only links on interposer
Impedance controlled transmission lines High-speed serial (HSS)
transmission with TxRx, SerDes and ECC
Technology Assessment of Silicon Interposers for Manycore SoCs: Active, Passive or Optical | Yvain Thonnart | NOCS’14 9
© CEA. All rights reserved
Optical Interposer Architectures
Focus on simple Corona-like 1-to-N/N-to-1 ring topologies Early adoption: only end-to-
end routing
External laser sources: lower thermal impact on laser power
Thermal tuning of all microring resonators up to 24 rings/channel for
WDM
High-speed serial communication with TxRx, Serdes & ECC
Technology Assessment of Silicon Interposers for Manycore SoCs: Active, Passive or Optical | Yvain Thonnart | NOCS’14 10
© CEA. All rights reserved
Area results
worst of: logic area in chiplets
CuPi area
logic area on interposer
reported to chiplet or inteproser area
Point-to-point lines on CMOS interposer: quadratic explosion in the number
of links compared to the NoCs
High-speed links, electrical or optical: much lower footrprint
challenge the NoC in terms of area
staying around 10% of chiplet size.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1—2 4 8 12 16 20
Pe
rce
nta
ge o
f ar
ea
Chiplets
sync ptp async ptp sync NoC
async NoC trans. lines optical links
Technology Assessment of Silicon Interposers for Manycore SoCs: Active, Passive or Optical | Yvain Thonnart | NOCS’14 11
© CEA. All rights reserved
Critical height results
worst of: 3D interface height reported to
chiplet height
cumulative ligne height at interposer bissection reported to interposer height.
CMOS point-to-point lines, (synchronous or asynchronous) require the complete interposer
height to connect all chiplets.
High-speed transmission lines suffer from an interface on the perimeter of the chiplet due to pitch constraints.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1—2 4 8 12 16 20
Pe
rce
nta
ge o
f h
eig
ht
Chiplets
sync ptp async ptp sync NoC
async NoC trans. lines optical links
Technology Assessment of Silicon Interposers for Manycore SoCs: Active, Passive or Optical | Yvain Thonnart | NOCS’14 12
© CEA. All rights reserved
Power results
total power is the sum of leakage power
idle power dynamic power of clock distribution
for CMOS
laser and thermal tuning for optics
useful switching power
The optical links start with a high static cost compared to the other solutions Laser & thermal tuning
For bigger matrices, the electrical NoC is sized accordingly to avoid contention, at the cost of higher power consumption.
0%
10%
20%
30%
40%
50%
60%
1—2 4 8 12 16 20
Pe
rce
nta
ge o
f p
ow
er
Chiplets
sync ptp async ptp sync NoC
async NoC trans. lines optical links
Technology Assessment of Silicon Interposers for Manycore SoCs: Active, Passive or Optical | Yvain Thonnart | NOCS’14 13
© CEA. All rights reserved
Latency results
communication set-up latency (pipeline, signaling…) plus packet size×throughput, in processor cycles End to end latency
Synchronous solutions:
huge latency (up to 100 cycles) compared to other solutions
due to low-frequency pipelining on the interposer.
Asynchronous solutions:
latency similar to DC buffering across chip (diffusion equation)
twice as high as high-speed serial solutions (propagation equation)
Technology Assessment of Silicon Interposers for Manycore SoCs: Active, Passive or Optical | Yvain Thonnart | NOCS’14 14
0
10
20
30
40
50
60
70
80
90
100
1—2 4 8 12 16 20
Pro
cess
or
cycl
es
Chiplets
sync ptp async ptp sync NoC
async NoC trans. lines optical links
© CEA. All rights reserved
An interposer roadmap for computing
Metallic interposer Active interposer Photonic interposer
Today 2016-18 2020
1-4 chiplets 5-9 chiplets >10 chiplets
Technology Metallic Active Photonic
On-chip bandwidth
≤ 250 Gb/s ≤2 Tb/s >4Tb/s (>2x)
Number of cores
≤ 16 ≤ 36 > 72 (>2x)
Power for on-chip com
~ 1 W ~ 20 W ~ 20 W (~1x)
Technology Assessment of Silicon Interposers for Manycore SoCs: Active, Passive or Optical | Yvain Thonnart | NOCS’14 15
© CEA. All rights reserved
Perspectives: Hubeo+ Project
Build a demonstrator for an optical network-in-package to interconnect microprocessors and memories integrated in a silicon photonics interposer
Started Q2 2013 - Demonstrator planned for Q2 2016
Cut view Top view
Technology Assessment of Silicon Interposers for Manycore SoCs: Active, Passive or Optical | Yvain Thonnart | NOCS’14 16
© CEA. All rights reserved
Thank you
© CEA. All rights reserved
Technical challenges High-density high-performance integration Optimized short-range optical WDM system
devices and protocols
E/O co-design of drivers and optical NoC architecture
Optical Communications on Interposer for Chip-Multiprocessors | Yvain Thonnart | 18
Thermal control of WDM optical devices