![Page 1: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/1.jpg)
An Overview of Standard Cell Based Digital VLSI Design
With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip
Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker, Bevan BaasVCL Laboratory, UC Davis
![Page 2: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/2.jpg)
Outline
• Overview of standard cell-based design
• Design of the AsAP1 and KiloCore chips including CAD Tool Flow
![Page 3: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/3.jpg)
Standard cell vs. Full-custom IC design
Standard-cell based IC design
Design using standard cells
Standard cells come from library provider
Many different choices for cell size, delay, leakage power
Many EDA tools to automate this flow
Shorter design time
Custom IC design (e.g., magic)
Design all by yourself
Higher performance
Lower energy per workload (lower power)
Smaller chip area
![Page 4: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/4.jpg)
Standard cell based VLSI design flow
“Front end”
System specification and architecture
HDL (verilog or VHDL) coding
Behavioral simulations using RTL (HDL)
Synthesis
Gate-level simulations
“Back end”
Floorplanning, Power grid design
Standard-cell Placement
Interconnect routing
DRC (Design Rule Check)
LVS (Layout vs Schematic)
Dynamic simulation and static timing analysis
![Page 5: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/5.jpg)
Outline
• Overview of standard cell-based design
• Design of the AsAP1 and KiloCore chips including CAD Tool Flow
![Page 6: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/6.jpg)
© B. Baas 6
FIFO 032
words ALU
MAC
Control
IMem 64
words
DMem128
words
OSC
Static
config
Dynamic
config
Output
AsAP1 Block Diagram
• GALS array of identical processors – Each processor is a reduced complexity programmable DSP with small
memories
– Each processor can receive data from any two neighbors and send data to any of its four neighbors
FIFO 132
words
![Page 7: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/7.jpg)
KiloCore
• Developed by the VLSI Computation Laboratory at UC Davis, with a similar architecture to AsAP(Asynchronous Array of Processors)
• A processing chip containing multiple uniform simple processor elements
• Globally Asynchronous Locally Synchronous (GALS)– Each processor has its local clock generator
• Each processor can communicate with its neighbor processors using dual-clock FIFOs
![Page 8: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/8.jpg)
KiloCore Design
• Contains 1,000 processors on one chip
• Fastest clock rate processor designed at a university
• 12 memories containing 64 KB each for 768 KB of shared memory
8
KiloCore Block Diagram
Single Processor One 64 KB memory
![Page 9: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/9.jpg)
Simple Diagram of the Front-End Design Flow
System
Specification
RTL
CodingSynthesis Gate level code
INV (.in(a), .out(a_inv));
AND (.in1 (a_inv), .in2(b),
.out(c));
Example: c = !a & b
Cab
![Page 10: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/10.jpg)
Simple diagram of the Back-end design flow
gate level Verilog
from synthesisPlace
&
Route
Final layout
(go for fabrication)
DRC
Gate level VerilogLVS
Timing information
Gate level dynamic and/or static analysis
Design rule
check
Layout vs.
schematic
![Page 11: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/11.jpg)
Back-End: Physical Design Flow
![Page 12: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/12.jpg)
Back-end design of AsAP1
• Technology: TSMC 0.18 μm CMOS
• Standard cell library: Artisan
• Tools– Synthesis: Synopsis Design compiler
– Placement & Route: Cadence Encounter
– DRC & LVS: Mentor Calibre
– Static timing analysis: Primetime
![Page 13: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/13.jpg)
Back-End Design of KiloCore
• Technology: IBM 32 nm PD-SOI CMOS
• Standard cell library: ARM-Artisan
• Tools– Synthesis: Synopsis Design compiler
– Placement & Route: Cadence Encounter
– DRC & LVS: Mentor Calibre
– Static timing analysis: Primetime and UltraSim (Spice)
![Page 14: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/14.jpg)
Flow of Placement and Routing
• Import needed files
• Floorplan
• Placement & in-place optimization
• Clock tree generation
• Routing
![Page 15: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/15.jpg)
Import Needed Files
• Gate-level verilog (.v)
• Geometry information (.lef)
• Timing information (.lib)
INV (.in (a), .out (a_inv));
AND (.in1 (a_inv), .in2 (b), .out (c));
INV: 1um width AND: 2 um width
INV: 1ns delay; AND: 2 ns delay
INV ANDa
b
C
Delay (ac): 1ns + 2ns = 3ns
![Page 16: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/16.jpg)
Floorplan
• Size of chip
• Location of Pins
• Location of main blocks
• Power supply: give enough power for each gate
Vdd (Metal)
Power supply (1.8 V)
current
Gate 1 Gate 2 Gate 3 Gate 4
1.75 V
Voltage drop equation: V2 = V1 – I * R
1.7 V
(need another power stripe)
1.65 V
Gnd
![Page 17: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/17.jpg)
Floorplan of a single processor
Inst
Mem
ClockInFIFO
0
Data Mem
ALU
MAC
Control
InFIFO
0
![Page 18: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/18.jpg)
Floorplan of Single KiloCore Processor
18
![Page 19: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/19.jpg)
Placement & In-Place Optimization
• Placement: place the gates (standard cells)– Utilizes a long series of very complex optimizations to meet all
design goals as best as possible. Design goals include: maximum delay of all paths, minimize length of all wires (to increase probability of a successful route), etc.
• In-place optimization– Why: there will always be a timing difference between synthesis
and layout (e.g., actual wire delay is different than predicted)
– How: change gate size, insert buffers
– May not change the circuit function!!
![Page 20: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/20.jpg)
Placement of a single AsAP1 processor
![Page 21: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/21.jpg)
Placement of Single KiloCore Processor
21
![Page 22: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/22.jpg)
Placement of Single KiloCore Processor
22
![Page 23: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/23.jpg)
Clock Tree Design
• Main parameters: skew, delay, transition time
Q
QSET
CLR
S
R Q
QSET
CLR
S
R
Q
QSET
CLR
S
R
Q
QSET
CLR
S
R Q
QSET
CLR
S
R
Q
QSET
CLR
S
R
Q
QSET
CLR
S
R Q
QSET
CLR
S
R
Q
QSET
CLR
S
R
Original Clock
Clock Delay = y
Clock Skew= x -yClock Delay= x
![Page 24: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/24.jpg)
Clock tree of a single AsAP1 processor
![Page 25: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/25.jpg)
Clock Tree of a Single KiloCore Processor
25
• Colors
indicate clock
phase which
is the same as
clock skew
![Page 26: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/26.jpg)
Routing
• Routing is the second step in the “Standard Cell Place & Route” process and consists of the CAD P&R tool routing all necessary signal wires– Local power and ground connections to standard cells are
made during an earlier power striping step and are made by a much simpler process of simply laying down horizontal power stripes
• Routing consists of two main steps– Connection of global signals (power)
– Connect other signals
![Page 27: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/27.jpg)
Metal Layer Topology
Routing
![Page 28: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/28.jpg)
Layout of a Single AsAP1 Processor After Routing
Area:
0.8mm x 0.8mm
![Page 29: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/29.jpg)
Layout of the first generation 6x6 AsAP1
One processor
Area: 30 mm^2
in 180 nm CMOS
- 36 processors
- 114 PADs
![Page 30: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/30.jpg)
Routing on a Single KiloCore Processor
30
• 239 μm x 231.3 μm
• 574,733 transistors
![Page 31: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/31.jpg)
KiloCore Chip Layout
• KiloCore Chip
– Entire chip takes up 8 mm x 8 mm or 64 mm2
– LVDS pairs used for high speed data I/O
– Drivers connected through C4 bump array
31 KiloCore Chip Layout
Single Processor
SRAM Memories
I/O Drivers
![Page 32: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/32.jpg)
Verification after layout
• DRC (design rule check)
• LVS (layout vs. schematic)– GDS-II vs. (verilog + spice module)
• Gate level verilog dynamic simulation– Mainly check the function
– Different with synthesis result
![Page 33: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/33.jpg)
Useful tools
• Dynamic Simulation:
– Modelsim (Mentor), NC-verilog (Cadence), Active-HDL
• Synthesis:
– Design-compiler, design-analyzer (Synopsys)
• Placement & Routing
– Encounter & Virtuoso (Cadence)
– Astro (Synopsys)
• DRC & LVS
– Calibre (Mentor)
– Dracula (Cadence)
• Static Analysis
– Primetime (Synsopsys)
![Page 34: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/34.jpg)
34
Flow: Standard-cell based
Technology: TSMC 0.18 µm
Transistors:
1 Proc 230,000
Chip 8.5 million
Max speed: 610 MHz @ 2.0 V
Area:
1 Proc 0.66 mm²
Chip 32.1 mm²
Power (1 Proc @ 1.8V, 475 MHz):
Typical application 32 mW
Typical 100% active 84 mW
Power (1 Proc @ 0.9V, 116 MHz):
Typical application 2.4 mW
Single
Processor
OSC FIFOs
DMemIMem
Chip Micrograph of the 36-Core AsAP1
• ISSCC 2006
• HotChips 2006
• IEEE Micro 2007
• JSSC 2008
![Page 35: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/35.jpg)
KiloCore Chip
Slide 35
7.8
2 m
m
7.67 mm8 mm
8 m
m
Technology 32nm IBM PDSOI CMOS
Processors
1000 per chip
1.78 GHz @ 1.1 V
1.24 GHz, 18 mW
@ 0.90 V
115 MHz, 0.7 mW
@ 0.56 V
Indep. Mems 12 per chip
Num. Oscs. 2012
Die Area 64 mm2
Array Area 60 mm2
Transistors 621 Million
C4 Bumps 564 (162 I/O)
Package 676 Pad Flip-Chip BGA
HotChips 2016
![Page 36: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/36.jpg)
![Page 37: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/37.jpg)
KiloCore Measurements
• A maximum of 1.78 trillioninstructions per second
– Assuming a custompackage design
• Processors achieve theiroptimal energy times time of 11.1 (pJ x ns / instruction) at 0.9 V
• At minimum voltage, KiloCore can perform 115 billion instructions per second using 0.7 Watts—low enough to be powered by a single AA battery!
37
HotChips 2016
Supply
Voltage
Max Clock
Freq
(MHz)
Total Chip
Instructions /
sec
Total Chip
Power
1.10 V 1782 1.78 Trillion 39 Watts
0.90 V 1237 1.24 Trillion 17 Watts
0.84 V 1000 1.00 Trillion 13 Watts
0.75 V 638 638 Billion 6.3 Watts
0.56 V 115 115 Billion 0.7 Watts
![Page 38: An Overview of Standard Cell Based Digital VLSI Designbbaas/116/notes/Handout.std.cell.design.pdfBack-end design of AsAP1 •Technology: TSMC 0.18 μm CMOS •Standard cell library:](https://reader035.vdocuments.us/reader035/viewer/2022081407/605004a51701a351d0766629/html5/thumbnails/38.jpg)