high performance, multi-cpu power signoff for mega designs
DESCRIPTION
High Performance, Multi-CPU Power Signoff for Mega Designs. Patrick Sproule Director of Engineering, VLSI Methodology. Nvidia Power Analysis Requirements. Static and Dynamic Full Chip Power Analysis - PowerPoint PPT PresentationTRANSCRIPT
High Performance, Multi-CPU Power Signoff for Mega Designs
Patrick SprouleDirector of Engineering, VLSI Methodology
Nvidia Power Analysis Requirements
Static and Dynamic Full Chip Power AnalysisTool implementation must handle both sub-chip analysis or full die analysis in a single sessions.Ideally provide full domain analysis for full accuracy in a single run.
Design Size ScalabilityFull flat design analysis to handle both small and largest production designs on existing/available compute resource.
Runtime PredictabilityDesigns get larger but schedule time for power analysis is required to stay constant or shrink. Required close ended runtime estimates.
Clear ReportingLarge amount of analysis data must be condensed to clear reports.
Power Analysis Challenges
Designs have seen device count grow by 4 orders of magnitude in less than 10 years.
Increased number of metal layers and modelled device count cause calculation to expand faster than tools and compute resources.
Large runtimes and/or inefficient subdivision of designs required.
Designs have also become highly replicated at a multitude of hierarchy levels.
Complexity of data handling and integration within the tools.Many engineer run analysis at different hierarchy levels.Recreation of db and duplication of analysis costs schedule.
Current Rail Analysis Methodology
• Partition-based hierarchical methodology is planned and executed within a large design team at many levels
• Unique design technologies, especially in low powerMulti-power domains, power gating switches, …
Full Chip
chiplet
partition Partition Owners
Chiplet Owners
Full Chip Integration
Typical Extraction and Rail Analysis
• Rail Analysis– Power-Grid-View (PGV): physical modeling of IP– Current Signatures– Extraction– Rail: RC, current, geometry
Physical Database
PGDB
Current Signatures
Primitive PGV
IR Drop Results/Plots
RC Extraction
Rail Analysis
Hierarchical Rail Analysis Method (H-PGV)
……Partition 1
Partition N
H-PGV 1
H-PGV N
RC Extraction
Rail AnalysisCurrent Signatures
Primitive PGV
IR Drop Results/Plots
Top-Level Database
PGDBRC
Extraction
H-PGV Advantages
H-PGV generation runtime is minimal compared to full chip database setup for IR-drop analysis
H-PGVs can be generated in parallel
Hierarchical methodology supports bottom-up and top-down rail analysis.
Capturing H-PGV boundary condition for ECO at partition level (top down push)
Full and Sub-chip level analysis time greatly improved with same accuracy
Flat vs. Hierarchical Correlation
Example Analysis: Sub-chip level14.4M total primitive instance count (modelled cells)
8.9M regular logic and memory cells5.5M filler, tap, decap cells
18 total partitions in chiplet7 unique partitions3 partitions replicated 4 time each.
H-PGV run metrics :Runtime : 18~32 minutesMemory : 40~45G
0 0.5 1 1.5 2 2.50
10
20
30
40
50
% Difference Between Flat and Hierarchical IR Drop Analysis
% o
f Ins
tanc
es
Rail Analysis at Full Chip Level
Design Metal Layers
# of Transistors (Billions) RAM (GB) CPU Rail Analysis Runtime
(Days)
GF100(flat) 9 3.0 200 1 2.25
GK104(flat) 11 3.5 600 8 10
GK110(flat) 11 7 1000+ (est.) 8 26 (est.)
GK110(hierarchical) 11 7 650 8 8
Tesla Fermi Kepler1.00E+06
1.00E+07
1.00E+08
1.00E+09
0
5
10
15
20
25
30
Runtime (d) Per Design
Design Sizes VS EPS EPS-H
Design Progression
Des
ign
Inst
ance
sNvidia Scale and Runtime Issues
Design Size Growth outpacing tool and resource capability.
Voltus on Kepler
~380M instances flat analysis – tsmc28nmMain resource:
~725Gb memory on 1Tb 32 cpu machine.
Static and Dynamic Signoff Power analysis at VDD & VSS (done as parallel runs)
21 hour runtime per analysis domain.
~8x runtime improvement over previous method with equivalent accuracy.
Rail Analysis at Full Chip Level
Design Metal Layers
# of Transistors (Billions) RAM (GB) CPU Rail Analysis Runtime
(Days)
GF100(flat) 9 3.0 200 1 2.25
GK104(flat) 11 3.5 600 8 10
GK110(flat) 11 7 1000+ (est.) 8 26 (est.)
GK110(hierarchical) 11 7 650 8 8
GK110(VOLTUS) 11 7 700 32 21 hours
Tesla Fermi Kepler Kepler-V Next1.00E+06
1.00E+07
1.00E+08
1.00E+09
0
5
10
15
20
25
30
Runtime (d) Per Design
Column1 VS EPS EPS-H VOLTUS
Design Progression
Des
ign
Inst
ance
sNvidia Scale and Runtime Issues
Memory requirement
Summary
Voltus meets our needs for Rail analysis with accuracy and runtime with far less than expected runtimes.
Further testing proved possible to run VDD-GND combined domain in a single pass in 50 hrs runtime using multi-threaded and distributed capabilities.
Capability to run both multi-threaded and distributed allows us the flexibility to manage schedule and resource requirements.
Congratulations to the Voltus team on delivering a distruptive runtime improvement.
Q&A