high performance, multi-cpu power signoff for mega designs

High Performance, Multi-CPU Power Signoff for Mega Designs

Patrick SprouleDirector of Engineering, VLSI Methodology

Nvidia Power Analysis Requirements

Static and Dynamic Full Chip Power AnalysisTool implementation must handle both sub-chip analysis or full die analysis in a single sessions.Ideally provide full domain analysis for full accuracy in a single run.

Design Size ScalabilityFull flat design analysis to handle both small and largest production designs on existing/available compute resource.

Runtime PredictabilityDesigns get larger but schedule time for power analysis is required to stay constant or shrink. Required close ended runtime estimates.

Clear ReportingLarge amount of analysis data must be condensed to clear reports.

Power Analysis Challenges

Designs have seen device count grow by 4 orders of magnitude in less than 10 years.

Increased number of metal layers and modelled device count cause calculation to expand faster than tools and compute resources.

Large runtimes and/or inefficient subdivision of designs required.

Designs have also become highly replicated at a multitude of hierarchy levels.

Complexity of data handling and integration within the tools.Many engineer run analysis at different hierarchy levels.Recreation of db and duplication of analysis costs schedule.

Current Rail Analysis Methodology

• Partition-based hierarchical methodology is planned and executed within a large design team at many levels

• Unique design technologies, especially in low powerMulti-power domains, power gating switches, …

Full Chip

chiplet

partition Partition Owners

Chiplet Owners

Full Chip Integration

Typical Extraction and Rail Analysis

• Rail Analysis– Power-Grid-View (PGV): physical modeling of IP– Current Signatures– Extraction– Rail: RC, current, geometry

Physical Database

PGDB

Current Signatures

Primitive PGV

IR Drop Results/Plots

RC Extraction

Rail Analysis

Hierarchical Rail Analysis Method (H-PGV)

……Partition 1

Partition N

H-PGV 1

H-PGV N

RC Extraction

Rail AnalysisCurrent Signatures

Primitive PGV

IR Drop Results/Plots

Top-Level Database

PGDBRC

Extraction

H-PGV Advantages

H-PGV generation runtime is minimal compared to full chip database setup for IR-drop analysis

H-PGVs can be generated in parallel

Hierarchical methodology supports bottom-up and top-down rail analysis.

Capturing H-PGV boundary condition for ECO at partition level (top down push)

Full and Sub-chip level analysis time greatly improved with same accuracy

Flat vs. Hierarchical Correlation

Example Analysis: Sub-chip level14.4M total primitive instance count (modelled cells)

8.9M regular logic and memory cells5.5M filler, tap, decap cells

18 total partitions in chiplet7 unique partitions3 partitions replicated 4 time each.

H-PGV run metrics :Runtime : 18~32 minutesMemory : 40~45G

0 0.5 1 1.5 2 2.50

10

20

30

40

50

% Difference Between Flat and Hierarchical IR Drop Analysis

% o

f Ins

tanc

es

Rail Analysis at Full Chip Level

Design Metal Layers

# of Transistors (Billions) RAM (GB) CPU Rail Analysis Runtime

(Days)

GF100(flat) 9 3.0 200 1 2.25

GK104(flat) 11 3.5 600 8 10

GK110(flat) 11 7 1000+ (est.) 8 26 (est.)

GK110(hierarchical) 11 7 650 8 8

Tesla Fermi Kepler1.00E+06

1.00E+07

1.00E+08

1.00E+09

0

5

10

15

20

25

30

Runtime (d) Per Design

Design Sizes VS EPS EPS-H

Design Progression

Des

ign

Inst

ance

sNvidia Scale and Runtime Issues

Design Size Growth outpacing tool and resource capability.

Voltus on Kepler

~380M instances flat analysis – tsmc28nmMain resource:

~725Gb memory on 1Tb 32 cpu machine.

Static and Dynamic Signoff Power analysis at VDD & VSS (done as parallel runs)

21 hour runtime per analysis domain.

~8x runtime improvement over previous method with equivalent accuracy.

Rail Analysis at Full Chip Level

Design Metal Layers

# of Transistors (Billions) RAM (GB) CPU Rail Analysis Runtime

(Days)

GF100(flat) 9 3.0 200 1 2.25

GK104(flat) 11 3.5 600 8 10

GK110(flat) 11 7 1000+ (est.) 8 26 (est.)

GK110(hierarchical) 11 7 650 8 8

GK110(VOLTUS) 11 7 700 32 21 hours

Tesla Fermi Kepler Kepler-V Next1.00E+06

1.00E+07

1.00E+08

1.00E+09

0

5

10

15

20

25

30

Runtime (d) Per Design

Column1 VS EPS EPS-H VOLTUS

Design Progression

Des

ign

Inst

ance

sNvidia Scale and Runtime Issues

Memory requirement

Summary

Voltus meets our needs for Rail analysis with accuracy and runtime with far less than expected runtimes.

Further testing proved possible to run VDD-GND combined domain in a single pass in 50 hrs runtime using multi-threaded and distributed capabilities.

Capability to run both multi-threaded and distributed allows us the flexibility to manage schedule and resource requirements.

Congratulations to the Voltus team on delivering a distruptive runtime improvement.

high performance, multi-cpu power signoff for mega designs

Documents

chip analysis

analysis data

die analysis

domain analysis

analysis hpgvs

subchip level analysis

view pgv

subchip level14