circuit simulation via matrix exponential method speaker: shih-hung weng adviser: chung-kuan cheng...

Circuit Simulation via Matrix Exponential Method

Speaker: Shih-Hung WengAdviser: Chung-Kuan Cheng

Date: 05/31/2013

Foundation of Design Flow

PlacementLogic Synthesis Timing Analysis Routing… … … …

Circuit Simulation

lookuptable

characterization Abstraction Layer

Circuit Simulation

Emerging Demands

• Full system verification and analysis– scalability and performance

voltageon-chip power grid

low frequency

Publications (1/3)• Circuit Simulation with Matrix Exponential Method:

1. S.-H. Weng, H. Zhuang and C.K. Cheng, “Adaptive Time Stepping for Power Grid Simulation using Matrix Exponential Method”, submitted to IEEE ICCAD 2013

2. S.-H. Weng, Q. Chen and C.K. Cheng, “Circuit Simulation using Matrix Exponential Method for Stiffness Handling and Parallel Processing”, IEEE ICCAD, Nov. 2012

3. Q. Chen, W. Schoenmaker, S.-H. Weng, C.K. Cheng, G.-H. Chen, L.-J. Jiang and N. Wong, “A Fast Time-Domain EM-TCAD Coupled Simulation Framework via Matrix Exponential,” IEEE ICCAD, Nov. 2012 (Best Paper Award Candidate)

4. Y. Li, Q. Cheng, S.-H. Weng, C.K. Cheng and N. Wong, “Globally Stable, Highly Parallelizable Fast Transient Circuit Simulation via Faber Series”, IEEE NewCAS May. 2012

5. S.-H. Weng, Q. Chen and C.K. Cheng, “Time-Domain Analysis of Large-Scale Circuits by Matrix Exponential Method with Adaptive Control”, IEEE Trans. on CAD, Jul. 2012

6. Q. Chen, S.-H. Weng and C.K. Cheng, “A Practical Regularization Technique for Modified Nodal Analysis in Large-Scale Time-Domain Circuit Simulation”, IEEE Trans. on CAD, Jun. 2012

7. S.-H. Weng, Q. Chen and C.K. Cheng, “Circuit Simulation by Matrix Exponential Method,” IEEE ASIC Conference, Oct. 2011

8. S.-H. Weng, P. Du and C.K. Cheng, “A Fast and Stable Explicit Integration Method by Matrix Exponential Operator for Large Scale Circuit Simulation”, IEEE ISCAS, May. 2011

Publications (2/3)• Clock Gating Synthesis:

9. S.-H Weng, Y.-M. Kuo and S.-C. Chang, “Timing Optimization in Sequential Circuit by Exploiting Clock-Gating Logic,” ACM Trans. on DAES, April 2012.

10. Y.-M. Kuo, S.-H. Weng, and S.-C. Chang, “A Novel Sequential Circuit Optimization with Clock Gating Logic,” IEEE ICCAD, Nov. 2008

• High-speed Interconnect:11. G. Sun, S.-H. Weng, C.K, Cheng, B. Lin and L. Zeng, “An On-Chip Global Broadcast Network Design

with Equalized Transmission Lines in the 1024-Core Era”, IEEE SLIP Jun. 201212. S.-H. Weng, Y. Zhang, J. F. Buckwalter and C.K. Cheng, “Energy Efficiency Optimization through Co-

Design of the Transmitter and Receiver in High-Speed On-Chip Interconnects”, accepted by IEEE Trans. on VLSI

• Placement and Routing:13. C.K. Cheng, P. Du, A.B. Kahng and S.-H. Weng, “Low-Power Gated Bus Synthesis for 3D IC via

Rectilinear Shortest-path Steiner Graph,” IEEE ISPD, Mar., 201214. P. Du, W. Zhao, S.H. Weng, C.K. Cheng and R.L. Graham, “Character Design and Stamp Algorithms

for Character Projection Electron-Beam Lithography,” IEEE ASPDAC, Feb., 2012

Publications (3/3)• Power Grid Analysis:

15. X. Hu, P. Du, S.-H. Weng and C.K. Cheng, “Worst-Case Noise Prediction With Non-zero Current Transition Times for Power Grid Planning,” accepted by IEEE Trans. on VLSI.

16. C.-C. Chou, H.-H. Chuang, T.-L. Wu, S.-H. Weng, and C.K. Cheng, “Eye Prediction of Digital Driver with Power Distribution Network Noise,” IEEE EPEPS, Nov. 2012 (Best Student Paper Award)

17. P. Du, S.-H. Weng, X. Hu and C.K. Cheng, “Power Grid Sizing via Convex Programming,” IEEE ASIC Conference, Oct. 2011

18. P. Du, X. Hu, S.H. Weng, A. Shayan, X. Chen, A. E. Engin and C.K. Cheng, “Worst-Case Noise Prediction with Non-zero Current Transition Times for Early Power Distribution System Verification,” IEEE ISQED, Mar. 2010

19. S.-H. Weng, Y.-M. Kuo, S.-C. Chang, and M. Marek-Sadowska, “Timing Analysis Considering IR Drop Waveforms in Power Gating Designs,” IEEE ICCD, Oct. 2008

Outline

• Numerical Integration in Circuit Simulation

• Matrix Exponential Method– Krylov Subspace Approximation– Rational Krylov Subspace Approximation– Parallelism

• Experimental Results

• Conclusions

Circuit Formulation

• Formulated as a system of DAEs [Ho et. al. ‘75]

ttt uxixGxCxq LL

resistance & incidence

capacitance & inductance

branch currents & nodal voltages

derivative of charges in nonlinear devices

input sources

currents of nonlinear devices

linearized by compact model (BSIM, PSP, etc.)

Circuit Formulation

• Formulated as a system of DAEs [Ho et. al. ‘75]

• Solve x(t) in implicit or explicit numerical method

)()()( ttt uGxxC

ttt uxixGxCxq LL

after linearization

forward Euler

backward Euler

Numerical Integration (1/2)

• Forward Euler (1st order explicit)

• Backward Euler (1st order implicit)

• Stability issue for stiff circuit

unstable result

)/()/( 11

nnn hh uxCGCx

nnn hh uCxGCIx 111 )(

performance & scalability issues

)()()( ttt uGxxC

sparse matrix-vector product

solving a linear system

MethodsLinear Nonlinear

High Mild Low High Mild LowForward Euler slow fast slow fast

Backward Euler mediumslow

Trapezoidal > Backward Euler

and beyond? fast

Numerical Integration (2/2)Methods Computation Scalability Error Stability Step size

Forward Euler x=Av high O(h2) low tiny

Backward Euler Ax=b low O(h2) A-stable medium

Trapezoidal Ax=b low O(h3) A-stable > Backward Euler

and beyond? simple high O(hn) high large

stiffness

lots Ax=b

one Ax=b with fixed step size in C/h+G

Performance = # steps x computation per stepcircuit dependent

more #steps

Outline

• Conclusions

Matrix Exponential Method (1/2)

• Analytical solution of– Let A=-C-1G, b=C-1u (C can be regularized [TCAD ‘12])

• Let input be piecewise linear

dtetehth hh )()()(

)( bxx AA

ththeteteht hhh )()(

)()()()( 21 bbAIAbAΙxx AAA

)()()()(

)()()( ttt uGxxC

Matrix Exponential Method (2/2)

• One-exponential formulation [Al-Mohy&Higham ‘11]

– reduce three matrix exponential to one

)( 0)(

xIx J0

ththt bW

bbwhere

Advantages• Accuracy: Analytical solution

– Approximate eAh as (I+Ah) Forward Euler

– Approximate eAh as (I-Ah)-1 Backward Euler

• Stability: A-stable for passive circuits

reference solution

How to compute eAv?

Computation on Matrix Exponential

• 19 dubious ways[van Loan03]

Categories Based on

Series Method

Rational Approximation

Decomposition

Splitting

Quadrature Rule

Krylov Subspace

32 AAAI

},,,{ vAAvv mspan

1SBSA CBBCCBCB eee

spec(A)

regular basis and rational basis

Outline

• Conclusions

Krylov Subspace Approximation (1/2)

• Krylov subspace K(A, v) = {v, Av, A2v, …, Am-1v}– orthogonalized by Arnoldi process

– approximate eAhv by eHmh

– posteriori error estimation[Saad92]

mmm AVVH {v, Av, A2v, …, Am-1v}Arnoldi process

12eee hh mH

mA Vvv

21, eeehmmErr h

mkrylovmHHv

sparse matrix-vector multiplication

m is about 10~100

fast error estimation

scaling invariant

efficiency adaptivity

• Stiffness affects step size and dimension – Arnoldi process captures extreme and clustered eigenvalues

– Error bound [Saad92]

Krylov Subspace Approximation (2/2)

Image{h}

Real{h}

highly stiff

-max -min

Image{h}

Real{h}

captured regions

Arnoldi process with a small m

critical part for eAh

shrink h or increase m for capturing critical eigenvalues

krylov

2hAwhere

remedied by restarted scheme and scaling effect [ICCAD ‘12]

Outline

• Conclusions

• Rational basis (I-A)-1

– K((I-A)-1, v) = {v, (I-A)-1v, …, (I-A)-mv}

Rational Krylov Subspace Approximation (1/2)

…..for j = 1, 2, . . . , m solve (I- A)w = vj for i = 1, 2, . . . , j Hi,j = wTvi

w = w − Hi,jvi

end Hj+1,j = |w|2

vj+1 = w/Hj+1,j

Arnoldi process

(C+G)w=Cvj

avoid regularization of C mmm VAIVH 1

mm VH ,

mm1m AVVHI

subspace for Aone LU for linear circuit

• Rational basis (I-A)-1

– K((I-A)-1, v) = {v, (I-A)-1v, …, (I-A)-mv}

• Approximation of eAhv

• Posteriori error estimation[van den Eshof 06]

mA Vvv

mrational eh

eemmErr vAH

1mm HIH ~

adaptivity

• Spectral transformation– similar to preconditioning– relax stiffness constraint– enable large step size with less dimension

’min ’maxsmall gap-max -min-h’’max -h’’min -’’max -’’min

Image{h}

Real{h}

transforming spectrum by (I-A)-1

captured by Arnoldi processcritical part for eA

projecting back to A by 1/ (I-H-1)

applying large h to 1/ (I-H-1)

small m is acceptable

determined by

within a unit circle

small step size

fix , sweep m and h

2eeeError

mA Vvv

= 10-12

large error

fix h , sweep m and

2eeeError

mA Vvv

MethodsLinear Nonlinear

High Mild Low High Mild LowForward Euler slow fast slow fast

Backward Euler mediumslow

Trapezoidal > Backward Euler

Krylov Approx slow fast slow mediumRation Krylov fast slow

Wrap UpMethods Computation Scalability Error Stability Step size

Forward Euler x=Av high O(h2) low tiny

Backward Euler Ax=b low O(h2) A-stable medium

Trapezoidal Ax=b low O(h3) A-stable > Backward Euler

Krylov Approx x=Av high O(hn) high medium

Ration Krylov Ax=b low O(hn) high large

Outline

• Conclusions

Parallelism in Krylov Subspace

• Arnoldi process– sparse matrix-vector multiplication [Bell&Garland ‘09]

• Exponential of a small matrix [Higham ‘05]

– dense matrix by matrix operation

thread 1thread 2

thread n-1thread n

• Constant slope within a step

Input Grouping

input 1

input 2

timet1 t2 t3 t4 t5 t6 t7 t8 t10t11

t12t13t14t15

tiny steps due to maintaining constant slope

• Constant slope within a step

Input Grouping

group 1

group 2

timet1 t2 t3 t4 t5 t6 t7 t8

t1 t2 t3 t4 t5 t6 t7 t8

thread 1

thread 2

Outline

• Conclusions

Settings of Experiments

• Environment– Implemented in Matlab– Intel i7 2.67GHz with 4GB memory

• Benchmarks– Nonlinear and large-scale circuits– Power distribution networks– IBM power grid testcases[Nassif 08]

Design Category # R # C # Trans. Size StiffnessD1 16bit adder 723 34 448 579 1.1x103

D2 ALU 13.6K 4.3K 6502 10K 5.4x106

D3 IO 1.26M 34.6K 1461 630K 1.6x106

D4 Power grid 10.4M 8.6M 0 12M 2.6x105

generalized eigenvalues of (G, C)

Design Area (mm2) # R # C # L Size StiffnessP1 0.352 23K 15K 15K 45.7K 8.7x109

P2 1.402 348K 228K 228K 688K 8.3x109

P3 2.802 1.46M 0.97M 0.97M 2.90M 1.0x1010

P4 5.002 3.75M 2.47M 2.47M 7.40M 1.0x1010

RC tanks for PCB and package

Design # R # C # L # I # V Size Stiffnessibmpg2t 245K 36K 330 36K 330 164K 3.5x1012

ibmpg3t 1.60M 201K 955 201K 955 1M 3.4x1011

ibmpg4t 1.83M 265K 962 266K 962 1.2M 2.5x1011

ibmpg5t 1.55M 473K 277 473K 539K 2.1M 4.7x1011

ibmpg6t 2.41M 761K 281 761K 836K 3.2M 3.8x1011

Nonlinear and Large-scale Circuits

• Matrix exponential method (MEXP)– Krylov subspace approximation – Restarted scheme and parallel SpMV on GPU

• Trapezoidal method (TRAP)– same adaptive scheme as MEXP

Design Size time m TRAP MEXP-Krylov speedupD1 579 100ps 20 671.4s 408.7s 1.64XD2 10K 100ps 30 3,085.91s 982.14s 3.14XD3 630K 100ps 30 8,053.45s 535.92s 15.05XD4 12M 1ns 20 fails 629.56 n/a

Parallel SpMV

Power Distribution Networks

• Simulate long time span (1μs) for step response• One LU factorization

– averaged by forward/backward substitutions• MEXP with rational basis adaptively scales h/• TRAP uses predetermined step size

DesignTRAP (h = 10ps) MEXP – Rational ( = 10-10)

LU(s) Total LU(s) Total Speedup

P1 0.67 44.85m 0.68 2.86m 15.73X

P2 15.60 15.43h 15.48 54.57m 16.96X

P3 91.60 76.92h 93.28 4.30h 17.91X

P4 293.81 203.64h 298.83 11.26h 18.08X

adaptive & large step size

Power Distribution Networks

IBM Testcases

• Widely adopted benchmarks • Many input current sources• Same MEXP with rational basis and TRAP

LU(s) Total(s) LU(s) Total(s) Speedup

ibmpg2t 1.31 48.19 1.29 41.81 1.15X

ibmpg3t 18.05 493.97 18.41 413.90 1.19X

ibmpg4t 30.32 675.78 31.01 229.13 2.95X

ibmpg5t 16.16 657.13 16.48 649.97 1.01X

ibmpg6t 23.99 965.53 34.60 915.62 1.05X

ill alignment

IBM Testcases

• Applying simple grouping – each group of inputs has the same pivot points– 6X speedup on average

IBM Testcases

LU(s) Total (s) # Group LU (s) Total (s) Speedup

ibmpg2t 1.31 48.19 25 1.29 7.93 6.77X

ibmpg3t 18.05 493.97 25 18.41 86.24 6.08X

ibmpg4t 30.32 675.78 4 31.01 124.16 5.73X

ibmpg5t 16.16 657.13 25 16.48 111.97 5.44X

ibmpg6t 23.99 965.53 25 34.60 166.34 5.80X

Conclusions

• Emerging challenges in the circuit simulation – scalability and performance

• Matrix exponential method– accuracy, adaptivity and stability– regular and rational Krylov subspace approximation

• Effectiveness of matrix exponential method– Simulate a large-scale circuit with 12M nodes– Nonlinear circuits: 6.61X speedup on average– Impulse response for PDNs: 15X speedup– IBM testcases: 6X speedup using input grouping

Future Works

• Variant basis in Krylov subspace– inverted, extended basis

• Model Order Reduction and matrix exponential method– both exploiting Krylov subspace– utilizing well-developed MOR to MEXP

• Hybrid simulation via matrix exponential– handle thermal, mechanical phenomena with FEM

Thank you!

• Trade off between stability and performance

SILCA [Li & Shi, ‘03]

ACES [Devgan & Rohrer, ‘97]

Where are we?

computationaleffort

stability

low high

Backward Euler

Forward Euler

Matrix Exponential Method [Weng et. al. ’11]

Telescopic [Dong & Li, ‘10]

Waveform Relaxation [E Lelarasmee et. al, ‘82]Domain Decomposition [K. Sun et. al., ‘07]

LIM [J. E. Schutt-Aine, ‘01]

Tailor for circuit simulation:• Adaptive step control• Scaling effect• Nonlinear device• Parallelization

ETD in numerical community:• [Saad ‘92]• [Ban et. al. ‘11]• [Aluffi-Pentini et. al. ‘03]• [Hochbruck et. al. ‘97]

Trapezoidal Method(SPICE)

Adaptive Step Control

• Typical circuit behavior

larger h

smaller h

Errherr

error budget

)( 0)(

xIx J0

Adaptive Step Size Strategy

• Adjustment of step size– Krylov subspace approximation

• require only to scale Hm: αA→αHm

• re-calculate eHm

– backward Euler• (C/h+G) changes and needs to solve linear system again

• Strategy: – maximize step size with a given error budget Errtotal

– error are from Krylov space method and linearization

)(/)/( 11

nnn hh BuxCGCx

NLtotal

nonlineartotal

Ltotal

krylov T

ErrhErr

Nonlinear Formulation

• Decouple nonlinear and linear components

dttetehth hh

)()()( bxFxx AA

txiC 1

constant during Newton’s iterationcalculate Jacobian matrix

J(F) in MEXP has less non-zeros

ththetetteht

hht hhh )()(

)()()(2

)( 21 bbAIAbAΙxFxxFx AAA

approximate eAF

2MEXP: NLL GGC h/BE:

• Rational basis A-1

– K(A-1, v) = {v, A-1v, …, A-mv}– requires more m and smaller h

Only Inverted

Image{h}

Real{h}

after shifted-and-invertedonly inverted

smaller spectrum

-1/ min

Different

needs large m

Different

Spectral Transformation – h = 10p• Small RC mesh, 100 by 100• Different h for Krylov subspace• Different for rational Krylov subspace

Spectral Transformation – h = 10f• Small RC mesh, 100 by 100• Different h for Krylov subspace• Different for rational Krylov subspace

Spectral Transformation – = 10f• Small RC mesh, 100 by 100• Different h for Krylov subspace• Different for rational Krylov subspace

Spectral Transformation– = 1p• Small RC mesh, 100 by 100• Different h for Krylov subspace• Different for rational Krylov subspace

Spectral Transformation– = 100p• Small RC mesh, 100 by 100• Different h for Krylov subspace• Different for rational Krylov subspace

Sweep for Large Range

Difference Between Inverted and Rational

Fixed = 1p, sweep time step h

Fixed = 1n, sweep time step h

Fixed = 1u, sweep time step h

Fixed = 1m, sweep time step h

Fixed = 1, sweep time step h

Fixed = 1k, sweep time step h

Fixed = 1M, sweep time step h

circuit simulation via matrix exponential method speaker: shih-hung weng adviser: chung-kuan cheng...

simulation framework

simulation results

power grid simulation

long simulation time

todays design

design tools

printed circuit board

low frequency behavior

Documents

power grid sizing via convex programming peng du, shih-hung...

the expert of mobile office integration...celine shih -...

portfolio yiren weng 2013

advanced shih tzu trick guide - best shih tzu · pdf...

lack of allodynia and thalamic hyper-excitatory in the p2x 7...

juyang (john) weng

sister letty kuan

bin weng - auburn university

assoc prof hui weng tat lee kuan yew school of public policy...

by: elisa kuan

shih tzu - uncommon goods · shih tzu shih tzu black and...

weng short 200

joanne shih

portfolio of kuan

new beginnings shih tzu & shih tzu mix rescue

presenter : shih-tung huang tsung-cheng lin kuan-fu kuo...

yiren weng coop portfolio

of the bodhisattva kuan shih yin - karida sanghathe...

kuan yin cultivation booklet - yellowdragon...

weng ho function & form