task 1091.001: highly scalable placement by multilevel optimization
Post on 01-Feb-2016
45 Views
Preview:
DESCRIPTION
TRANSCRIPT
Task 1091.001: Highly Scalable Placement Task 1091.001: Highly Scalable Placement by Multilevel Optimizationby Multilevel Optimization
•Task Leaders: Jason Cong (UCLA CS) and Tony Chan (UCLA Math)
•Students with Graduation Dates:•Michalis Romesis (UCLA CS, March 2005 ---graduated)•Kenton Sze (UCLA Math, July 2006 --- graduated)•Min Xie (UCLA CS, September 2006 --- graduated)•Guojie Luo (UCLA CS, September 2010)
•Research Staff: Joe Shinnerl, UCLA CS
23/4/22 UCLA VLSICAD LAB 2
Industrial LiaisonsIndustrial Liaisons
Patrick McGuinness, Freescale Semiconductor, Inc.Patrick McGuinness, Freescale Semiconductor, Inc.
Natesan Venkateswaran, IBM CorporationNatesan Venkateswaran, IBM Corporation
Amit Chowdhary, Intel CorporationAmit Chowdhary, Intel Corporation
23/4/22 UCLA VLSICAD LAB 3
Task Description and Anticipated ResultTask Description and Anticipated Result
Highly scalable multilevel, multiheuristic placement algorithms Highly scalable multilevel, multiheuristic placement algorithms that address the critical placement needs of nanometer designs: that address the critical placement needs of nanometer designs: scalabilityscalability multi-constraint optimization multi-constraint optimization
--- timing, routability, power, manufacturability, etc.--- timing, routability, power, manufacturability, etc. support of mixed-sized placement and incremental design.support of mixed-sized placement and incremental design.
Quantitative study of the optimality and scalability of placement Quantitative study of the optimality and scalability of placement algorithmsalgorithms Construction of synthetic benchmarks with known optima to identify the Construction of synthetic benchmarks with known optima to identify the
deficiencies of existing methodsdeficiencies of existing methods
Our goal is to achieve one-process-generation benefit through Our goal is to achieve one-process-generation benefit through innovation of physical-design technologies, especially placement.innovation of physical-design technologies, especially placement.
23/4/22 UCLA VLSICAD LAB 4
Task DeliverablesTask Deliverables Report on new placement benchmarks with known optimal or near optimal Report on new placement benchmarks with known optimal or near optimal
solutions for all major objectives and constraints. Scalability and optimization solutions for all major objectives and constraints. Scalability and optimization studies on existing placement techniques (studies on existing placement techniques (Completed 3-Nov-2003Completed 3-Nov-2003))
Experiments and reports on the applicability of integrated AMG-based weighted Experiments and reports on the applicability of integrated AMG-based weighted aggregation and weighted interpolation. Improvement measured on both PEKO aggregation and weighted interpolation. Improvement measured on both PEKO examples and industrial examples from SRC member companies (examples and industrial examples from SRC member companies (Completed 1-Completed 1-Jun-2004Jun-2004))
Experiments and reports on multiheuristic, multilevel relaxation and the scalable Experiments and reports on multiheuristic, multilevel relaxation and the scalable incorporation of complex constraints into the enhanced multilevel framework. incorporation of complex constraints into the enhanced multilevel framework. Improvement measured on both PEKO and industrial examples (Improvement measured on both PEKO and industrial examples (Completed 1-Completed 1-Jun-2005)Jun-2005)
A highly scalable placement tool that (i) supports multi-constraint optimization, A highly scalable placement tool that (i) supports multi-constraint optimization, mixed-sized placement, and incremental design and (ii) produces best-of-class mixed-sized placement, and incremental design and (ii) produces best-of-class results for both PEKO and industrial examples from SRC member companies results for both PEKO and industrial examples from SRC member companies ((Completed 1-Jun-2006Completed 1-Jun-2006))
Final report summarizing research accomplishments and future direction Final report summarizing research accomplishments and future direction (Planned-Oct-31, 2006)(Planned-Oct-31, 2006)
23/4/22 UCLA VLSICAD LAB 5
Accomplishments in the Past YearAccomplishments in the Past Year
1.1. Improvements in mPL for routing density control Improvements in mPL for routing density control [Best quality, ISPD 2006 contest][Best quality, ISPD 2006 contest]
2.2. Thermal-Driven PlacementThermal-Driven Placement
3.3. Heterogeneous PlacementHeterogeneous Placement
23/4/22 UCLA VLSICAD LAB 6
Relative Wirelength
year2000 2001 2002 2003 2004
UNIFORM CELL SIZE
NON-UNIFORM CELL SIZE
A Brief History of mPLA Brief History of mPL
2005 2006
mPL 5.0• Multilevel force directed• Mixed-size capability
mPL 6.0• EnhancedRoutability handling
mPL 1.0 [ICCAD00]• ESC Clustering• Goto relaxation
mPL 1.1• FC clustering• Partitioning addedto legalization mPL 2.0
• RDFL relaxation• Primal-dual netlist pruning
mPL 3.0 [ICCAD03]• QRS relaxation• AMG interpolation• Multiple V cycles
mPL 4.0• Improved DP• BacktrackingV cycle
23/4/22 UCLA VLSICAD LAB 7
mPL: Generalized Force-Directed PlacementmPL: Generalized Force-Directed Placement Use of accurate objective functions [Bertsekas, 82, Naylor et al, 01]Use of accurate objective functions [Bertsekas, 82, Naylor et al, 01]
Optimization-based bin-density constraint formulationOptimization-based bin-density constraint formulation
Iterative Uzawa solverIterative Uzawa solver
Multilevel for better runtime and wirelengthMultilevel for better runtime and wirelength
)()(1 kkk
W xλx
)()()(1
ijijkijkλλ
,)( .. xts)( min xW
.),()( where11cxdx
is a generalized force
23/4/22 UCLA VLSICAD LAB 8
Accomplishments in the Past YearAccomplishments in the Past Year
1.1. Improvements in mPL for routing density Improvements in mPL for routing density control [Best quality, ISPD 2006 contest]control [Best quality, ISPD 2006 contest]
2.2. Thermal-Driven PlacementThermal-Driven Placement
3.3. Heterogeneous PlacementHeterogeneous Placement
23/4/22 UCLA VLSICAD LAB 9
Core Engine for Density ControlCore Engine for Density Control Overall schemeOverall scheme
One V cycle with comparable qualityOne V cycle with comparable quality
Minimum perturbation in the last stages of GFDMinimum perturbation in the last stages of GFD
Significant speed up without losing solution Significant speed up without losing solution qualityquality
Routing density handlingRouting density handling Residual density in each binResidual density in each bin
Even distribution of dummy density into binsEven distribution of dummy density into bins
Cell area inflation for better convergenceCell area inflation for better convergence
Initial Finest Problem
Final Placement
coarsening
coarsening
coarsening interpolation
interpolation
interpolation
Coarsest Problem
GFD with Density Control
Minimun perturbation
23/4/22 UCLA VLSICAD LAB 10
Macro SpreadingMacro SpreadingNeed area density below target value Need area density below target value
[Nam, ISPD06][Nam, ISPD06]
Target distance between neighboring Target distance between neighboring
macrosmacros
: target density: target density
Spreading represented as objectiveSpreading represented as objective
W
Hw
w1
w2
2121 ww
H
AAw
A1
A2
n
i
n
ijiijij
n
iii fyfxdydx
1 ,11
min
fij
xHij
dxdxii and and dydyii : perturbation : perturbation
fxfxijij and and fyfyijij : piece-wise linear function : piece-wise linear function
23/4/22 UCLA VLSICAD LAB 11
Experiment Results on ISPD06Experiment Results on ISPD06
mPL6 produces the best solution quality using ISPD06 routability-driven metric
Demonstration of mPL6Demonstration of mPL6
http://cadlab.cs.ucla.edu/cpmo/videos/mPL6-density.wmv
23/4/22 UCLA VLSICAD LAB 12
23/4/22 UCLA VLSICAD LAB 13
Accomplishments in the Past YearAccomplishments in the Past Year
1.1. Improvements in mPL core engine for mixed-size Improvements in mPL core engine for mixed-size global placementglobal placement
2.2. Thermal-Driven PlacementThermal-Driven Placement
3.3. Heterogeneous PlacementHeterogeneous Placement
23/4/22 UCLA VLSICAD LAB 14
MotivationMotivation High power density due to technology scalingHigh power density due to technology scaling
Problems caused by high temperatureProblems caused by high temperature Hot spots become Hot spots become moremore harmful harmful
• Higher temperature Higher temperature Higher leakage power Higher leakage power More heat More heat Previously negligible effects become first-order effectsPreviously negligible effects become first-order effects
• Difficult estimation for power, timing, etcDifficult estimation for power, timing, etc
23/4/22 UCLA VLSICAD LAB 15
Thermal ModelThermal Model One layer mesh to model the One layer mesh to model the
substratesubstrate
ΣΣjj (T (Tii - T - T
jj) C) Cxyxy + (T + (T
ii – T – Tsinksink) C) C
zz = P = Pii
• CCxyxy, C, Czz are the thermal conductance for are the thermal conductance for
the substrate and the heat sinkthe substrate and the heat sink
Solved by Fast DCTSolved by Fast DCT• Solve T from CT = P, given C and PSolve T from CT = P, given C and P• Diagonalize C = Diagonalize C = ΓΓTTΛΓΛΓ
ΓΓ is the discrete cosine matrix is the discrete cosine matrix ΛΛ is a diagonal matrix is a diagonal matrix
• T = T = ΓΓ-1-1ΛΛ-1-1ΓΓ P P
Ti
Tj,1
Tj,2
Tj,3
Tj,4
Tsink
P
Cxy
Cz
23/4/22 UCLA VLSICAD LAB 16
Formulation & SolutionFormulation & Solution
Implement Implement ii(x) and t(x) and tii(x) with filler cells and “filler power” without area(x) with filler cells and “filler power” without area
TTdesdes is a given by user is a given by user
Solved by Uzawa AlgorithmSolved by Uzawa Algorithm
As additional thermal-aware GFD following a WL-driven V-CycleAs additional thermal-aware GFD following a WL-driven V-Cycle
re)(Temperatu)()(
p)(Nonoverla)()(subject to
)(minimize
desii
ii
TxtxT
xx
xWL
))((
))((
0)()()(
)1()()1(
)1()()1(
)1()()1()()1(
desk
iki
ki
ki
ki
ki
i
ki
ki
i
ki
ki
k
TxT
x
xTxxWL
23/4/22 UCLA VLSICAD LAB 17
Experiment Results on IBM-FastPlaceExperiment Results on IBM-FastPlace Quality improvementQuality improvement
TTeveneven is the ideal is the ideal
temperature with the same temperature with the same total powertotal power
Max. on-chip Max. on-chip temperature:temperature:• TTinitinit after Step 1 after Step 1
• TTfinalfinal = T = Tdes des after Stepafter Step
More than 90% quality More than 90% quality
improvement within 5% improvement within 5%
WL increaseWL increase
T WL T WL Qual impr WL incr
ibm01 60 68.65 1.62E+006 60.34 1.71E+006 96.1% 1.06
ibm02 60 67.78 3.62E+006 60.30 3.82E+006 96.1% 1.05
ibm03 60 68.99 4.80E+006 60.38 5.08E+006 95.7% 1.06
ibm04 60 77.00 5.94E+006 61.21 6.45E+006 92.9% 1.08
ibm05 60 66.89 9.41E+006 60.38 9.99E+006 94.4% 1.06
ibm06 60 68.37 4.90E+006 60.42 5.01E+006 95.0% 1.02
ibm07 60 69.93 8.22E+006 60.62 8.72E+006 93.8% 1.06
ibm08 60 70.42 9.38E+006 61.80 9.59E+006 82.7% 1.02
ibm09 60 70.14 9.44E+006 60.47 9.88E+006 95.3% 1.05
ibm10 60 71.07 1.79E+007 60.89 1.85E+007 91.9% 1.03
ibm11 60 67.90 1.42E+007 60.69 1.50E+007 91.3% 1.05
ibm12 60 72.20 2.25E+007 61.74 2.38E+007 85.7% 1.06
ibm13 60 66.37 1.72E+007 60.87 1.78E+007 86.3% 1.03
ibm14 60 69.56 3.31E+007 61.18 3.47E+007 87.6% 1.05
ibm15 60 66.76 3.95E+007 60.73 4.02E+007 89.1% 1.02
ibm16 60 71.87 4.48E+007 61.03 4.58E+007 91.4% 1.02
ibm17 60 76.38 6.16E+007 61.33 6.42E+007 91.9% 1.04
ibm18 60 73.69 4.29E+007 61.19 4.51E+007 91.3% 1.05
Average 70.22 60.87 91.6% 1.05
Initial Finalcircuit T_even
)()( eveninitfinalinit TTTT
23/4/22 UCLA VLSICAD LAB 18
Accomplishments in the Past YearAccomplishments in the Past Year
1.1. Improvements in mPL for routing density control [1Improvements in mPL for routing density control [1stst quality, ISPD 2006 contest]quality, ISPD 2006 contest]
2.2. Thermal-Driven PlacementThermal-Driven Placement
3.3. Heterogeneous PlacementHeterogeneous Placement
23/4/22 UCLA VLSICAD LAB 19
MotivationMotivationNeed for placement on array Need for placement on array
type chips with pre-fabricated type chips with pre-fabricated
resourcesresources
FPGAFPGA
Structured ASICStructured ASIC
Need for heterogeneous Need for heterogeneous
capabilitycapability
Memory, DSP, etcMemory, DSP, etc
Block on sites of the same typeBlock on sites of the same type
23/4/22 UCLA VLSICAD LAB 20
Related WorkRelated Work
AcademiaAcademia VPR [Betz & Rose 97], PATH [Kong 02], SPCD [Chen & Cong VPR [Betz & Rose 97], PATH [Kong 02], SPCD [Chen & Cong
04,05], PPFF [Maidee et al, 03], CAPRI [04,05], PPFF [Maidee et al, 03], CAPRI [Gopalakrishnan et al, 06]]
Most comparisons to out-dated tools Most comparisons to out-dated tools
No heterogeneous capabilityNo heterogeneous capability
IndustryIndustry Quartus II [Altera Corp.], ISE [Xilinx Inc.]Quartus II [Altera Corp.], ISE [Xilinx Inc.]
Proprietary chips onlyProprietary chips only
Techniques not publicly documentedTechniques not publicly documented
23/4/22 UCLA VLSICAD LAB 21
Heterogeneous Placement by mPL-HHeterogeneous Placement by mPL-H First analytical placer for First analytical placer for
heterogeneous placementheterogeneous placement
Framework based on mPL6 Framework based on mPL6
[Chan et al, 05][Chan et al, 05]
Multiple layered placementMultiple layered placement One logical layer for each resourceOne logical layer for each resource
Forbidden regions blocked by Forbidden regions blocked by obstaclesobstacles
Uniform wirelength computationUniform wirelength computation
Filler cells on each layer Filler cells on each layer
DSP
M-RAM
LAB
Demonstration of mPL-HDemonstration of mPL-H
http://cadlab.cs.ucla.edu/cpmo/videos/mPL-H.wmv
23/4/22 UCLA VLSICAD LAB 22
23/4/22 UCLA VLSICAD LAB 23
Experiment SettingExperiment Setting
Quartus_map
Verilog netlist
Quartus_fitter mPL-HClustered .vqm netlist
Quartus_router
Chip type
Stratix
Description
.xml
.qsf placement
23/4/22 UCLA VLSICAD LAB 24
Wirelength ComparisonWirelength ComparisonPWL RWL PWL ratio RWL ratio
fip_risc8 219 6322 15828 5872 0.93 15304 0.97mux64_16bit 150 4560 9464 4582 1.00 9728 1.03mux8_128bit 141 3608 7328 3541 0.98 6556 0.89
oc_cordic_p2r 111 2982 6852 2786 0.93 6264 0.91oc_cordic_r2p 157 4239 8848 3889 0.92 8260 0.93oc_aes_core 181 16362 27100 15537 0.95 25944 0.96
oc_aes_core_inv 183 18639 30288 17962 0.96 30680 1.01oc_aquarius 646 32684 78616 31703 0.97 78280 1.00
oc_cfft_1024x12 191 5915 12256 5757 0.97 11988 0.98oc_des_des3area 120 8273 15588 8139 0.98 15472 0.99oc_des_des3perf 1569 62982 116312 59611 0.95 118028 1.01oc_des_perf_opt 550 17443 31548 17096 0.98 32476 1.03
oc_fpu 793 26570 64324 24583 0.93 63628 0.99oc_mem_ctrl 387 16502 38428 16808 1.02 39132 1.02
oc_mips 387 18550 43916 18639 1.00 43260 0.99oc_oc8051 331 13640 31172 13332 0.98 30520 0.98
oc_video_compression_systems_dct 4440 165835 423292 172902 1.04 436708 1.03oc_video_compression_systems_jpeg 3924 139076 370064 142234 1.02 368528 1.00
oc_wb_dma 396 25614 57128 24999 0.98 57704 1.01os_blowfish 168 27023 43832 22569 0.84 39804 0.91oc_ethernet 265 12013 24812 11935 0.99 25128 1.01
Avg. 1.00 1.00 0.97 0.98
Quartus 5.0 mPL-Hcircuit #LAB
WL still important for architecture evaluation
mPL-H is 3% better in HPWL, and 2% better in routed WL than Quartus II v5.0
23/4/22 UCLA VLSICAD LAB 25
Runtime ComparisonRuntime Comparison
0200400600800
10001200140016001800
111
141
157
181
191
265
387
396
646
1569
4440
#LAB
run
tim
e(s)
Quartus II 5.0 mPL-H
mPL-H can be 2X faster than Quartus II v5.0 when the circuit becomes
sufficiently large
23/4/22 UCLA VLSICAD LAB 26
Overall Accomplishments Over the Funding Period Overall Accomplishments Over the Funding Period
WL runtime(s) WL runtime(s)ibm01 1.88E+06 211 1.65E+06 63ibm02 3.77E+06 456 3.62E+06 128ibm03 5.45E+06 501 4.58E+06 110ibm04 6.34E+06 565 5.71E+06 147ibm05 1.06E+07 876 9.96E+06 158ibm06 5.88E+07 756 5.12E+07 197ibm07 9.62E+06 1000 8.18E+06 260ibm08 9.77E+06 1578 9.28E+06 379ibm09 1.10E+07 1243 9.24E+06 373ibm10 1.98E+07 2017 1.74E+07 444ibm11 1.67E+07 1568 1.40E+07 439ibm12 2.51E+07 2124 2.23E+07 510ibm13 2.04E+07 2281 1.65E+07 570ibm14 3.74E+07 3776 3.14E+07 1095ibm15 4.57E+07 5289 3.88E+07 1375ibm16 5.10E+07 5891 4.34E+07 1532ibm17 7.07E+07 7022 6.15E+07 1685ibm18 4.86E+07 7127 4.13E+07 1853Avg. 1.00 1.00 0.87 0.26
circuitmPL3 mPL5
WL runtime(s) WL runtime(s)adaptec1 8.72E+07 5296 7.79E+07 2894adaptec2 1.05E+08 5691 9.20E+07 2995adaptec3 2.65E+08 7015 2.14E+08 9353adaptec4 2.29E+08 6433 1.94E+08 8812bigblue1 1.12E+08 3454 9.68E+07 3636bigblue2 2.01E+08 7145 1.52E+08 10207bigblue3 4.33E+08 32825 3.44E+08 13564bigblue4 9.53E+08 36343 8.29E+08 30540
Avg. 1.20 1.31 1.00 1.00
circuitmPL5 mPL6
34% reduction in WL over 3 years
One technology generation advancement
23/4/22 UCLA VLSICAD LAB 27
Technology Transfer in 2006Technology Transfer in 2006
Discussions at conferences and workshopsDiscussions at conferences and workshops ASPDAC 2006, Yokohama, JapanASPDAC 2006, Yokohama, Japan
ISPD 2006, San Jose, USAISPD 2006, San Jose, USA
DAC 2006, San Francisco, USADAC 2006, San Francisco, USA
Benchmark Releases (PEKO-MS) Benchmark Releases (PEKO-MS)
http://cadlab.cs.ucla.edu/~pubbench
mPL release: mPL release: http://cadlab.cs.ucla.edu/src_686_mpl/
23/4/22 UCLA VLSICAD LAB 28
Software Download RecordSoftware Download Record PEKO/PEKU [2002 – now]PEKO/PEKU [2002 – now]
More than 360 downloads…More than 360 downloads…• SRC member companies SRC member companies
Cadence, IBM, Intel, Mentor Graphics,…etc.Cadence, IBM, Intel, Mentor Graphics,…etc.• NON-SRC member companiesNON-SRC member companies
Synopsys, Magma, Monterey Design, etc.Synopsys, Magma, Monterey Design, etc.• Universities Universities
CMU, Michigan, MIT, UC Berkeley, UCSD, …etc., CMU, Michigan, MIT, UC Berkeley, UCSD, …etc.,
mPL [2001 – now]mPL [2001 – now] More than 480 downloads…More than 480 downloads…
• SRC member companies SRC member companies Cadence, Intel, Mentor Graphics,…etc.Cadence, Intel, Mentor Graphics,…etc.
• NON-SRC member companiesNON-SRC member companies Synopsys, Magma, Intrinsity, Oasys, etc.Synopsys, Magma, Intrinsity, Oasys, etc.
• Universities Universities CMU, Michigan, Stanford, UCSD, Nat’l Taiwan U., …etc., CMU, Michigan, Stanford, UCSD, Nat’l Taiwan U., …etc.,
23/4/22 UCLA VLSICAD LAB 29
Publications in 2006Publications in 2006
Conference papersConference papers
ASPDAC 2006ASPDAC 2006: : J. Cong, M. XieJ. Cong, M. Xie, “, “A Robust Detailed A Robust Detailed Placement for Mixed-size IC DesignsPlacement for Mixed-size IC Designs.”.”
ISPD 2006:ISPD 2006: T. F. Chan, J. Cong, J. Shinnerl, K. Sze and M. T. F. Chan, J. Cong, J. Shinnerl, K. Sze and M. Xie, “Xie, “mPL6: Enhanced Multilevel Mixed-size PlacementmPL6: Enhanced Multilevel Mixed-size Placement.”.”
ThesisThesis
Kenton SzeKenton Sze, “, “Multilevel Optimization for VLSI Circuit Multilevel Optimization for VLSI Circuit Placement.Placement.””
Min Xie,Min Xie, ““Constraint-Driven Large Scale Circuit Placement Constraint-Driven Large Scale Circuit Placement AlgorithmsAlgorithms.”.”
23/4/22 UCLA VLSICAD LAB 30
Room for Further Improvement?Room for Further Improvement?
“Swirls” are difficult to correct with localized refinement
mPL4 mPL5
top related