1 enhancing performance of iterative heuristics for vlsi netlist partitioning dr. sadiq m. sait dr....
Post on 22-Dec-2015
216 views
TRANSCRIPT
1
Enhancing Performance of Iterative Heuristics
for VLSI Netlist Partitioning
Dr. Sadiq M. Sait
Dr. Aiman El-Maleh
Mr. Raslan Al Abaji.
Computer Engineering Department
King Fahd University of Petroleum & Minerals
2
• Introduction
• Problem Formulation
• Cost Functions
• PowerFM
• Experimental Results
• Conclusion
Outline ….
3
Technology 0.1 umTransistors 200 MLogic gates 40 MSize 520 mm2
Clock 2 - 3.5 GHzChip I/O’s 4,000Wiring levels 7 - 8Voltage 0.9 - 1.2Power 160 WattsSupply current ~160 Amps
PerformancePower consumptionNoise immunityAreaCostTime-to-market
Tradeoffs!!!
The VLSI Chip in 2006
4
• Decomposition of a complex system into smaller subsystems
• Each subsystem can be designed independently speeding up the design process (divide-and conquer-approach)
• Decompose a complex IC into a number of functional blocks, each of them designed by one or a team of engineers
• Decomposition scheme has to minimize the interconnections between subsystems
Why we need Partitioning ?
5
System Level Partitioning
Board Level Partitioning
Chip Level Partitioning
System
PCBs
Chips
Sub-circuits/Blocks
Levels of Partitioning
6
Partitioning Algorithms
Group Migration Simulation Based IterativePerformance
Driven
1. Kernighan-Lin
2. Fiduccia-Mattheyeses (FM)
3. Multilevel K-way Partitioning
Others
1. Simulated annealing
2. Simulated evolution
3. Tabu Search
4. Genetic
1. Lawler et al.
2. Vaishnav
3. choi et al.
4. jun’ichiro et al.
1. Spectral
2. Multilevel Spectral
Classification of Partitioning Algorithms
7
Objective: Design a class of iterative algorithms for VLSI multi objective partitioning optimizing Power AND Delay AND Cutset
Constraint: Balanced partitions to a certain tolerance degree (10%)
Problem formulation
8
• Based on hypergraph model H = (V, E)
• c(e) = 1 if e spans more than 1 block
• Cutset = sum of hyperedge costs
cutset = 3
Cutset
9
Delay
• Gate delay: d(v)
• Constant inter-chip wire delay dc :
• Path delay between nodes vi and vj as d(pij)
• Number of nodes cut along path pij as ncut(pij)
• Objective:
)(vddc
)(
iv
ij )()()d(p ij
pVijci pncutdvdMinimize
10
The average dynamic power consumed by CMOS logic gate in a synchronous circuit is given by:
iLoadi
cycle
ddaveragei NC
T
VP
2
5.0
Ni : is the number of output gate transitions per cycle ( switching Probability)
LoadiC : is the Load Capacitance
Power
11
extrai
basici
Loadi CCC basiciC : Load Capacitances driven by a cell
before Partitioning
extraiC : additional Load due to off chip
capacitance.( cut net)
ii
extrai
basici
cycle
dd NCCT
VP
2Total Power dissipation of a Circuit:
Power
12
vi
iNMinimizeobjective
:
basici
extrai CC
extraiC : Can be assumed identical for all nets
v :Set of Visible gates Driving a load outside the partition.
Power
13
The Balance as a constraint is expressed as follows:
However balance as a constraint is not appealing because it may prohibit lots of good moves.
Objective : |Cells(block1) – Cells(block2)|
Balance
)2()1(
)2()1(
BlockCellsBlockCells
BlockCellsBlockCells
14
• A good partitioning can be described by the following fuzzy rule
IF solution has
small cutset AND
low power AND
short delay AND
good Balance.
THEN it is a good solution
Fuzzy Cost Function
15
The above rule is translated to AND-like OWA
BDPC
BDPCx
4
11
,,,min)(
Represent the total Fuzzy fitness of the solution, our aim is to Maximize this fitness.
)(x
BDPC ,,, Respectively (Cutset, Power, Delay , Balance ) Fitness.
Fuzzy cost function
16
Where Oi and Ci are lower bound and actual cost of objective “i”
i(x) is the membership of solution x in set “good ‘i’ ”
gi is the relative acceptance limit for each objective.
Membership functions
Start with a balanced partition P = {X, Y}.
Repeat
For i = 1 to n:
Choose a free cell b XY s.t. moving b to the other side gives the highest Power gain, Pgain(b), and moving b preserves balance in P.
Move and lock b.
Let gi = gain(b).
Find k s.t. G = g1 + g2 + ….. + gk is maximized and move the k cells to their complement partitions
Until G = 0
PowerFM- Algorithm
19
c
f
be
dg5
a
fb
e
dg6
ac
be
d a
cf
If G = g1 + g2 + g3 + g4 is the largest partial sum,the final partition after this pass is:
cde
afb
An Example
20
Power Gain Calculation
i
K
jj
K
jj UjSXijSiPgain
11
)(
2
3
1
4
5
0.2
0.1
0.2
7
0.3
6
0.4
0.1
Partition 1 Partition 2
7.0)4.03.0(0)7( Pgain
1.001.0)1( Pgain
Xi: is the set of cut critical nets.
Ui: is the set of uncut critical net.
23
GA from PowerFM vs Random Start
D C P D C PS298 233 19 1013 191 10 921S386 356 36 1529 345 31 1401S641 1043 45 2355 861 43 2343S832 444 45 3034 441 42 3032S953 526 96 2916 465 89 3012S1196 396 123 5443 390 86 4921S1238 475 127 5713 461 91 5702S1488 571 104 5648 541 83 5248S1494 614 102 5474 601 97 5123S2081 302 26 787 260 15 740S3330 571 299 10358 435 203 9296S5378 587 573 18437 442 423 15356S9234 1313 1090 38149 856 375 28305s13207 1399 1683 45611 951 750 39620s15850 1820 2183 51747 1350 851 43680
GA Random Start GA Start From PowerFM
24
TS from PowerFM vs Random Start
D C P D C PS298 197 24 926 189 10 849S386 386 30 1426 333 27 1264S641 889 59 2281 844 48 2476S832 446 50 2731 431 40 3135S953 466 99 2518 430 85 2999S1196 301 106 4920 335 77 4823S1238 408 79 4597 401 74 5190S1488 528 98 5529 521 94 6005S1494 585 101 5339 534 95 5058S2081 225 17 770 244 12 704S3330 533 295 10298 419 257 9288S5378 590 430 16527 432 400 15319S9234 1052 918 34055 835 705 31837s13207 843 1332 41114 823 1310 40235s15850 1411 1671 47480 1210 1332 45320
TS Random Start TS Start From PowerFM
25
Conclusion
• Proposed a modification to the FM algorithm, PowerFM, targeting low power.
• PowerFM results are comparable to SimE but with a faster runtime.
• Investigated the use of PowerFM as a starting solution to iterative algorithms, GA and TS.
• GA performed significantly better when starting from PowerFM.