floorplanning - samex entaboutme.samexent.com/classes/fall08/ee5301/slides/ee5301-floorpla… · 2...
TRANSCRIPT
1
Floorplanning
Try the online demos athttp://foghorn.cadlab.lafayette.edu/cadapplets/
An Example Floorplan
• Alpha 21364
Floorplanning
• Problem Given circuit modules (or cells)
and their connections, determine the approximate location of circuit elements
Consistent with a hierarchical / building block design g gmethodology
Modules (result of partitioning):o Fixed area, generally rectangularo Fixed aspect ratio hard macro (aka
fixed-shaped blocks)fixed / floating terminals (pins)Rotation might be allowed / denied
o Flexible shape soft macro (aka soft modules)
(w1,h1)
(wN,hN)
[Bazargan]
2
Floorplanning (cont.)• Objectives: Minimize area Determine best shape of soft modules Minimize total wire length
o to make subsequent routing phase easy(short wire length roughly translates into routability)
Additional cost components: Additional cost components:o Wire congestion (exact routability measure)o Wire delayso Power consumptiono System throughput (e.g., CPI of a processor)
• Possible additional constraints: Fixed location for some modules Fixed die, or range of die aspect ratio
[Bazargan]
Floorplanning: Why Important?
• Early stage of physical design Determines the location of large blocks detailed placement easier (divide and conquer!)
Estimates of area, delay, power important design decisions
Impact on subsequent design steps (e.g., routing, heat dissipation analysis and optimization)dissipation analysis and optimization)
B
C G
E L F
K I
J AHD
J
K
L G
I E
H
C
B
AF
D
Figs: [©Sherwani]
[Bazargan]
Floorplan Classes
• Slicing, recursively defined as: A module OR A floorplan that can be
partitioned into two slicing floorplans with a horizontal or vertical cut line
1234567
1 34
5
2
67
Slicingfloorplan
Corresp.Slicingtree
167 2345
234 5167
43
6 27 34
[©Sarrafzadeh]
Non-Slicingfloorplan
• Non-slicing Superset of slicing floorplans Contains the “wheel” shape too.
[Bazargan]
3
• Hierarchical floorplan of order 5 Templates
Non-slicing Floorplan Example
L5 R5
Floorplan and tree example
[©Sarrafzadeh]
3
8 57
346
87
12
5
R5
1 6 2
4[Bazargan]
Floorplanning Algorithms• Components “Placeholder” representation
o Usually in the form of a treeo Slicing class: Polish expression [Otten]o Non-slicing class: O-tree, Sequence Pair, BSG, etc.o Just defines the relative position of modules
Perturbationo Going from one floorplan to anothero Usually done using Simulated Annealing
Floorplan sizingo Definition: Given a floorplan tree, choose the best shape for
each module to minimize areao Slicing: polynomial, bottom-up algorithmo Non-slicing: NP! Use mathematical programming (exact solution)
Cost functiono Area, wire-length, ...
[Bazargan]
Bounds on Aspect Ratios
• We can also allow several shapes for each block:
• For hard blocks, the orientations can be changed:
[Pan]
4
Area Utilization, Hard and Soft Modules
• The hierarchy tree and floorplandefine “place holders” for modules
• Area utilization Depends on how nicely
the rigid modules’ shapes are matched Soft modules can take different shapes to Soft modules can take different shapes to
“fill in” empty slots floorplan sizing
1
7 6
23
45
m1
m7
m6 m5
m2 m
4m
3 m1
m7
m6 m5
m2 m
4m
3
m7
m7
m1
m7
Area = 20x22 = 440 Area = 20x19 = 380[Bazargan]
Bounds on Aspect Ratios
If there is no bound on the aspect ratios, can we pack everything tightly?
- Sure!
But we don’t want to layout blocks as long strips, so we require ri hi/wi si for each i.
[Pan]
Floorplan Sizing for Slicing Floorplans• Bottom-up process• Has to be done per floorplan perturbation• Requires O(n) time. n is the total number of shapes of all the modules
V H
L R
[©Sarrafzadeh]
T B
biai
yjxj
bi+yj
max(ai, xj)
biai
max(bi, yj)
ai+ xj
yjxj
[Bazargan]
5
17x16
Sizing Slicing Floorplans
1234567
167 2345
• Simple case: All modules are hard macros No rotation allowed one shape only
1 34
5
2
67
m1
9x15
m7
m6
9x7
m5
8x16
8x11m2 m4
m3
4x11
167 2345
234 5167
43
6 27 344x7 5x4
8x8
4x8
3x6 4x5
7x5
[Bazargan]
Sizing Slicing Floorplans (cont.)
• What if modules have more than one shape?• If area only concern: Module A has shapes 4x6, 7x8, 5x6, 6x4, 7x4,
which ones should we pick? Module A has shapes 4x6, 5x5, 6x4,
which ones should we pick?
A B
which ones should we pick?
• Dominant points Shape (x1, y1) dominates (x2, y2)
if x1 x2 and y1 y2.
p q
r dominates p dominates q
dominates r
[Bazargan]
Sizing Slicing Floorplans: Example
A B a1 a2 a3
4x6 5x5 6x4
b1 b1a1 b1a2 b1a3
b2
b3
3x4
2x7
4x2
6x7 7x7 8x7
7x6 8x5 9x4b2a1 b2a2 b2a3
8x6 9x5 10x4b3
a1b3a2 b3a3
[Bazargan]
6
Slicing Floorplan Sizing AlgorithmProcedure Vertical_Node_Sizing
Input: Two sorted lists L = { (a1, b1), ... , (as,bs) },R = { (x1, y1), ... , (xt, yt) } where ai < aj, bi > bj, for all i < j; xi < xj, yi > yj for all i < j
Output: A sorted list H = { (c1, d1), ... , (cu,du) }where u s + t - 1, ci < cj, di > dj for all i < j
beginH :=
i := 1, j := 1, k = 1while (i s) and (j t) dobegin
(ck, dk) := (ai + xj, max(bi, yj)) H := H { (ck, dk) }k := k + 1 if max(bi, yj) = bi then i := i + 1 if max(bi, yj) = yj then j := j + 1
endend [©Sarrafzadeh]
[Bazargan]
Slicing Floorplan Sizing
• Input: floorplan tree, modules shapes• Start with sorted shapes lists of modules• In a bottom-up fashion, perform: Vertical_Node_Sizing
ANDHorizontal Node SizingHorizontal_Node_Sizing
• When get to the root node, we have a list of shapes. Select the one that is best in terms of area
• In a top-down fashion, traverse the floorplan tree and set module locations
[Bazargan]
Find the Best Area
• Recursively combining shape curves.
V
Pick thebest
21
23
131
H
[Pan]
7
Wire Length• For hyperedges: Either of complete graph, MST, or Steiner tree
• For each edge: Euclidian distance sqrt( (x1-x2)2 + (y1-y2)2 ).
o Direct lines Manhattan distance |x1 – x2| + |y1 – y2|
o Manhattan: Only horizontal / vertical lines
(c) complete graph (length = 32)
(b) minimum spanning tree(length = 11)
(a) Steiner tree (length = 13)
[©Sherwani][Bazargan]
Polish Expression
• Tree representation of the floorplan Left child of a V-cut in the
tree representsthe left slice in the floorplan
Left child of an H-cut in the
1 34
5
2
67
tree representsthe top slice in the floorplan
• Polish expression representation A string of symbols obtained
by traversing a binary tree in post-order.
1 7 6 | - 2 3 4 - | 5 - |
1 5
43
67 2
[Bazargan]
Normalized Polish Expression
• Problem with Polish expressions? Multiple representations for some slicing trees
o When more than one cut in one direction cut a floorplan Larger solution space A stochastic algorithm (e.g., Simulated Annealing) will be
more biased towards floorplans with multiple representationsrepresentations
o (More likely to be visited)
12
3 4
1 2 - 3 4 | | 1 2 - 3 | 4 |[©Sarrafzadeh]
1 2 3 4
4
3
21
[Bazargan]
8
Normalized Polish Expression (cont.)
• Solution? Assign priorities to the cuts In a top-down tree construction,
o Pick the right-most cuto Pick the lowest cut
Result: no two same operators 4padjacent in the Polish expression(i.e., no “| |” or “— —”)
12 3 45 1 2 – 5 - 3 | 4 |
4
3
5
21
[Bazargan]
Simulated Annealing
• Idea originated from observations of crystal formations (e.g., in lava) A crystal is in a low energy state Materials tend to form crystals (global minimum) If at the right temperature (i.e., right speed), a
molecule will adhere to a crystal formationV l l d t t• Very slowly decrease temperature When very hot, molecules move freely
o When a molecule gets to a chunk of crystal,it *might* move away due to its high speed
When colder, molecules slow downo The probability of moving away from a local optimum
decreases When the material “freezes”, all molecules are fixed
and the material is in minimum energy state[Bazargan]
Simulated Annealing Algorithm
• Components: Solution space (e.g., slicing floorplans) Cost function (e.g., the area of a floorplan)
o Determines how “good” a particular solution is
Perturbation rules(e.g., transforming a floorplan to a new one)( g , g p )
Simulated annealing engineo A variable T, analogous to temperatureo An initial temperature T0 (e.g., T0 = 40,000)o A freezing temperature Tfreez (e.g., Tfreez=0.1)o A cooling schedule (e.g., T = 0.95 * T)
[Bazargan]
9
Simulated Annealing AlgorithmProcedure SimulatedAnnealing
curSolution = random initial solutionT = T0 // initial temperaturewhile (T > Tfreez) do
for i=1 to NUM_MOVES_PER_TEMP_STEP donextSol = perturb (curSolution)cost = cost(nextSol) – cost(curSolution)if acceptMove (cost, T) thenif acceptMove (cost, T) then
curSolution = nextSol // accept the moveT = coolDown (T )
Procedure acceptMove (cost, T)if cost < 0 then return TRUE // always accept a good moveelse
boltz = e-cost / k T // Boltzmann probability functionr = random(0,1) // uniform rand # between 0&1if r < boltz then return TRUEelse return FALSE
[Bazargan]
Simulated Annealing: Move Acceptance
• Good moves are always accepted• Accepting bad moves: When T = T0, bad move acceptance probability 1 When T = Tfreez, Bad move acceptance probability = 0
• Boltzmann probability function?!?cost / k T boltz = e-cost / k T.
k is the Boltzmann constant, chosen so that all moves at the initial temperatureare accepted
[Bazargan]
Simulated Annealing: More Insight...
0
10000
20000
30000
40000
1 51 101 151 201 251 301 351 401
Tem
per
atu
re
Annealing steps
1 51 101 151 201 251 301 351 401
0
0.2
0.4
0.6
0.8
1
1 51 101 151 201 251 301 351 401
Bo
ltzm
ann
Exp
[Bazargan]
10
Simulated Annealing: More Insight...
0
50
100
150
200
250
1 51 101 151 201 251 301 351 401
Nu
m M
ove
s A
cc
0
200
400
600
800
1 51 101 151 201 251 301 351 401
Co
st F
un
ctio
n
1 51 101 151 201 251 301 351 401
[Bazargan]
Wong-Liu Floorplanning Algorithm
• Uses simulated annealing• Normalized Polish expressions represent
floorplans• Cost function: cost = area + totalWireLength
Floorplan sizing is used to determine area After floorplan sizing, the exact location of each
module is known, hence wire-length can be calculated
[Bazargan]
Wong-Liu Floorplanning Algorithm (cont.)
• Moves: OP1: Exchange two operands that have
no other operands in between OP2: Complement a series of operators
between two operands OP3: Exchange adjacent operand and operator if the
resulting expression still a normalized Polish expresulting expression still a normalized Polish exp.
2 4
1 3
3
2 4
1
3
1 2 44
1 2
3
[©Sarrafzadeh]
12 | 4 – 3 | 12 | 3 – 4 | 12 - 3 – 4 | 12 - 3 4 - |
OP1OP1OP1OP1 OP2OP2OP2OP2 OP3OP3OP3OP3
[Bazargan]
11
The Sequence Pair Algorithm
• Sequence-Pair is a succinct representation of non-slicingfloorplans of rectangles Just like Polish Expression for slicing floorplans
• Represent a non-slicing floorplan by a pair of sequences of blocksU i Si l t d A li t fi d d i• Using Simulated Annealing to find a good sequence-pair
• Can only handle hard blocks i.e., cannot do things like shape-curve computation
• Essentially macro placement• Techniques for soft block shaping exist (e.g., using
Lagrangian Relaxation) but are very slow
[Pan]
Positive step lines
ea d
c
f b
Is this unique?
ea d
c
f b
12
Sequence Pair
• Positive step line sequence: ecadfb [or ecafdb in the alternative version]
• Negative step line sequence: fcbead
[Pan]
Positive Locus and Negative Locus
Positive Locusof Block b
Negative Locusof Block b
[Pan]
Sequence-Pair
Positive Loci Negative Loci
Sequence-Pair = (abdecf, cbfade)
[Pan]
13
Geometric Info of Sequence-Pair
Given a placement and the corresponding sequence-pair (P, N):
• a right of b a is after b in both P and N.c c
ba
c
ba
c
Geometric Info of Sequence-Pair
Given a placement and the corresponding sequence-pair (P, N):
• a above b a is before b in P and after b in N
ba
cba
c
Positive Locus and Negative Locus
Positive Locusof Block b
above
left right
Negative Locusof Block b
[Pan]
below
right
14
Geometric Info of Sequence-Pair
Given a placement and the corresponding sequence-pair (P, N):
• a right of b a is after b in both P and N.
• a left of b a is before b in both P and N.
• a above b a is before b in P and after b in N.
• a below b a is after b in P and before b in N.
[Pan]
Sequence Pair
• Positive step line sequence: ecadfb
• Negative step line sequence: fcbead
[Pan]
From Sequence-Pair to a Floorplan
• Given a sequence-pair, the floorplan with smallest area can be found in O(n2) time.
• Algorithms of time O(n log log n) or O(n log
Labeled grid for(abdecf, cbfade)
bf
ad
e
log log n) or O(n log n) exist. But faster than O(n2) algorithm only when n is quite large.
ab
de
cf
cb
[Pan]
15
From Sequence-Pair to Placement
• Distance from left (bottom) edge can be found using the longest path algorithm on the horizontal (vertical) constraint graph.
Horizontal Constraint Graph Vertical Constraint Graph
[Pan]
Sequence Pair (SP)
A floorplan is represented by a pair of permutations of the module names:e.g. 1 3 2 4 5
3 5 4 1 2A sequence pair (s1, s2) of n modules can represent all possible floorplans formed by the nmodules by specifying the pair-wise relationship between the modules.
[Pan]
Sequence Pair
Consider a pair of modules A and B. If the arrangement of A and B in s1 and s2 are: (…A…B…, …A…B…), then the right boundary of A
is on the left hand side of the left boundary of B. (…A…B…, …B…A…), then the upper boundary of B
is below the lower boundary of A.
[Pan]
16
Example
Consider the sequence pair:(13245,41352 )
2
Any other SP that is also valid for this packing?
2
54
1 3
[Pan]
Floorplan Realization
• Floorplan realization is the step to construct a floorplan from its representation.
• How to construct a floorplan from a sequence pair?
• We can make use of the horizontal and vertical constraint graphs (G and G )constraint graphs (Gh and Gv).
[Pan]
Floorplan Realization
• Whenever we see (…A…B…, …A…B…), add an edge from A to B in Gh with weight wA.
• Whenever we see (…A…B…, …B…A…), add an edge from B to A in Gv with weight hA.
• Add a source vertex s to G and G pointing with• Add a source vertex s to Gh and Gv pointing, with weight 0, to all vertices without incoming edges.
• Finally, find the longest paths from s to every vertex in Gh and Gv (how?), which are the coordinates of the lower left corner of the module in the packing.
[Pan]
17
Example
21 3
1.21.1 1
1.2
1
3 2
5
4
1.21.2
1.2
1.1
1.1
2 4s
0
0
Gh
54
1 3
(13245,41352 )
2
1
2
2.4 1.2
4 2.4s 0
1
3 2
5
4s
0 0
Gv
11 1
2
[Pan]
Constraint Graphs
• How many edges are there in Gh and Gv in total?• Are there any transitive edges in Gh and Gv?• How to remove the transitive edges?• Can we reduce the size of Gh and Gv to linear, i.e.,
no. of edges is of order O(n), by removing all the transitive edges?
[Pan]
Moves
• Three kinds of moves in the annealing process:M1: Rotate a module, or change the shape of a
moduleM2: Interchange 2 modules in both sequencesg qM3: Interchange 2 modules in the first sequence
• Does this set of move operations ensure reachability? Why?
[Pan]
18
Pros and Cons of SP
• Advantages: Simple representation All floorplans can be represented. The solution space is finite. (How big?)
• Disadvantages:ad a ag Redundant representation. The representation is not
1-to-1. The size of the constraint graphs, and thus the
runtime to construct the floorplan is quadratic
[Pan]
*-Tree Methods
• Various methods and representations for nonslicing floorplans Bounded slicing grid (BSG) (1996) O-tree (1999) B*-tree (2000) Corner block list (CBL) (2000) Corner block list (CBL) (2000) Transitive closure graph (TCG) (2001)
• These represent nonslicing floorplans by strings and use simulated annealing to optimize the layout.
Other Floorplanning Methods
• Integer linear programming Uses integer variables to capture “left of,” “right of,”
“above” and “below”
19
Overconstrained Shaping
• Why rectangles, L’s, T’s ? available granularity is by site spacing, row height placers can handle arbitrarily complex region constraints hard IP reuse, generated modules benefit from shape
freedom• Why non overlapping ?• Why non-overlapping ? only requirement: total assigned cell area total resource
area• Roundness and shape simplicity are mythical needs constructive pin assignment don’t need roundness path timing optimization may even want disconnected
shapes
[Kahng]
This is Okay, Really... (Trust Me)
1.0
1.0
0.5,0.5
Blk A Blk B
[Kahng]
...The Cells Won’t Mind
[Kahng]
20
Using Floorplan Information: A Typical “Fluid” Placement
[I. Markov]
Flat vs. hierarchical placement
Flat Hierarchical
• Works well for highly interconnected networks
• Good choice for SoC
Can hybridize the two to get best of both worlds
[Lackey et al., IBM, DAC 03]
Other Objective Functions
21
Motivation
• Critical length as a function of technology Wire length at which delay = clock period
Across-chip wire delays > clock period Multicycle global communication is essential
Chip cross-section
90nm 65nm 45nm 32nm
M3M60
1
2
3
4
5
6
7
Relative critical
seq. length
0.43x
[Sax
ena
(Int
el),
ISPD
03]
[Intel]
Wire-pipelining
• Interconnect delay is distributed among several clock cycles by inserting flip-flops
• Adds area/power overhead
o Delay = 0.67ns (70nm) y ( )o[Cong, Proc. IEEE 2001]
o Target Frequency : 3GHz (clock period : 0.33ns)
1cm
1cm
• Widely used, e.g., Intel’s Itanium processor
An Example Microarchitectue
red
etch
eord
er B
uffe
r
Int
Ren
ame
Int
sche
dule
r
Int
Reg
File
0R
eg F
ile 1
EX
0E
X1
EX
2E
X3
MD
H
MD
H
4
4
2
2
Q
• Numbers below the lines indicate the # of instructions flowing across the line (not bit width) MDH = Memory Disambiguation Hardware
41 blocks, 21 latch banks
Bpr IFe
Re
FP
Ren
ame
FP
Sch
edul
er
Int
FP
Reg
File
EE
X0
EX
1
D-c
ach
eM
Bus Interface Unit
4 4
4 2
8
FT
Q
22
Impact on Microarchitecture
• Keep throughput critical wires short
Execution time = num-instr * cycles/instr (CPI) * cycle-time
• CPI estimation – Cycle accurate simulation, using superscalar processor simulators, of benchmark programs Simulators : Simplescalar (Wisc.), Turandot (IBM), etc. Benchmarks : SPEC 2000, Mediabench Very slow – A single simulation can take days to run to
completion
Minimizing CPI
CPI estimator Physical design
μ-arch Freq
• A Possible design flow
Layout
• A few objectives : Optimal microarchitectural configuration for a particular frequency Optimal design frequency : Wire-pipelining may not improve
performance (exec time) after a certain operating frequency
Recent approaches
• MEVA [Jagannathan, DAC 03] – Floorplanning Simulated Annealing (SA) based, no wire-pipelining Assumption : Each block has multiple implementations Cost function : CPI * cycle-time
o CPI is determined by the chosen μ-arch configuration o Cycle time is determined by the global wire delayso Cycle-time is determined by the global wire delays
CPI is computed for each configuration before-handμ-arch blocks Simplescalar
Floorplanning
CPI
Configuration, cycle-time
Expensive if there are too manycandidate configurations
23
Microarchitecture Template
• A way to specify a class of microarchitectures Define underlying building blocks for the architecture model and
their connections Individual blocks can still be parameterized
o Examples: Size/associativity of caches, size of register file etc.
• Variation in area/latency/delay of a given block• Variation in area/latency/delay of a given block Latency variation affects IPC in the architectural space Area/delay affects physical design space
• Some examples of alternatives.. Cache – size, associativity, latency Branch predictor - size, predictor type Register File – size, latency Instruction scheduler – different scheduling techniques
[Jagannathan, DAC03]
Illustration: Cache
8K Data cache32K Data cache 8K Data cache
A=5.04 mm2, L=4 A=1.44 mm2, L=1A=1.44 mm2, L=2
Smaller area, latencyLarger area, latency
[Jagannathan, DAC03]
Bus Weights Approaches
• Used for floorplanning, incorporating wire latencies• Search space is exponential
Say, up to k latencies per bus, n busses nk combinations Each requires a cycle-accurate simulation for performance
analysis
• Quantify the impact of each wire with a weight, which can be used in physical design optimizationswhich can be used in physical design optimizations
Fetch
Decode
Exec
Branchmispred
loop
The impact may vary with the loop latency
• [Ekpanyapong, DAC 04] : Wire weight = Number of times it is accessed – Determined from simulation profiles Are access ratios good estimators of criticality?
24
Bus Weights Approaches (Contd.)
• Weighted cost function:
Area = area of the layoutWL = wirelengthWSFL = weighted sum of factor latenciesAR = aspect ratio
WSFLWARWarea W cost WLARA
AR = aspect ratio
• [Nookala, DAC 05] Another way of finding wire weights: wire weights are determined using a statistical design of experiments based strategy Has some benefits over access ratios, which are an indirect metric Captures the effect of capturing throughput directly Can add thermal issues [Nookala, ISLPED06] – using HotSpot
(built on top of SimpleScalar)
Controlling the Wire Length “Explosion”
An Architectural Solution to Interconnect Tyranny
• As seen earlier, alternate scaling scenarios also face interconnect tyranny (albeit to differing degrees)
• Most promising approach: simplify interconnection complexity architecturally Modify wiring histogram shape (i.e. Rent’s parameters) of design
• An example: multi-core microprocessors Goes counter to traditional approach of increased integration through
block size scalingblock size scaling
# w
ires
wirelength
[Saxena]
25
Planning a City: Land Usage[Somewhere in Iowa; pop. Density of Iowa= 20 persons/km2]
[Minneapolis, p.d. = 2700/km2] [Barcelona=16000/km2] [New York=26000/km2]
The Future of Chip Design
• Today’s chips are 2-dimensional
[Maly]
3D IC Using Wafer Bonding
SOI wafers with bulk substrate removed
Generalized view
Layer 4
Layer 5
Detailed view
Adapted from [Das et al., ISVLSI, 2003]
Bulk waferMetal levelof wafer 1 Layer 1
Layer 2
Layer 3
Bulk Substrate
Inter-layerbonds
Devicelevel 1
500m
10m
1m
26
Global Net Length Distribution
• Histogram of net length, for various numbers of 3D layers
1000
1200
1400
4 Strata
2 Strata
3D Global Net Distributions
0
200
400
600
800
0 5 10 15 20 25 30 35
Length (mm)
Ne
t D
en
sity
(#
/mm
)
1 Stratum
3D Floorplanning
• Problem: getting the heat out!• Need to incorporate thermal analysis into design• Example of a 3D floorplanner Cong et al., ICCAD 2004; ASPDAC06.