keith so university of new south wales, sydney, australia feb 25 @ fpga’08
DESCRIPTION
Enforcing Long-Path Timing Closure for FPGA Routing with Path Searches on Clamped Lexicographic Spirals. Keith So University of New South Wales, Sydney, Australia Feb 25 @ FPGA’08. Outline. Problem Statement Related Work SpiralRoute Overview Budget Generation - PowerPoint PPT PresentationTRANSCRIPT
Enforcing Long-Path Timing Closure for FPGA Routing
with Path Searches on Clamped Lexicographic Spirals
Keith SoUniversity of New South Wales,
Sydney, AustraliaFeb 25 @ FPGA’08
Outline
• Problem Statement• Related Work1. SpiralRoute Overview
a) Budget Generationb) Clamped Lexicographic Search
2. Some Performance Optimizations3. Experiments• Conclusions and Future Work
Problem Statement & Assumptions
Long-Path Timing-Driven Detailed Routing• Given: Placed circuit mapped into RR Graph +
Timing Requirement D• Find: Mutually RR-vertex disjoint routing trees
s.t. Max. Long-Path Comb. Delay <= DAssumptions• D is achievable under given placement• Buffered switching (delays summable)
Related Work
• [F’92] Iterative slack allocation• [AR’95] Criticality bin + Steiner/Arbor.• [ME’95] Negotiated Congestion• [BR’97] VPR• [LW’03] Lagrangian Rel. Weighting• [ANC+’04] Auto. Constraint Gen.• [FBC’04] RCV
SpiralRoute Overview
• Negotiated Congestion Routing over A*• Paths are lexicographic-costed [S’07,ISPD07]Major Deltas• Optimal delay upper bound generation for
FPGA routing domain• Minimum-congestion bounded-delay
searching (vs tradeoff using weights)• Provable timing closure at completion
Connection Budget Generation – Optimization Component
Weighted Budget Distribution Problem [Ghiasi et.al, ICCAD’04]
Given: DAG G=(V,A), min. delays dij, weights wij, long-path constraint T
Find: delay budgets bij such that:1. (dij+bij) summed along all paths satisfies T2. Sum of (wij.bij) over all edges is maximised
Transforms into min-cost flow problem; budgets recovered from dual of flow solution.
Connection Budget Generation – Mapping to FPGA Routing
1. Represent LE’s and pads as edges (split clocked LE’s)
2. Form super-DAG3. dij = min connection delay (from congestion-
oblivious routing)4. Set T = D5. Set wij = 1 for real edges, 0 for virtuals6. Solved (dij+bij) is the maximum delay for each
edge in our routing
Comparison with It. Minimax PERT(clma runtime ~ 20mins)
Search Design – n-Lex. Search • [1-Line A* search: f(v)=g(v)+h(v), expand v with
minimum f(v) until t]• 2-component lexicographic search used for
routability router (Conceptually a*∞ + b)• Need n-components and custom comparison
functions (proofs needed to avoid ∞k values!)Theorem A* of n-lexicographic search is admissible
if all components are totally-ordered monoids with order-preserving addition
• Monoids helpful to avoid clutter from max()
Search Design – Clamping Component
• 3-component vector1. Delay, with pivot (x < y iff
x <= T & y > T)2. Congestion, regular <3. Delay, regular <Ex: f(w2)=[0,2,2];
f(x1)=[1,0,4]; f(w3)=[0,1,3]Assumption h(v) is at least
close to h*(v) for clamping component
Search Design – Timing Closure
• Delay pivot element splits congestion identical paths by budget
• Will always choose a budget-compliant path (sum of finite congestion costs are finite)
• Over all connections => successful routing always yields timing closure!
Performance – [Low-Hanging] Optimizations
• Original implementation is around ~ 2-2.5x slower than current runtime
• Introduced some low-hanging speed & quality optimizations– Index structure for lexicographic costs– Greedy tree mgmt. to ameliorate pin-ordering
• A high-hanging optimization in future work is congestion schedule handling (but many promising leads from global routers in ICCAD’07)
Trie-of-Stacks Index Structure
• Replaces f(v) index structure
• Exploits FPGA routing symmetry
• Index operations independent of size
• Reduces runtimes by ~15 %
Tree Topology Maintainence
Experiments - Setup
• Run against VPR4.30 on architecture similar to single-segment “challenge” arch.– (Researcher timing
constraints)– routability comparison
with unclamped lex-search
• Route at the placement allowed Fmax
• VPR pres_fac=1.5/1.1
Routed Solution Timing Quality
Runtime Comparison
Effects of Budget Quality
Future Work
• Runtime improvements – Schedule improvement– Performance tuning
• Multi-CLB segments (see backup slide)• Multi-objective routing • Other domains (e.g. standard cell global)
Conclusions
• Extended lexicographic search to timing-driven routing– New budgeting component– Clamped search design– Supporting techniques for runtime
• Timing closure is guaranteed on routing success
• Solution quality is good but need more runtime improvement to be viable
Acknowledgements
• J. Rose, V. Betz, A. Marquardt (Toronto) – VPR4.30 source & benchmarks
• Australian Centre for Advanced Computing and Communications (ac3) – High Performance Computing Support
• Advisor*: Dr. Aleks Ignjatovic
Question Time…
To Backup Slides
Issues with h(v) ~/~ h*(v)• “Node locking” occurs when g(v)+h(v) <= D but
really g(v)+h*(v) > D– Expansion downstream will be truncated– But a subpath with less delay but more congestion
cannot expand into it– But if reexpand on shorter delay then backtrace will
ignore congestion – not locally decidable!• Quick fix: precompute h*(v) (Only needed for
sink pins t) – Only bounding components need the accuracy
• Fancy on-the-fly handling?