ece 506 reconfigurable computing lecture 8 fpga placement

19
ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement

Upload: anthony-freeman

Post on 23-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement

ECE 506

Reconfigurable Computing

Lecture 8

FPGA Placement

Page 2: ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement

Timing-driven Placement

• Why should placement take timing into account?- Placement sets the constraints for router- A timing driven router’s performance is limited by the quality of the

placement. - For more speed, placement should be timing-driven.

• Operation principle- Map blocks that are on critical path onto physical locations that are

closer together– Minimize the amount of interconnect for critical signals to traverse

Page 3: ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement

Timing-Driven Placement Expectation° High quality placement° Reasonable execution time° Less sacrifices in routability

Page 4: ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement

Timing-driven Placement

• Timing-driven only placement- Increases demand on routing resources

• Wireability-driven only placement- Slower circuit

• Take both wire length and critical path into account- Problem: Modeling delay

– Critical path changes as we move blocks– Most accurate delay model

» Route each placement » Extract delay of each connection» Execution time is a major problem

Page 5: ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement

Timing-Driven Placement – Delay Modeling

° Delay profile• Homogenous FPGA• Exploit uniformity

- Compute delay as a function of distance (∆x, ∆y)- Use VPR router to determine delay between blocks- Compute a delay lookup matrix for every possible ∆x, ∆y

• Router is timing driven- Take advantage of the architecture features

– Segment length – Use long wires for blocks on far ends of the FPGA

• Assumption that router will probably find the minimum delay path (a leap of faith!)

Page 6: ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement

Determining Criticality

• Same basic approach as used for clustering criticality• For each (i, j) connection from source i and sink j

- Determine arrival times (pre-order BFS)- Determine required arrival times (post-order BFS)- Determine slack -> required_arrival_time –

arrival_time- Criticality(i, j) = [1- slack(i, j)]/ (Max slack)

Page 7: ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement

TVPLACE

Page 8: ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement

Cost Function

What is the purpose of the criticality exponent?

°Heavily weight connections that are critical, while giving less weight to connections that are non-critical

From lookup table matrix

[0 1]

Page 9: ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement

Balancing Wiring and Timing Cost

• Need to determine relative changes in timing and wiring based on moves

• Idea: Use relative changes from previous calculation- Both values less than 1- Helps balance effect based on scaling parameter

Page 10: ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement

Path vs Connection Based Timing Analysis° Path based:

• Timing-analysis to compute path-delays at every stage of the placement and use delays in the cost function

• Computationally expensive- Moving any connection triggers a new timing-analysis

° Connection based:• Perform timing-analysis before placement

- Assign slacks to each connection- Pay attention to connections with low slack

• Delay values are always up to date (∆x, ∆y)• Criticality becomes outdated after the moves

° Approach: Hybrid• Allow certain number of moves between each timing-analysis

Page 11: ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement

VPR, Placement° VPlace is a Simulated Annealing based algorithm

• minimize the amount of interconnect • circuit blocks that are on the same net => close together.• uses a bounding-box based cost function

Page 12: ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement

Updated Annealing Algorithm

Page 13: ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement

How often to recalculate delay?

• Recalculating delay once per temperature is good.• Also simplifies programming somewhat

# of temperature changes between each timing analysis

Page 14: ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement

Criticality Exponent

°Large exponent• Fewer connections will have

large “Timing_Cost”- For these few connections

“Timing_Cost” is effective

• For Non-critical connections “Wiring_Cost” is effective

• Therefore, placement focuses on minimizing wiring as “Criticality_Exponent” increases

Page 15: ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement

Criticality Exponent

°When is 1• Critical path is worse

• Wiring cost is much more worse

Page 16: ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement

Oscillation Effect° When is 1

• Only delay component• Attempts to minimize critical path at the cost of extending other

non-critical paths• Timing analyze once per temperature update

- Several moves between temperature updates• Able to reduce critical path during one iteration of the outer

loop• Makes other paths very critical• Oscillation effect makes it hard for placement to converge to

best solution

° When is 0.5• Wirelength reduces the oscillation effect

- Penalizes moves that increase wirelength

Page 17: ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement

Effect of

Page 18: ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement

How important is timing-driven placement?

Run timePenalty –

2.5X

Page 19: ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement

Conclusion° The greatest challenge facing FPGA placement is the

need to produce high quality placements for ever-larger circuits.

• FPGA capacity doubles every two to three years, doubling the size of the placement problem.

° In order to maintain the fast time to market and ease of use historically provided by FPGAs, placement algorithms cannot be allowed to take ever more CPU time.

° There is thus a compelling need for algorithms that are very scalable and parallel yet still produce high-quality results.