quiz 1. question 1) according to the study on “simultaneous timing driven clustering and placement...

1

QUIZ

2

Question• 1) According to the study on “Simultaneous Timing Driven

Clustering and Placement for FPGAs”, what is a fragment level move and which drawbacks of the traditional FPGA CAD flow are targeted with the fragment level moves?

3

BSPlace: A BLE Swapping technique for placement

04.11.2014

Minsik Hong

George Hwang

Hemayamini Kurra

Minjun Seo

4

Outline• SCPlace

• Introduction• Algorithm flowchart• Net Counting Algorithm• Results

• BSPlace• Algorithm• Demo

• Backup Slides• If you guys ask minimal questions we can cover more

• Net Weighting• VPR Datastructures

5

Rajavel, Senthilkumar Thoravi, and Ali Akoglu. "MO-Pack: Many-objective clustering for FPGA CAD." Proceedings of the 48th Design Automation Conference. ACM, 2011.

6

Simultaneous timing driven clustering and placement for FPGAs.

Chen, Gang, and Jason Cong. Field Programmable Logic and Application. Springer Berlin Heidelberg, 2004. 158-167.

7

Key concept• Fragment level move

• BLE to a new CLB• Check for valid CLB configuration• Feasibility (number of BLEs and input pins)• Update the cost function

• Block level move• CLB to CLB

•

8

BLE Level Swapping• Advantages

• Fix Packing issues during simulated annealing• Better Congestion Mitigation• Better at Routeability

• Disadvantages• Speed• Complexity

9

SCPlace Algorithm

11

Additional feature of Journal version SCPlace

12

Use Novel net weighting

Use Novel net weighting

13

A novel net weighting algorithm for timing-driven placement

Kong, Tim Tianming. Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design. ACM, 2002.

14

Accurate All Path Counting

15

a

b

c d

e

f5

71

5

3

0/0

0/2

7/7 8/8

13/13

11/13

ARR/REQ

a

b

c d

e

f

Calculate F(t)

Fs(a, c) = 7 – 0 – 7 = 0Fs(b, c) = 7 – 0 – 2 = 2

2

00

0

0

D{Fs(a, c), T} = D{0,13} = 1D{Fs(b, c), T} = D{2,13} = = 0.88D{Fs(c, d), T} = D{0,13} = = 1D{Fs(d, e), T} = D{0,13} = = 1D{Fs(d, f), T} = D{0,13} == 1

a=2, T: the longest path delay

1

1

0

0

0

0

F(c) = F(c) + D{Fs(a, c), T} x F(a) + D{Fs(b, c), T} x F(b) = 0 + 1x1 + 0.88x1 = 1.88

1.88 1.88

1.88

1.88

1

1

delay

16

Calculate B(s)

a

b

c d

e

f5

71

5

3

0/0

0/2

7/7 8/8

13/13

11/13

ARR/REQ

a

b

c d

e

f

0 0

1

1

0

0

Bs(d, e) = 13 – 5 – 8 = 0Bs(d, f) = 13 – 3 – 8 = 2

0

00

0

2

a=2, T: the longest path delay

D{Bs(a, c), T} = D{0,13} = 1D{Bs(b, c), T} = D{0,13} = 1D{Bs(c, d), T} = D{0,13} = 1D{Bs(d, e), T} = D{0,13} = 1D{Bs(d, f), T} = D{2,13} = 0.88

B(d) = B(d) + D{Bs(d, e), T} x B(e) + D{Bs(d, f), T} x B(f) = 0 + 1x1 + 0.88x1 = 1.88

1.88 1.88

1

1

1.88

1.88

17

Calculate AP(s, t) (a=2)

D{slack(a, c), T} = D{0,13} = 1D{slack(b, c), T} = D{2,13} = 0.88D{slack(c, d), T} = D{0,13} = 1D{slack(d, e), T} = D{0,13} = 1D{slack(d, f), T} = D{2,13} = 0.88

a

b

c d

e

f

1.88/1.88 1.88/1.88

1.88/1

1.88/1

1/1.88

1/1.882

0

0

0

2

F(s)/B(t)

slack

AP(a,c) = F(a) x B(c) x D{slack(a, c), T} = 1 x 1.88 x 1 = 1.88AP(b,c) = F(b) x B(c) x D{slack(b, c), T} = 1 x 1.88 x 0.88 = 1.65

a

b

c d

e

f

1.88

1.65

3.531.88

1.65

18

Results (Only use BLE swapping)

CLB = 4

19

Results (Only use BLE swapping)

20

Results (BLE + CLB swapping)

where 0 ≤ α ≤ 1

The number of CLB moves: The number of BLE moves:

21

Results (BLE + CLB swapping)

T-Vpack+VPR vs SCPlace (α=0.5)

22

BSPlace

23

BSPlace• BLE Level Swapping within Simulated Annealing with

Rent’s Rule• Advantages

• Fix packing issues as they occur.• Potentially better routability.• Potentially better congestion due to combination of placement and pack-

ing.

• Disadvantages• Execution time – We need to do memory allocation and deallocation for

any ble swapping.• Code Complexity – VPR is complex. We focus a lot of time with debug-

ging and testing instead of algorithms.

24

Rent’s Rule Threshold Value• Calculate the k value to get threshold• Enter simulated annealing process

• Outer loop process• Inner loop process

• Choose random CLB to move from current position to another position• Check Rent’s Rule Threshold• If we get a better result for swap

• Queue BLE Swapping

• Otherwise• Do CLB swapping :Use T-v place

• Loop Through BLE Swapping• Do BLE Swap after checking whether swap overlaps with previous swap• Re-Allocated Memory and return to outer loop

Pio kBT

25

Current Status• Code

• Created our own BLE swapping mechanism using VPR data struc-ture.• We have a whole suite of test fixtures to test code.• Testing still continuing, but we are finding minimal issues.

• We have done a swap within placement.• We have started to integrate our cost function

• Validation• We intend to run VPR benchmarks. Our BLE swapping solution

should be better or the same as TV-Place.• Our VPR benchmarks should also be comparable to IRAC.

26

The circuit below abstracts the MUX, switchboxes, and connection boxes. The connections represent the direct connections between bles in clbs. Op-timize this circuit by performing one BLE swap. Explain why your optimiza-tion will result in better performance.

Architecture ParameterK = 2I = 3N = 2MeasurementCritical Path Delay = 1.182ns

Demo

27

Demo• http://www.screenr.com/gJdN

http://www.screenr.com/gJdN

28

Demo

29

Thanks.

30

Backup Slides

31

Impact of duplication on placement

Delay = 2 Delay = 1

32

A novel net weighting algorithm for timing-driven placement

Kong, Tim Tianming. Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design. ACM, 2002.

33

A Novel Net Weighting Algorithm• Accurate path counting algorithm

• The first known accurate path counting algorithm that considers all paths

• Due to experimental number of paths present in the circuit, accu-rate all path counting has been considered very difficult.

• Significant performance improvement• Little loss in total wirelength• No runtime overhead

34

A Novel Net Weighting Algorithm• consider the path sharing effect

• If two critical paths share a common segment, the edges in the common segment should receive higher weights.

• Define two variables• Forward path F(p) - the number of different critical paths starting

from PI elements, terminating at p.

• Backward path B(p) – the number of different critical paths staring from PO elements, terminating at p, if we reverse all signal flow di-rections.

35

Background

36

Background

37

Example

a

b

c d

e

f5

71

5

3

Timing of a circuit

0

0

7 8

13

11

5

71

5

3

ARR(t)

0

2

7 8

13

13

5

71

5

3

REQ(s)

The longest path delay (T)

38

Example

0

2

0 0

0

25

71

5

3

Slack(s, t)

5

71

5

3

0/0

0/2

7/7 8/8

13/13

11/13

39

Example

0

0 0

0

71

5

d(π) = 13, slack(π) = 0

2

0 0

25

1

3

0

0 0

2

71

3 2

0 0

0

5

1

5

d(π) = 9, slack(π) = 4

d(π) = 11, slack(π) = 2

d(π) = 11, slack(π) = 2

40

Critical Path counting

41

Calculate F(p)

0

0

0 0

0

05

71

5

3

1

1

0 0

0

05

71

5

3

1

1

2 2

2

25

71

5

3

42

Calculate B(p)

0

0

0 0

0

05

71

5

3

0

0

0 0

1

15

71

5

3

2

2

2 2

1

15

71

5

3

43

Calculate GP(s,t)

2

2

2 2

1

15

71

5

3

1

1

2 2

2

25

71

5

3

a

b

c d

e

f

2

2

4

2

2

44

Accurate All Path Counting• Use discount function to get accurate counting result

• ‘a’ is a positive constant number• x

• Fs(s,t) = ARR(t) – ARR(s) – d(s,t)• Bs(s,t) = REQ(t) – REQ(s) – d(s,t)

• y is the longest path delay (T)

45

Accurate All Path Counting

46

Ex. Calculate F(t) (a=2)

a

b

c d

e

f5

71

5

3

0/0

0/2

7/7 8/8

13/13

11/13

D{Fs(a, c), T} = D{0,13} = 1D{Fs(b, c), T} = D{2,13} = 0.88D{Fs(c, d), T} = D{0,13} = 1D{Fs(d, e), T} = D{0,13} = 1D{Fs(d, f), T} = D{0,13} = 1

a

b

c d

e

f5

71

5

3

1

1

1+0.88

1.88

1.88

1.88

47

Ex. Calculate B(s) (a=2)

a

b

c d

e

f5

71

5

3

0/0

0/2

7/7 8/8

13/13

11/13

D{Bs(a, c), T} = D{0,13} = 1D{Bs(b, c), T} = D{0,13} = 1D{Bs(c, d), T} = D{0,13} = 1D{Bs(d, e), T} = D{0,13} = 1D{Bs(d, f), T} = D{2,13} = 0.88

a

b

c d

e

f5

71

5

3

1.88

1.88

1.88 1+0.88

1

1

48

Ex. Calculate AP(s,t) (a=2)

a

b

c d

e

f5

71

5

3

1.88

1.88

1.88 1+0.88

1

1

a

b

c d

e

f5

71

5

3

1

1

1+0.88

1.88

1.88

1.88

a

b

c d

e

f

1*1.88*1= 1.88

D{slack(a, c), T} = D{0,13} = 1D{slack(b, c), T} = D{2,13} = 0.88D{slack(c, d), T} = D{0,13} = 1D{slack(d, e), T} = D{0,13} = 1D{slack(d, f), T} = D{2,13} = 0.88

1*1.88*0.88=1.65

1.88*1.88*1=3.53

1.88*1*1=1.88

1.88*1*0.88=1.65

49

Compare results

a

b

c d

e

f

1.88

1.65

3.53

1.88

1.65

a

b

c d

e

f

2

2

4

2

2

Using Critical counting method (GPATH), it is difficult to get accurate re-sult.However, if we use proposed algorithm, we can get more accurate result.

50

VPR Datastructures• Resource Routing Graph• Physical Block Graph• Netlist

• Global CLB Netlist• Global Atom Netlist

• Blocks

51

Blocks

• Contains CLB• Contains the Input Output• Contains the Resource Routing Graph• Contains the Physical Blocks

• Physical Blocks represents the BLE• Physical Blocks represents the Flip Flop• Physical Blocks also contains the LUTs

52

Resource Routing Graph

• Nodes are pins• Edges are architectural connections• Each pin is associated with a net num• Prev Nodes and Edges represents

the actual connections per ble.

53

Global Netlist

54

Atom Netlist

quiz 1. question 1) according to the study on “simultaneous timing driven clustering and placement...

Documents

fc d

bd d

x fa d

fa x bc x d

fb x bc x d

tvpack vpr vs scplace

results ble clb

bsplaceble level swapping