switch eecs 252 – spring 2006 ramp blue project

53
Switch EECS 252 – Spring 2006 RAMP Blue Project Jue Sun and Gary Voronel Electrical Engineering and Computer Sciences University of California, Berkeley May 1, 2006

Upload: maggie-brown

Post on 03-Jan-2016

19 views

Category:

Documents


0 download

DESCRIPTION

Switch EECS 252 – Spring 2006 RAMP Blue Project. Jue Sun and Gary Voronel Electrical Engineering and Computer Sciences University of California, Berkeley May 1, 2006. Outline. Goal of switch Implementation Performance Future implementation Current state of project Project experience. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Switch EECS 252 – Spring 2006 RAMP Blue Project

Switch

EECS 252 – Spring 2006RAMP Blue Project

Jue Sun and Gary VoronelElectrical Engineering and Computer Sciences

University of California, Berkeley

May 1, 2006

Page 2: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

2

Outline

• Goal of switch

• Implementation

• Performance

• Future implementation

• Current state of project

• Project experience

Page 3: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

3

One Piece of the Puzzle

• Main goal of RAMP Blue is to build a large scale system

• To do useful work, processors must be able to communicate

• Therefore, we need an interconnection network

Page 4: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

4

Implementation Goals

1. Support communication between all processors in system

2. Flexible hardware allowing parameterization of global system constants, especially number of Microblaze cores per FPGA

3. Minimal resource utilization

4. High throughput

5. Low latency

6. Simple, homogenous hardware

7. Simple software interface

Page 5: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

5

Hardware Design Constraints

• RAMP Blue will be implemented on the BEE2

• 4 user FPGAs per BEE2 board

• 2 LVCMOS links FPGA-to-FPGA communication– Relatively low latency (2 or 3 cycles)

– Throughput: more than 64bit

• 16 MGT links per board (4 per FPGA) for board-to-board communication

– Relatively high latency (20 or more cycles)

– Throughput: 32bit or 64 bit

• To achieve lowest latency possible, we limit the packet routes to at most 1 MGT link

• 16 Microblaze cores per FPGA (64 per board)– Depending on resource utilization, number of cores per FPGA

may need to be reduced

Page 6: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

6

Physical Topology

• Topology is fixed and homogenous throughout the system

– Each FPGA directly connected to 2 other FPGAs on the same board and 4 other boards

– Number of cores per FPGA is the same on every FPGA

• Each board has a direct connection to every other board in the system (maximum of 17 boards)

– BOARD n hooks up to board BOARD 16 through MGT n

– With 16 cores per FPGA, 17 boards supports 1088 processors!

Page 7: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

7

Board Level Connectivity

BOARD 0

FPGA 2

01 1415

45

23

12

13

10

11

8 96 7

FPGA 0 FPGA 3

FPGA 1

BOARD 7

FPGA 30

01 1415

45

23

12

13

10

11

8 96 7

FPGA 28 FPGA 31

FPGA 29

BOARD 16

FPGA 66

01 1415

45

23

12

13

10

11

8 96 7

FPGA 64 FPGA 67

FPGA 65

BOARD 10

FPGA 42

01 1415

45

23

12

13

10

11

8 96 7

FPGA 40 FPGA 43

FPGA 41

Page 8: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

8

FPGA Level Connectivity

For clarity, configuration shown is with 4 Microblaze cores per FPGA

BOARD 0

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 0 FPGA 3

FPGA 1 FPGA 2

0

1

3

2

4

5

7

6

12

13

15

14

8

9

11

10

Page 9: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

9

Switch Fabric Specifications

• Crossbar switch with maximal connectivity– Every Microblaze can access every other Microblaze on the

same FPGA directly

– Every Microblaze can access both LVCMOS links

– Every Microblaze can access all FPGA-local MGT links

• Buffering on inputs and outputs– Store-and-forward buffers for Microblazes to decrease

complexity and simplify software interface

– Cut through buffers for LVCMOS links

– MGT links wrapped XAUI cores that already have internal buffers

Page 10: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

10

Microblaze Level Connectivity

For clarity, configuration shown is with 4 Microblaze cores per FPGA

MICROBLAZE 0

MICROBLAZE 1

MICROBLAZE 3

MICROBLAZE 2S

WIT

CH

XAUI0

LV

CM

OS

LE

FT

LV

CM

OS

BU

FF

ER

LV

CM

OS

BU

FF

ER

LV

CM

OS

RIG

HT

LV

CM

OS

BU

FF

ER

LV

CM

OS

BU

FF

ER

BU

FF

ER

UN

ITB

UF

FE

RU

NIT

BU

FF

ER

UN

IT

BU

FF

ER

UN

IT

BU

FF

ER

UN

IT

BU

FF

ER

UN

IT

BU

FF

ER

UN

IT

BU

FF

ER

UN

IT

XAUICTRL

XAUI1XAUICTRL

XAUI2XAUICTRL

XAUI3XAUICTRL

Page 11: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

11

Switch Overall

Scheduler

Send requestDestinationDataDoneAllow Start

DataDataValidDataDone

Input Buffer

Send requestDestinationDataDoneAllow Start

DataDataValidDataDone

Input Buffer

Send requestDestinationDataDoneAllow Start

DataDataValidDataDone

Input Buffer

Send requestDestinationDataDoneAllow Start

DataDataValidDataDone

Input Buffer

free

DataDataValidDataDone

Output B

uffer

free

DataDataValidDataDone

Output B

uffer

free

DataDataValidDataDone

Output B

uffer

free

DataDataValidDataDone

Output B

uffer

Page 12: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

12

Scheduler

• If two ports want to send to the same port at the same time, the port attached to the port with the lowest number will be allowed to send first

• Other control logic not shown here is used to implement protocol between switch and buffers

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Port in 1

Port in 4

Port in 3

Port in 2

Control Logic

Dest Port

Dest Port

Dest Port

Dest Port

Page 13: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

13

Scheduler

• If two ports want to send to the same port at the same time, the port attached to the port with the lowest number will be allowed to send first

• Other control logic not shown here is used to implement protocol between switch and buffers

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Port in 1

Port in 4

Port in 3

Port in 2

Control Logic

Dest Port

Dest Port

Dest Port

Dest Port

Page 14: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

14

Scheduler

• If two ports want to send to the same port at the same time, the port attached to the port with the lowest number will be allowed to send first

• Other control logic not shown here is used to implement protocol between switch and buffers

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Port in 1

Port in 4

Port in 3

Port in 2

Control Logic

Dest Port

Dest Port

Dest Port

Dest Port

Page 15: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

15

Scheduler

• If two ports want to send to the same port at the same time, the port attached to the port with the lowest number will be allowed to send first

• Other control logic not shown here is used to implement protocol between switch and buffers

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Port in 1

Port in 4

Port in 3

Port in 2

Control Logic

Dest Port

Dest Port

Dest Port

Dest Port

Page 16: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

16

Scheduler

• If two ports want to send to the same port at the same time, the port attached to the port with the lowest number will be allowed to send first

• Other control logic not shown here is used to implement protocol between switch and buffers

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Port in 1

Port in 4

Port in 3

Port in 2

Control Logic

Dest Port

Dest Port

Dest Port

Dest Port

Page 17: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

17

Scheduler

• If two ports want to send to the same port at the same time, the port attached to the port with the lowest number will be allowed to send first

• Other control logic not shown here is used to implement protocol between switch and buffers

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Port in 1

Port in 4

Port in 3

Port in 2

Control Logic

Dest Port

Dest Port

Dest Port

Dest Port

Page 18: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

18

Scheduler

• If two ports want to send to the same port at the same time, the port attached to the port with the lowest number will be allowed to send first

• Other control logic not shown here is used to implement protocol between switch and buffers

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Port in 1

Port in 4

Port in 3

Port in 2

Control Logic

Dest Port

Dest Port

Dest Port

Dest Port

Port 2

Page 19: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

19

Scheduler

• If two ports want to send to the same port at the same time, the port attached to the port with the lowest number will be allowed to send first

• Other control logic not shown here is used to implement protocol between switch and buffers

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Port in 1

Port in 4

Port in 3

Port in 2

Control Logic

Dest Port

Dest Port

Dest Port

Dest Port

Port 2

Page 20: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

20

Scheduler

• If two ports want to send to the same port at the same time, the port attached to the port with the lowest number will be allowed to send first

• Other control logic not shown here is used to implement protocol between switch and buffers

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Request schedulerfirst come first serve

Req port1Req port2Req port3Req port4

Req 1 RegisteredReq 2 RegisteredReq 3 RegisteredReq 4 Registered Port to service

Port in 1

Port in 4

Port in 3

Port in 2

Control Logic

Dest Port

Dest Port

Dest Port

Dest Port

Port 3

Page 21: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

21

Source Routing

• Fixed topology allows for straightforward source routing implementation

• Destination routing would be more robust, but would require significantly more resources and greater complexity

• Packet header is extremely simple: just a concatenated sequence of hops

• Minimal hardware required to determine next hop and adjust the header at every hop (zero LUTs used – can’t get better than that!)

– The next hop is encoded in the lowest bits of the header

– To adjust the header, the hardware must simply shift out the lowest bits

Page 22: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

22

Source Routing – Hop Encoding

• Need 5 bits to represent each hop– Must be able to encode 16 cores per FPGA + 4 MGT links + 2 LVCMOS

links = 22 total encodings (+ 1 for a FIN code)

– If 8 or less cores per FPGA are used, then each hop can be represented using only 4 bits (hardware supports parameterization of the hop encoding width)

• Maximum of 6 hops based on physical topology– Constrained MGT links to 1 hop per route

– Therefore, worst case route is:LVCMOS LVCMOS MGT LVCMOS LVCMOS MB

• Hop encoding allows header to fit into 1 word– 6 hops x 5 bits/hop = 30 bits

Page 23: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

23

Source Routing – Hop Encoding

• Need 5 bits to represent each hop– Must be able to encode 16 cores per FPGA + 4 MGT links + 2 LVCMOS

links = 22 total encodings (+ 1 for a FIN code)

– If 8 or less cores per FPGA are used, then each hop can be represented using only 4 bits (hardware supports parameterization of the hop encoding width)

• Maximum of 6 hops based on physical topology– Constrained MGT links to 1 hop per route

– Therefore, worst case route is:LVCMOS LVCMOS MGT LVCMOS LVCMOS MB

• Hop encoding allows header to fit into 1 word– 6 hops x 5 bits/hop = 30 bits

00 HOP5 HOP4 HOP3 HOP2 HOP1 HOP0

5 5 5 5 5 52

Page 24: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

24

Source Routing – Global Naming

• Processors are globally named – Necessary to reach the goal of a simple software interface

– If there are 16 cores per FPGA with 4 FPGAs per board and 17 total boards, then the processors are numbered 0 - 1087

• Naming scheme scales down with less cores– Necessary to support parameterization of global system

constants (especially number of cores per FPGA)

– If there are 4 cores per FPGA with 4 FPGAs per board and 17 total boards, then the processors are numbered 0 – 271

• Invalid processor number triggers error at the software level

– Again, supports simple software interface

– Ensures that only packets with valid headers enter the network

Page 25: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

25

Source Routing Example

• For simplicity, let’s assume there are 4 cores per FPGA

• Let’s send from processor #10 to processor #24 (representative of worst case path)

Page 26: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

26

Source Routing Example

• For simplicity, let’s assume there are 4 cores per FPGA

• Let’s send from processor #10 to processor #24 (representative of worst case path)

BOARD 0

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 0 FPGA 3

FPGA 1 FPGA 2

0

1

3

2

4

5

7

6

12

13

15

14

8

9

11

10

BOARD 1

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 4 FPGA 7

FPGA 5 FPGA 6

16

17

19

18

20

21

23

22

28

29

31

30

24

25

27

26

00 MB0 LEFT LEFT MGT1 LEFT LEFT

Page 27: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

27

Source Routing Example• Destination core is on a different board, so packet must first be

routed from the source FPGA (FPGA 2) to the FPGA that is connected to the destination board (which is FPGA 0)

• This requires 2 hops over the LEFT LVCMOS link

BOARD 0

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 0 FPGA 3

FPGA 1 FPGA 2

0

1

3

2

4

5

7

6

12

13

15

14

8

9

11

10

BOARD 1

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 4 FPGA 7

FPGA 5 FPGA 6

16

17

19

18

20

21

23

22

28

29

31

30

24

25

27

26

00 MB0 LEFT LEFT MGT1 LEFT LEFT

Page 28: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

28

Source Routing Example• Destination core is on a different board, so packet must first be

routed from the source FPGA (FPGA 2) to the FPGA that is connected to the destination board (which is FPGA 0)

• This requires 2 hops over the LEFT LVCMOS link

BOARD 0

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 0 FPGA 3

FPGA 1 FPGA 2

0

1

3

2

4

5

7

6

12

13

15

14

8

9

11

10

BOARD 1

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 4 FPGA 7

FPGA 5 FPGA 6

16

17

19

18

20

21

23

22

28

29

31

30

24

25

27

26

00 FIN MB0 LEFT LEFT MGT1 LEFT

Page 29: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

29

Source Routing Example• Destination core is on a different board, so packet must first be

routed from the source FPGA (FPGA 2) to the FPGA that is connected to the destination board (which is FPGA 0)

• This requires 2 hops over the LEFT LVCMOS link

BOARD 0

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 0 FPGA 3

FPGA 1 FPGA 2

0

1

3

2

4

5

7

6

12

13

15

14

8

9

11

10

BOARD 1

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 4 FPGA 7

FPGA 5 FPGA 6

16

17

19

18

20

21

23

22

28

29

31

30

24

25

27

26

00 FIN FIN MB0 LEFT LEFT MGT1

Page 30: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

30

Source Routing Example

• Once at the proper FPGA, packet can be sent across the MGT link to an FPGA on the destination board

BOARD 0

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 0 FPGA 3

FPGA 1 FPGA 2

0

1

3

2

4

5

7

6

12

13

15

14

8

9

11

10

BOARD 1

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 4 FPGA 7

FPGA 5 FPGA 6

16

17

19

18

20

21

23

22

28

29

31

30

24

25

27

26

00 FIN FIN MB0 LEFT LEFT MGT1

Page 31: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

31

Source Routing Example

• Once at the proper FPGA, packet can be sent across the MGT link to an FPGA on the destination board

BOARD 0

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 0 FPGA 3

FPGA 1 FPGA 2

0

1

3

2

4

5

7

6

12

13

15

14

8

9

11

10

BOARD 1

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 4 FPGA 7

FPGA 5 FPGA 6

16

17

19

18

20

21

23

22

28

29

31

30

24

25

27

26

00 FIN FIN FIN MB0 LEFT LEFT

Page 32: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

32

Source Routing Example

• Then, the packet must be routed to the destination FPGA, which requires 2 more LVCMOS hops

BOARD 0

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 0 FPGA 3

FPGA 1 FPGA 2

0

1

3

2

4

5

7

6

12

13

15

14

8

9

11

10

BOARD 1

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 4 FPGA 7

FPGA 5 FPGA 6

16

17

19

18

20

21

23

22

28

29

31

30

24

25

27

26

00 FIN FIN FIN MB0 LEFT LEFT

Page 33: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

33

Source Routing Example

• Then, the packet must be routed to the destination FPGA, which requires 2 more LVCMOS hops

BOARD 0

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 0 FPGA 3

FPGA 1 FPGA 2

0

1

3

2

4

5

7

6

12

13

15

14

8

9

11

10

BOARD 1

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 4 FPGA 7

FPGA 5 FPGA 6

16

17

19

18

20

21

23

22

28

29

31

30

24

25

27

26

00 FIN FIN FIN FIN MB0 LEFT

Page 34: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

34

Source Routing Example

• Then, the packet must be routed to the destination FPGA, which requires 2 more LVCMOS hops

BOARD 0

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 0 FPGA 3

FPGA 1 FPGA 2

0

1

3

2

4

5

7

6

12

13

15

14

8

9

11

10

BOARD 1

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 4 FPGA 7

FPGA 5 FPGA 6

16

17

19

18

20

21

23

22

28

29

31

30

24

25

27

26

00 FIN FIN FIN FIN FIN MB0

Page 35: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

35

Source Routing Example

• Finally, the packet must be forwarded to the destination Microblaze core

BOARD 0

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 0 FPGA 3

FPGA 1 FPGA 2

0

1

3

2

4

5

7

6

12

13

15

14

8

9

11

10

BOARD 1

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 4 FPGA 7

FPGA 5 FPGA 6

16

17

19

18

20

21

23

22

28

29

31

30

24

25

27

26

00 FIN FIN FIN FIN FIN MB0

Page 36: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

36

Source Routing Example

• Finally, the packet must be forwarded to the destination Microblaze core

BOARD 0

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 0 FPGA 3

FPGA 1 FPGA 2

0

1

3

2

4

5

7

6

12

13

15

14

8

9

11

10

BOARD 1

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 4 FPGA 7

FPGA 5 FPGA 6

16

17

19

18

20

21

23

22

28

29

31

30

24

25

27

26

00 FIN FIN FIN FIN FIN FIN

Page 37: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

37

Source Routing Example• Each arrow head represents a hop – takes 5 hops to reach the

destination FPGA• Requires one more hop to send the packet to the destination

Microblaze core totalling 6 hops in the worst case

BOARD 0

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 0 FPGA 3

FPGA 1 FPGA 2

0

1

3

2

4

5

7

6

12

13

15

14

8

9

11

10

BOARD 1

MGT0MGT1 MGT14MGT15

MG

T4

MG

T5

MG

T2

MG

T3

MG

T12

MG

T13

MG

T10

MG

T11

MGT8 MGT9MGT6 MGT7

FPGA 4 FPGA 7

FPGA 5 FPGA 6

16

17

19

18

20

21

23

22

28

29

31

30

24

25

27

26

Page 38: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

38

Source Routing – 17th Board

• To support the 17th board, boards communicate to the 17th board through the MGT link of their own board number

Page 39: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

39

Source Routing – 17th Board

• To support the 17th board, boards communicate to the 17th board through the MGT link of their own board number

BOARD 0

FPGA 2

01 1415

45

23

1213

1011

8 96 7

FPGA 0 FPGA 3

FPGA 1

BOARD 7

FPGA 30

01 1415

45

23

1213

1011

8 96 7

FPGA 28 FPGA 31

FPGA 29

BOARD 16

FPGA 66

01 1415

45

23

1213

1011

8 96 7

FPGA 64 FPGA 67

FPGA 65

BOARD 10

FPGA 42

01 1415

45

23

1213

1011

8 96 7

FPGA 40 FPGA 43

FPGA 41

Page 40: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

40

Source Routing – 17th Board

• For example, for BOARD 0 to send to BOARD 16, it sends over MGT 0

BOARD 0

FPGA 2

01 1415

45

23

1213

1011

8 96 7

FPGA 0 FPGA 3

FPGA 1

BOARD 7

FPGA 30

01 1415

45

23

1213

1011

8 96 7

FPGA 28 FPGA 31

FPGA 29

BOARD 16

FPGA 66

01 1415

45

23

1213

1011

8 96 7

FPGA 64 FPGA 67

FPGA 65

BOARD 10

FPGA 42

01 1415

45

23

1213

1011

8 96 7

FPGA 40 FPGA 43

FPGA 41

Page 41: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

41

Microblaze Interface• Store and forward• Connecting to FSL bus for now• Essentially double buffered • MB FSL reading speed = extremely slow compare to switch delay time – at

the fastest compilation with most efficient code, takes 48 cycle to write one value to FSL bus!

• Example: send from MB to LVCMOS, loop back to LVCMOS link and then back to MB

Page 42: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

42

LVCMOS interface

• 2 cycles of latency

• Two buses connecting 2 FPGAs, can be used to do anything

• Wire control bus and data bus on LVCMOS, except data_full or free signal is high 2 cycle before it is really full

Page 43: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

43

XAUI Interface

• Much simplified because of XAUI has internal buffer

• Essentially just some control signals

• Interface has recently changed, so this is still in progress

Page 44: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

44

Software Interface

• Simple interface to send and receive data• int send(int src, int dest, byte *buf, int len)

– Copies len bytes of buf into local outgoing Buffer Unit

– Constructs source route from src MB core to dest MB core

– Blocks until all data copied

– Returns number of bytes sent or -1 on error

• Receive is called by interrupt• int recv(byte *buf, int len)

– Copies len bytes into buf from local incoming Buffer Unit

– Blocks until all data received

– Returns number of bytes received or -1 on error

Page 45: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

45

Simplifications

• Fixed packet length simplifies control hardware

• Packet length fits completely into all buffers in the system, so the entire packet can be transferred from hop to hop

• Once data transmission starts from MB buffer, it is not interrupted till MB input buffer

• Store-and-forward implementation of MB buffers

Page 46: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

46

Performance (still need to clean this up)

• Latency1 =~ 48*packet length to write into FSL bus

• Latency2 =~ 2* packet length to wait for MB buffer to be full

• Latency3 =~ 2 in switch transmission

• Latency4 =~ 48*packet length to read into FSL bus

• Bandwidth = 32bit/cycle or 64 bit/cycle (current fsl do not support 64 bit)

Page 47: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

47

Utilization on BEE2:  With Switch (16x32 FIFO) Without Switch  

Number of BSCANs 1 out of 1 100% 1 out of 1 100%

Number of BUFGMUXs 7 out of 16 43% 6 out of 16 37%

Number of DCMs 3 out of 8 37% 3 out of 8 37%

Number of External DIFFMs 1 out of 496 1% 1 out of 496 1%

Number of LOCed DIFFMs 1 out of 1 100% 1 out of 1 100%

 

Number of External DIFFSs 1 out of 496 1% 1 out of 496 1%

Number of LOCed DIFFSs 1 out of 1 100% 1 out of 1 100%

 

Number of External IOBs 371 out of 996 37% 303 out of 996 30%

Number of LOCed IOBs 371 out of 371 100% 303 out of 303 100%

 

Number of MULT18X18s 14 out of 328 4% 14 out of 328 4%

Number of RAMB16s 35 out of 328 10% 27 out of 328 8%

Number of SLICEs 8136 out of 33088 24% 6901 out of 33088 20%

Note: Measured with switch that connects 8 ports: 2 MB, 2 LVCMOS link, but no XAUI. All buffers are 32 bit wide and 16 word deep.

Page 48: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

48

Future implementation

• Switch topology change

• Allow variable packet length – using control in fsl

• DMA

• 4 MB share a DMA

Page 49: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

49

“Associated Switch”

Scheduler

Input Buffercontrol

Data

Input Buffercontrol

Data

Input Buffercontrol

Data

Input Buffercontrol

Data

Output Port

control

data

Output Port

control

data

Output Port

control

data

Output Port

control

data

sch

edu

ler

buffer

buffer

buffer

buffer

sch

edu

ler

sch

edu

ler

sch

edu

ler

sch

edu

ler

sch

edu

ler

sch

edu

ler

sch

edu

ler

Page 50: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

50

Clustered Organization

• Microblaze cores organized into clusters– Since there are 4 DIMMs on the BEE2, split into 4 clusters

• NIC will coordinate transfer of data for all MBs in cluster– Faster transfer for MBs in the same cluster because its DMA– Faster overall transfer because data copying done in hardware

• Only 4 bits per hop now, but extra hop needed

Cluster0

Cluster2

MB0

NIC0

MB3

MB1

MB2

MB8

NIC2

MB11

MB9

MB10

Cluster1

MB4

NIC1

MB7

MB5

MB6

Cluster3

MB12

NIC3

MB15

MB13

MB14

Page 51: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

51

Whats Working NOW!!

• Switch @ 100MHz

• Source route generation

• Store and forward buffer for MB

• TCL script and (partial) global parameterization

• Homogenous hardware

• Interface LVCMOS

• Single MB with switch booted on XUP

• Double MB with switch booted on BEE2

Page 52: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

52

Almost Done / To Do

• Cut Through MB Buffer – Bottleneck of copying data from software limits performance

gains from cut through version

• Need to test XAUI / MGT link

• Interrupt controller

• Complete parameterization

Page 53: Switch EECS 252 – Spring 2006 RAMP Blue Project

5/1/2006 CS252-s06, Project Presentation

53

Trouble Spots

• Tools

• Interfaces

• Putting multiple MB on FPGA

• Lack of infrastructure during early stages