cpre / coms 583 reconfigurable computing

35
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #8 – Applications II

Upload: miyo

Post on 08-Jan-2016

21 views

Category:

Documents


1 download

DESCRIPTION

CprE / ComS 583 Reconfigurable Computing. Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #8 – Applications II. Recap – Splash 1 Architecture. VME Bus. VSB Bus. Interface. Interface. FIFO IN. Control. FIFO OUT. F 3. F 2. F 1. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CprE / ComS 583 Reconfigurable Computing

CprE / ComS 583Reconfigurable Computing

Prof. Joseph ZambrenoDepartment of Electrical and Computer EngineeringIowa State University

Lecture #8 – Applications II

Page 2: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.2

Recap – Splash 1 Architecture

F3

M3

VME Bus

Interface

VSB Bus

Control

InterfaceFIFO IN

FIFO OUT

M4

F4

F11

M11

M12

F12

F0

M0

M7

F7

F8

M8

M15

F15

F1

M1

M6

F6

F9

M9

M14

F14

F2

M2

M5

F5

F10

M10

M13

F13

F31

M31

M24

F24

F23

M23

M16

F16

F28

M28

M27

F27

F20

M20

M19

F19

F29

M29

M26

F26

F21

M21

M18

F18

F30

M30

M25

F25

F22

M22

M17

F17

Page 3: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.3

Recap – Splash 2 Architecture

M1 M4M3M2 M5 M8M7

F1 F4F3F2 F5 F8F7F6

M6

F0

M0

M16

F16

M13

F13

M14

F14

M15

F15

M12

F12

M9

F9

M10

F10

M11

F11

Crossbar Switch

From PrevBoard

To NextBoard

Page 4: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.4

Recap – Splash 2 Architecture

Sparc-basedHost

InterfaceBoard

S-Bus

External Input

External Output

Array BoardsR-Bus

SIMD S-Bus

Page 5: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.5

Recap – Dictionary Search

• XOR two character value with temp result and hash function• Rotate result• Different hash function for each FPGA

Shift amount: 7 bitsHash function: 1100 1000 1010 0011

00 0000 0000 0000 0000 0000 Clear hash register01 1010 0001 1101 00 Input the letters “th”---------------------------------10 1000 0011 0101 1100 0000 Temporary Result

10 0000 0101 0000 0110 1011 Result for “th”00 0000 0001 1001 01 Input for letters “e_”-----------------------------------------01 0010 0110 0001 1110 1011 Temporary result

10 0101 1010 0100 1100 0011 Result for “the_”

Page 6: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.6

Outline

• Recap• The Field-Programmable Port Extender (FPX) • FPX Architecture• FPX Programming Model• FPX Applications

• Pattern Matching• Packet Classification• Rule Processing

Page 7: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.7

Application – Network Processing

• Networking applications well-suited for reconfigurable hardware• Target signatures change often• Massive quantities of stream-based data• Repetitive operations

• Connecting up to a realistic networking environment is hard• Washington University experimental setup one of the

best• Shows importance of both memory and processing

capability• Numerous experiments performed over the past five

years

Page 8: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.8

Network Routing with the FPX

• FPX Modules distributed across each port of a switch • IP packets (over ATM) enter and depart line card• Packet fragments processed by modules• Advantages:

• New protocols implemented directly in silicon• Easy to upgrade in the field

IP Packets

IP Packets

IPP

IPP OPP

OPP

CardOC3/OC12/OC48

Line FPX

ExtenderPort

programmableField-

FPX

ExtenderPort

Field-programmable

CardOC3/OC12/OC48

Line

SwitchFabric

Gigabit

IPP

IPP

OPP

OPP

Page 9: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.9

FPX Hardware Device

Page 10: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.10

FPX Hardware in a WUGS-20 Switch

Page 11: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.11

FPGA-based Router

• FPX module contains two FPGAs• NID – network interface device

• Performs data queuing• RAD – reprogrammable application device

• Specialized control sequences

OSC100MHz10 MHz

OSC

SDRAM

(backside)

8Mbit ZBT

SRAM

8Mbit ZBT

SRAM

DeviceInterfaceNetworkNIDRAD

ReprogrammableApplication

Device

Virtex1000E fg680

RAD/NID Status

SDRAM

(backside)

SRAMProgram

RAD

Reprog

OC

3 / O

C12

/ O

C48

Lin

ecar

d C

onne

ctor

EPROM

NID

62.5 MHz

VR

M:

1.8

V S

witc

her

JTAG

WU

GS

Sw

itch

Bac

kpla

ne C

onne

ctor

FPXWashU / ARL

July, 2000JL / MR

OSC

Page 12: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.12

Reprogrammable Application Device

• Spatial Re-use of FPGA Resources• Modules implemented using FPGA logic• Module logic can be individually reprogrammed

• Shared Access to off-chip resources• Memory Interfaces to SRAM and SDRAM• Common Datapath to send and receive data

Mo

du

le

Mo

du

le

Network Interfaces to NID

RAD

SRAMData

SRAM

SDRAM SDRAM

Data Data

Data

Page 13: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.13

Architecture of the FPX

• RAD• Large Xilinx FPGA• Attaches to SRAM and

SDRAM• Reprogrammable over

network• Provides two user-defined

Module Interfaces

• NID • Provides Utopia

Interfaces between switch & line card

• Forwards cells to RAD• Programs RAD

SRAM

EC

Mo

du

le

EC

NID

Switch LineCard

Mo

du

le

RAD

VC VC

VCVC

RAD

Program

SDRAM SDRAM

Data Data

SRAM

Data

SRAM

Data

Page 14: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.14

Architecture of the FPX (cont.)

PR

OM

Cache

Prog

ram

Flo

wB

uffer

Ro

ute

Filter

Exte

ns

ible

Mo

du

les

Layered Protocol Wrappers

Switch

SD

RA

MS

RA

M

SD

RA

MS

RA

M

Co

nfig

NID (FPGA)

Memory

Network Interface

RAD (FPGA)

FPX Block Diagram FPX Photo

PR

OM

Cache

Prog

ramC

acheP

rogram

Flo

wB

uffer

Ro

ute

Filter

Exte

ns

ible

Mo

du

les

Layered Protocol Wrappers

Switch

SD

RA

MS

RA

M

SD

RA

MS

RA

M

Co

nfig

NID (FPGA)

Memory

Network Interface

RAD (FPGA)

FPX Block Diagram FPX Photo

Page 15: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.15

FPX SRAM

• Provide low latency for fast table-lookups• Zero Bus Turnaround (ZBT) allows back-to-back

read / write operations every 10ns• Dual, Independent Memories • 36-bit wide bus

Page 16: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.16

FPX SDRAM

• Dual, independent SDRAM memories• 64-bit wide, 100 MHz• 64Mb / Module : 128 Mb total [expandable]• Burst-based transactions [1-8 word transfers]• Latency of 14 cycles to Read/Write 8-word burst

Page 17: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.17

Routing Traffic Flows

• Functions• Check packets for errors• Process commands

• Control, status, & reprogramming

• Implement per-flow forwarding

NID

LineCardSwitch

EC ECEC

VC VC

VC VC

VC

ccp

ccp

• Traffic flows routed among• Switch• Line Card• RAD.Switch• RAD.Linecard

Page 18: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.18

Typical Flow Configurations

VC

EC

VC

ccpEC

VC VC

RADSwitch

RADLineCard

LineCardSwitchDefault Flow Action

(Bypass)

VC

EC

VC

ccpEC

VC VC

RADSwitch

RADLineCard

LineCardSwitchIngress Processing

(IP Routing)

VC

EC

VC

ccpEC

VC VC

RADSwitch

RADLineCard

LineCardSwitchFull RAD Processing

(Packet Routing and Reassembly)

VC

EC

VC

ccpEC

VC VC

RADSwitch

RADLineCard

LineCardSwitch

(Per-flow Output Queueing)Egress Processing

VC

EC

VC

ccpEC

VC VC

LineCardSwitch

RADSwitch

RADLineCard

Full Loopback Testing(System Test)

VC

EC

VC

ccpEC

VC VC

RADSwitch

RADLineCard

LineCardSwitchPartial Loopback Testing(Egress Flow Processing Test)

Page 19: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.19

Reprogramming Logic

• NID programs at boot from EPROM• Switch Controller writes RAD configuration memory to NID

• Configuration file for RAD arrives transmitted over network via control cells

• Switch Controller issues {Full/Partial} reconfigure command• NID reads RAD config memory to program RAD

• Performs complete or partial reprogramming of RAD

R A DReprogrammable

ApplicationDevice

Virtex1000E fg680

S D R A M

(ba cks id e)

8Mb

it Z

BT

SR

AM

8Mb

it Z

BT

SR

AM

S D R A M

(ba cks id e)

O SC6 2.5 M H z

O SC1 00 M H z

DeviceInterfaceNetworkN ID

R ADP rogram

FIFO E PR O MN ID

WU

GS

Sw

itch

Bac

kpla

ne C

onn

ect

or

OC

3 /

OC

12

/ OC

48 L

inec

ard

Con

nect

or

(backside)

V R M

V R M

1.8V

2.5V(backside)

P C B Trace D e ns ity

Switch Element

IPPIPPIPPIPPIPPIPPIPP

OPPOPPOPPOPPOPPOPPOPPOPPIPP

LCLCLCLCLCLCLCLC

LCLCLCLCLCLCLCLC

Page 20: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.20

FPX Interfaces Provides

• Well defined Interface• Utopia-like 32-bit fast data interface• Flow control allows back-pressure

• Flow Routing• Arbitrary permutations of packet flows through ports

• Dynamically Reprogrammable• Other modules continue to operate even while new

module is being reprogrammed

• Memory Access• Shared access to SRAM and SDRAM• Request/Grant protocol

Page 21: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.21

Pattern Matching using the FPX

• Use Hardware to detect a pattern in data• Modify packet based on match• Pipeline operation to maximize throughput

Page 22: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.22

“Hello, World” Module Function

Page 23: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.23

Logical Implementation

VCI Match

Append“WORLD”to payload

NewCell

Page 24: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.24

The Wrapper Concept

App

Wrapper

Wrapper

Page 25: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.25

AAL5 Encapsulation

ATM Header

Padding

AAL5 Trailer

Payload

0

0

1

CRC-32

options length

Payload

• Payload is packed in cells

• Padding may be added

• 64 bit Trailer at end of cell

• Trailer contains CRC-32

• Last Cell indication bit (last bit of PTI field)

Page 26: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.26

HelloBob Module

UDP Processor

Echo

UDP Hello Bob

IP Processor

Cell Processor

Frame Processor

SRAMInterface

OutputInput

Page 27: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.27

Results: Performance • Operating Frequency: 119 MHz.

• 8.4ns critical path• Well within the 10ns period RAD's clock.• Targeted to RAD’s V1000E-FG680-7

• Maximum packet processing rate:• 7.1 Million packets per second.

• (100 MHz)/(14 Clocks/Cell)• Circuit handles back-to-back packets

• Slice utilization: • 0.4% (49/12,288 slices)

• Less than one half of one percent of chip resources

• Search technique can be adapted for other types of data matching and modification

• Regular expressions• Parsing image content …

Page 28: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.28

CAM-based Packet Matching

• Sample Packet:• Source Address = 128.252.5.5 (dotted.decimal)

• Destination Address = 141.142.2.2 (dotted.decimal)

• Source Port = 4096 (decimal)

• Destination Port = 50 (decimal)

• Protocol = TCP (6)

• Payload = “Consolidate your loans. CALL NOW”• Payload Lists = { General SPAM (0), Save Money SPAM (1) }• Content Vector = “00000011” (binary) = x”03” (hex)

7103 3971

Src IP (hex) =80FC0505

Dest IP (hex) =8D8E0202

SrcPort = 1000

Dest Port =0050

Proto= 06

084072

Con-tent= 03

111 104

Page 29: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.29

Sample Filter• Source Address = 128.252.0.0 / 16 • Destination Address = 141.142.0.0 / 16• Source Port = Don’t Care• Destination Port = 50• Protocol = TCP (6)• Payload includes general SPAM (List 0)

7103 3971

Src IP (hex) =80FC0505

Dest IP (hex) =8D8E0202

SrcPort = 1000

Dest Port =0050

Proto= 06

084072

Src IP value =80FC0000

Dest IP (hex) =8D8E0000

SrcPort = 0000

Dest Port =

50

Proto= 06

Src IP (hex) =FFFF0000

Dest IP (hex) =FFFF0000

SrcPort = 0000

Dest Port =FFFF

Proto= FF

Value

Mask: 1=care0=don’t care

IP Packet

Con-ten t=

01

Con-ten t=

01

Con-tent=

03

DROP the packet : It matches the filter

Page 30: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.30

Packet Classifier with FlowIDCAM MASK [1]

CAM VALUE [1]

CAM MASK [2]

CAM VALUE [2]

CAM MASK [3]

CAM VALUE [3]

CAM MASK [N]

CAM VALUE [N]

Flow ID [1]112 bits

Flow ID [2]

Flow ID [3]

Flow ID [N]

Flow ID

. . .. . .

. . .

16 bits

Value Comparators

Mask Matchers

Priority Encoder

Resulting Flow

Identifier

Flow List

Source Address Destination Address

16 bits

Payload Match Bits

Source Port

Dest.Port

Protocol

- - CAM Table - -

Bits in IP Header

Page 31: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.31

Fast IP Lookup Algorithm

• Function• Search for best matching prefix using

Trie algorithm

Prefix Next Hop*

01*47

10* 2110* 90001* 11011* 000110* 501011* 3

0 1

0

0 0

0

0

0

1 1

1

1 1

1

1

1

1

Page 32: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.32

Hardware Implementation in the FPX

SRAM1

SRAM2

IP LookupEngine

counter

On-Chip Cell Store

SRAM1 Interface

Control CellProcessor

PacketReassembler

RAD FPGA

NID FPGA

ExtractIP Headers

Remap VCIsfor IP packets

LC SW

RequestGrant0 1

0

0 0

0

0

0

1 1

1

1 1

1

1

1

1

Page 33: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.33

Time (cycles)

Generate Address

Pipelined FIPL Operations

• Throughput : Optimized by interleaving memory accesses• Operate 5 parallel lookups• t_pipelined_lookup = 550ns / 5 = 110 ns • Throughput = 9.1 Million packets / second

Latch ADDRinto SRAM

SRAMD < M[A]

Latch Datainto FPGA

Compute

Time (cycles)

Space

(Parallel lookup units on FPGA)

Generate Address

Page 34: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.34

Other Modules Implemented

• IPv4 CAM Filter• 104 Bit header matching

• Fast IP Lookup (FIPL)• Longest Prefix Match• MAE-West at 10M

pkts/second

• Packet Content Scanner• Reg. Expression Search

• Data Queueing• Per-flow queue in

SDRAM

• IPv6 Tunneling Module• Tunnels IPv6 over IPv4

• Statistics Module• Event counter

• Traffic Generator• Per-flow mixing

• Video Recoder• Motion JPEG

• Embedded Processor• KCPSM

Page 35: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 14, 2006 Lect-08.35

Summary

• Field Programmable Port Extender (FPX)• Network-accessible Hardware• Reprogrammable Application Device

• Module Deployment• Modules implement fast processing on data flow• Network allows Arbitrary Topologies of distributed systems

• Project Website• http://www.arl.wustl.edu/arl/projects/fpx/