ece 636 reconfigurable computing lecture 11 reconfigurable computing applications

24
Lecture 13: Reconfigurable Computing Applications October 10, 2013 ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Upload: chul

Post on 19-Jan-2016

66 views

Category:

Documents


0 download

DESCRIPTION

ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications. Hardware assisted Simulated Annealing. Use FPGA to perform FPGA placement Take advantage of parallelism and specialization Some limitations Global view of cost Convergence Scalability Lots of benefits - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

ECE 636

Reconfigurable Computing

Lecture 11

Reconfigurable Computing Applications

Page 2: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

Hardware assisted Simulated Annealing

° Use FPGA to perform FPGA placement

° Take advantage of parallelism and specialization

° Some limitations• Global view of cost

• Convergence

• Scalability

° Lots of benefits• Massive parallelism

• Self-contained reconfigurable system

Courtesy: Wrighton/DeHon

Page 3: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

Systolic Architectures

Memory

Bottleneck

Compute

Compute

Compute

Memory

Compute

Memory

Compute

Memory

Compute

Memory

Compute

Memory

Compute

Memory

Compute

Page 4: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

Strategy° Reformulate simulated annealing allowing only

local swaps

° Consider all swaps in parallel

° Maintain information in “systolic cells”• Represent current placement spatially

• Construct hardware to operate on entire placement at once

Page 5: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

Local Swaps

Local Communication

Local Swaps

Massively Parallel Operation

Page 6: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

Individual Swap Element

myX, myY

counter

myIDFanout0(id, x, y)

PosChain(id, x, y)

Fanout2(id, x, y)

Fanout2N(id, x, y)

Fanin0(id, x, y)

Fanin1(id, x, y)

Fanin2(id, x, y)

FaninN(id, x, y)

Position chain in

Left data in

Position chain out

Right Data In

Right data out

Left data out

Fanout1(id, x, y)

Up data in/out

Down Data in/out

RandomnessArithmetic

Unit

Page 7: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

Linear Wirelength Improvement

Apex4 Benchmark

0

20000

40000

60000

80000

100000

120000

140000

160000

0 100000 200000 300000 400000 500000 600000 700000

Clock Cycles

Me

tric

0

0.2

0.4

0.6

0.8

1

P

Metric

P

Page 8: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

Choosing 400 Cooling Steps

0.8

1

1.2

1.4

1.6

1.8

2

1 .02√N .04√N .06√N .08√N .10√N .12√N .14√N .16√N .18√N .20√N

swapsPerInterval

No

rma

lize

d L

ine

ar

Wir

ele

ng

th M

etr

ic

alu4.net

apex2.net

apex4.net

bigkey.net

clma.net

des.net

diffeq.net

dsip.net

elliptic.net

ex1010.net

ex5p.net

frisc.net

misex3.net

pdc.net

s298.net

s38417.net

s38584.1.net

seq.net

spla.net

tseng.net

Page 9: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

VPR Comparison Methodology

Netlists fromFPGA Place andRoute Challenge

SystolicPlacementAlgorithm

vpr

Router

Placed Design

vpr -fastPlacer

ConfigurationOptions

Record Statistics (channel utilization,critical path delay)

Routed Design Routed Design

vpr

Router

Placed Design

Page 10: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

Speedups

° VPR on 2.2 GHz Xeon Workstation

° 500x for ex5p• 18% channel growth

° 1200x for spla• 41% channel growth

° More opportunity for speedups with better cooling schedules

° Better quality with better cost functions

° Feasible on a Virtex2000E part

Page 11: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

Networking Application: Reconfigurable Firewall

° Networking hardware well suited for reconfigurable hardware

• Target signatures change often

• Massive quantities of stream-based data

• Repetitive operations

° Connecting up to a realistic networking environment is hard• Washington University experimental setup one of the best

• Shows importance of both memory and processing capability

° Numerous experiments performed over the past five years

Courtesy: Lockwood

Page 12: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

Network Routing

• FPGAs popular in network hardware

• New protocols implemented directly in silicon

• Easy to upgrade in the field

• Washington University Gigabit Switch (WUGS)

- Switch provides up to 160 Gbps of bandwidth.

Page 13: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

FPGA-based Router

• FPX module contains two FPGAs

• NID – network interface device

- Performs data queuing

• RAD – reprogrammable application device

- Specialized control sequences

Page 14: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

Reconfigurable Data Queuing

• Data may be congested.

• FPGA can be programmed for virtual channels.

Page 15: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

Hardware Setup

• Stacked boards part of system

• Scalable to multiple boards

• Allows for cooling, power.

Page 16: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

IP Lookup Function• RAD can be used to evaluate packet headers.

• Headers evaluated in groups of four bits

Page 17: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

FPX Hardware Platform

PR

OM

Cache

Prog

ram

Flo

wB

uffer

Ro

ute

Filter

Exte

ns

ible

Mo

du

les

Layered Protocol Wrappers

Switch

SD

RA

MS

RA

M

SD

RA

MS

RA

M

Co

nfig

NID (FPGA)

Memory

Network Interface

RAD (FPGA)

FPX Block Diagram FPX Photo

PR

OM

Cache

Prog

ramC

acheP

rogram

Flo

wB

uffer

Ro

ute

Filter

Exte

ns

ible

Mo

du

les

Layered Protocol Wrappers

Switch

SD

RA

MS

RA

M

SD

RA

MS

RA

M

Co

nfig

NID (FPGA)

Memory

Network Interface

RAD (FPGA)

FPX Block Diagram FPX Photo

Page 18: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

FPX Hardware in WUGS-20 Switch

Page 19: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

System-On-Chip Firewall

Layered Protocol Wrappers

Interfaces to Off-Chip Memories

PayloadScanner

TCAMFilter

FlowBuffer

Queue Manager

Datainputfrom

GigabitEthernet

or SONET

Line Card

Free List Manager

SRAM 1Controller

SDRAM 1Controller

PacketScheduler

Dataoutput

To switch,Gigabit

Ethernet,or

SONETLine Card

Payload Match Bits Flow ID

ExtensibleModule(s)

SDRAM 2Controller

Xilinx XCV2000E FPGA

Layered Protocol Wrappers

Interfaces to Off-Chip Memories

PayloadScanner

TCAMFilter

FlowBuffer

Queue Manager

Datainputfrom

GigabitEthernet

or SONET

Line Card

Free List Manager

SRAM 1Controller

SDRAM 1Controller

PacketScheduler

Dataoutput

To switch,Gigabit

Ethernet,or

SONETLine Card

Payload Match Bits Flow ID

ExtensibleModule(s)

SDRAM 2Controller

Xilinx XCV2000E FPGA

Page 20: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

Content Matching Module

regex_app(given)

32

dataen_out_appld_out_appl

sof_out_appleof_out_applsod_out_appltca_out_appl

clkreset_lenable_l

dataen_appl_ind_appl_insof_appl_ineof_appl_insod_appl_intca_appl_in

Matched

ready_l

32

8To extended Bits of CAM

To existingMP1 circuit

FromProtocol

Wrappers

wrapper_module.vhd

Page 21: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

Packet matching w/ Content Addressable Memory

° Sample Packet:- Source Address = 128.252.5.5 (dotted.decimal)

- Destination Address = 141.142.2.2 (dotted.decimal)

- Source Port = 4096 (decimal)

- Destination Port = 50 (decimal)

- Protocol = TCP (6)

- Payload = “Consolidate your loans. CALL NOW”

– Payload Lists = { General SPAM (0), Save Money SPAM (1) }

– Content Vector = “00000011” (binary) = x”03” (hex)

7103 3971

Src IP (hex) =80FC0505

Dest IP (hex) =8D8E0202

SrcPort = 1000

Dest Port =0050

Proto= 06

084072

All values shown In hex

Con-tent= 03

111 104

Page 22: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

Sample Filter

- Source Address = 128.252.0.0 / 16

- Destination Address = 141.142.0.0 / 16

- Source Port = Don’t Care

- Destination Port = 50

- Protocol = TCP (6)

- Payload includes general SPAM (List 0)

7103 3971

Src IP (hex) =80FC0505

Dest IP (hex) =8D8E0202

SrcPort = 1000

Dest Port =0050

Proto= 06

084072

Src IP value =80FC0000

Dest IP (hex) =8D8E0000

SrcPort = 0000

Dest Port =

50

Proto= 06

Src IP (hex) =FFFF0000

Dest IP (hex) =FFFF0000

SrcPort = 0000

Dest Port =FFFF

Proto= FF

Value

Mask: 1=care0=don’t care

IP Packet

Con-ten t=

01

Con-ten t=

01

Con-tent== 03

DROP the packet : It matches the filter

Page 23: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

Packet Classifier with FlowID

CAM MASK [1]

CAM VALUE [1]

CAM MASK [2]

CAM VALUE [2]

CAM MASK [3]

CAM VALUE [3]

CAM MASK [N]

CAM VALUE [N]

Flow ID [1]112 bits

Flow ID [2]

Flow ID [3]

Flow ID [N]

Flow ID

. . .. . .

. . .

16 bits

Value Comparators

Mask Matchers

Priority Encoder

Resulting Flow

Identifier

Flow List

Source Address Destination Address

16 bits

Payload Match Bits

Source Port

Dest.Port

Protocol

- - CAM Table - -

Bits in IP Header

Page 24: ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013

Other Modules Implemented

° IPv4 CAM Filter• 104 Bit header matching

° Fast IP Lookup (FIPL)• Longest Prefix Match

• MAE-West at 10M pkts/second

° Packet Content Scanner• Reg. Expression Search

° Data Queueing• Per-flow queue in SDRAM

° IPv6 Tunneling Module• Tunnels IPv6 over IPv4

° Statistics Module• Event counter

° Traffic Generator• Per-flow mixing

° Video Recoder• Motion JPEG

° Embedded Processor• KCPSM