ece 697f reconfigurable computing lecture 19 reconfigurable coprocessors

35
Lecture 19: Reconfigurable Coprocessors November 15, 2004 ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Upload: lane

Post on 11-Jan-2016

40 views

Category:

Documents


0 download

DESCRIPTION

ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors. Overview. Focus on Processor and Array hybrids. Motivation Compute Models: how to fit into computation Examples: Garp, Prism, Remarc, OneChip, Prisc - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

ECE 697F

Reconfigurable Computing

Lecture 19

Reconfigurable Coprocessors

Page 2: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Overview

• Focus on Processor and Array hybrids.

• Motivation

• Compute Models: how to fit into computation

• Examples: Garp, Prism, Remarc, OneChip, Prisc

• Some lecture material taken with permission from Dehon lecture on reconfigurable computing.

Page 3: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Compression Techniques

• Processors efficient at sequential codes, regular arithmetic operations.

• FPGA efficient at fine-grained parallelism, unusual bit-level operations.

• Tight-coupling important: allows sharing of data/control

• Converging technologies: SRAM being migrated to same die as processor anyway. Why not integrate?

Page 4: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Motivational: Other Viewpoints

• Replace interface glue logic.

• I/O pre/post processing

• Handle real-time responsiveness

• Provide powerful, application specific operation.

• Allow migration of function /performance over time.

Page 5: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Compute Models

• Glue logic for buses, adapters.

• Dedicated I/O processor

• Instruction augmentation

- Special instructions/coprocessor ops

- VLIW/microcoded extension to processor

- Configurable vector unit

• Autonomous co/stream processor

Page 6: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Interfacing• Logic replaces:

- ASIC customization

- External FPGA/CPLD

• Example

- Bus protocols

- Peripherals

- Sensors, actuators

• Argument

- Need customization

- Modern chips have capacity

- Reduce part count

- Migrate to system-on-a-chip

- Performance/power

Page 7: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Triscend E5 Architecture

Page 8: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Triscend E5 Architecture

Page 9: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

I/O Processor• Array dedicated to servicing to I/O channel

- Sensor, LAN, WAN, peripheral

- Many protocols, services

• Provides protocol handling

- Stream computation

- Compression, encrypt

• Effectively looks like I/O peripheral to processor.

- Don’t need all at same time

- Offload function from processor.

Page 10: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

I/O Processing

• Single threaded processor created in reconfigurable logic.

• No support for multiple data pipes or multiple contexts.

• Need some minimal, local control to handle events.

• For performance or real-time guarantees, may need to service rapidly.

• Checksum and acknowledge packets, for example

Page 11: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Instruction Augmentation

• Processor can only describe a small number of basic computations in a cycle

- I bits -> 2I operations

• Recall that for Boolean function a total of ______ operations could be performed on 2 W-bit words.

• ALU implementations restrict execution of some simple operations.

- e. g. bit reversal

a31 a30………. a0

b31 b0

Swap bitpositions

Page 12: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Instruction Augmentation

• Provide a way to augment the processor instruction set for an application.

• Avoid mismatch between hardware/software

•Fit augmented instructions into data and and

control stream.

•Create a functional unit for augmented instructions.

•Compiler techniques to identify/use new functional unit.

What’s Required?

Page 13: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Chimaera

• Start from Prisc idea.

- Integrate as a functional unit

- No state

- RFU Ops (like expfu)

- Stall processor on instruction miss

• Add

- Multiple instructions at a time

- More than 2 inputs possible

• Hauck: University of Washington

Page 14: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Chimaera Architecture

• Live copy of register file values feed into array

• Each row of array may compute from register of intermediates

• Tag on array to indicate RFUOP

Page 15: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Chimaera Architecture

• Array can operate on values as soon as placed in register file.

• Logic is combinational

• When RFUOP matches

- Stall until result ready

- Drive result from matching row

Page 16: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Chimaera Timing

• If R1 presented last then stall

• Might be helped by instruction reordering

• Physical implementation an issue.

R5 R3 R2 R1

Page 17: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Chimaera Results

• Three Spec92 benchmarks

- Compress 1.11 speedup

- Eqntott 1.8

- Life 2.06

• Small arrays with limited state

• Small speedup

• Perhaps focus on global router rather than local optimization.

Page 18: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Garp

• Integrate as coprocessor

- Similar bandwidth to processor as functional unit

- Own access to memory

• Support multi-cycle operation

- Allow state

- Cycle counter to track operation

• Configuration cache, path to memory

Page 19: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Garp – UC Berkeley

• ISA – coprocessor operations

- Issue gaconfig to make particular configuration present.

- Explicitly move data to/from array

- Processor suspension during coproc operation

- Use cycle counter to track progress

• Array may directly access memory

- Processor and array share memory

- Exploits streaming data operations

- Cache/MMU maintains data consistency

Page 20: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Garp Instructions

• Interlock indicates if processor waits for array to count to zero.

• Last three instructions useful for context swap

• Processor decode hardware augmented to recognize new instructions.

Page 21: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Garp Array

• Row-oriented logic

• Dedicated path for processor/memory

• Processor does not have to be involved in array-memory path

Page 22: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Garp Results

• General results- 10-20X

improvement on stream, feed-forward operation

- 2-3x when data dependencies limit pipelining

- [Hauser-FCCM97]

Page 23: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

PRISC/Chimaera vs. Garp

• Prisc/Chimaera

- Basic op is single cycle: expfu

- No state

- Could have multiple PFUs

- Fine grained parallelism

- Not effective for deep pipelines

• Garp

- Basic op is multi-cycle – gaconfig

- Effective for deep pipelining

- Single array

- Requires state swapping consideration

Page 24: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Common Theme

• To overcome instruction expression limits:

- Define new array instructions. Make decode hardware slower / more complicated.

- Many bits of configuration… swap time. An issue -> recall tips for dynamic reconfiguration.

• Give array configuration short “name” which processor can call out.

• Store multiple configurations in array. Access as needed (DPGA)

Page 25: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

ReMarc

• Miyamori/Olukotun – Stanford

• Array of “nano-processors”

- 16b, 32 instructions each

- VLIW –like instruction

• Coprocessor interface (similar to Garp)

- No direct array -> memory

Page 26: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

ReMarc Architecture

• 8x8 array of nanoprocessor

• Reminiscent of DPGA except that processing element is ALU

Page 27: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Nanoprocessor Tile

• Each tile has own instruction RAM

• Communication with near-neighbor tiles

• Global sequence specifies non-PC

• 16 bit output.

Page 28: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

ReMarc Results

• ReMarc 60X smaller than FPGA

• Performance comparable

Page 29: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Observation

• All coprocessors have been single-threaded

- Performance improvement limited by application parallelism

• Potential for task/thread parallelism

- DPGA

- Fast context switch

• Concurrent threads seen in discussion of IO/stream processor

• Added complexity needs to be addressed in software.

Page 30: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Scalability?

• Can scale….

- Number of inactive contexts.

- Similar to cache model

- Number of PFUs in PRISC/Chimaera– Still limited by single execution thread.

– Exacerbate pressure/complexity of reconfigurable logic/interconnect

• Cannot scale?

- Amount of active resources.

- Perhaps take coarser-grain focus to parallel processing.

Page 31: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Parallel Computation: Processor and FPGA

• What would it take to let the processor and FPGA run in parallel?

Modern Processors

Deal with:

• Variable data delays

• Dependencies with data

• Multiple heterogeneous functional units

Via:

• Register scoreboarding

• Runtime data flow (Tomasulo)

Page 32: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

OneChip -> Toronto

• Allow array to have more memory-memory operations

• Want to fit into programming model/ISA without forcing exclusive processor/FPGA operation.

• Also allow decoupled processor/array execution.

• Allow interlocking of data in special “scoreboard” area.

Page 33: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

OneChip Innovations

• FPGA operates on certain memory regions only

• Makes regions explicit to processor issue.

• Scoreboard memory blocks

0x00x1000

0x10000

FPGA

Proc

Indicates usage of data pages like virtual memory system!

Page 34: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

OneChip

• Basic Op is FPGAMem -> Mem

• No state between ops

• Ops must appear sequential

• Could have multiple/parallel FPGA compute units

- Scoreboard between all

• Multiprocessing?

Page 35: ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Lecture 19: Reconfigurable Coprocessors November 15, 2004

Summary

• Several different models and uses for “reconfigurable processor”

• Some move towards parallel computing. Others towards single processors

• Exploit density and expressiveness of fine-grained, parallel operations.

• Number of ways to integrate. Need to work around limitations.