february 12, 1998 aman sareen dpga-coupled microprocessors commodity ic’s for the early 21st...

Post on 11-Jan-2016

217 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

February 12, 1998 Aman Sareen

DPGA-Coupled Microprocessors

Commodity IC’s for the Early 21st Century

by

Aman SareenAman SareenSchool of Electrical Engineering and Computer Science

Ohio University

February 12, 1998 Aman Sareen 2

What’s going to be covered ??

Part 1Technology TrendsApplication OutlookSome Developed Reconfigurable EnginesApplications of Reconfigurable LogicCommon Objectives of Reconfigurable DevicesLimitations of the Current Systems

February 12, 1998 Aman Sareen 3

What’s going to be covered ?? (cont.)

Part 2Uniform Computational Array Model

FPGA SIMD Arrays

Hybrid Arrays DPGA

Applications Benefits

DPGA Prototype Highlights Architecture Implementation

February 12, 1998 Aman Sareen 4

What’s going to be covered ?? (cont...)

Part 3DPGA Coupled Processor ApplicationsCosts and Benefits of ReconfigurationChallengesConclusion

February 12, 1998 Aman Sareen 5

Technology Trends

What's going on in the industry??Operational performance of microprocessors is increasing by 60% each year.More and more transistors (25% increase per year) on a single chip.12 million transistors on a single chip are estimated by the end of the century.

Disadvantages ??High performance is not we get always.Cost ineffective.Risks overspecialization.Reduced volume utilization per design investment.

So what do we do ?? => Reconfigurable Design

What does it do ??Application acceleration.Implement system specific functions.

February 12, 1998 Aman Sareen 6

Application Outlook

There’s always a scope of additions/modifications

So what do we do ?? => Reconfigurable Design

What does it do ??It allows applications to specialize the hardware.

February 12, 1998 Aman Sareen 7

Some Developed Reconfigurable Engines

PRISM ( Processor Reconfiguration through Instruction-Set Metamorphosis)built by Athanas and Silverman.* couples a programmable element with a microprocessor.* each application synthesizes new processor instructions for acceleration.

CM-2 built at the Supercomputing Research Center by Cuccaro and Reese.* the processor is augmented with reconfigurable logic to perform common operations.

SPLASH built at the Supercomputing Research Center.* used in genome sequence matching.

February 12, 1998 Aman Sareen 8

Applications of Reconfigurable Logic

Binary Operations.Arithmetic.Encryption/Decryption/Compression.Sequence and string matching.Sorting.Physical system simulation.Video and image processing.

February 12, 1998 Aman Sareen 9

Common Objectives in Reconfigurable Applications

High performance.Clear potential for application acceleration.Exploring bit-level parallel computation.High performance through parallelism.Customize data paths.

February 12, 1998 Aman Sareen 10

Limitations of the Current Systems

Low Bandwidth and High Latency InterfaceExpected acceleration not achievable.Prevents close cooperation between fixed and reconfigurable logic circuits.Expensive.Limits throughput.

High Reconfiguration OverheadSingle configuration must be maintained throughout an application.Multitasking/Time sharing not possible.

February 12, 1998 Aman Sareen 11

Unified Computational Array Model

Arr

ay E

lem

ent

Com

puta

tiona

l Uni

t

Inpu

ts f

rom

loca

l sta

te o

r fr

om o

ther

arr

ay e

lem

ents

Out

puts

to lo

cal s

tate

or

to o

ther

arr

ay e

lem

ents

Instruction

Computational Block of AE

February 12, 1998 Aman Sareen 12

Unified Computational Array ModelLookup Models for AE Computational Unit

Lookup Table(Memory)

Inpu

ts f

rom

loca

l sta

te

or f

rom

oth

er a

rray

el

emen

tsInstruction

Outputs to local state or to other array elements

Data Outputs

Add

ress

Inp

uts

Instruction = MemoryProgramming

Outputs to local state or to other array elements

Data OutputsInpu

ts f

rom

loca

l sta

te

or f

rom

oth

er a

rray

el

emen

ts

Add

ress

Inp

uts

Lookup Table(Memory)

February 12, 1998 Aman Sareen 13

Ideally, different instruction for each AE on each computational cycle

Drawback: Instruction distribution resource requirement increases. Instruction bandwidth becomes unmanageable.

P * log2(Nf)tcycle

IBW =

Unified Computational Array Model

Instruction Distribution

P = 100, Nf = 64, Operational Freq. = 10 MHz

IBW => 6 Gbits/sec

February 12, 1998 Aman Sareen 14

Unified Computational Array ModelWeakening Instruction Distribution

SIMD ArrayGlobal Instruction

(common to all elements in array)FPGA Instruction / AE Uniform in time Slow programming phase

SIMD Array Instruction / cycle Uniform in space A

rray

Ele

men

tC

ompu

tati

onal

Uni

t

Inpu

ts f

rom

loca

l sta

te o

r fr

om o

ther

arr

ay e

lem

ents

Out

puts

to lo

cal s

tate

or

to o

ther

arr

ay e

lem

ents

Instruction

FPGAStatic Instruction

( distinct for each array elementefficiently constant during operation)

February 12, 1998 Aman Sareen 15

FPGA v/s SIMD Computation

FPGA Fixed Function in Time Spatially Varying Computation Bit-Parallel Computation Build Computation Spatially

* Low-latency

SIMD Array Operation Varies in Time Homogenous Computation in Space Bit-Serial Computation Build Computation in Time

* High Throughput on Homogenous data

February 12, 1998 Aman Sareen 16

Dynamically Programmable Gate Arrays

Hybrid Model

Multiple Context FPGABroadcast a Context IdentifierIndirect Instruction LookupFeatures:

Rapid Context SwitchExploits local, on-chip BandwidthSpatially and Temporally Varying ComputationHigh Logic DensityReuse Gates and Wires in Time

February 12, 1998 Aman Sareen 17

Dynamically Programmable Gate ArraysConfigurable Instruction-Store View of DPGA AE

Computational Unit(Lookup Table)

Inpu

ts f

rom

loca

l sta

te

or f

rom

oth

er a

rray

el

emen

ts

Outputs to local state or to other array elements

Data Outputs

Add

ress

Inp

uts

Dat

a O

utpu

ts

Address InputsInstruction Store(Lookup Table)Configurational Unit

function is configuredby Instruction Storeoutput Programming may

differ for eacharray element

Global Context Identifier(common to all elements)

Inst

ruct

ion

February 12, 1998 Aman Sareen 18

Dynamically Programmable Gate Arrays

Applications

Rapid Context Switch FPGA

Time-Slice Computation

Temporal Pipelining

Operation Cache

Processor Assistance

Multi-Stream SIMD

Boundary Condition handling

Virtual Cells

February 12, 1998 Aman Sareen 19

DPGA Prototype - Highlights

4 on-chip configuration contexts

DRAM configuration cells

Automatic refresh of dynamic memory elements

Non-intrusive background loading

Wide bus architecture for high-speed context loading

Two-level routing architecture

February 12, 1998 Aman Sareen 20

DPGA Prototype - Overview

February 12, 1998 Aman Sareen 21

DPGA Prototype - Context Memory

February 12, 1998 Aman Sareen 22

DPGA Prototype - Array Element

February 12, 1998 Aman Sareen 23

DPGA Prototype - Local Interconnect

February 12, 1998 Aman Sareen 24

DPGA Prototype - Subarray Interconnect

February 12, 1998 Aman Sareen 25

DPGA Prototype - Areas

3 metal, 1µ drawn 0.85µ effective CMOS process

February 12, 1998 Aman Sareen 26

DPGA Prototype - Area Percentages

February 12, 1998 Aman Sareen 27

DPGA Prototype - Estimated Timings

tcycle = tmem + nl * tlut + nx * txbar

February 12, 1998 Aman Sareen 28

DPGA-Coupled Processor Applications

General-Purpose Workstations and Personal Computers.

Special-Purpose Computing Machines.

Embedded Systems.

Multiprocessor Systems

February 12, 1998 Aman Sareen 29

Costs and Benefits of Reconfiguration

Specialized design limits range of application.

Moving exception handling into reconfigurable logic.* Feature Interaction.

* Migrating critical control of fixed resources to reconfigurable logic

February 12, 1998 Aman Sareen 30

Challenges

Processor reconfigurable logic interfacing.

Grain Size.

Area and Pin allocation.

Multitasking and state interaction.

February 12, 1998 Aman Sareen 31

Conclusion

•Prototype demonstrates that efficient DPGAs can be implemented•DPGAs allow computation to vary both spatially and temporally•DPGAs require no additional bandwidth•Both bit-parallel and bit-serial computation in a single array structure•Higher performance•Higher flexibility•Lower part count•Microprocessors with tightly integrated, rapidly reconfigurable logic

promise to be prime commodity building block.

top related