wireless communication extensions for dsps and general purpose processors

39
Wireless Communication Extensions for DSPs and General Purpose Processors Sridhar Rajagopal COMP 625 April 17, 2000

Upload: kendall-tran

Post on 03-Jan-2016

23 views

Category:

Documents


2 download

DESCRIPTION

Wireless Communication Extensions for DSPs and General Purpose Processors. Sridhar Rajagopal COMP 625 April 17, 2000. Motivation. Wireless, the next wave after Multimedia Highly Compute-Intensive Algorithms Real-Time Requirements Design based on Time-to-Market. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Wireless Communication Extensions for DSPs and  General Purpose Processors

Wireless Communication Extensions for DSPs and

General Purpose Processors

Sridhar Rajagopal

COMP 625

April 17, 2000

Page 2: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 2

Motivation

Wireless, the next wave after Multimedia Highly Compute-Intensive Algorithms Real-Time Requirements Design based on Time-to-Market

Page 3: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 3

Outline

Processor Core with Reconfigurable Support Permutation Based Interleaved Memory Processor Architecture -EPIC Instruction Set Extensions Truncated Multipliers Software Support Needed

Page 4: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 4

Characteristics of Wireless Algorithms

Massive Parallelism Bit-level Computations Matrix Based Operations Memory Intensive Complex-valued Data Approximate Computations

Page 5: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 5

What’s wrong with Current Architectures for these applications?

Page 6: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 6

Problems with Current Architectures

UltraSPARC, C6x, MMX, IA-64 Not enough MIPs/FLOPs Unable to fully exploit parallelism Bit Level Computations Memory Bottlenecks Specialized Instructions for Wireless

Communications

Page 7: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 7

Why Reconfigurable

Adapt algorithms to environment Seamless and Continuous Data Processing during

Handoffs

Home Area Wireless LAN

High Speed Office Wireless LAN

Outdoor CDMA Cellular Network

Page 8: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 8

Reconfigurable Support

User InterfaceTranslation

SynchronizationTransport Network

OSILayers3-7

Data Link Layer(Converts Frames

to Bits)

OSILayer2

Physical Layer(hardware;

raw bit stream)

OSILayer1

Page 9: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 9

Different Protocols

Source Coding Channel Coding

Channel

Decoding

Source

Decoding

Multiuser

Detection

Channel

Estimation

MPEG-4, H.723 - Voice,Multimedia

Convolutional,Turbo - Channel Coding

Page 10: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 10

A New Architecture

Processor Core

(GPP/DSP)

Cache

Q Q

Crossbar

Reconfigurable

Logic

Real-Time I/O

Bit Stream

Main

Memory

RF Unit

Processor

Add-on PCMCIA Network Interface Card

Page 11: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 11

Why Reconfigurable

Process initial bit level computations Optimize for fast I/O transfer

Reconfigurable

Logic

Real-Time I/O

Bit StreamRF Unit

Page 12: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 12

Reconfigurable Support

Configuration Caches

2 64-bit data buses1 64-bit address bus

ControlBlocks

SequencerGARP Architecture at UC,Berkeley

Boolean values 64-bit Datapath Fast I/O

Page 13: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 13

Reconfigurable Support

Wide Path to Memory

– Data Transfer

– Minimize Load Times

Configuration Caches

– Recently Displaced Configurations(5 cycles)

– Can hold 4 full size Configurations

Independent Execution

Page 14: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 14

Reconfigurable Support

Access to same Memory System as Processor

– Minimize overhead

When idle

– Load Configurations

– Transfer Data

Page 15: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 15

Operation

Load Configuration

– If in configuration cache, minimal time

Copy initial data with coprocessor move instructions

Start execution

Issue wait that interlocks while active

Copy registers back at kernel completion

Page 16: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 16

Memory Interface

Access to Main Memory and L1 Data Cache– Large, fast Memory Store

Memory Prefetch Queues for Sequential Accesses– Read aheads and Write Behinds

Processor Core

(GPP/DSP)

L1 Data Cache

Q Q

Crossbar

Main

Memory

FPGA

Instruction Cache

Page 17: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 17

Permutation Based Interleaved Memory (PBI)

High Memory Bandwidth Needed Stride-Insensitive Memory System for Matrices Multiple Banks Sustained Peak Throughput (95%)

L1 Data Cache

Main

Memory

Page 18: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 18

PBI Scheme

N- address length

M = 2n Banks

2N-n words in each bank

To access a word,

– n-bit bank number

– N-n bit address (high-order)

Calculation of the n-bit Bank Number

Page 19: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 19

Calculate Bank Number

Use all N bits to get n-bit vector Y = A X , A = n*N matrix of 0’s & 1’s

Y = AhXh + Al Xl (N-n,n) [Al -rank n]

N-bit parity circuit with logkN levels of XOR gates (k-

Fanin)

Parity Ckt.

Row 0 of A

Parity Ckt.

Row 1 of A

Parity Ckt.

Row n-1 of A

N-bit address

Decoder

n parity bit signals

2n bank select signals

Page 20: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 20

Interleaved Memory Model

Address Source

M(0) M(1) M(M-1)

Data Sink Data Sequencer

Input Buffers

Output Buffers

Memory Banks

Page 21: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 21

Processor Core

64-bit EPIC Architecture with Extensions(IA-64/C6x) Statically determined Parallelism;exploit ILP Execution Time Predictability

Processor Core

(GPP/DSP)

Cache

Q Q

Crossbar

FPGA

Page 22: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 22

EPIC Principle

Explicitly Parallel Instruction Computing

Evolution of VLIW Computing

Compiler- Key role

Architecture to assist Compiler

Better cope with dynamic factors

– which limited VLIW Parallelism

Page 23: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 23

Aspects of EPIC

Designing Plan of Execution(POE) at Compile Time

Permitting Compiler to play Statistics– Conditional Branches, Memory references

Communicating POE to the hardware– Static Scheduling– Branch information

Page 24: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 24

Architecture Features in EPIC

Static Scheduling– MultiOP– Non-Unit Assumed Latency (NUAL)

The Branch Problem– Predicated Execution– Control Speculation– Predicated Code Motion

The Memory Problem– Cache Specifiers– Data Speculation

Page 25: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 25

Instruction Set Extensions

To accelerate Bit level computations in Wireless

Real/Complex Integer - Bit Multiplications

– Used in Multiuser Detection, Decoding

Bit - Bit Multiplications

– Used in Outer Product Updates

– Correlation, Channel Estimation

Complex Integer-Integer Multiplications

Useful in other Signal Processing applications

– Speech, Video,,,

Page 26: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 26

Architecture Support

Support via Instruction Set Extensions

Minimal ALU Modifications necessary

Transparent to Register Files/Memory

Additional 8-bit Special Purpose Registers

Page 27: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 27

Integer - Bit Multiplications

64-bit Register A 64-bit Register C

+/- +/- +/-

64-bit Register D

D[I] = D[I] + b[J]*C[j]Eg: Cross-Correlation

8-bit Register b

Register Renaming?

Page 28: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 28

8-bit to 64-bit conversions

D = D + b*bT

Eg: Auto-Correlation

b1 = b(1:8),b(1:8),….b(1:8) b2 = b(1)b(1)……b(8)b(8)

b(1)..b(8) b(1) b(1) b(8)

b(1)..b(8) b(1) b(2) b(8)b(7)

b(8)

8-bit Register b 64-bit Register A

1.1 1.2

2.1

Page 29: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 29

Bit-Bit Multiplications

D = D + b*bT

Eg: Auto-Correlation

64-bit Register A = b1 64-bit Register B=b2

Ex-NOR

b1*b2Bit-Bit Multiplications

64-bit Register C=b1*b2

B1 B2 B1*B2

0 0 10 1 01 0 01 1 1

Page 30: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 30

Increment/Decrement

64-bit Register D

+/- +/- +/-

64-bit Register (D+b1*b2)

8-bit Register b1*b2

1

D = D + b*bT

Eg: Auto-Correlation

Page 31: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 31

Complex-valued Data Processing

Is it easy to add ? Is this worth an additional ALU Support ? Typically supported by Software!

?

Page 32: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 32

Truncated Multipliers

Many applications need approximate computations Adaptive Algorithms :Y = Y + mu*(Y*C) Truncate lower bits Truncated Multipliers - half the area/half the delay Can do 2 truncated multiplies in parallel with

regular

Multiplier 1 Multiplier 2Truncated

Multiplier

ALU Multipliers

Page 33: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 33

Software Support

Greater Interaction between Compilers and Architectures

– EPIC– Reconfigurable Logic

Compiler needs to find and exploit bit level computations

Reconfigurable Logic Programming

Page 34: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 34

Area Estimates

Area increase by 20% over a IA-64 architecture size

due to reconfigurable Support

Instruction Set extensions need min hardware

support

Parallel Interleaved Memory Banks will need larger

area

Page 35: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 35

Other Uses

Reconfigurable Logic– For accelerating loops of general purpose processors

Bit Level Support– For other voice, video and multimedia applications

Page 36: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 36

Conclusions

Processor Core with Reconfigurable Support developed for Wireless Applications

Instruction Set Extensions added for accelerating performance of the algorithms

Integration of Wireless Appliances with General Purpose Processors

Great Impact on Performance of Wireless Algorithms

Page 37: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 37

Future Work

Simulations for finding performance improvements

Other Processor Architectures– Bit Slice Architectures– Out-of-order

Page 38: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 38

References

The GARP Architecture and C Compiler

– T.C. Callahan,J.R.Hauser,J.Wawrzynek, IEEE Computer,April 2000, pp62-

69

http://brass.cs.berkeley.edu

EPIC:Explicitly Parallel Instruction Computing

– M.S.Schlansker,B.R.Rau, IEEE Computer, Feb 2000, pp 37-45

High-Bandwidth Interleaved Memories for Vector

Processors - A Simulation Study

– G.S.Sohi, IEEE Transactions on Computers, Vol.42,No.1,Jan 1993,pp34-44

Page 39: Wireless Communication Extensions for DSPs and  General Purpose Processors

April 17,2000 Sridhar Rajagopal 39

Acknowledgements

Vijay Pai Partha Ranganathan Joseph Cavallaro