rice university “joint” architecture & algorithm designs for baseband signal processing...

30
RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia Communications This work has been supported by Nokia, TI, TATP and NSF

Upload: byron-nichols

Post on 04-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

“Joint” architecture & algorithm designs for baseband signal processing

Sridhar Rajagopal and Joseph R. CavallaroRice Center for Multimedia Communications

This work has been supported by Nokia, TI, TATP and NSF

Page 2: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Single-slide version of my talk

Algorithms

DSP

VLSI

FPGA

IMAGINE

Multiuser channel estimationMultiuser detection

Task-partitioningParallelism Pipelining

Conventional arithmeticOn-line arithmetic

Instruction set extensionsCo-processor support

Functional unit design and usage

DistantPast

RecentPast

Recent andNear Future

Page 3: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Contents

Algorithms for channel estimation and detection

Conventional and on-line arithmetic designs

Programmable architecture design using the

IMAGINE simulator

Page 4: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Estimation - detection algorithms?

Sophisticated, computationally complex algorithms proposed for 3G - 4G standards

Typically need complex operations, huge matrix sizes, matrix inversions

Difficult for hardware implementation and for real-time performance

Page 5: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

rbR Hiibr

bbR Tiibb

RA*R bribb

Multiuser channel estimation algorithm

= {+1, -1} : training/tracking bits

= 8-bit integer (complex) : Received signal

N = spreading gain (typically fixed, e.g. 32)

K = number of users (variable, <=N)

= maximum likelihood channel estimate

Cr

RbN

i

2Ki

bi

ri

Ai

Page 6: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Iterative scheme for channel estimation

Bit-streaming : suitable for tracking (window length

L)

Method of gradient descent

Stable convergence behavior

Simple fixed-point VLSI architecture [ASAP 2000]

T00

TLL

)1i(bb

)i(bb b*bb*bRR

H00

HLL

)1i(br

)i(br r*br*bRR

)RR*A(AA )i(br

)i(bb

)1i()1i()i(

Page 7: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Comparisons

Implementation ClockRate

Full AdderCells

Data Rates

C67 DSP 166 MHz - 1.02 KbpsArea 500 MHz 248 3.81 Kbps

: : : :Area-Time 500 MHz 104 256 Kbps

: : : :

Time 500 MHz 2x107 83.33 Mbps

DSPs unable to exploit bit-level parallelism Inefficient storage of bits Replacing multiplications by

additions/subtractions

Page 8: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Multiuser detection innovations

Developed a simple architecture for asynchronous multiuser detection for CDMA [ + , x ]

Bit-streaming reduced latency eliminates window edge computations lower memory requirements

Pipelined stages higher throughput (with more hardware)

Page 9: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Block Pipelined Detector

Variable latency [Worst case (1st bit) D*latency per bit]

2 extra edge bit computations per stage.

11 MF 22

Bits 12-21

TIME

1 MF 12

Bits 2-11

1 PIC 12 11 PIC 22

1 PIC 12 11 PIC 22

1 PIC 12 11 PIC 22

Page 10: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Bit-streaming multiuser detection

d

ddd

d

D

1i

i

1i

1

ˆ

ˆˆˆ

ˆ

m1i

mi

m1i

mi

mi dRdCdLyy ˆˆˆ

KKRC,L, Ky

Savings in memory by D2

1H10

H00

H1

1H01

H10

H00

H1

1H0

1H01

H10

H00

H1

1H00

H0

AAAAAA00

AAAAAAAA0

0AA

0AAAAAAAA

000AAAA

Page 11: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

1 2 3 4 5 6 7 8 9 1 0 11 1 2

1 2 3 4 5 6 7 8 9 1 0 11 1 2

1 2 3 4 5 6 7 8 9 1 0 11 1 2

1 2 3 4 5 6 7 8 9 1 0 11 1 2

Pipelining the multiuser detector

Matched Filter

(causal)

PIC - Stage 1

PIC - Stage 2

PIC - Stage 3

TIME

Latency = 2*latency per bit (D/2 speedup over block)

eliminated edge bit computations. [ISCAS 2001]

Page 12: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Contents

Algorithms for channel estimation and detection

Conventional and on-line arithmetic designs

Programmable architecture design using the

IMAGINE simulator

Page 13: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Matched filter with conventional arithmetic

d p = s ig n (A H r )

A Hp ,1 A H

p ,2 A Hp ,N - 1

+

+

+

+

+

A H r

A Hp ,N

* * * *

r0 r1 rN - 1 rN

T ~ log(N) * log(d)

N - dot product sized - precision

Page 14: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Conventional MF using CSAs

A Hp ,1 A H

p ,2 A Hp ,N - 1 A H

p ,N

* * * *

r0 r1 rN - 1 rN

2 * N :2 C o m p re s s o r

C P A

d p = s ig n (A H r )

A H r

s C s C s C s C

s C

T ~ a + log(d+c)

a,c - constants

Page 15: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Key concept in on-line arithmetic

Conventional detection - high precision operations (8-32 bits) followed by testing for sign.

Actual detection dependent only on most significant digits (1-3 bits).

Use MSDF computation to find the sign and avoid computation of the successive digits. [Arith-15]

Detection

Page 16: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Comparisons of arithmetic schemes

0 0 0 R 0 0 0 R

R R

R R

R R

a i * b i

T re e A d d itio nL e v e l 1

T re e A d d itio nR e su lt tO L d

(A ) O n -lin e a rith m etic w ith fu ll p rec is io n

a i * b i

T ree A d d itio nL e v e l 1

T ree A d d itio nR e su lt

tO L = co n s tan t

R R

R R

R R

R R

S ign d e te rm in ed a t th is p o in t. S to p !

(C ) O n -lin e a rith m etic w ith tru n ca ted p rec is io n

(B ) C o n v en tio n a l a rith m etic w ith fu ll p rec is io n

tC O N V lo g(d )

Page 17: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Using on-line arithmetic for detection

Channel-1,+1 -1,+1

-1 -0.5 0 0.5 10

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Received Signal Amplitude (Normalized)

Tim

e ta

ken

for

ad

dit

ion

(N

orm

aliz

ed)

Page 18: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Equations

0

*2

N

EQP b

OPTe

00

*2*

*

11

*2*

*

11*5.0

N

E

prQ

N

E

prQP bb

OLe

Probability of error for optimal BPSK detection

Probability of error for on-line BPSK detection

r – radix of the number systemp – number of digits

Page 19: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Probability of error using on-line

Page 20: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

On-line MF implementation

A Hp ,1 A H

p ,2 A Hp ,N - 1

+

+

+

+

+

A Hp ,N

* * * *

r0 r1 rN - 1 rN

d p = s ig n (A H r )

T ~ c

c - constant

Page 21: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Throughput comparisons

Page 22: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Area comparisons

Page 23: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Implementing higher modulation schemes

1 6 -Q AM (Se e the Si g n and M ag ni tude )

R EA L

I M A G I NA R Y

0 0 0

0 0 1

0 1 0

0 11

11 1

1 0 1

11 0

0 110 1 00 0 111 11 0 1 11 0 0 0 0

wa it u n t il n e x t n o n -ze ro dig it

wa it u n t il n e x t n o n -ze ro dig it

wa it u n t il n e x t n o n -ze ro dig it

1 x x 0 0 0 0 x x

B P SK (J us t s e e the s i g n)

wa it u n t il n e x t n o n -ze ro dig it

Page 24: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Conclusions on arithmetic schemes

CSAs better than straightforward implementation

1.35 - 1.6X speedup for 8-32 bit precision1.64 - 1.14X less area

If reduced precision computations, on-line still better

1.67 - 2.12X speedup over CSA0.64 - 12.73X less area over CSA

Page 25: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Contents

Algorithms for channel estimation and detection

Conventional and on-line arithmetic designs

Programmable architecture design using the

IMAGINE simulator

Page 26: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

A programmable architecture?

Flexibility in the algorithm requirementschannel dependent computationschanging algorithms on-the-flyseamless switching between wireless LAN

and wideband CDMA -- RENE.

Simulator needed to test performance of algorithmsextensions/modifications for critical

operations

Page 27: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Algorithms needed for 3G base-band base-station implementation

Equalization FFT Viterbi decoding

Channel estimation Multiuser detection Viterbi/Turbo decoding

Multiple antennas Long spreading codes Space-Time codes

Wireless LAN

W-CDMA

If you felt that life was too easy

Page 28: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

The IMAGINE architecture and simulator

IMAGINE is a media signal processor, built at Stanford.

Many common workload features

Good starting point to explore.

Local expertise - Dr. Scott Rixner ([email protected])

Page 29: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

IMAGINE architecture

Great for media processing algorithms1024 pt FFT in 7.4 s on a 500 MHz

processor with a 8-cluster (48 units) 3.8W of power

Great for parallel, vector and streaming computations

Performance/extensions to sequential computation kernels such as Viterbi traceback needs to be investigated.

Page 30: RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia

RICE UNIVERSITY

Conclusions

Algorithm steps for designing communication systems

Design hardware-efficient versionsFixed-point implementationDSP implementation - bottlenecksTask partitioning, pipelining, parallelismComputer arithmetic ideas -- VLSI

Integration into a programmable processor