overview of implementation issues for multitier networks on dsps joseph r. cavallaro electrical...

48
Overview of Implementation Issues for Multitier Networks on DSPs Electrical & Computer Engineering Dept. Rice University August 17, 1999

Upload: shanon-bryant

Post on 19-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Overview of Implementation Issues for Multitier Networks on DSPs

Joseph R. Cavallaro

Electrical & Computer Engineering Dept.Rice UniversityAugust 17, 1999

Page 2: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Outline

Overview of Multitier Networks

DSP Rapid Prototyping Tools

Channel Estimation and Multistage Detection

DSP implementation and Real-time Issues

ASIC Implementation of Algorithm Modules

Conclusions and Future Directions

Page 3: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Multitier Overlay Networks

Home Area Wireless LAN

High Speed Office Wireless LAN

Outdoor CDMA Cellular Network

Page 4: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Time Scales in Multitier Networks

Medium Access

Horizontal

Handoff Handoff

Vertical Session Lifetime

msec secs 10 secs mins

Multiple Radio Interfaces Reconfigurability and Commonality of Modules Multitier Network Interface Card

Page 5: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

mNIC

ServerMobile Platform

Network Protocols

Proxy File System

Transcoders

Application

Proxy Awareness

mNIC

NIC

BS

BS

BS

INTERNET

FileSystem

Network Protocols

Proxy File System

Transcoders

Page 6: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Current Group

Suman Das - Universal Baseline Software System

Vishwas Sundaramurthy - System Design Issues

Sridhar Rajagopal - Channel Estimation Algorithms

Oscar Pan – Real Time Workshop Implementation

Recent Graduates:

– Chaitali Sengupta - ML Synchronization

– Gang Xu - Differencing Multistage Detector

Page 7: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

W-CDMA Simulation Testbed Overview

Development of an integrated software testbed

Unified framework to evaluate new algorithms for coding,

synchronization, detection, etc.

Construction of a faster, efficient, and possibly hardware

accelerated simulation testbed

TI TMS320C6201- TMS320C6701 based system – Base Station

TI TMS320C54 and FPGA / ASIC - Mobile

Page 8: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Software Rapid Prototyping Methodology

DSP

hardware

DSP CODE

HOSTDSP CODE GENERATION TOOLS

C - CODE

WRAPPER (C - Code or

Simulink)

C mex - CODE

MATLAB

COMPILER

MATLAB

CODE Communication and

Signal Processing

Algorithms in MATLAB

and “C”

Faster Execution of “C”

Code

Acceleration on DSP

Boards

Multiple DSP Boards

C - CODE

Page 9: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Simulink

Simulink– Good system for algorithm evaluation in

communication systems and signal processing– Ties in well with MATLAB environment and

functions– More intuitive than (C/Matlab) code based

evaluation

Used in software version of wireless testbed

Page 10: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

RTW

Real-Time Workshop

– Generates ANSI C-code for Simulink block

diagrams

– Tool for DSP rapid prototyping

– Quick but inefficient/non-optimized C-code

RTW support for C67x generation boards

– Hardware (DSP)-in-the-loop simulations

Page 11: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Wireless ChannelUser_Data

Show StatsUpdate Parameters

Decorrelating Detector

Multiuser Detector

Error Counter

Chip MF

Max. Likelihood Channel Est.

Channel Estimation

CDMA Wireless System Testbed Simulink Version

Parameters

Multiuser Detection

Channel Estimation

AWGN Channel

User Data

Error Rate Calculation

Statistics

Chip matched filter

Page 12: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Hardware Platform Issues

Current System

– TI TMS320C6201 and TMS32C6701 EVM boards

Multiple DSP Processor Configuration Issues and

Task Decomposition.

Planned Upgrade to BlueWave, Spectrum

Page 13: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

DSPs in Simulink based Wireless testbed

Use of C67 based boards for simulations– Useful for study of individual algorithms on C67

generation processors Multiprocessing issues

– Need block diagram partitioning and code generation support from Simulink/RTW

– Need cleaner external communication mechanisms in the C67x DSP

– Need support for controlling multiple DSPs

Page 14: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Architectural Issues

Memory– More internal memory for large temporary

matrices Prefetch Buffers

– Matrices stored as arrays in memory. ASIC /FPGA glue support

– To explore HW acceleration of critical parts of the code

Specialized instructions : Square roots, reciprocals, rotations ?

Page 15: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Compiler Support

Compilers for VLIW

– Scheduling & Tracking units difficult in manual assembly

– Challenge to generate code to keep all units busy.

– Small Operating System Support

Architectural improvements require coordinated advances in compiler support.

Page 16: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

W-CDMA Software Testbed Experiments

Third generation wireless communication systems

Multimedia capabilities

Multirate services

Quality of service

Higher Data Rates: 2 Mbps, 384 Kbps, 144 Kbps.

Page 17: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

The Wireless Channel : Multiuser, Multipath

Direct Path

Reflected Paths

Faces Attenuation, Delays and Doppler Effects : Unknown Channel Parameters

Antenna

Noise + MAI

Desired User

Page 18: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

W-CDMA Base-Station Receiver

Channel

Estimator

Multiuser

Detector

Demux Decoder

Data

Pilot

Estimated Amplitudes &

Delays

Demodulator

Antenna

Page 19: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

CDMA Uplink System

Channel

Encoder

Channel

Encoder

Channel

Encoder

Spreading

Spreading

Spreading

AWGN

Matched

Filter

Matched

Filter

Channel

Estimator

Matched

Filter

Multi-

User

Detector

Channel

Decoder

+

User 1d1

User 2d2

User KdK

R(t)

User 1d1

'

User 2d2

'

User KdK

'

y1

y2

yK

Demux

Page 20: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Maximum Likelihood - Channel Estimation

Send a time-multiplexed Preamble (Pilot).

Channel properties extracted from received signal.

Compare received signal with known pilot and

estimate channel parameters.

Keep estimate for remaining data bits (static).

Repeat preamble every frame, if no tracking.

Page 21: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

The Maximum Likelihood Algorithm

Compute the correlation matrices

Compute the channel estimate

Calculate the noise covariance matrix K.

Calculate the channel impulse response vector z.

Extract the ampitudes and delays from the channel impulse

response vector using least squares fit.

bb.brrr R & R ,R

.bb-1

br R R Y

Page 22: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

The ML Algorithm Complexity

Complex-Real Dot Product.

Complex-Real Matrix Product.

Complex -Real Product.

Real Square roots.– Solving quadratic equation for least squares fit.

Critical code : Matrix-vector multiplications / Dot Product

r.bL

1Rbr

1

bbbr RRY

1''

212))((

UUUUUyUyz

L

k

L

k

R

k

R

k

L

k

H

k

R

k

H

k

H

k

Assuming Unity Noise CovarianceAssuming Unity Noise Covariance

Offline

Page 23: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Differencing Multistage - Multiuser Detection

Based on the principle of Parallel Interference

Cancellation (PIC)

Cross-correlation information used to remove

interference of other users from desired user

Repeated iterations for convergence

Differencing techniques applied for improving the

performance of the algorithm

Page 24: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

The Differencing Multistage Detector

Split the crosscorrelation matrix into lower, upper and the

diagonal matrix.

Calculate the channel impulse

response iteratively using

x is called the differencing vector.

TSSDR

R

D

S

TS

})2,2,0{ˆ(

ˆˆˆ

ˆ)()2()1()1(

)1()1()(

k

lll

lTll

x

ddx

xASSAzAz

where

Page 25: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Multistage Detector Complexity

Matrix Multiplication:

– Computed only once for one frame

Dot Product:

– Computed iteratively

Critical code: Dot Product

ASSB T )(

ljij

lk

lk xBzz ˆ1

Page 26: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

TI Tools Used

Evaluation Modules (EVM) for C6201 and C6701

fixed and floating point DSPs

– 64 KB each internal program & data memory

– 256 KB SBSRAM, 8 MB SDRAM (external)

C Compiler ver 3.0 from Code Generation Tools

Code Composer ver 4.02 for profiling the code

Page 27: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

DSP Implementation: Channel Estimation

Floating point implementation found more feasible due to matrix inversions and square-roots.

Code optimized for the DSP

Use of Specialized approximate instructions– Approximate reciprocal square roots– Approximate reciprocals

Use of Assembly Code for critical part.– TI's C67 floating point benchmarks for Matrix-

Vector Multiplication & Dot Product

Data Memory requirements for Channel Estimation

Page 28: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Use of Approximate Instructions

L = 150, P =3, N= 31,

SNR = 5dB, SINR = -10 dB

TMS320C67x DSP Cycles

Approx. FPReciprocalinstruction

1

FP reciprocalfunction 28

Approx. FPReciprocal Sq. root

Instruction1

FP Reciprocal Sq.root Instruction 34

0 5 10 150

20

40

60

80

100

120

140

Number of users -->

Ex

ec

uti

on

tim

e(i

n m

illi

se

co

nd

s)

-->

Use of specialized instructions and assembly code on C6701 DSP

C6701: Original C6701: with IntrinsicsC6701: with Assembly

10% improvement

100% improvement

Page 29: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Optimization Effects for Channel Estimation

1 2 30

10

20

30

40

50

60

70

80

90

100 Effect of optimizations for Channel Estimation on C6701-->

Ex

ec

uti

on

tim

e(n

orm

ali

zed

) --

>

Base(-o3 -pm)

Approx.(-o3 -pm with intrinsics)

Assembly opt.(-o3 -pm with asm)

2.34X improvement

1.08X improvement

Page 30: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Data Memory Requirements

Data to be placed in External memory

1306

Page 31: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

DSP Implementation: Multistage Detection

16-bit Fixed Point C Code

Code optimized for the DSP

Use of Assembly Code for critical part

– TI's C62 fixed point assembly benchmarks for Dot

Product

Data memory requirements for Multistage Detection

Page 32: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Optimization Effects for Multistage Detector

1 2 30

10

20

30

40

50

60

70

80

90

100 Effect of optimizations for Multistage Detection on C6201 -->

Ex

ec

uti

on

tim

e(n

orm

ali

zed

) --

>

Global opt.(-o3 -pm -mu)

Software Pipelining (-o3 -pm) Assembly opt.

(-o3 -pm with asm)

5.22X improvement

7.47X improvement

Page 33: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Data Memory Requirements

Data can be placed

completely in Internal memory

Page 34: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Flops Count

1 2 3 4 5 6 7 80

2

4

6

8

10

12

14x 10

4

Total Number of Iterations

Nu

mb

er

of

Flo

ps

Users:K=15 SNR=6dB

Conventional MethodDifferencing Method

conventional

differencing

2X speedup

for a

three-stage

detector

Page 35: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Real-Time Requirements

Real-Time capability by C6201 DSP

NUMBER OF USERS8 9 10 11 12 13 14

50

100

150

200

250

300

350

MA

X B

IT R

AT

E P

ER

US

ER

(k

b/s

)

SNR=10dB WindowSize=12

Conventional MethodDifferencing Method

12users

150kb/s

Page 36: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Trends in Recent DSPs

More internal memory and higher clock speeds

– C6203 : 512 KB data, 384 KB program, 250 MHz

– useful for uplink channel estimation algorithms.

Specialized Blocks in the DSP Core.

– Viterbi decoding in C54.

Lower Voltage operation

– 1.2 V in C5402 , useful for saving power consumption in the

mobile.

Page 37: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

ASIC Implementation

Differencing Multistage Detector Block

MOSIS Tiny-Chip (40-pin DIP)

– 8 synchronous users

– 12-bit fixed point implementation

– 6000 transistors

– 1.2 m CMOS technology

– 190kb/s for each user (@12.5MHz)

– 3-stage cascade delay < 15 s

Page 38: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Chip (Single Stage) Architecture

)1( ld

)( lz)( lz

)( ld

)1( lz)( lz

)( ld

SHIFT

)1( ld

A

L

U

RECODER

REG

(L+L’)A ControlLogic

)1()()(

)()()1(

ˆˆˆ whereˆ)(

lll

lTll

ddxxALLzz Internal signals

External signals

Page 39: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

ASIC Architecture Features

Page 40: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Chip Layout

12-bit ALU

Soft Decisions

Cross-Correlation

Recodinglogic

2.0 mm

Page 41: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

3-stage Cascade Mode

Sin

Hin

Fin

Load

CLK

Sout

Hout

Fout

1/2

Sin

Hin

Fin

Load

CLK

Sout

Hout

Fout

1/2

Sin

Hin

Fin

Load

CLK

Sout

Hout

Fout

1/2

Matched

FilterOutput

DetectorOutput

HandShaking

Load RClock

Output Valid

Page 42: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Current Work – GPP vs. DSP

• Joint work with Prof. Sarita Adve, Praful Kaul, and Parthasarathy Ranganathan

• Performance of general-purpose systems• Comparing GPP and DSP performance• Complete 3G benchmark suite with all components• Identification of key performance bottlenecks

Page 43: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Preliminary Results (1 of 4)

(4 algorithms: channel estimation, multi-stage detection, FIR filter, dot product)

Performance of general-purpose processors– Instruction-level parallelism features help (3.4X to 4.4X)– Media ISA extensions help (1.2X to 5.4X)

New extensions for packing/multiplication useful Comparing GPP and DSP performance

– GPPs outperform DSPs UltraSPARC-II+VIS 2-4X better than TI TMS320C6701 Caveat: compiler issues with DSP

Page 44: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Preliminary Results (2 of 4)

Important to study complete system including all components– Need for complete benchmark suite

SOURCE CODING

CHANNEL CODING SPREADING

DECODER DETECTOR DEMODULATION

CHANNEL ESTIMATION

user’s bits

TRANSMITTER

RECEIVER(BASE STATION)

(MOBILE USER)

detected bits of all K users

K USERS

MODULATION

Page 45: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Preliminary Results (3 of 4)

Complete 3G benchmark suite with all components• Source coding• Channel coding• Spreading• Modulation/De-modulation• Multi-stage detection• Channel estimation• Channel decoding• Source decoding

Used either public-domain or in-house “C” code Optimized with ISA extensions

Page 46: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Preliminary Results (4 of 4)

Choice of source coding standard makes big difference– G728 system: source coding/decoding dominant– GSM system: channel estimation/detection dominant

G728

Speech Coder29%

Speech Decoder

24%

Channel Encoder

3%

Channel Decoder

17%

Channel Estimation

9%

Multi-stage Detection

18%

GSMSpeech Coder

11%

Speech Decoder

6%

Channel Encoder

3%

Channel Decoder

20%

Channel Estimation

20%

Multi-stage Detection

40%

Page 47: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Conclusions

Implementation issues : Estimation & Detection Algorithms

Channel Estimation - Floating Point / External Memory

Multistage Detection - Fixed Point / Internal Memory

Specialized instructions : square root/reciprocals.

Additional support for complex arithmetic useful.

Recent trends in GPP / DSPs highly encouraging for next generation

wireless communication applications.

Page 48: Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Future Work

FPGA / ASIC Implementation via VHDL models and SPW Program & DSP implementations for W-CDMA uplink and

downlink

– Blind Algorithms – Adaptive Algorithms

Architectural bottlenecks and compiler issues in DSPs to

enhance suitability for next generation W-CDMA systems

Multiple DSPs – mixed DSP / FPGA for mNIC