a system solution for high- performance, low power sdr yuan lin 1, hyunseok lee 1, yoav harel 1,...

A System Solution for High- Performance, Low Power SDR

Yuan Lin1, Hyunseok Lee1, Yoav Harel1, Mark Woh1,

Scott Mahlke1, Trevor Mudge1 and Krisztian Flautner2

1Advanced Computer Architecture Laboratory

University of Michigan2ARM, Ltd.

Advanced Computer Architecture LaboratoryUniversity of Michigan

SDR Design Challenges:

Hardware design challenges High computational throughput (~40 Gops) Low power consumption (~200mW) Meet real-time requirements

DSP programming support System-level development

Inter-algorithm communication Algorithm-level development

Efficient DSP representations

SDR Benchmark Design & Analysis

W-CDMA Protocol: 2Mbps

Frontend

LPF-Tx scrambler spreader InterleaverChannelencoder

LPF-Rx

searcher

descrambler despreader combiner

descrambler despreader

modulator

demodulator

deinteleaverChanneldecoder

(turbo/viterbi)

Upper layersTransmitter

Receiver

W-CDMA Characteristics

Plenty of vector parallelism 8 & 16-bit DSP algorithms Multiplication is not dominant No floating-point operation Small instruction/data memory Has periodic real-time tasks

802.11a Protocol: 24Mbps

Frontend

LPF-TxQAM

MappingInterleaver

Channelencoder

LPF-Rx

Frequency, TimeSynchronization

modulator

demodulator

deinteleaverChanneldecoder(viterbi)

Receiver

IFFTPreambleInsertion

ChannelEstimation

FrequencyEqualization

QAMdemapping

802.11a Characteristics

Similar to W-CDMA Plenty of vector parallelism No floating-point operation Small instruction/data memory

Different from W-CDMA Mostly 16-bit DSP algorithms Multiplication is more dominant No periodic real-time tasks

SDR Processor Architecture Design

System Architecture Design Tradeoffs

Fine grainprocessors

Inter-processorNetwork

Coarse grainprocessors

Uniprocessoralu

PEalu alu alu alu

Amortized fetch

Intra-processorCommunication

Inter-processor Communication

System Architecture Design Tradeoffs

4x32 2x64 1x1281x256

(50% Utilization)

0 50 100 150 200 250 300

SIMD width (arithmetic units)

Intra-processor shuffle networkcritical delay path

ALU dominated critical delay path

critical delay path

Number of Processing Elements x SIMD width For W-CDMA 2Mbps (51.2GOP/sec) 90nm 1V @400MHz

Fine grain processors

Uniprocessor

Coarse grain processorsPE

alu alu alu alu

PEalu alu alu alu

System Architecture Design

Interconnect

GlobalMemory

SIMDMemory

SIMD ALU

SIMD RF

ScalarMemory

ScalarALU

ScalarRF

SoCInterface

ScalarPipeline

SIMDPipeline

SoCInterface

GlobalMemory

SoCInterface

MEM MEM

Memory

SoCInterface

Controller

SIMDMemory

SIMD ALU

SIMD RF

ScalarMemory

ScalarALU

ScalarRF

SoCInterface

ScalarPipeline

SIMDPipeline

SIMDMemory

SIMD ALU

SIMD RF

ScalarMemory

ScalarALU

ScalarRF

SoCInterface

ScalarPipeline

SIMDPipeline

PE Design (Area < 1mm2 Power<50mW)

SIMDMemory

ScalarMemory

16x8bitRegFile

DataShuffle

Network

8bit ALU

16x16bitRegFile

16bit ALU

SIMD Unit

Scalar Unit

32x8bit

16bit 16bit 16bit

Mapping DSP Algorithms: Filters

Z-1 Z-1 Z-1

b0b1b2b3

z-1spread Vin, Sin

shift z, z, upmac z, Vin, Sin

SIMDMemory

ScalarMemory

16x8bitRegFile

DataShuffle

Network

16x16bitRegFile

SIMD Unit

Scalar Unit

32x8bit

16bit 16bit 16bit

8bit ALU

16bit ALU

Mapping DSP Algorithm: Filter

spread Vin, Sin

shift z, z, upmac z, Vin, Sin

Efficient Design

Wide SIMD width Small register file with minimum ports Small memories Narrow system BUS Data-path optimized for 8bits Vector shuffle reduce memory ports

2 LPF-Rx182 MopsMisc. Control

1 Mops

Searcher200 Mops

Deinterleaver16 Mops

PowerControl15 Kops

PN CodeTX/RX

23 Mops

Turbo Decoder324 Mops

Buffer(1360 Bytes)

Buffer(1280 Bytes) Buffer

(2560 Bytes)FIFO Queue(12.5 KBytes)

Buffer(10 Bytes)

Buffer(20 KBytes)

Buffer(1024 Bytes)

ARM PE PE PE PEGlobal

Memory

Buffer(1024 Bytes)

WCDMA Receiver WCDMA Transmitter

4 LPF-Rx307 Mops

Scrambler8.6 Mops

Spreader4.7 Mops

Turbo Encoder2 Mops

Interleaver2 Mops

Descrambler22.5 Mops

Despreader11.3 Mops

Combiner3 Mops

Frontend

LPF-Rx

searcher

modulator

demodulator

(turbo/viterbi)

Receiver

Channeldecoder

(turbo/viterbi)

4 LPF-Rx307 Mops

Scrambler8.6 Mops

Spreader4.7 Mops

Turbo Encoder2 Mops

Interleaver2 Mops

Searcher200 Mops

2 LPF-Rx182 Mops

Despreader11.3 Mops

Combiner3 Mops

PowerControl15 Kops

PN CodeTX/RX

23 Mops

Misc. Control1 Mops

...LPF-Rx deinteleaver

searcher

Channelencoder

InterleaverspreaderscramblerLPF-Tx

Buffer(1280 Bytes)

Buffer(1360 Bytes)

Buffer(2560 Bytes)

Buffer(1024 Bytes) FIFO Queue

(12.5 KBytes)

Buffer(20 KBytes)

Buffer(20 KBytes)Buffer

(10 Bytes)

Buffer(1024 Bytes)

2 LPF-Rx182 MopsMisc. Control

1 Mops

Searcher200 Mops

PowerControl15 Kops

PN CodeTX/RX

23 Mops

Buffer(1360 Bytes)

(2560 Bytes)FIFO Queue(12.5 KBytes)

Buffer(10 Bytes)

Buffer(20 KBytes)

Buffer(1024 Bytes)

Memory

Buffer(1024 Bytes)

WCDMA Receiver WCDMA Transmitter

4 LPF-Rx307 Mops

Scrambler8.6 Mops

Spreader4.7 Mops

Turbo Encoder2 Mops

Interleaver2 Mops

Despreader11.3 Mops

Combiner3 Mops

Frontend

LPF-Rx

searcher

modulator

demodulator

(turbo/viterbi)

Receiver

Channelencoder

InterleaverspreaderscramblerLPF-Tx

Channeldecoder

(turbo/viterbi)deinteleaver

searcher

...LPF-Rx

4 LPF-Rx307 Mops

Scrambler8.6 Mops

Spreader4.7 Mops

Turbo Encoder2 Mops

Interleaver2 Mops

Searcher200 Mops

2 LPF-Rx182 Mops

Despreader11.3 Mops

Combiner3 Mops

PowerControl15 Kops

PN CodeTX/RX

23 Mops

Misc. Control1 Mops

MACFrontend A/D

Frontend D/A

Buffer(1280 Bytes)

Buffer(1360 Bytes)

Buffer(2560 Bytes)

Buffer(1024 Bytes) FIFO Queue

(12.5 KBytes)

Buffer(20 KBytes)

Buffer(20 KBytes)Buffer

(10 Bytes)

Buffer(1024 Bytes)

802.11a PE Mapping

FIR (Rx)320 Mops

Misc. Control80 Mops

FFT120 Mops

Buffer(2048 Bytes)

(30 KBytes)

Buffer(30 KBytes)

Memory

Interleaver60 Mops

Freq. Eq.120 Mops

QAM Demod2 Mops

FIR (Tx)320 Mops

Buffer(2048 Bytes)

Channel Enc.20 Mops

Buffer(22048 Bytes)

IFFT120 Mops

Buffer(1024 Bytes)

QAM Mod2 Mops

Viterbi Dec.398 Mops

Buffer(2048 Bytes)

Preamble Ins.50 Mops

Buffer(30 KBytes)

Interplator250 Mops

Buffer(2048 Bytes)

Sync.20 Mops

Buffer(1024 Bytes)

802.11a Receiver

802.11a Transmitter

Power Results

PE Mem

PE Reg

PE ALU

PE Oth

ersARM

or Bus

ersTota

802.11a

Configuration 4 PEs, 1 ARM (Cortex M3) controller Global scratchpad memory (64Kb) 90nm (1V @ 400 MHZ), Synthesized conservatively

Area Results

SDR Programming Language Support

Software Development Flow

Matlab-Simulink/CFloating Point

Algorithm Prototyping

TimingRequirements

C/SPEXFixed Point

Algorithm KernelImplementations

IO/ControlCode

C/SPEXSystem

Description

Timing/MachineIndependent

asm code

TimingRequirements

MemoryAccessPatterns

SystemDescriptionasm code

Controllerasmcode

PEasmcode

ProgrammerDefined

CompilerGenerated

Algorithm-levelDesign Flow

System-LevelDesign Flow

SPEX (Signal Processing EXtension)

Implemented as a library extension to C System-level development

Support concurrent DSP kernel function definitions Channel variables for inter-kernel communications

Algorithm-level development Native vector & matrix variables Explicit DSP variable attribute definition Native vector & matrix operations

SPEX Overview

DSP variableextension

DSP kernelextension

Scalarobject

SIMDobject

DSP variableobjects

DSPvariable

attributes

DSP variable

operations

ModeBit-

widthType

Arith. &logicop.

Predi-cation

Permu-tation

Concurrentvariableobject

Concurrentvariable

operations

Kernelobject

Channelobject

Concurrentkernel

mangement

Channelcommunication

operations

SPEX Example Code: Viterbi ACSConcurrent DSP kernel definitions

void* acs(void*) { /* variable declaration */ saturated char<64> metrics1, metrics2; saturated char<64> states; saturated char<64> t1, t2;

while (!viterbi.stop()) { /* receiving data from BMC */ metrics1 = bmc_to_acs.receive(); metrics2 = bmc_to_acs.receive();

/* add */ metrics1 += states; metrics2 += states;

/* compare and select */ t1 = (metrics1(0,2,62),metrics2(0,2,62)); t2 = (metrics1(1,2,63),metrics2(1,2,63)); states(t1<t2) = t1; states(t1>=t2) = t2;

/* sending data to TB */ acs_to_tb.send(states); }}

SPEX Example Code: Viterbi ACS

Native SIMD variable definition with explicit attributes

SPEX variable supports1. saturated/overflow2. various variable bit-width3. vector & matrices

Inter-kernel communication through channel operations

Channel types:1. FIFO queue2. Broadcast queue3. Sync/control channel4. Random-read FIFO queue

SPEX vector operations Supports(Matlab-like C code)

1. SIMD arithmetic operations

2. SIMD permutation 3. SIMD predication

Summary

- Hardware & software solutions for SDR- Hardware

- 4 dual-issue asymmetric SIMD processing elements- Consumes 200~300mW for 90nm- Meets the performance requirements for WCDMA &

802.11a

- Software- SPEX provides efficient DSP algorithm and system

implementation

Questions?

a system solution for high- performance, low power sdr yuan lin 1, hyunseok lee 1, yoav harel 1,...

mbps slide

sdr design challenges

system solution

pe design area

wcdma protocol

wcdma characteristics

soc interface system

saturated char metrics1

Documents

11 1 anysp: anytime anywhere anyway signal processing mark...

european journal of legal studies - ejlseditorial changing...

arpad figyelmesi, krisztian niesz - patinformatics ·...

ieee transactions on very large scale...

folksonomy-based collabulary learning · folksonomy-based...

green tea and stomach cancer risk : a meta-analysis ·...

this article was taken from the july 2013 · krisztian...

klijenttelebet.rs:81/liste/dopuna.pdfengland-premier league...

motivation - krisztian balog

www.eecs.umich.edu/~sdrg 1 soda: a low-power architecture...

the particle physics group 2005 and beyond. slide2/23...

mededit: a computer assisted planning and simulation system...

the bodi miniatures - webshop, models, resin figures, busts,...

az-cv english updated to feb 17, 2019 · 2019. 2. 17. · 1...

wordpress.com · 1 1 1 1 1 1 1 1 1 phase 1 1 1 1 1 1 1 1 1...

singapore’s healthcare financing system · 2018. 10....

1 folksonomy-based collabulary learning leandro balby...

lhc’s second run hyunseok lee 1. 2 ■ discovery of the...

catterick 2015 - wordpress.com€¦ · position bib # 1: 1....

€¦ · xls file · web view · 2015-10-041. 1. 1. 1. 1....