parallelization strategies for wireless baseband · © imec 2010 / confidential our modulo...

26
© IMEC 2010 / CONFIDENTIAL

Upload: others

Post on 27-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL

Page 2: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL

PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND

MARTIN PALKOVIC

ARTISTDESIGN MPSOC CLUSTER MEETING

7TH JULY 2010

Page 3: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL Martin Palkovic

© imec 2010 3

WIRELESS DOMAIN CONTAINS “SEA” OF STANDARDS; AND CUSTOMERS WANT TO HAVE THEM ALL IN ONE DEVICE

Page 4: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL Martin Palkovic

© imec 2010 4

FOR MULTI-MODE >100MBPS WIRELESS COMMUNICATION SOFTWARE DEFINED RADIO (SDR) SOLUTIONS ARE CRUCIAL

Energy Efficiency

Flex

ibili

ty

DSP

ASIP

ASIC

Software Defined Radio

Page 5: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL Martin Palkovic

© imec 2010 5

THERE IS A LONG WAY TILL STANDARD SPECIFICATION IS MAPPED ON THE SDR PLATFORM

WLAN

3GPP LTE

WiMAX

Standard specification SDR platform

ADRES_1

ADRES_2

L2 RAM ARM 926

DFE_1

DFE_2

DFE_3

AD

R1_IM

EM

A

DR

1_IME

M

Page 6: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL Martin Palkovic

© imec 2010 6

OUTLINE

IMEC SDR BEAR platform

Baseband SW mapping flow

-  Sequential MATLAB code

-  Sequential C code

-  Parallel C code

IMEC SDR COBRA platform

Conclusions

Page 7: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL Martin Palkovic

© imec 2010 7

OUR PLATFORM DEALS WITH THE STREAM FROM THE ANTENNA RECEIVER TILL THE FORWARD ERROR CORRECTION ENGINE

ARM

Data SPM

ADRES2

Data SPM

ADRES1

FEC DFE

L2

DFE = Digital Front-End ARM = ARM processor FEC = Forward Error Correction

SPM = Scratch-Pad Memory L2 = Layer 2 memory ADRES = ADRES processor

Page 8: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL Martin Palkovic

© imec 2010 8

TO OBTAIN ENERGY EFFICIENCY ON THE PLATFORM, COMBINATION OF DSP, ASIP AND ASIC IS USED

Energy Efficiency

Flex

ibili

ty

Forward Error Correction FEC ASIC

Synchronization Signal detection

DFE ASIP

Modulation Demodulation

ADRES,ARM proc.

Page 9: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL Martin Palkovic

© imec 2010 9

THE BASEBAND PART OF OUR SDR PLATFORM, WE WILL FOCUS ON, FEATURES ALL DIFFERENT LEVELS OF PARALLELISM

ARM

Data SPM

ADRES2

Data SPM

ADRES1

a ib c id

e if g ih +

FU

FU FU

FU

RF

Page 10: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL Martin Palkovic

© imec 2010 10

OUTLINE

IMEC SDR BEAR platform

Baseband SW mapping flow

-  Sequential MATLAB code

-  Sequential C code

-  Parallel C code

IMEC SDR COBRA platform

Conclusions

Page 11: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL Martin Palkovic

© imec 2010 11

OUTLINE

IMEC SDR BEAR platform

Baseband SW mapping flow

-  Sequential MATLAB code

-  Sequential C code

-  Parallel C code

IMEC SDR COBRA platform

Conclusions

Page 12: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL Martin Palkovic

© imec 2010 12

KERNEL SELECTION FROM THE CODE GENERATED BY MATLAB2C CONVERSION TOOL

fft.c

chan_comp.c

demap.c

WLAN.c

Page 13: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL Martin Palkovic

© imec 2010 13

DATA LEVEL PARALLELISM (DLP) AND INSTRUCTION LEVEL PARALLELISM (ILP) ARE EXPLOITED IN THE KERNEL CODE

 DLP

▸  Intrinsics

 ILP

▸ DRESC Compiler

FU

RF

ADRES

V4HI_ADD

64

64 64

d e f g a b c d

a+d b+e c+f d+g

Page 14: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL

OUR MODULO SCHEDULING ALGORITHM IN THE DRESC COMPILER IS THE KEY TO EFFICIENT ILP EXPLOITATION

fu1 fu3 fu4 fu2 t=0

t=1

t=2..n-2

t=n-1

t=n

Modulo-Scheduling (Space-time representation)

+ *

/ +

steady state (kernel)

prologue

epilogue

Initiation Interval (II) = 1 Pipeline stages = 3 4 operations/cycle for kernel

+

* /

+

Matrix

Loop body: Where to place an operation? (placement) When to schedule an operation? (scheduling) How to connect operations? (routing)

for (i=0;i<n;i++) z = (x+y)*a + (x+y)/b;

DFG

+

* /

+

*

+

/

+

+

* /

+

x y

a b

z

?

14 Martin Palkovic

© imec 2010

Page 15: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL Martin Palkovic

© imec 2010 15

OPTIMIZED KERNELS ARE USED IN THE SKELETON CODE WHICH WAS OBTAINED BY MATLAB2C CONVERSION TOOL

WLAN.c

demapO .DRE

chan_ compO .DRE

fftO .DRE

Optimized kernel library Skeleton code

Page 16: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL Martin Palkovic

© imec 2010 16

OUTLINE

IMEC SDR BEAR platform

Baseband SW mapping flow

-  Sequential MATLAB code

-  Sequential C code

-  Parallel C code

IMEC SDR COBRA platform

Conclusions

Page 17: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL Martin Palkovic

© imec 2010 17

STILL, WLAN 40 MHZ MIMO NOT REAL-TIME FOR 1 ADRES. TASK LEVEL PARALLELISM (TLP) IS NOT EXPLOITED YET!

ARM

Data SPM

ADRES2

Data SPM

ADRES1

Preamble Symb1 Symb2 Symb3 Symb4 … 12µs 4µs 4µs

5.8µs

Input stream:

Decoding time:

Page 18: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL Martin Palkovic

© imec 2010 18

THE APPLICATION OFFERS MULTIPLE WAYS FOR PARALLELIZATION. WHICH ONE TO CHOOSE?

Preamble Symb1 Symb2 Ant1

Ant2

Symb3 Symb4 …

Preamble Symb1 Symb2 Symb3 Symb4 …

time&freq offset ↓

FFT ↓

chan.est. ↓

inv.

time&freq offset ↓

FFT ↓

tracking ↓

chan.comp. ↓

demap.

to next symbol

to FEC

Page 19: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL Martin Palkovic

© imec 2010 19

ANALYZING THE COMMUNICATION FOR POTENTIAL SPLITS IS ALWAYS A GOOD THING TO DO!

Preamble Symb1 Symb2 Ant1

Ant2

Symb3 Symb4 …

Preamble Symb1 Symb2 Symb3 Symb4 …

Preamble Symb1 Symb2 Ant1

Ant2

Symb3 Symb4 …

Preamble Symb1 Symb2 Symb3 Symb4 …

Per antenna split 100MB/s (1/3 bus)

Odd-even symbol split 2.5MB/s (<1%)

Page 20: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL Martin Palkovic

© imec 2010 20

CREATION OF THE PARALLEL CODE AND FAST EVALUATION ARE HARD THINGS FOR THE DESIGNER – TOOL SUPPORT NEEDED!

MPA

./*.c par.spec

./par/*.c

Page 21: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL Martin Palkovic

© imec 2010 21

HIGH-LEVEL SIMULATOR PROVIDES FAST INFORMATION ABOUT THE QUALITY OF OUR PARALLELIZATION

arp.data ./par/*.c

HLSim lib

Page 22: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL Martin Palkovic

© imec 2010 22

WE IMPLEMENTED THE FIFO COMMUNICATION VIA THE VLIW AND HALT INTERRUPTS OF ADRES ENGINES

ARM

Data SPM

ADRES2

Data SPM

ADRES1

Odd

sym

bols

Eve

n sy

mbo

ls

1 2

3 4

Pre

ambl

e

ADRES1 ADRES2

Halt int. ADR.2 VLIW int ADR.1 VLIW int ADR.2

Halt int ADR.1 VLIW int ADR.1, Halt int ADR.2

… … …

VLIW Int ADR.1

Copy data

Release ADR.2 (if halted)

Page 23: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL Martin Palkovic

© imec 2010 23

OUTLINE

IMEC SDR BEAR platform

Baseband SW mapping flow

-  Sequential MATLAB code

-  Sequential C code

-  Parallel C code

IMEC SDR COBRA platform

Conclusions

Page 24: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL

COBRA PLATFORM LEARNED FROM THE DRAWBACKS OF BEAR

Martin Palkovic © imec 2010 24

Page 25: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL Martin Palkovic

© imec 2010 25

CONCLUSIONS

 Multi-mode >100Mbps wireless communication requires multi-processor software-defined radio (SDR) solutions

 Parallelization for SDR should be exploited on each level – SIMD (DLP), ILP and TLP

 Odd-even symbol group split seems to be the right choice for TLP for 40 MHz MIMO SDM OFDM

 ILP was explored by our robust C DRESC compiler and TLP split was supported by our MPA parallelization tool

 With combination of DLP, ILP and TLP we achieved real-throughput behavior of 40 MHz MIMO SDM OFDM

 Cobra platform is the natural evolution of Bear to >500Mbps, multistream and run-time reconfiguration

Page 26: PARALLELIZATION STRATEGIES FOR WIRELESS BASEBAND · © imec 2010 / confidential our modulo scheduling algorithm in the dresc compiler is the key to efficient ilp exploitation fu1

© IMEC 2010 / CONFIDENTIAL