the scalable communications core: a multi-core wireless ...the scalable communications core: a...

75
1 The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun DSP Architect Wireless Communications Lab Corporate Technology Group Intel Corporation [email protected] IEEE SCV Signal Processing Society, Feb. 9, 2009 IEEE SCV Signal Processing Society, Feb. 9, 2009

Upload: others

Post on 09-May-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

1

The Scalable Communications Core: A Multi-Core Wireless

Baseband Prototype

Dr. Anthony (Tony) ChunDSP Architect

Wireless Communications LabCorporate Technology Group

Intel [email protected]

IEEE SCV Signal Processing Society, Feb. 9, 2009IEEE SCV Signal Processing Society, Feb. 9, 2009

Page 2: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society2

Agenda

� Introduction

�Motivation

�Architecture

�Programming

�Test Chip

� Implementation Examples

� Learnings

�Summary

Page 3: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society3

Introduction

� What is the Scalable Communications Core?– Flexible baseband

– Supports multiple communication standards

– Multi-core DSP

– Heterogeneous coarse-grained accelerators

– NoC interconnect

� Contributions– Developed area and energy-efficient architecture

– Developed programming technology

– Taped out first test chip

– Validated WiFi and WiMAX

– Mapped components of Bluetooth, DVB-H and GPS

Page 4: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society4

Why is this work interesting?

�Intersection of several disciplines–Communications

–Signal Processing

–Algorithms

–Architecture

–On chip interconnect

–Programming tools

Page 5: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society5

Agenda

� Introduction

�Motivation

�Architecture

�Programming

�Test Chip

� Implementation Examples

� Learnings

�Summary

Page 6: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society6

Near Field Communication

60GHzUWB BluetoothWiMAX

WiFi

A,B,G,N 3GDTV GPS

Problems:• Large Form factor•Many SKUs• RF interference

Vision: Connectivity anytime, anywhere to any network

Motivation: Too Many Radios in Future Platforms

Page 7: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society7

Agenda

� Introduction

�Motivation

�Architecture

�Programming

�Test Chip

� Implementation Examples

� Learnings

�Summary

Page 8: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society8

SCC Architecture Overview

� Heterogeneous coarse-grained Processing Elements

– Each is programmable within its domain

– Support for multiple threads within PEs

– Stream processing

– Distributed memory

� Network-on-Chip (NoC) interconnect

– Packet-based

– Direct connection to nearest neighbors

– Stringent latency requirements

� Data-driven distributed control

– Control embedded within packet header

– Microcontroller is used only for low rate configuration

Page 9: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society9

Flexibility Tradeoffs� Flexible architecture trades

three vectors

� ASIC: low flexibility, low power, small area

� Digital Signal Processor (DSP): high flexibility, high power, medium area

� FPGA: high flexibility, high power, large area

� SCC: medium flexibility, medium power, small area

For multiple basebands

High power

High flexibility

Large area

Conceptual

Page 10: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society10

Flexibility Tradeoffs� Flexible architecture trades

three vectors

� ASIC: low flexibility, low power, small area

� Digital Signal Processor (DSP): high flexibility, high power, medium area

� FPGA: high flexibility, high power, large area

� SCC: medium flexibility, medium power, small area

For multiple basebands

High power

Large area

High flexibility

Conceptual

Page 11: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society11

Flexibility Tradeoffs� Flexible architecture trades

three vectors

� ASIC: low flexibility, low power, small area

� Digital Signal Processor (DSP): high flexibility, high power, medium area

� FPGA: high flexibility, high power, large area

� SCC: medium flexibility, medium power, small area

For multiple basebands

High power

Large area

High flexibility

Conceptual

Page 12: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society12

Flexibility Tradeoffs� Flexible architecture trades

three vectors

� ASIC: low flexibility, low power, small area

� Digital Signal Processor (DSP): high flexibility, high power, medium area

� FPGA: high flexibility, high power, large area

� SCC: medium flexibility, medium power, small area

For multiple basebands

High power

Large area

High flexibility

Conceptual

Page 13: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society13

Flexibility Tradeoffs� Flexible architecture trades

three vectors

� ASIC: low flexibility, low power, small area

� Digital Signal Processor (DSP): high flexibility, high power, medium area

� FPGA: high flexibility, high power, large area

� SCC: medium flexibility, medium power, small area

�SCC solution offers best combination of energy efficiency, area efficiency and flexibility

For multiple basebands

High power

Large area

High flexibility

Conceptual

Page 14: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society14

Observation: Many Commonalities Between Wireless Standards

√√√√√√FIR / IIR

√Spreading

3G

√√√√CRC

√√√√√Randomization

√√√√Reed-Solomon Coding

√Turbo Coding

√√√√√Convolutional Coding

√√√√√Interleaving

√√√√√QAM Mapping

√√√√√Channel Estimation

√√√√√FFT

√√√√√Correlation

60GHzUWBDVB-TWiMaxWiFiAlgorithm

Wireless standards share many of the same DSP algorithms

Page 15: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society15

Architecture Considerations

� Large superset of protocols, but only a few are active concurrently

�Complex control procedures with strict timing requirements

�Pipelined data flow through protocol stack

�Must support variable data block sizes

�Must be able to constrain timing jitter and latency

Page 16: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society16

Solution: Heterogeneous Processors on a 3-ary 2-cube NoC

� Heterogeneous Processing Elements

– Digital Front End (DFE)

– Data-Stream Processing Engine (DPE)

– Interleaving (ILV)

– High-Speed Viterbi (HSV)

– Low Power Viterbi (LPV)

– Turbo-Decoder (TD)

– Convolutional Coder (CC)

– Reed-Solomon Decode(RSD)

– Reed-Solomon Encode(RSE)

� 3-ary 2-cube NoC Data Plane

� 32-bit ARC™ RISC Processor

� 32-bit OCP™ Control Plane

� PLME Mailboxes

Page 17: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society17

Solution: Scalable Communications Core

A Baseband Processor for WiFi, WiMAX, and DVB Multi-radio

� Heterogeneous Processing Elements

– Digital Front End (DFE)

– Data-Stream Processing Engine (DPE)

– Interleaving (ILV)

– High-Speed Viterbi (HSV)

– Low Power Viterbi (LPV)

– Turbo-Decoder (TD)

– Convolutional Coder (CC)

– Reed-Solomon Decoder (RSD)

– Reed-Solomon Encoder (RSE)

� 3-ary 2-cube NoC Data Plane

� 32-bit OCP™ Control Plane

� PLME Mailboxes

� 32-bit ARC™ RISC Processor

Heterogeneous Processing Elements on 3-ary 2-cube NoC

Page 18: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society18

Data Stream Processing Element

x1 x2

...

...

NCO1

#

Add

ress

Gen

erat

or

StackCore Data Router Adaptor

VLIW microcode

OCPRegs

To/From Mesh

EngineCore

� 16-bit microcontroller (StackCore)� Configuration

� Micro/Macro-sequencing

� Scalar arithmetic

� Programmed using C or assembly

� Complex DSP machine (EngineCore)� Highly reconfigurable data path

� Crossbar connections

� Complex mult, add, sub, shift, round, sat, trunc, conj.

� Split VLIW microcode –� Long Configuration Words

� Long Address Words

� Address Generators

� Stream programming model

Page 19: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society19

Reed-Solomon Decoding� Maximum throughput

� 84.2Mbps ATSC

� 105.8Mbps DVB-H (PHY)

� 22.9Mbps DVB-H (MPE)

� Up to 4 resident configurations

� GF(2m); m<=8

� T<=32

� g(x)=(x+1)(x+a)…(x+a2T-1)

� p(x)=c0xm+ c1x

m-1… cm-1x+1

� Up to 4 simultaneous streams

� Example supported standards:

� ATSC

� DVB-H

� 802.16de

� ITU-T J.83

� Integrated clock gating

� Fine grained power management

Input DMA & Codeword

Reassembly

Error Correction & Output DMA

Header Table RAM

Code Profile Registers

Switch Matrix

Codeword RAM

Codeword RAM

to mesh

from mesh

OCP Slave

Socket

control

data

Syndrome Calculator (Horner’s

Rule)

Key Equation Solver

(Berlekamp-Massey

Algorithm)

Error Locator & Evaluator

(Chien Search & Forney

Algorithm)

1/x LUT

Context RAM

1/x LUT

Codeword RAM

Codeword RAM

Codeword RAM

Codeword RAM

Codeword RAM

Codeword RAM

Page 20: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society20

Opportunity: Radio Composition using Shared Resources

� Smaller – reduce redundancy by sharing resources

� More Energy Efficient – reduced redundancy equates to lower leakage

� Scalable – can easily add new processing elements to cover emerging standards

� Wider Roaming – can compose radios on-the-fly based on signals detected in the air

� Improved Coexistence – wider array of future interference mitigation and coordination options

� Potential Time to Market Reduction – future drag and drop methodology for building a multi-radio baseband processor using well characterized processing elements on a flexible and scalable interconnect

Page 21: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society21

Dataflow & Resource Sharing: WiFi vs. Mobile WiMax TX Case

Shared Shared

ResourcesResources

Mobile WiMAX WiFi

Page 22: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society22

Dataflow & Resource Sharing:Fixed WiMAX vs. DVB RX Case

Fixed WiMAX DVB

Shared Shared

ResourcesResources

Page 23: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society23

Distributed MemoryMemory Bandwidth

0.000E+00

2.000E+10

4.000E+10

6.000E+10

8.000E+10

1.000E+11

1.200E+11

1.400E+11

802.11n 802.16e DVB

Acc

esse

s pe

r Sec

ond

Cumulative

Single Stream

Memory Bandwidth

0.000E+00

1.000E+09

2.000E+09

3.000E+09

4.000E+09

5.000E+09

6.000E+09

7.000E+09

802.11n 802.16e DVB

Acc

esse

s pe

r Sec

ond

Cumulative

Single Stream

Number of Ports vs. Clock Frequency

0100200300400500600700800900

1000

125 250 500

Clock Frequency (MHz)

Num

ber of

Por

ts R

equire

d

DVB

802.16e

802.11n

DSP + FEC DSP alone

Shared memory not practical – distributed memory required for bandwidth.

Number of Ports vs. Clock Frequency

0

10

20

30

40

50

60

125 250 500

Clock Frequency (MHz)

Req

uire

d Por

tsDVB

802.16e

802.11n

Page 24: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society24

Power vs. Flexibility

0.000E+00

5.000E+10

1.000E+11

1.500E+11

2.000E+11

2.500E+11

0 100 200 300

Flexibility Metric

Pow

er M

etric NoC

Sparse OCP matrix

Split OCP Matrix

Full OCP Matrix

Interconnect Considerations

DPE0

DPE1

DFE0

DFE1

DFE2

HSV

ILV

RSE

CC

RSD

LPV

TD

MAC0

MAC1

MAC2

DPE0

DPE1

DFE0

DFE1

DFE2

HSV

ILV

RSE

CC

RSD

LPV

TD

MAC0

MAC1

MAC2

DPE0

DPE1

DFE0

DFE1

DFE2

HSV

ILV

RSE

CC

RSD

LPV

TD

MAC0

MAC1

MAC2

DPE0

DPE1

DFE0

DFE1

DFE2

HSV

ILV

RSE

CC

RSD

LPV

TD

MAC0

MAC1

MAC2

Full Matrix (shared bus) Split Matrix (segmented bus) Sparse Matrix

3-ary 2-cube NoC

NoC provides lowest NoC provides lowest

power with maximum power with maximum

flexibilityflexibility

Page 25: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society25

NoC Issues

� Latency – caused by multiple streams contending for a shared interconnect

� Jitter – caused by time division multiplexing with variations in workload

Page 26: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society26

Using Fragmentation to Constrain Latency

Single Long Single Long

PacketPacket

Many Small FragmentsMany Small Fragments

Page 27: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society27

Using Time Division Multiplexing to Share Interconnect Segments

DSP blocks DSP blocks

for transferfor transferMultiplexed Multiplexed

fragmentsfragmentsDemultiplexed Demultiplexed

DSP blocksDSP blocks

Page 28: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society28

Using Timestamps to Constrain Jitter

0

1

1

3Input

Timestamps

Output

2

32

0 1f (x,y...z) 2

4

t0

4

t1 t2 t3 t4 t5

Time

reference

=

timestampfalse

true

outputinput

5

5

f (x,y...z)

3 4 5

Time

...

...

...

Router

south

north

west

0

east

Packets arrive with jitterPackets arrive with jitter Functions complete with jitterFunctions complete with jitter

Output Output

transmission is transmission is

precisely timedprecisely timed

Page 29: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society29

Data Driven Processing: Using a System of Tags to form Linked Lists

Stream IDStream ID

references a references a

context for context for

multimulti--stream stream

processingprocessing

Function IDFunction ID

references references

function function

parametersparameters

Output headerOutput header

contains route to contains route to

next PE, FID, & next PE, FID, &

SIDSID

Page 30: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society30

NoC Performance Requirements

1898per channel

(aggregate)

314DVB

336802.16e

1248802.11n

Throughput

(Mbps)

Protocol

0.6per channel

(7 hops)

5.8

4.2

PE Budget

NoC Budget

6.0

10.0

MAC Budget

PHY Budget

16.0802.11n SIFS

Latency

(µs)

Budget

Worst Case NoC Throughput:(RX coded soft-bits @8 bits/soft-bit)

Worst Case NoC Latency:(802.11n SIFS timing budget)

Page 31: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society31

Dimension Order Minimal Routing Satisfies Throughput Requirement

Page 32: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society32

Latency is Constrained by Packet Size Not by Choice of Routing Algorithm

Page 33: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society33

Agenda

� Introduction

�Motivation

�Architecture

�Programming

�Test Chip

� Implementation Examples

� Learnings

�Summary

Page 34: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society34

Programming Technology Challenges

�Vision: program the architecture as if it was a single DSP

–We are not there yet

�Programming of heterogeneous accelerators

–Degree of programmability varies i.e. DPE is more programmable than Viterbi decoder

–Compilers for DPE and ILVPE

–Other PEs are configured via registers

�Parallel programming model is in progress

Page 35: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society35

Programming SCC

�Map algorithms

�Code algorithms

�Build

�Debug

�Simulate

�Profile

Mapping of 802.11n Rx

PPL (Parallel Programming Language) describes protocol mapping

Page 36: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society36

Programming SCC

�Map algorithms

�Code algorithms

�Build

�Debug

�Simulate

�Profile

DFE

PPL (Parallel Programming Language) describes protocol mapping

Page 37: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society37

Programming SCC

�Map algorithms

DFE

Page 38: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society38

Programming SCC

�Map algorithms

�Code algorithms

�Build

�Debug

�Simulate

�ProfileDPE

PPL (Parallel Programming Language) describes protocol mapping

Page 39: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society39

Programming SCC

�Map algorithms

DPE

FFT

Phase

Tracking

Soft Bits

FFT

Phase

Tracking

Soft Bits

Channel Estimation

Equalizer / Spatial Demapper

Page 40: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society40

Programming SCC

�Map algorithms

�Code algorithms

�Build

�Debug

�Simulate

�ProfileILVPE

PPL (Parallel Programming Language) describes protocol mapping

Page 41: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society41

Programming SCC

�Map algorithms

ILVPE

Page 42: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society42

Programming SCC

�Map algorithms

�Code algorithms

�Build

�Debug

�Simulate

�Profile

HVDPPL (Parallel Programming Language)

describes protocol mapping

Page 43: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society43

Programming SCC

�Map algorithms

HVD

Page 44: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society44

Programming SCC

�Map algorithms

�Code algorithms

�Build

�Debug

�Simulate

�Profile

CCPE

PPL (Parallel Programming Language) describes protocol mapping

Page 45: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society45

Programming SCC

�Map algorithms

CCPE

Page 46: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society46

Programming SCC

�Map algorithms

�Code algorithms

�Build

�Debug

�Simulate

�Profile

ARCPPL (Parallel Programming Language)

describes protocol mapping

Page 47: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society47

Programming SCC

�Map algorithms

ARC

Page 48: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society48

Programming SCC

�Map algorithms

�Code algorithms

�Build

�Debug

�Simulate

�Profile

PPL (Parallel Programming Language) describes protocol mapping

Algorithms mapped to PEs

Page 49: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society49

Programming SCC

�Map algorithmsAlgorithms mapped to PEs

Page 50: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society50

Programming SCC

�Map algorithms

�Code algorithms

�Build

�Debug

�Simulate

�Profile

DPE configuration for 64 pt radix-4 FFT

DPE Example: FFT

Page 51: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society51

Programming SCC

�Map algorithms

�Code algorithms

�Build

�Debug

�Simulate

�Profile

DPE Example: FFT

Page 52: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society52

Programming SCC

�Map algorithms

�Code algorithms

�Build

�Debug

�Simulate

�Profile

selector[iter:16] dataSelector1[index:4] = {{ iter + index * 16}};

selectorselector[iter:16] dataSelector1[index:4] = {{ [iter:16] dataSelector1[index:4] = {{ iter + index * 16}};iter + index * 16}};

Data stream Programming

Language

DPE Example: FFT

Page 53: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society53

Programming SCC

�Map algorithms

�Code algorithms

�Build

�Debug

�Simulate

�Profile

selector[iter:16] dataSelector2[index:4] =

{{(iter%4) + (iter/4) * 16 + index * 4}};

selectorselector[iter:16] dataSelector2[index:4] = [iter:16] dataSelector2[index:4] =

{{(iter%4) + (iter/4) * 16 + index * 4}};{{(iter%4) + (iter/4) * 16 + index * 4}};

DPE Example: FFT

selector[iter:16] dataSelector1[index:4] = {{ iter + index * 16}};

selectorselector[iter:16] dataSelector1[index:4] = {{ [iter:16] dataSelector1[index:4] = {{ iter + index * 16}};iter + index * 16}};

Data stream Programming

Language

Page 54: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society54

Programming SCC

�Map algorithms

�Code algorithms

�Build

�Debug

�Simulate

�Profile

selector[iter:16] dataSelector3[index:4] = {{iter * 4 + index}};

selectorselector[iter:16] dataSelector3[index:4] = [iter:16] dataSelector3[index:4] = {{iter * 4 + index}};{{iter * 4 + index}};

DPE Example: FFT

selector[iter:16] dataSelector1[index:4] = {{ iter + index * 16}};

selectorselector[iter:16] dataSelector1[index:4] = {{ [iter:16] dataSelector1[index:4] = {{ iter + index * 16}};iter + index * 16}};

Data stream Programming

Language

selector[iter:16] dataSelector2[index:4] =

{{(iter%4) + (iter/4) * 16 + index * 4}};

selectorselector[iter:16] dataSelector2[index:4] = [iter:16] dataSelector2[index:4] =

{{(iter%4) + (iter/4) * 16 + index * 4}};{{(iter%4) + (iter/4) * 16 + index * 4}};

Page 55: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society55

Programming SCC

�Map algorithms

�Code algorithms

�Build

�Debug

�Simulate

�Profile

Page 56: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society56

Programming SCC

�Map algorithms

�Code algorithms

�Build

�Debug

�Simulate

�Profile

Page 57: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society57

Programming SCC

�Map algorithms

�Code algorithms

�Build

�Debug

�Simulate

�Profile

Page 58: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society58

Programming SCC

�Map algorithms

�Code algorithms

�Build

�Debug

�Simulate

�Profile

Page 59: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society59

Agenda

� Introduction

�Motivation

�Architecture

�Programming

�Test Chip

� Implementation Examples

� Learnings

�Summary

Page 60: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society60

Onega Test Chip

�65nm process

�Taped out in Dec 2007

�Subset of PEs included

Page 61: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society61

Onega Die Photo5.77 mm5.77 mm

4.26 mm4.26 mm

Page 62: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society62

Silicon Results� Process technology: 65nm

� Silicon area (excl. pads): 20.75mm2

� Program memory-DPE1, DPE2 and microcontroller: 96+96+32=224kbytes

� Data memory-DPE1, DPE2 and microcontroller: 16+256+8=280kbytes

� Logic gate count: 1.36M

� Supply Voltage: 1.1V Core, 3.3V I/O

� Package: WB-PBGA 31x31 mm

� Signal I/O Count: 332

� Measured Clock frequency: 233MHz

� Paper summarizing power measurements has been submitted

to ISVLSI 2009

Page 63: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society63

Areas of Processing Elements

0.73Configuration and controlARC

0.091InterconnectNoC

0.21CRC, randomization, codingCCPE

0.09RS encodingRSE

0.26RS decodingRSD

0.21Low power Viterbi decodingLPV

1.57puncture, interleave, multiplexILV

4.698k-point FFT/IFFT, chn eq, QAMDPE2

2.1364-point FFT/IFFT, chn eq, QAMDPE1

3.600AGC, resample, filter, detectDFE

Areamm2

ClassificationLabel

Page 64: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society64

Agenda

� Introduction

�Motivation

�Architecture

�Programming

�Test Chip

� Implementation Examples

� Learnings

�Summary

Page 65: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society65

Protocol Implementations (to date)

�802.11a/n –subset of MCSs (limited by Viterbi decoder rate

of 54 Mbps) tested on Onega silicon

–16 µs SIFS requirement met

�802.16e: range of modes validated on Onega silicon

�DVB-H: Rx on RTL simulator

�Bluetooth: modulation and demodulation on DPE simulator

�GPS: code acquisition and tracking on DPE simulator

Page 66: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society66

802.11a High Rate Inter-Symbol Control

MACARCCCLPVILVDPEDFERFIC

ccaRstccaRst

samplessts

ltssignal

psdu

signalsignal

signal

rxStart

psdupsdu

psdu

samplessamples

samples

rxCfgrxCfg

psdu

samples

psdupsdu

psdupsdu

psdu

rxEndrxEnd

...

... ... ... ... ...

Mailbox

NoCOCP

rxCfg

rxOn

samples

rxOff

ccaBusyccaBusy

htSig

htSightSig

htSig

sampleshtSightSts

htLts...

htLts

samples

samplessamples

...

samples

rxCfg

(packets)(dwords)

(messages)

...

ADC

(samples)

Symbols arrive at Symbols arrive at

250kHz rate tagged 250kHz rate tagged

by typeby type

Header delivered Header delivered

to ARCto ARC™™Payload Payload

delivered to delivered to

MACMAC

Page 67: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society67

802.11a Low Rate Inter-Frame Control

MACARCCCLPVILVDPEDFERFIC

ccaRstccaRst

samplessts

ltssignal

psdu

signalsignal

signal

rxStart

psdupsdu

psdu

samplessamples

samples

rxCfgrxCfg

psdu

samples

psdupsdu

psdupsdu

psdu

rxEndrxEnd

...

... ... ... ... ...

Mailbox

NoCOCP

rxCfg

rxOn

samples

rxOff

ccaBusyccaBusy

htSig

htSightSig

htSig

sampleshtSightSts

htLts...

htLts

samples

samplessamples

...

samples

rxCfg

(packets)(dwords)

(messages)

...

ADC

(samples)

ARCARC™™ initiates RX initiates RX

operationoperationARCARC™™ processes header and processes header and

adjusts configurationadjusts configurationARCARC™™

terminates RX terminates RX

operationoperation

Page 68: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society68

GFSK modulator / demodulator

Modulator (Transmitter)

Choose Tx sampling rate as 8 Ms/s.

Demodulator (Receiver)

Choose Rx sampling rate as 8 Ms/s.

p[k]S/P

3 bits

Look-up Table

8 samples

Z-1

P/Ss[n]

y[i], y[i+1], ���, y[i+7]

y(i+7)

Z-8

Z-8

I(n)

Q(n)Q(n-8)

I(n-8)+

-

X[n]

1/0

Z-8

++

Z-1

m[n]-

w[n]

Demodulator operates at 200 kbps (goal is 1 Mbps)

Demodulator operates at Demodulator operates at

200 kbps (goal is 1 Mbps)200 kbps (goal is 1 Mbps)

Page 69: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society69

GPS Code Acquisition

GPS C/A Code Gold Code 1 Gold Code 2 Gold Code 3

To ARC

Peak valueDoppler shiftCode shift

Logic

al

packets

DPE Processing

End of receiving GPS C/A Code data

End of processing Gold Code 1

FFT

FFT

C/A Code data

Gold Code

IFFT

Doppler correction

|.|2Find Max

loop over all doppler bins

Find Max

loop over all satellites (different Gold codes)

Onega Data Flow

Peak

Dopple

rIn

dex

Sate

llite #

Peak

Dopple

rIn

dex

Acquires four satellites in 9 ms.

Acquires four satellites in 9 Acquires four satellites in 9

ms.ms.

Page 70: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society70

Agenda

� Introduction

�Motivation

�Architecture

�Programming

�Test Chip

� Implementation Examples

� Learnings

�Summary

Page 71: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society71

Learnings

�Heterogeneous coarse-grained PE NoC architecture validated as real-time wireless baseband

�Area and power competitive with fixed solutions

�Stream programming model and tools developed

�General parallel programming tool for entire set of PEs remains a goal

Page 72: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society72

Agenda

� Introduction

�Motivation

�Architecture

�Programming

�Test Chip

� Implementation Examples

� Learnings

�Summary

Page 73: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society73

Summary

�We have demonstrated a flexible radio baseband

�Taped out test chip, programmed it, validated and measured power

�Next steps

–Implement additional protocols

–Improvements to the architecture

–Can our learnings be applied to other signal processing applications?

Page 74: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society74

Acknowledgements

Aliaksei Chapyzhenka, Anton Bobkov, Vladimir Pudovkin, Veronica Mikheeva, Alexey Kostyakov, Tatiana Stounina, Victoria Slavinskaya, Mariano Aguirre, Jorge Carballido, Arturo Veloz, David Arditti, Brando Perez Esparza, Victor Rivera

Alvarez, Carlos Ornelas, Luis Cuellar, and Edgar Borrayo Sandoval, Jeffrey Hoffman, Thomas

Tetzlaff, Frank Carroll, Kyle McCanta, Jenny Chang, Jane Lin, Kapil Gulati, David Bormann, Denise Souza, Kirk Skeba, Ernest Tsui, Inching Chen

Page 75: The Scalable Communications Core: A Multi-Core Wireless ...The Scalable Communications Core: A Multi-Core Wireless Baseband Prototype Dr. Anthony (Tony) Chun ... Algorithm WiFi WiMax

IEEE Signal Processing Society75

Q&A