compiling for deeply embedded and heterogeneous signal ... · compiling for deeply embedded and...

Post on 16-Oct-2019

24 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Compiling for deeply embedded and heterogeneous signal processing systemsJeronimo CastrillonCfaed Chair for Compiler Construction (CCC)

5G Summit, Dresden, GermanySeptember 29, 2016

Multi-Processor/core Systems on Chip

q HW complexityq Increasing number of coresq Increasing heterogeneity

q Heterogeneity and size

© Prof. J. Castrillon. 5G Summit2

0 2 4 6 8

10 12 14 16

2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

Num

ber o

f PEs

Year

PE Count in SoCs

OMAP Family

OMAP1 OMAP2OMAP3530

OMAP3640OMAP4430

OMAP4470

OMAP5430

Snapdragon Family

S1 QSD8650 S2 MSM8255S3 MSM8060

S4 MSM8960

S4 APQ8064

[Castrill14]

DMAs, sema-

phoresPMU

Peripherals

Communicationsupport

HW queues

Network Processor

Packet DMA

MEMsubsystem

NoC

A15L1

A15L1

L2A15L1

A15L1

VLIW DSP

L1,L2

TI Keystone IIEphiphany IV

Source: http://www.adapteva.com/docs/e64g401_datasheet.pdf

Multi-Processor/core Systems on Chip

q SW productivity gapq How to program these systems? q Meet performance/energy requirements

© Prof. J. Castrillon. 5G Summit3

DMAs, sema-

phoresPMU

Peripherals

Communicationsupport

HW queues

Network Processor

Packet DMA

MEMsubsystem

NoC

A15L1

A15L1

L2A15L1

A15L1

VLIW DSP

L1,L2

TI Keystone IIEphiphany IV

Source: http://www.adapteva.com/docs/e64g401_datasheet.pdf

Fragmented tools, different runtimes/OS on ARM and DSP, 500+ pages on APIs

Homogeneous! OpenMP support uses 1/3 of program memory!

5G

Massive MIMO

High-perf.Codes

Complex Beamform.

5G

Massive MIMO

High-perf.Codes

Complex Beamform.

Multi-Processor/core Systems on Chip

q SW productivity gapq How to program these systems? q Meet performance/energy requirements

© Prof. J. Castrillon. 5G Summit4

DMAs, sema-

phoresPMU

Peripherals

Communicationsupport

HW queues

Network Processor

Packet DMA

MEMsubsystem

NoC

A15L1

A15L1

L2A15L1

A15L1

VLIW DSP

L1,L2

TI Keystone IIEphiphany IV

Source: http://www.adapteva.com/docs/e64g401_datasheet.pdf

Fragmented tools, different runtimes/OS on ARM and DSP, 500+ pages on APIs

Homogeneous! OpenMP support uses 1/3 of program memory!

Need for software automation tools

© Prof. J. Castrillon. 5G Summit5

Programming flow

DMAs, sema-

phoresPMU

Peripherals

Communicationsupport

HW queues

Network Processor

Packet DMA

MEMsubsystem

NoC

A15L1

A15L1

L2A15L1

A15L1

VLIW DSP

L1,L2

Dataflow application

Architecture model

Non-functional specification

Analysis

Synthesis

Code generation

Property models (timing, energy, error, …)

PNargs_ifft_r.ID = 6U;PNargs_ifft_r.PNchannel_freq_coef = filtered_coef_rightPNargs_ifft_r.PNnum_freq_coef = 0U;PNargs_ifft_r.PNchannel_time_coef = sink_rightPNargs_ifft_r.channel = 1;sink_left = IPCllmrf_open(3, 1, 1);sink_right = IPCllmrf_open(7, 1, 1);PNargs_sink.ID = 7U;PNargs_sink.PNchannel_in_left = sink_leftPNargs_sink.PNnum_in_left = 0U;PNargs_sink.PNchannel_in_right = sink_rightPNargs_sink.PNnum_in_right = 0U;taskParams.arg0 = (xdc_UArg)&PNargs_src;taskParams.priority = 1;

ti_sysbios_knl_Task_create((ti_sysbios_knl_Task_FuncPtr&taskParams, &eb);

glob_proc_cnt++;hasProcess = 1;taskParams.arg0 = (xdc_UArg)&PNargs_fft_ltaskParams.priority = 1;

ti_sysbios_knl_Task_create((ti_sysbios_knl_Task_FuncPtrft_Templ, &taskParams, &eb);

glob_proc_cnt++;hasProcess = 1;taskParams.arg0 = (xdc_UArg)&PNargs_ifft_rtaskParams.priority = 1;

ti_sysbios_knl_Task_create((ti_sysbios_knl_Task_FuncPtrfft_Templ, &taskParams, &eb);

glob_proc_cnt++;hasProcess = 1;taskParams.arg0 = (xdc_UArg)&PNargs_sinktaskParams.priority = 1;

[Castrill14]

Dataflow programming models

q Graph representation of applicationsq Implicit repetitive execution of tasksq Good model for streaming applicationsq Good match for signal processing & multi-media applications

q Large body of research on multiple flavors of these models

6 © Prof. J. Castrillon. 5G Summit

Static Dynamic

Properties: No race conditions, determinism, strong/weak guarantees

Language: C for process networks

q FIFO Channels

q Processes & networks

© Prof. J. Castrillon. 5G Summit7

typedef struct { int i; double d; } my_struct_t;__PNchannel my_struct_t S;__PNchannel int A = {1, 2, 3}; /* Initialization */__PNchannel short C[2], D[2], F[2], G[2];

__PNkpn AudioAmp __PNin(short A[2]) __PNout(short B[2]) __PNparam(short boost){

while (1)__PNin(A) __PNout(B) {for (int i = 0; i < 2; i++)B[i] = A[i]*boost;

}}__PNprocess Amp1 = AudioAmp __PNin(C) __PNout(F) __PNparam(3);__PNprocess Amp2 = AudioAmp __PNin(D) __PNout(G) __PNparam(10);

[Sheng14]

Architecture model and constraints

q System model including:q Topology, interconnect, memoriesq Computation: uArch modelq Communication: cost functionsq MCA standardizing it to SHIM 2.0

q Constraintsq Time constraints

q Mapping constraintsq Platform constraints

© Prof. J. Castrillon. 5G Summit8

…… [Oden13]

1 ms

3 ms

1 ms

© Prof. J. Castrillon. 5G Summit9

Analysis and synthesis: Overview

Non-functional specification

CPN application

Architecture model

Analysis: Instrumentation, profiling, tracing

Sequential performance estimation

Mapping and scheduling

Mapping configuration

Time-annotated traces

Parallel perf. estimation

Increase resources

[Castrill13]

10

Example 1) From LabVIEW to Tomahawk Platform

© Prof. J. Castrillon. 5G Summit

Baseband processing application (data dependent execution paths)

hs-serial

hs-serial

hs-serial

hs-serial

hs-serial

parallelRouter(1,0)

FPGA-Interface

Router(1,1)

Router(0,1)

Router(0,0)

Duo-PE0

Duo-PE1

FEC

Duo-PE2

SDDuo-PE6

Duo-PE7

Duo-PE5

Duo-PE3 CM

ADPLL, PM

GT

ADPLL, PM

GT

ADPLL, PM

GT

AVS Contr.

UART-GPIO

ADPLL, PM

GT

ADPLL, PM

GT

ADPLL, PM

GT

ADPLL, PM

GT

ADPLL, PM

GT

ADPLL, PMGT

ADPLL, PMGT

ADPLL, PMGT

ADPLL

ADPLL

DDR-SDRAM-Interface

Tomahawk2_core

APP

Duo-PE4

ADPLL, PMGT

VDSPRISC

VDSPRISC

VDSPRISC

VDSPRISC

VDSPRISC

VDSPRISC

VDSPRISC

ADPLL, PMGT

VDSPRISC

Tomahawk Chip (accelerators for baseband processing)

Automatic code generation

Visit demo booth

[Arn

old1

3]

11

Example 2) LTE application mapping

© Prof. J. Castrillon. 5G Summit

Cor

tess

y: S

ilexi

caSo

ftwar

e So

lutio

ns G

mbH

12

Example 2) LTE application execution

© Prof. J. Castrillon. 5G Summit

Cor

tess

y: S

ilexi

caSo

ftwar

e So

lutio

ns G

mbHSchedule

13

Example 2) LTE application: Results

© Prof. J. Castrillon. 5G Summit

8,62

1,71 2,082,79

0

2

4

6

8

10

PeakPower(W)

2,62

1,55 1,641,99

0

0,5

1

1,5

2

2,5

3

AveragePower(W)

327,8

1930

966,5

348,2

0

500

1000

1500

2000

2500

ExecutionTime(ms)

1,164

0,3340,631

1,443

0

0,5

1

1,5

2

PowerEfficiency(1/(W*second))

Reference (LoadBalancer)

Constraint=2s

Constraint=1s

Constraint=0.35s

Cor

tess

y: S

ilexi

caSo

ftwar

e So

lutio

ns G

mbH

Acknowledgements

q Vodafone Chair for Mobile Communications Systems

q Silexica Software Solutions GmbH

q National Instruments

q German Cluster of Excellence: Center for Advancing Electronics Dresden (www.cfaed.tu-dresden.de)

© Prof. J. Castrillon. 5G Summit14

Thanks!Questions?

References

[Castrill14] J. Castrillon and R. Leupers, Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap. Springer, 2014

[Sheng14] W. Sheng, S. Schürmans, M. Odendahl, M. Bertsch, V. Volevach, R. Leupers, and G. Ascheid, “A compiler infrastructure for embedded heterogeneous MPSoCs”, Parallel Comput. 40, 2 (February 2014), 51-68

[Oden13] M. Odendahl, et al., “Split-cost communication model for improved MPSoC application mapping”, In International Symposium on System on Chip pp. 1-8, 2013

[Castrill13] J. Castrillon, R. Leupers, and G. Ascheid, “MAPS: Mapping concurrent dataflow applications to heterogeneous MPSoCs,” IEEE Transactions on Industrial Informatics, vol. 9, no. 1, pp. 527–545, 2013

[Arnold13] O. Arnold, et al. “Tomahawk - Parallelism and Heterogeneity in Communications Signal Processing MPSoCs”. TECS, 2013

© Prof. J. Castrillon. 5G Summit16

top related