analysis and software synthesis of kpn applications€¦ · kpn applications jeronimo castrillon...

47
Analysis and software synthesis of KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden [email protected] DREAMS Seminar UC Berkeley, CA. October 22 nd 2015

Upload: others

Post on 16-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Analysis and software synthesis of KPN applications

Jeronimo CastrillonChair for Compiler ConstructionTU [email protected]

DREAMS SeminarUC Berkeley, CA. October 22nd 2015

Page 2: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Acknowledgements

q Institute for Communication Technologies and Embedded Systems (ICE), RWTH Aachen

q Silexica Software Solutions GmbH

q German Cluster of Excellence: Center for Advancing Electronics Dresden (www.cfaed.tu-dresden.de)

© J. Castrillon. DREAM Seminar, Oct. 20152

Page 3: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

SW Layers and programming models

Context: Cfaed

q Programming flows for future heterogeneous systems

© J. Castrillon. DREAM Seminar, Oct. 20153

Hardware abstractions

Chemical processing [Voigt14] Reconfigurable HW with Si-Nanowires

[Trommer15]

Courtesy: Thorsten Lars-SchmidtPlasmonic wave-guides

Page 4: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Outline

q Motivation

q Input specs

q Analysis and synthesis

q Code generation and evaluation

q Summary

© J. Castrillon. DREAM Seminar, Oct. 20154

Page 5: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Outline

q Motivation

q Input specs

q Analysis and synthesis

q Code generation and evaluation

q Summary

© J. Castrillon. DREAM Seminar, Oct. 20155

Page 6: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Multi-processor on systems and applications

q HW complexityq Increasing number of coresq Increasing heterogeneity

q Multi-cores everywhereq Ex.: Smartphones, tablets and

e-readers

q SW “complexity”q Not anymore a simple control loop

q Need expressive models

© J. Castrillon. DREAM Seminar, Oct. 20156

0 2 4 6 8

10 12 14 16

2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

Num

ber o

f PEs

Year

PE Count in SoCs

OMAP Family

OMAP1 OMAP2OMAP3530

OMAP3640OMAP4430

OMAP4470

OMAP5430

Snapdragon Family

S1 QSD8650 S2 MSM8255S3 MSM8060

S4 MSM8960

S4 APQ8064

[Castrill14]

[EETimes11]

Page 7: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

SW productivity gap

q SW-productivity gap: complex SW for ever-increasing complex HWè Cannot keep pace with requirementsè Cannot leverage available parallelism

q Difficult to reason about time constraintsq Even more difficult about energy consumption

7

Time

Evol

utio

n (lo

g) 2x/10 Months

1,2x/10 MonthsGap

© J. Castrillon. DREAM Seminar, Oct. 2015

q Need domain-specific programming tools and methodologies!è Parallelizing sequential codesè Parallel abstractions and mapping methods

In this talk: One such a flow for KPN applications (multimedia & signal processing domains)

Page 8: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

© J. Castrillon. DREAM Seminar, Oct. 20158

Programming flow: Overview

DMAs, sema-

phoresPMU

Peripherals

Communicationsupport

HW queues

Network Processor

Packet DMA

MEMsubsystem

NoC

A15L1

A15L1

L2A15L1

A15L1

VLIW DSP

L1,L2

KPN Application

Architecture model

Non-functional specification

Analysis

Synthesis

Code generation

Property models (timing, energy, error, …)

PNargs_ifft_r.ID = 6U;PNargs_ifft_r.PNchannel_freq_coef = filtered_coef_rightPNargs_ifft_r.PNnum_freq_coef = 0U;PNargs_ifft_r.PNchannel_time_coef = sink_rightPNargs_ifft_r.channel = 1;sink_left = IPCllmrf_open(3, 1, 1);sink_right = IPCllmrf_open(7, 1, 1);PNargs_sink.ID = 7U;PNargs_sink.PNchannel_in_left = sink_leftPNargs_sink.PNnum_in_left = 0U;PNargs_sink.PNchannel_in_right = sink_rightPNargs_sink.PNnum_in_right = 0U;taskParams.arg0 = (xdc_UArg)&PNargs_srctaskParams.priority = 1;

ti_sysbios_knl_Task_create((ti_sysbios_knl_Task_FuncPtr&taskParams, &eb);

glob_proc_cnt++;hasProcess = 1;taskParams.arg0 = (xdc_UArg)&PNargs_fft_ltaskParams.priority = 1;

ti_sysbios_knl_Task_create((ti_sysbios_knl_Task_FuncPtrft_Templ, &taskParams, &eb);

glob_proc_cnt++;hasProcess = 1;taskParams.arg0 = (xdc_UArg)&PNargs_ifft_rtaskParams.priority = 1;

ti_sysbios_knl_Task_create((ti_sysbios_knl_Task_FuncPtrfft_Templ, &taskParams, &eb);

glob_proc_cnt++;hasProcess = 1;taskParams.arg0 = (xdc_UArg)&PNargs_sinktaskParams.priority = 1;

Page 9: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Outline

q Motivation

q Input specs

q Analysis and synthesis

q Code generation and evaluation

q Summary

© J. Castrillon. DREAM Seminar, Oct. 20159

Page 10: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Kahn Process Networks (KPNs)

q Graph representation of applicationsq Processes communicate only over FIFO buffersq Good model for streaming applications

q Good match for signal processing & multi-media

q Stereo digital audio filter

© J. Castrillon. DREAM Seminar, Oct. 201510

PnTran sfo rmSd fToKpn (D,   S);P nTran sfo rmToArrayAccess(D,   S);Co l lectChannelAccessRanges(D,   S);

P ropagateChannelAccessRang es(D,  S);P n StreamFacto rystreamFacto ry(Bas eP a th );switch (tran sTarget)   {case Tran sMVP :

P nTran sfo rmTemp lateIn st an tiat e(D,  S);

ErasePnP ro cessTemp late s(D);

P nP rin tFo rMVP (D,  S);b reak;

case Tran sP th read :P nTran sfo rmP th read s(D,   S,   traces);ErasePnDefs(D);b reak;

case Tran sSystemC :P rin tFo rSystemC(D,   S,   traces,  

streamFacto ry);ErasePnDefs(D);

b reak;case Tran sVPUtg:P rin tFo rVPUtg(D,  S,  

streamFacto ry);ErasePnDefs(D);b reak;

case Tran sVPUmap :P rin tFo rVPUmap (D,  S,  

strMapp in gFi leName,  streamFacto ry);

ErasePnDefs(D);b reak;

case Tran sIn val id :assert(false);b reak;

}}

#in clude "PnTran sfo rm.h "#in clude "PnTVPUtg.h "

#in clude "PnTVPUmap .h "#in clude "clan g/AST/ASTCon tex t.h "#in clude "PnStreamFacto ry.h "u sin g namespace clan g;

vo idclan g::P nTran sfo rm(Tran sT arg et  tran sTarget,   b oo l traces,

con st std ::strin g &  strMapp in gFi leName,   ASTCon text

&C tx,Sema  &S,  con st

l lvm::sys::P ath   &BaseP ath )   {assert(tran sTarget !=   Tran sIn val id );Tran slationUn itDecl *D  =  

C tx.getTran slationUn itDecl ();P nTran sfo rmSizeo f(D,   S);P nTran sfo rmTask(D,   S);switch (tran sTarget)   {case Tran sSystemC :

case Tran sVPUtg:case Tran sVPUmap :P nTCopyin g(D,  S);b reak;

d efau l t:;

}

P nTran sfo rmP th read s(D,   S,   traces);ErasePnDefs(D);b reak;

case Tran sSystemC :P rin tFo rSystemC(D,   S,   traces,  

streamFacto ry);ErasePnDefs(D);b reak;

case Tran sVPUtg:P rin tFo rVPUtg(D,  S,  

streamFacto ry);ErasePnDefs(D);b reak;

case Tran sVPUmap :P rin tFo rVPUmap (D,  S,  

strMapp in gFi leName,  streamFacto ry);

ErasePnDefs(D);b reak;

case Tran sIn val id :assert(false);b reak;

&BaseP ath )   {

assert(tran sTarget !=   Tran sIn val id );Tran slationUn itDecl *D  =  

C tx.getTran slationUn itDecl ();P nTran sfo rmSizeo f(D,   S);P nTran sfo rmTask(D,   S);switch (tran sTarget)   {case Tran sSystemC :case Tran sVPUtg:case Tran sVPUmap :P nTCopyin g(D,  S);

b reak;d efau l t:;

}

src

fft_l

sink

fft_r

filter_l

filter_r

ifft_l

ifft_r

Page 11: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Language: C for process networks

q FIFO Channels

q Processes & networks

© J. Castrillon. DREAM Seminar, Oct. 201511

typedef struct { int i; double d; } my_struct_t;__PNchannel my_struct_t S;__PNchannel int A = {1, 2, 3}; /* Initialization */__PNchannel short C[2], D[2], F[2], G[2];

__PNkpn AudioAmp __PNin(short A[2]) __PNout(short B[2]) __PNparam(short boost){

while (1)__PNin(A) __PNout(B) {

for (int i = 0; i < 2; i++)B[i] = A[i]*boost;

}}__PNprocess Amp1 = AudioAmp __PNin(C) __PNout(F) __PNparam(3);__PNprocess Amp2 = AudioAmp __PNin(D) __PNout(G) __PNparam(10);

[Sheng14]

Page 12: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Architecture model

q System model including:q Topology, interconnect, memoriesq Computation: cost tables (as backup)

q Communication: cost function (no contention)q Example: Texas Instruments Keystone

© J. Castrillon. DREAM Seminar, Oct. 201512

… [Oden13]

Page 13: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Architecture model: Communication

q Piecewise curve-fitting from measurements

© J. Castrillon. DREAM Seminar, Oct. 201513

[Oden13]

Page 14: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Architecture model: Communication (2)

q Models for Network on Chips (NoC)q Channels can be mapped to

q Local scratchpad (producer or consumer)q Global SDRAM

14 © J. Castrillon. DREAM Seminar, Oct. 2015

hs-serial

hs-serial

hs-serial

hs-serial

hs-serial

parallelRouter(1,0)

FPGA-Interface

Router(1,1)

Router(0,1)

Router(0,0)

Duo-PE0

Duo-PE1

FEC

Duo-PE2

SDDuo-PE6

Duo-PE7

Duo-PE5

Duo-PE3 CM

ADPLL, PM

GT

ADPLL, PM

GT

ADPLL, PM

GT

AVS Contr.

UART-GPIO

ADPLL, PM

GT

ADPLL, PM

GT

ADPLL, PM

GT

ADPLL, PM

GT

ADPLL, PM

GT

ADPLL, PMGT

ADPLL, PMGT

ADPLL, PMGT

ADPLL

ADPLL

DDR-SDRAM-Interface

Tomahawk2_core

APP

Duo-PE4

ADPLL, PMGT

VDSPRISC

VDSPRISC

VDSPRISC

VDSPRISC

VDSPRISC

VDSPRISC

VDSPRISC

ADPLL, PMGT

VDSPRISC

[Arnold13]

0 50 100 150 200 250

114

203140

229

Static access costs

0 1 024 2 048 3 072 4 096 5 120 6 144 7 168 8 1920

500

1 000

1 500

2 000

Cost model & measurement

Page 15: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Constraints

q Timing constraintsq Process throughputq Latencies along paths

q Time triggeringq Mapping constraints

q Processes to processors

q Channels to primitivesq Platform constraints

q Subset of resources (processors or memories)q Utilization

© J. Castrillon. DREAM Seminar, Oct. 201515

1 ms

3 ms

1 ms

Page 16: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Algorithmic description

q Extended application specificationq Selected processes are algorithmic kernels with algorithmic parameters

q Extended platform modelq SW/HW accelerated kernels and their implementation parameters

© J. Castrillon. DREAM Seminar, Oct. 201516

[Castrill10, Castrill11]

FFT HWACC

Points Dataformat

Latency equations

Interfacing

Types & parameters

Page 17: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Outline

q Motivation

q Input specs

q Analysis and synthesis

q Code generation and evaluation

q Summary

© J. Castrillon. DREAM Seminar, Oct. 201517

Page 18: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

© J. Castrillon. DREAM Seminar, Oct. 201518

Analysis and synthesis: Overview

Non-functional specification

CPN application

Architecture model

Analysis: Instrumentation, profiling, tracing

Sequential performance estimation

Mapping and scheduling

Mapping configuration

Time-annotated traces

Parallel perf. estimation

Increase resources

Page 19: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Tracing: Dealing with dynamic behavior

q KPNs do not have firing semanticsq White model of processes: source code analysis and tracingq Tracing: instrumentation, token logging and event recording

© J. Castrillon. DREAM Seminar, Oct. 201519

TracingSeq. perf. estimation

MappingPar. perf. estimation

Resources

Elapsed time between events?

Page 20: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Sequential performance estimation

q Fine-grained: Sometimes within code basic-blocks

q IR-level instrumentationq Cost tables for different architectures

q Execution count in between events

q Advanced: Emulate effect of target compilers and back annotate to IR

20 © J. Castrillon. DREAM Seminar, Oct. 2015

TracingSeq. perf. estimation

MappingPar. perf. estimation

Resources

Page 21: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Sequential performance estimation (2)

q Processor models

21 © J. Castrillon. DREAM Seminar, Oct. 2015

TracingSeq. perf. estimation

MappingPar. perf. estimation

Resources

Execution counts, branch stats and execution traces

[Eusse14]

Page 22: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Sequential performance estimation (3)

q Abstract models for compiler emulationq Resources (functional units, register banks)q Operations (pipeline effects, SIMD, addressing, predicated exec.)

q SW-related costs (calling convention, register spilling, C-lib calls)

22 © J. Castrillon. DREAM Seminar, Oct. 2015

FU1 FU2 FU3ADDSUBMULOR

ASRLSLLSRZEX

LDSTCMPBR

RF1(16,32) RF2(8,32)

Conventions

External library costs

Prologue: 1+2*XargEpilogue: 4 Branchoverh:3

malloc: 9+0.3*Nsizefsqrt: 235

Processor Model

TracingSeq. perf. estimation

MappingPar. perf. estimation

Resources

Page 23: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Parallel performance estimation

q Discrete event simulator to evaluate a solutionq Replay traces according to mappingq Extract costs from architecture file (NoC modeling, context switches, communication)

© J. Castrillon. DREAM Seminar, Oct. 201523

TracingSeq. perf. estimation

MappingPar. perf. estimation

Resources

Mapping configuration

t

t

t

Time-annotated traces

Trace Replay Module(TRM)

Architecture model

Grantt Chart, platform utilization, channel profiles, ...

t

Contextswitch

Blockedtime

...

Page 24: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Trace-based synthesis

q Synthesis based on code and trace analysis (using simple heuristics)q Mapping of processes and channelsq Scheduling policies

q Buffer sizing24 © J. Castrillon. DREAM Seminar, Oct. 2015

[Castrill10b, Castrill13]

TracingSeq. perf. estimation

MappingPar. perf. estimation

Resources

t

t

t

Time-annotated traces

Mapping, scheduling,

buffer sizing

Non-functional specification

Mapping configuration

Architecture model

Page 25: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Trace-based algorithms

q Event traces can be represented as large dependence graphs

© J. Castrillon. DREAM Seminar, Oct. 201525

Blocking write, size(chan. 2) = 2

Blocking write, size(chan. 2) = 1

Blocking read

TracingSeq. perf. estimation

MappingPar. perf. estimation

Resources

Page 26: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Trace-based algorithms (2)

q Event traces can be represented as large dependence graphsq Possible to reason about

q Channel sizes and memory allocationq Mapping and scheduling onto heterogeneous processors

© J. Castrillon. DREAM Seminar, Oct. 201526

size(chan. 2) = 2, size(chan. 1,3) = 1

TracingSeq. perf. estimation

MappingPar. perf. estimation

Resources...

......

...

Page 27: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

© J. Castrillon. DREAM Seminar, Oct. 201527

Dealing with heterogeneity: group-based mapping (GBM)

1) Initialize: All to all

2) Select element: Trace graph critical path

3) Reduce group

4) Assess & propagate

5) Quasi-homogeneous

[Castrill12]

Page 28: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Mapping for HW accelerators

q Not only mapping but also configurationq Match algorithmic parameters with implementation parametersq Adjust synchronization and communication protocols

© J. Castrillon. DREAM Seminar, Oct. 201528

TracingSeq. perf. estimation

MappingPar. perf. estimation

Resources

Plain code

Plain code

Platform model + characterization of special components

... ... ...Algorithm library

N: Algorithmic actorsF: Existing implementation in target platform

[Castrill10, Castrill11]

Page 29: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Multiple traces

q Different input à different behavior (traces)q Characterize behaviors and impact on mapping performance

© J. Castrillon. DREAM Seminar, Oct. 201529

[Goens15]

TracingSeq. perf. estimation

MappingPar. perf. estimation

Resources

...

...

...

Mapping configuration

Mapping configuration

Mapping configuration

Page 30: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Multiple traces (2)

q Different input à different behavior (traces)q Characterize behaviors and impact on mapping performance

30 © J. Castrillon. DREAM Seminar, Oct. 2015

...

...

...

TracingSeq. perf. estimation

MappingPar. perf. estimation

Resources

Behavior

Evol

utio

n (lo

g)

[Goens15]

Mapping configuration

Mapping configuration

Mapping configuration

Page 31: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

TracingSeq. perf. estimation

MappingPar. perf. estimation

Resources

Multiple traces (3)

q Behavior difference as metric in trace/history monoid

31 © J. Castrillon. DREAM Seminar, Oct. 2015

[Goens15] Normalized Trace Distance0.5 0.6 0.7 0.8 0.9 1

Slow

dow

n Fa

ctor

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9d1d2d1

Trace Distance Normalized with d(0,ref)0 0.5 1 1.5 2

Num

ber o

f Tra

ces

0

20

40

60

80

100

120

140 d1d2d1

Random KPNs: Slow down w.r.t. optimum vs. trace distance à No correlation

JPEG application: Observed trace groups

Page 32: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Increasing resources

q Add resources to the synthesis until constraints are met

q Add processors and memoriesq Easy for homogeneous platforms

q Non trivial for heterogeneous platforms

© J. Castrillon. DREAM Seminar, Oct. 201532

TracingSeq. perf. estimation

MappingPar. perf. estimation

ResourcesMapping and scheduling

Parallel perf. estimation

Increase resources

Page 33: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Increasing resources: Exploit symmetries

q Identify mapping equivalent classes due to HW symmetries

33 © J. Castrillon. DREAM Seminar, Oct. 2015

TracingSeq. perf. estimation

MappingPar. perf. estimation

Resources

=?

[Goens15]

q Do not evaluate equivalent mappingsq Reduce search space when adding resourcesq Multiple traces (revisit)

q Random traces: 5 out of 83 classes account for 50% of all optimal mappings

q Application to multi-application analysis

Page 34: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Outline

q Motivation

q Input specs

q Analysis and synthesis

q Code generation and evaluation

q Summary

© J. Castrillon. DREAM Seminar, Oct. 201534

Page 35: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Code generation

q Take mapping configuration and generate code accordinglyq From architecture model: APIs, configuration parameters, …q Sample targets: experimental heterogenous systems, TI (Keysonte,

TDA3x), Parallela/Ephiphany, ARM-based (Exynos, Snapdragon)

35 © J. Castrillon. DREAM Seminar, Oct. 2015

PNargs_ifft_r.ID = 6U;PNargs_ifft_r.PNchannel_freq_coef = filtered_coef_right;PNargs_ifft_r.PNnum_freq_coef = 0U;PNargs_ifft_r.PNchannel_time_coef = sink_right;PNargs_ifft_r.channel = 1;sink_left = IPCllmrf_open(3, 1, 1);sink_right = IPCllmrf_open(7, 1, 1);PNargs_sink.ID = 7U;PNargs_sink.PNchannel_in_left = sink_left;PNargs_sink.PNnum_in_left = 0U;PNargs_sink.PNchannel_in_right = sink_right;PNargs_sink.PNnum_in_right = 0U;taskParams.arg0 = (xdc_UArg)&PNargs_src;taskParams.priority = 1;...

Code generationCPN

application

Mapping configuration Architecture

model Debugging info

Page 36: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Debugging with virtual platforms

q Interactive debuggingq Get snapshots of the system stateq Full system stop

q Track progress irrespective of mapping

36

Virtual platform

Debugging layer

OS-descriptor

[Castrill11]

Internal state of the MPSoC schedule: assigned, blocked and running tasks

© J. Castrillon. DREAM Seminar, Oct. 2015

Debugging info

Page 37: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Debugging with virtual platforms (2)

37

[Murillo14]

q Deterministic replay and automatic bug exploration

© J. Castrillon. DREAM Seminar, Oct. 2015

Page 38: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Evaluation and results

q Virtual platforms: SystemC models of full systemsq Explore heterogeneous architecturesq Easier to integrated state-of-the-art accelerators

q Configurable accuracy

q Real platforms for validation q Speedup on commercial platforms

q Code generation against vendor stacks

38 © J. Castrillon. DREAM Seminar, Oct. 2015

Page 39: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Example: multi-media applications

q Platform: 2 RISCs, 4 VLIW, 7 Memories

© J. Castrillon. DREAM Seminar, Oct. 201539

Page 40: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Example: multi-media applications (2)

q Dealing with real-time constraints

© J. Castrillon. DREAM Seminar, Oct. 201540

Tool: ~1 min. for LP-AF, ~7 min. for MJPEG ~10 min.Sim.: ~6 days for LP-AF, ~24 for MJPEG ~10 days (~x103)

Page 41: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Example: With HW acceleration

q Application: MIMO OFDM receiverq Hardware

q Platform 1: Baseline software q Platform 2: Optimized software

q Platform 3: Optimized SW + HW

© J. Castrillon. DREAM Seminar, Oct. 201541

ARM Mem

VLIW VLIW VLIW

src sink

Library 1: SW implementations

Library 2: SW optimized

MemMem

FFT

ARM Mem

VLIW FFT Viterbi

src sink

Custom Interconnect

Demap Library 3: SW+HW accel.7680

128000

1758241,758

1,00E+00

1,00E+02

1,00E+04

1,00E+06

1) bsp1 (sw,

unoptimized)

2)bsp2 (sw, optimized)

3)bsp3 (hw)

Rate

(bps

)

Achieved rate @ 100 MHz

Page 42: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

© J. Castrillon. DREAM Seminar, Oct. 201542

Manual vs. Automatic: TI Keystone

Image processing Audio filtering application

LTE digital receiver

[Aguilar14]

Page 43: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

© J. Castrillon. DREAM Seminar, Oct. 201543

TRM vs. Actual execution: TI Keystone

Makespan error < 20% (Avg. 7%)

Speedup error < 8% (Avg. 5%)

Page 44: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Outline

q Motivation

q Input specs

q Analysis and synthesis

q Code generation and evaluation

q Summary

© J. Castrillon. DREAM Seminar, Oct. 201544

Page 45: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Summary

q Tool flow for mapping KPN applicationsq CPN language: a language for KPNs close to Cq Analysis and synthesis based on traces

q Extensions for: multiple traces and algorithmic descriptionsq Backends for multiple platforms (bus and NoC-based)

q Current and future workq More on static code analysis

q Continue on multiple-traces and symmetriesq Applying to other domains (server applications)

© J. Castrillon. DREAM Seminar, Oct. 201545

Page 46: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

References

[Trommer15] Trommer, et al., "Functionality-Enhanced Logic Gate Design Enabled by Symmetrical Reconfigurable Silicon Nanowire Transistors”, IEEE Trans on in Nanotechnology, pp.689-698, July 2015[Voigt14] Andreas Voigt, et al., “Towards Computation with Microchemomechanical Systems”, In Journal of Foundations of Computer Science, vol 25, 2014[Castrill14] J. Castrillon , et al., “Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap”, Springer, 2014[EETimes11] http://www.eetimes.com/document.asp?doc_id=1259446[Sheng14] W. Sheng, et al., “A compiler infrastructure for embedded heterogeneous MPSoCs”, Parallel Comput. 40, 2, 51-68, 2014[Oden13] M. Odendahl, et al., “Split-cost communication model for improved MPSoC application mapping”, In International Symposium on System on Chip pp. 1-8, 2013[Arnold13] O. Arnold, et al. “Tomahawk - Parallelism and Heterogeneity in Communications Signal Processing MPSoCs”. TECS, 2013[Castrill10] J. Castrillon, , et al., “Component-based waveform development: The nucleus tool flow for efficient and portable SDR,” Wireless Innovation Conference andProduct Exposition (SDR), 2010[Castrill11] J. Castrillon, , et al., “Component-based waveform development: The nucleus tool flow for efficient and portable software defined radio”, Analog Integrated Circuits and Signal Processing, vol. 69, no. 2–3, pp. 173–190, 2011[Eusse14] J.F. Eusse, , et al., "Pre-architectural performance estimation for ASIP design based on abstract processor models," In SAMOS 2014 pp.133-140, 2014[Castrill10b] J. Castrillon, et al., “Trace-based KPN composability analysis for mapping simultaneous applications to MPSoC platforms”, In DATE 2010, pp. 753-758[Castrill13] J. Castrillon, et al., “MAPS: Mapping concurrent dataflow applications to heterogeneous MPSoCs,” IEEE Transactions on Industrial Informatics, vol. 9, 2013[Castrill12] J. Castrillon, et al. , “Communication-aware mapping of KPN applications onto heterogeneous MPSoCs,” in DAC 2012[Goens15] A. Goens, et al., “Analysis of Process Traces for Mapping Dynamic KPN Applications to MPSoCs”, In IFIP International Embedded Systems Symposium (IESS), 2015, Foz do Iguaçu, Brazil (Accepted for publication), 2015[Castrill11] J. Castrillon, et al., “Backend for virtual platforms with hardware scheduler in the MAPS framework”, In LASCAS 2011 pp. 1-4, 2011[Murillo14] L. G. Murillo, et al., “Automatic detection of concurrency bugs through event ordering constraints”, In DATE 2014, pp. 1-6, 2014 [Aguilar14] M. Aguilar, et al., "Improving performance and productivity for software development on TI Multicore DSP platforms," in Education and Research Conference (EDERC), 2014 6th European Embedded Design in , vol., no., pp.31-35, 11-12 Sept. 2014

© J. Castrillon. DREAM Seminar, Oct. 201546

Page 47: Analysis and software synthesis of KPN applications€¦ · KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS

Thanks!Questions?