active cells: a programming model for configurable multicore … · a programming model for...

51
CASE STUDY 4: CUSTOM DESIGNED MULTI- PROCESSOR SYSTEM Active Cells: A Programming Model for Configurable Multicore Systems 311

Upload: others

Post on 25-Mar-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM

Active Cells:

A Programming Model for Configurable Multicore Systems

311

Page 2: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Vision

312

P P NILP NILNIL P P PP P PP P

core core

cache

bus

cache

memory

P P

core corePP

core engineP

core corePP

coreP

coreP

engine

General Purpose Shared Memory Computer Application Specific Multicore Network On Chip

Page 3: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Motivation: Multicore Systems Challenges

Cache Coherence

Shared Memory Communication Bottleneck

Thread Synchronization Overhead

Hard to predict performance of a program

Difficult to scale the design to massive multi-core architecture

313

Page 4: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Operating System Challenges

Processor Time Sharing

Interrupts

Context Switches

Thread Synchronisation

Memory Sharing

Inter-process: Paging

Intra-process, Inter-Thread: Monitors

314

Page 5: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Focus: Streaming Applications

315

Stream-Parallelism: Pipelining

Task Parallelism:

ParallelExecution Data Paralellism:

Vector ComputingLoop-level parallelism

Page 6: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

4.1. HARDWARE BUILDING BLOCKSTRM AND INTERCONNECTS

316

Page 7: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

TRM: Tiny Register Machine*

Extremely simple processor on FPGA with Harvard architecture.

Two-stage pipelined

Each TRM contains

Arithmetic-logic unit (ALU) and a shifter.

32-bit operands and results stored in a bank of 2*8 registers.

local data memory: d*512 words of 32 bits.

local program memory: i*1024 instructions with 18 bits.

7 general purpose registers

Register H for storing the high 32 bits of a product, and 4 conditional registers C, N, V, Z.

No caches

317* Invented and implemented by Dr. Ling Liu and Prof. Niklaus Wirth

Page 8: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

TRM Machine Language

Machine language: binary representation of instructions

18-bit instructions

Three instruction types:

Type a: arithmetical and logical operations

Type b: load and store instructions

Type c: branch instructions (for jumping)

318from Lectures on Reconfigurable Computing, Dr. Ling Liu, ETH Zürich

Page 9: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Encoding Overview Register Operations

Load and Store

Conditional Branches

Special Instructions

Branch and Link

319

091011131417

0 immRdop imm is zero extended to 32 bits

091011131417

1 RsRdop 0 0 0 x x 0

(a)

(b)

091011131417

1 VRsVRdop 1 0 0 x x x(c)

091011131417

1 x x xRdop 0 0 0 0 0 1(a)

(c)

091011131417

1 RsRdop 1 0 x x x x

091011131417

1 RsRdop 0 1 x x x x(d)

091011131417

1 x x xVRdop 1 0 0 001(b)

091011131417

1 RsRdop 101(e) xxxx

6

(a)

(b)

091011131417

0 RsRdop off

091011131417

1 RsVRdop off

3

3

off is zero extended

0910131417

cond1110 off off is sign extended0131417

1111 off off is 14-bit offset

Page 10: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

TRM architecture

320

Figure from: Niklaus Wirth, Experiments in Computer System Design, Technical Report, August 2010

http://www.inf.ethz.ch/personal/wirth/Articles/FPGA-relatedWork/ComputerSystemDesign.pdf

Page 11: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Variants of TRM

FTRM

includes floating point unit

VTRM (Master Thesis Dan Tecu)

includes a vector processing unit

supports 8 x 8-word registers

available with / without FP unit

TRM with software-configurable instruction width (Master Thesis Stefan Koster, 2015)

321

Page 12: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Initial ExperimentsTRM12 Bus TRM12 Ring

322

Column 0

Column 1

Column 2

Column 3

H0

H1

H2H3

C0

inbound arbiteroutbound arbiter

inbound arbiteroutbound arbiter

inbound arbiteroutbound arbiter

inbound arbiter

outbound arbiter

N0

C7

N1

N2

C2

N6

N7

N8

C6

C1

C8

N3

N4

N5

N9

N10

N11

C5 C11

C4

C3

C10

C9

RS232TR

CiNi

: processor core : network controller RS232TR : RS232 transmitter receiver

Timer

LCD

LEDs

DDR2

node0 node1 node2 node3 node4 node5

node11 node10 node9 node8 node7 node6

TRM0 TRM1 TRM2 TRM3 TRM4 TRM5

TRM11 TRM10 TRM9 TRM8 TRM7 TRM6T

RM

Rin

g

RS232

TR

MR

ing

TR

MR

ing

TR

MR

ing

TR

MR

ing

TR

MR

ing

TR

MR

ing

TR

MR

ing

TR

MR

ing

TR

MR

ing

TR

MR

ing

TR

MR

ing

01111110

01111110

01111110

01111110

01111110

01111110

01111110

01111110

01111110 0111

111001111110

01111110

Page 13: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

4.2. HARDWARE SOFTWARE CODESIGN

323

Page 14: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Software / Hardware Co-designVision: Custom System on Button Push

System design as high-level program code

Electronic circuits

Computing model

Programming Language

Compiler, Synthesizer,

Hardware Library,

Simulator

Programmable Hardware

(FPGA)

324

Page 15: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

System specification, HW/SW partitioning

Program microcontroller in

C/C++

Programsystem specific

hardware in HDL

Compilation Synthesis

Microcontroller + machine code + specific hardware (eg. DSP)

Traditional HW/SW co-design for embedded

systems

Traditional HW/SW co-design

One Program

One Toolchain

System on FPGA

Active Cells approach for embedded systems

development

Goal

325

Page 16: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Active Cells Computing Model

On-chip distributed system

Cell

Scope and environment for a running isolated process.

Integrated control thread(s)

Provides communication ports

Net

Network of communication cells

Cells connected via channels (FIFOs)

326

Inspired by

• Kahn Process Networks

• Dataflow Programming

• CSP (e.g. Google's Go)

• Actor Model (e.g. Erlang)

Page 17: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Software Hardware Map

channel

cell

fifo

327

Page 18: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Consequences of the approach

No global memory

No processor sharing

No pecularities of specific processor

No predefined topology (NoC)

No interrupts

No operating system

328

Page 19: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Cell

329

type

BernoulliSampler* = cell (probIn: port in; valOut: port out);

var

r: Random.Generator;

p: real;

begin

new(r);

loop

p << probIn;

valOut << r.Bernoulli(p);

end

end BernoulliSampler;

non-typed communication ports

blocking receive

asynchronous send

BernoulliSampler

probIn

valOut

CellActivity

Page 20: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Properties

type

Controller = cell {Processor=TRM, FPU, DataMemory=2048, BitWidth=18} (in: port in (64); result: port out);

...

begin

(* ... controller action ... *)

end Controller;

....

Port Width

330

Properties can influence both, generation of hardware and the generation of software code.

Controller(TRM)

FPU

Page 21: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Configurable Processor on PL

331

Tiny Register Machine

FPU

offon

Vector Unit

offon

Instruction Width

14 16 18 20 22

Multiplier

offon

Program Memory

0k 1k 2k 3k 4k

Data Memory

0.5k 1k 1.5k

Page 22: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Engine

typeMerger = cell {Engine, inputs=1}

(ind: array inputs of port in; outd: port out);var data: longint;begin

loopfor i := 0 to len(in)-1 do

data << ind[i]outd << data;

endend

end Merger;

332

Engines are prebuilt components instantiated as electronic circuitson a target hardware

Merger

Page 23: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Unit of Deployment: (Terminal) CellnetLearnTest = cellnet;var

learner: CRBMNet.CRBMLearner;reader: MLUtil.imageReader;ims0,ims1: MLUtil.imshow;…

beginnew(learner)new(reader);new(ims0{name='v0debug',posx=0,posy=100});new(ims1{name='v1debug',posx=300,posy=100});…reader.imageOUT >> learner.imgIN;learner.v0DebugOUT >> ims0.imageIN;learner.v1DebugOUT >> ims1.imageIN;…

end LearnTest;

333

CRBMLearner

imShow

imageReader

imShow

connection

dynamic construction

connection

Page 24: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Hierarchical Composition

334

CRBMLearner

imShow

imageReader

imShow

CRBMLearner

DownNet

UpNet

Delay

Delay

Delay

Delay

debug

result

img in

… …

…Sample

UpNet

GetGradients

visible

P(h|v)visible

hidden

…hidden

kernels

UpNetsplit

UpStep

Page 25: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Hierarchic Composition: non-terminal CellnetUpNet = cellnet

{vr=28,vc=28,kr=5,kc=5,k=9,c=2,name='upstep'}(pvIN, kerIN, bIN : port in; phOUT: array k of port out; pvSideOUT: port out );

vari,hr,hc: longint;upstep: CRBMUpstepCell;split: MLFunctions.SplitterCell;

beginhr:=vr-kr+1; hc:=vc-kc+1;new(vSplit {dataSize=vr*vc,numOut=2});new(upstep {vr=vr,vc=vc,kr=kr,kc=kc,k=k,c=c});pvIN >> vSplit.dataIN;

vSplit.dataOUT[0] >> pvSideOUT;

vSplit.dataOUT[1] >> UpstepCell.vIN;

kerIN >> UpstepCell.kerIN;bIN >> UpstepCell.bIN;for i:=0 to k-1 do

upstep.phOUT[i] >> phOUT[i];end;

end CRBMUpNet;

335

UpNet

img in

split

UpStep

pvOut

phOut

kernIn

biasIn

port delegationport delegation

ports and properties

Page 26: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Software Hardware Map

336

ARM TRM

ARM ENGINE

ENGINE

ZYNQ PL

PS

cell

engine

port

thread

softcore

hw engine

AX

I4

FIFO

AXI4 Stream Interconnect

Page 27: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Hybrid Compilation

Code body Role Compilation method

Cell (Softcore) Program logic Software Compilation

Cell (Engine) Computation unit Hardware Generation

Cell Net Architecture Hardware Compilation

337

cellnet N;

type A=cell(pi: port in; po: port out);var x: integer;begin… pi ? x; … po ! x; …end A;

var a,b: A;begin… connect(a.po, b.pi)end N.

Page 28: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Implementation Alternatives (1)

338

compilerfrontend

Zybo Backend

Spartan3 Backend

Spartan3e Backend

+ simple- redundant

- not flexible- hard to extend

sourcebitstream

Page 29: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Implementation Alternatives (2)

339

compilerfrontend

+ extensible+ not redundant- not simple- static configuration limits flexibility

Hardware Library Component Library

cellnet interpreterbackend

source bitstream

Page 30: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Implementation Alternatives (3)

340

compilerfrontend

+ extensible+ not redundant+ simpler+ simulation becomes execution

Hardware LibraryExecutables

Component LibraryExecutables

hdl runtime source bitstreamexecutable

Page 31: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Frontend

341

Source Code(Module)

Intermediate Code

(Module)Executable

CompilerFrontend

CompilerBackend

SimulationRuntime

VisualizationRuntime

Software Libraries

Software Libraries

Software Libraries

HDL BackendRuntime

Page 32: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Backend

342

HDL Backend Runtime

DependencyAnalysis

HDL Code Generator

Instruction Code Generator

HW Library

Dependency Graph

HDL Code

TRM Kernel

TRM CodeExecutable

ARM Kernel

ARM CodeExecutable

TRM Kernel

TRM CodeExecutable

ARM Kernel

ARM CodeExecutable

TRM Kernel

TRM CodeExecutable

RuntimeLibraries

CompilerBackend

Executable

FPGA VendorImplementation

Toolsbitstream

Deployment

Intermediate Code

(Module)

RuntimeLibrariesRuntimeLibrariesRuntimeLibraries

Page 33: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Extensibility: Defining components and platforms

Software HLL Codetype Gpo = cell{Engine,DataWidth=32,InitState="0H"} (input: port in);

343

Hardware HDL Codemodule Gpo

#( parameter integer DW = 8 … )

(

input aclk, input aresetn , input [DW−1:0] …

);

Component Specification

Build CommandAcHdlBackend.Build

--target="ZyboBoard" -p="Vivado" CRBM.TestCellnet ~

Platform Specification

Hardware Platform and ToolsHardware Types

Platform Instances

Vendor specific tools

Page 34: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Component Specification

344

module Gpo;import Hdl := AcHdlBackend;var c: Hdl.Engine;begin

new(c,"Gpo","Gpo");c.SetDescription("General Purpose Output … ");

c.SupportedOn("∗"); (∗ portable ∗)

c.NewProperty("DataWidth","DW",Hdl.NewInteger(32),Hdl.IntegerPropertyRangeCheck(1,Hdl.MaxInteger));c.NewProperty("InitState","InitState",Hdl.NewBinaryValue("0H"),nil);

c.SetMainClockInput("aclk"); (* main component's clock *)c.SetMainResetInput("aresetn",Hdl.ActiveLow); (* active-low reset *)c.NewAxisPort("input","inp",Hdl.In,8);c.NewExternalHdlPort("gpo","gpo",Hdl.Out,8);

c.NewDependency("Gpo.v",true,false);

c.AddPostParamSetter(Hdl.SetPortWidthFromProperty("inp","DW"));c.AddPostParamSetter(Hdl.SetPortWidthFromProperty("gpo","DW"));

Hdl.hwLibrary.AddComponent(c);end Gpo.

Identification and description

Supported devices

Parameters

Ports

Dependencies

Configuration Actions

c.NewExternalHdlPort("gpo","gpo",Hdl.Out,8);

Page 35: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Generic Peer-to-Peer Communication Interface

Use of AXI4 Stream interconnect standard from ARM

Generic, flexible

Non-redundant

345

TVALID: Data valid

Senderport

Receiverport

TREADY:Ready to process

Mu

ltip

lexi

ng

net

wo

rkData out

Select Select

Data in

clk clk

master must assert tvalid and keep asserted

slave can wait for master's tvalid and then assert tready

Page 36: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Target Platform Specificationmodule Basys2Board;

import Hdl := AcHdlBackend, AcXilinx;

var t: Hdl.TargetDevice;

pldPart: AcXilinx.PldPart;

ioSetup: Hdl.IoSetup;

pin: Hdl.IoPin;

begin

new(pldPart,"XC3S100E-4CP132");

pldPart.SetJtagChainIndex(0);

new(t,"Basys2Board",pldPart);

new(pin,Hdl.In,"B8","LVCMOS33");

t.NewExternalClock(pin,50000000,50,0); (* ExternalClock0 *)

t.SetSystemClock(t.clocks.GetClockByName("ExternalClock0"),1,1);

new(pin,Hdl.In,"G12","LVCMOS33");

t.SetSystemReset(pin,true);

new(ioSetup,"Gpo_0");

ioSetup.NewIoPort("gpo",Hdl.Out,"U16,E19,U19,V19","LVCMOS33");

t.AddIoSetup(ioSetup);

Hdl.hwLibrary.AddTarget(t);

end Basys2Board.

346

System Signals

FPGA Part

Mapping of external Ports

ioSetup.NewIoPort("gpo",Hdl.Out,"U16,E19,U19,V19","LVCMOS33");

Page 37: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

TRM1

TRM2

TRM9

TRM10 TRM11 TRM12

FIFO1

FIFO8

FIFO9

FIFO16

FIFO17 FIFO18

FIFO19

FIFO20

FIFO33

FIFO34

UART

controller CF

controller

LCD

controller

Virtex-5LX50T FPGA

Xilinx ML505 board

RS232

CF

LCD

ECG

Sensor

·

·

·

·

·

·

Case Study 1: ECGFocus: Resources and Power

Real-time

ECG MonitorSignalinput

Wave proc_1

QRSdetect

HRV analysis

Disease classifier

Wave proc_2

Wave proc_8

ECG

bitstream

out

stream

347

Page 38: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Resources

ECG Monitor*

Maximum number of TRMs in communication chain

348

#TRMs #LUTs #BRAMs #DSPs TRM load

12 13859(48%)

52(86%)

12(25%)

<5%@116 MHz

FPGA #TRMs #LUTs #BRAMs #DSPs

Virtex-5 30 27692 (96%) 60 (100%)

30 (62%)

Virtex 6 500

*8 physical channels @ 500 Hz sampling frequency

implemented on Virtex 5 348

Page 39: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Comparative Power Usage

Preconfigured FPGA (#TRMs, IM/DM, I/O, Interconnect fixed)versus fully configurable FPGA (Active Cells)

System StaticPower (W)

Dynamic Power (W)

Preconfigured("TRM12")

3.44 0.59

Dynamicallyconfigured

0.5 0.58

86% saving!

Core 0

Core 1

Core 2

Core 3

Core 4

Core 5

Core 6

Core 7

Core 8

Core 9

Core 10

Core 11

Signal

inpu

t

Wave

proc_

1

QRS

det

ect

HRV analysis

Disease classifier

Wave

proc_2

Wave

proc_8

349

Page 40: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Case Study: Non-Invasive Continuous Blood Pressure MonitorFocus: Development Cycle Time

A2 Host OS with GUI on ARMSensor control and medical algorithms on Zynq PL

Sensors and Motorson Bracelet

350

Page 41: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Medical Monitor Network On Chip

351351

Dominated by TRM processors. Feedback driven. Not performance critical.

Page 42: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Development Cycle Times

Step Medical Monitor

OCT (full)

Software Compilation 4s 2s

Hardware Implementation

20 min 20 min

Stream Patching (all) 2 min -

Stream Patching (typical) 12 s -

Deployment 11s 16s

sporadic often352

352

Page 43: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Case Study 3: Optical Coherence TomographyFocus: Performance

A(λi) = A(1/fi)

f(z)

z-Axis Processing

1. Non uniform sampling

A(λi) A(f i)

2. Dispersion compensation

3. (Inverse) FFT

… for many lines x in a row (2d)

… and many rows y in a column (3d)

~

zx

y 353 353

Page 44: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

A component of OCT image processingDispersion Compensation

Dominated by Engines. Dataflow driven.

354

Page 45: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Case Study: ANN

355

Sigmoid

InnerProdFlt

Í Í Í

+

+

x0x0 x1x1 x2x2y0y0 y1y1 y2y2

- Í

+

Frac

LUT

yy

Fix

zz

input[0]input[0]

input[1]input[1]

outputoutput

xx

y

bx0 x1 x2w0 w1 w2

ARM SoC

x

w

P0

R0

P2

R2P1

R1

z

ANN Layer 1(MoreLayers)

b

Programmable Logic

Perceptron

Page 46: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Performance and Resource Usage

Medical Monitor Simple OCT OCT Perceptron

Architecture Spartan 6XC6SLX75

Zynq 7000XC7Z020

Zynq 7000XC7Z020

Zynq 7000XC7Z010

Resources 28% Slice LUTs, 4% Slice Registers80% BRAMs24% DSPs

11% Slice LUTs, 6% Slice Registers7% BRAMs15% DSPs1 ARM Cortex A9

17% Slice LUTS8% Slice Registers22% BRAMs31% DSPs1 ARM Cortex A9

83% Slice LUTS, 67% Slice Registers13.33% BRAMs, 21% DSP1 ARM Cortex A9

Clock Rate 58 MHz 118 MHz 50 MHz 147 MHz

Data Bandwidth 1.25 Mbit /s (in)23 kB/s (out)

236 MWords/s (in)118 MWords/s (out)

50 MWords/s (in)50 MWords/s (out)

9.6 GBits/s in9.6 GBits/s out

Performance -- 8.3 GFPOps*up to 32 GFPops**

4.3 GFPOps* 4.9 GFlops

Power ~2W ~5W ~5W 2.1 W

** Fixed point operations, 32bit

* when instantiated 4 times356

Page 47: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Conclusion

ActiveCells: Computing model and tool-chain for configurable computing

Configurable interconnect Simple Computing, Power Saving

Embedding of task engines High Performance

Hybrid compilation Quick Development

Backend execution Eased Flexibility and Extensibility

357

Page 48: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Mapping to Hardware – Simple ExampleIO*=CELL (input: PORT IN; output: PORT OUT;

buttons: PORT IN; leds: PORT OUT);BEGIN ...END IO;

Controller*=CELL (input: PORT IN; output: PORT OUT)BEGIN ...END Controller;

VAR controller: Controller; io: IO; gpo{DataWidth=8}: Engines.Gpo;gpi{DataWidth=11}: Engines.Gpi;

BEGINNEW(controller); NEW(io);NEW(gpi); NEW(gpo);

CONNECT (controller.output, io.input, 32);CONNECT (io.output, controller.input, 32);CONNECT (gpi.output, io.buttons);CONNECT (io.leds, gpo.input);

END Simple.

358

IOTRM

ControllerTRM

fifo fifo

gpi gpo

Blackboard / Code inspection

Page 49: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Spartan3 Board

Resources

4-input LUTS 3840

Flip Flops 3840

BRAMs 12

DSPs -

359

Page 50: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Resource Usage Scenarios

360

CELLNET Game;

IMPORT LED;

TYPE

IO*=CELL {DataMemorySize(512), CodeMemorySize(1024), LED} (in: PORT IN);BEGINEND IO;

VAR io: IO; BEGIN

NEW(io); END Game.

1 TRM

Resources

4-input LUTS 27%

Flip Flops 5%

BRAMs 16%

Resources

4-input LUTS 58%

Flip Flops 9%

BRAMs 33%

TRM ≈ 21-27% (LUTs), 16% (BRAMs); Fifo ≈6% (LUTs)

VAR io: IO; trm: TRM;BEGINNEW(io); NEW(trm);

CONNECT(trm.out, io.in);END Game.

TRM TRM

(cf. next page)

TRM TRM

-> TRM

Resources

4-input LUTS 86%

Flip Flops 18%

BRAMs 75%

Page 51: Active Cells: A Programming Model for Configurable Multicore … · A Programming Model for Configurable Multicore Systems 311. Vision 312 NIL PP PP PP NIL NIL P P P P P core core

Resource Usage (Lab)controller: Controller;ioTRM: IO;digitsDriver: DigitsDriver;digits: Engines.LEDDigits;gpo{DataWidth=8}: Engines.Gpo;gpi{DataWidth=11}: Engines.Gpi;

BEGINNEW(controller); NEW(ioTRM);NEW(digitsDriver);NEW(digits);NEW(gpi);NEW(gpo);CONNECT( controller.cmdOUT,ioTRM.cmdIN,10);CONNECT( ioTRM.cmdOUT,controller.cmdIN,10);CONNECT( gpi.output, ioTRM.ButtonsIN);CONNECT( ioTRM.LEDOUT, gpo.input);CONNECT(ioTRM.digitsOUT, digitsDriver.cmdIN,10);CONNECT(digitsDriver.ledOUT, digits.input);

3 TRMs

3 FIFOs

361

Resources

4-input LUTS 86%

Flip Flops 18%

BRAMs 75%