multi-processor systems-on-a-chip: trends and...

39
Multi-Processor Systems-on-a-Chip: Trends and Technologies Pierre Paulin, Director SoC Platform Automation, AST STMicroelectronics, Ottawa, Canada

Upload: others

Post on 01-Apr-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

Multi-Processor Systems-on-a-Chip:Trends and Technologies

Pierre Paulin, DirectorSoC Platform Automation, AST

STMicroelectronics, Ottawa, Canada

Page 2: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

2

Outline

�MP-SoC Trends�Key market trends�Key technology trends

�MP-SoC Technology Objectives�ST’s MP-SoC Technologies�Application to platform mapping

�Applications: Multimedia, Networking�R&D Outlook

Page 3: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

3

Page 4: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

4

Emb. Systems Prog. Survey 2005:Number of Processors per chip

Source: Embedded Systems Programming Magazine, 2005

�Nearly 50% of chips use multiple processors�Over 100 projects used >10 processors

1

2

3-5

6-10

>10

Page 5: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

5

Processor Heterogeneity

�60% of systems are heterogeneous MP

Source: Embedded Systems Programming Magazine, 2005

Multipleidentical

Multipledifferent

Board Multipleidentical

Board Multipledifferent

Page 6: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

6

Processor Selection Criteria

Source: Embedded Systems Programming Magazine, 2005

�Quality of software tools sells the processor

S/W toolsPerformance

Price

Page 7: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

7

Market Trends Summary

�Continued reduction in time-to market�Fast changing specs, requirements�Need for high-level capture and mapping�Increasing cost of developing SoC platforms� 10M$ to 80 M$ for today’s 90nm SoC’s�Need to increase time-in-market

Implies higher flexibility is needed�Rising proportion of SoC function in eS/W�Currently 50% to 75% of design costs�Majority of all SoC’s have 2 or more

processors� Leading edge (20%) have 6-10 processors

eFPGAeFPGA

Page 8: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

8

Outline

�MP-SoC Trends�Key market trends�Key technology trends�MP-SoC Technology Objectives�ST’s MP-SoC Technologies�Applications�R&D Outlook

Page 9: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

9

0

100

200

300

400

500

600

700

800

900

1000

ARM7TDMI MIPS M4K ARM1176 MIPS 24K

MHz

MIPS

mw/GIPS

Power Scaling (Core Only)Commercial Cores

Video codectarget power

(100 mW)

Source: ARM, MIPS

Page 10: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

10

Power Scaling (w 16KB/16KB D/I Caches)

0100200300400500600700800

ARM920 T ARM1022E ARM1176

Processor

MHz

MIPS

mW/GIPS

Power Scaling (Cores + Caches)

Video codectarget power

(100 mW)

Source: ARM

Page 11: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

11

Voltage and Frequency Scaling(ADI Blackfin DSP)

Source: ADI

Page 12: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

12

mW/GIPS

0

50

100

150

200

100 200 300 400 500 600

MHz

Voltage and Frequency Scaling(ADI Blackfin DSP)

0.8V1.0V

1.2V

0.9V

Source: ADI

Page 13: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

13

DSP

Technology Trend: Latencies

Out

NoC

In SRAM

DSPDSPH/W-MT

RISCH/W

Proc. Element$

GPP

I$D$ I$

Increasing gap memory & processor

speeds(2x / 2 years)

Increasing gap interconnect &

gate delays(multi-clock

intra-chip delay)

More parallel processing

(lower-power, higher-perf./mm2) implies more IPC

Page 14: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

14

Technology Trends Summary�Embedded processor speed wall ~1GHz�More function requires more parallelism

�Parallelism favors lower power solutions�Single 625 MHZ MIPS processor

consumes 4x more power than 3x 225 MHz processors�Need to efficiently program

these parallel cores though …

�Three types of rising latency�Growing gap in interconnect v.s logic delays�Growing gap in memory v.s. logic delays�MP implies inter-processor communication

vs.

3 x 32mW = 96 mW 365mW

Page 15: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

15

Outline

�MP-SoC Trends�Key market trends�Key technology trends

�MP-SoC Technology Objectives�ST’s MP-SoC Technologies�Applications�R&D Outlook

Page 16: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

16

What is Next SoC Paradigm Shift?

Architecture Platforms� Hard-disk drive platform� Set-top box, DVD, HDTV� Mobile multimedia � Image processing platform

System Applications� Audio codecs� Video codecs� Still image processing� Communication stacks

Component IP� Value-add cores, IP� Commodity cores:

GPP, DSP, MCU, Bus� Libraries: Cells, memories, I/O

2 yrs

year

� Algm./system expert� S/W developers� Platform programming

� Domain-specific platform expertise

� Fast platform derivatives

1995: RTL, ISA " Insulation"

2005: Platform " Insulation"

Systemfunction

RTL to Layout

H/W-S/W architecture

� Component expertise� Process differentiators(BiCMOS, eDRAM,eFlash, eFPGA)

1985: Cell Library " Insulation"

months

Prog. Models

NoC

I/O Mem

Proc. Proc.

H/W

Page 17: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

17

Configurability Trade-offs

RTL

PhysicalDesign

Cell Library

RTL

FPGA Fabric

CProcessor Platform

(ISA)

Abstraction

Intrinsic efficiency

Year Year- 3 months

Month

Deepsubmicron

effects

Page 18: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

18

SMP

Platform Programming Models

DSOC

Flexible SoC Platform Objectives

Abstraction

sAsic FPGARTL

WeeksISA C

Weeks

<3 months

RTL

Weeks(reuse)

Cell Library

Connectivity Platform

RTL

Page 19: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

19

Outline

�MP-SoC Trends�Key market trends�Key technology trends

�MP-SoC Technology Objectives�ST’s MP-SoC Technologies�Application to platform mapping

�Applications�R&D Outlook

Page 20: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

20

System-on-Chip Platform AutomationOttawa, Canada

� Info:[email protected]+1 (613) 768-9069

� Focus on use of flexible SoC platforms� Multi-processor, Network-on-Chip, Configurable H/W

� Applications� Multimedia (A/V/I), Networking, 3G basestation

� Leverage ST’s other system design technologies

Page 21: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

21

HW/SW Platform MappingMultiFlexMultiFlex

MultiFlexMP-SoCTools

Multi-processor SoC Platform MultiFlex SoC Tools

FPGAFPGAFPGA FPGAH/W PE H/W PE

I/O

I/O

I/O . . .eMEMeMEM

. . .eFPGAeFPGARISC DSP

� Lightweight mapping tools� Supported by H/W RTOS

thread-level parallelism� Interoperable with

general-purpose O/S� MP analysis tools

System Application

Parallel programming models

Object1 Object2

Object4Object3

SharedMemory

T1T2

T3

- Temporal constraints- Spatial constraints

Msg.passing

� Proven, established programming models(msg. passing, SMP)� User-defined

parallelism

� Simplify the use ofheterogeneous HW/SWMP-SoC architectures

Page 22: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

22

Processors integrated:

MicroBlaze®

PowerPC®

FlexMP Demo Virtual Platform

FPGAFPGAFPGA FPGAH/W PE(sASIC)

H/W PE(eFPGA)

. . .

Processor 1Processor 1 Processor NProcessor N

eMEMMEM

NetworkNetwork--onon--ChipChip

Proc. MEM

AS-IS

P1Pn

Proc. MEM

AS-IS

P1PnCoProc

I/O

App-specI/O

MemI/O

Gen-purpI/O

�Multi-threaded, multi-processor platform� Popular processor models w. config. extensions:

H/W multithreading and pipeline depth

ASIC

ConfigWMem

ProcessorPacketi-zation

H/W schedulers

(OCP(OCP--IP, IP, STBusSTBus))

MPU I/O

Host GPP

D$I$

Page 23: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

23

Thread 1

Thread 2

Thread 3

Thread 4

Co

re 1

Hiding latency with H/W MT

C Mem C Mem

IPCC C Mem

Mem C IPC C

C Co-proc C MemMem

Mem

IPCC

C

C Mem

Co-proc

C IPC

C Mem

C Mem

C Mem

C

Mem

Thread 1

Thread 2

Thread 3

Thread 4C

ore

2

Page 24: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

24

� Two parallelProgramming Models�DSOC: Object-Oriented

Message passing�SMP: Shared memory

I/O

I/O

I/O

MultiFlex MP-SoC Platform Tools

Application to platform mapping

. . .

NoC

. . .eFPGARISC

mem

FPGAFPGAH/WeMEMeMEM

Fine grain parallelism

Parallel System-Level Application Code

eFPGADSP

mem

H/W MP-O/S scheduleracceleratorsH/W O/S

Schedulers

H/W message passing,IP Plug and Play

Page 25: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

25

FPGAFPGAeSoG/eFPGA

I/O

I/O

I/OeMEMeMEM

Impact of H/W Accellerators

. . .

NoC

H/W O/SSchedulers

Parallel System-Level Application Code

. . .eFPGA

RISC

mem eFPGADSP

mem

� Dynamic management fine-grain parallelism(~100 instrns. or more)� Fault tolerant

[Power, QoS in future]

Msg passing rate:<15 instructions

(15M msg/s @200 MHz)

Fork-join (for N forks): 10 instructions+ 12 cycles/thread

Additional cost(5 processor system): +0.3 mm2 (@90 nm)

Page 26: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

26

1~8 2~6

Exploitable Parallelism

GP O/SThread-LevelParallelism

ILPVLIW, SIMD

1

10 000’sInstructions

Min parallel grain size (instrns.)

Exploitable taskparallelism

//m overhead

1~100

MultiFlexThread-Level

Parallelism

100’s10’s

1000’s

Overheads:1. Context switching2. Message Passing3. Scheduling

Page 27: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

27

Application to Platform Mapping (1)

DSC

Video

SMP

T1 T2 T3Audio1

Audio2

ControlControl

SMP

T1 T2 T3

Control

� High-level communicatingparallel application objects�Platform independent�Map onto host GP O/S

to validate HL function and parallelismHost (Linux/Unix)

Page 28: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

28

Out

NoC

In DSOC H/W scheduler

SMP H/W schedulerMEM

DSPDSPHwMT

RISCH/WProc. Element

DSPDSPDSP,

ASIP$Host GPP

D$I$ I$ I$hw

MEM MEMMEMMEM

hw

Control

Application to Platform Mapping (2)

DSC

Video

SMP

T1 T2 T3Audio1

Audio2

ControlControl

SMP

T1 T2 T3

GP O/SLinux/Symbian/WinCE

Control

Page 29: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

29

Outline

�MP-SoC Trends�MP-SoC Technology Objectives�ST’s MP-SoC Technologies�Application to platform mapping

�Applications�Multimedia, Networking, 3G

�R&D Outlook

Page 30: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

30

3G

MultiFlex MP Applications�Packet processing� 2.5/10 Gbps IPv4 forwarding, traffic manager�High-throughput platforms

�Nomadik mobile multimedia�Heterogeneous platform, O/S interoperability� Impact: Hundreds of programmers�Video codec exploration�VGA (30fps): Small grain, mixed SMP/DSOC�Digital Still Camera�Array access optimization�3G basestation�Heterogeneous MP (ARM11, 3X DSP, 3X ASIP)

Page 31: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

31

�Results:� 85-92%

PE utilization�Msg

passing code <20%

DSOC+SMP Traffic Manager(2.5Gb/s)

ParameterizableParameterizable NetworkNetwork--onon--Chip (latency LChip (latency Li, ji, j ))

. . .

Pipe1

PipeP

RISC 2

…..T1 Tm

Data$

ingSPI egrSPIextSRAM

SMPqueMgr

extDRAM

dataMgr

Pipe1

PipeP

RISC 1

…..T1 Tm

Data$

Pipe1

PipeP

RISC N-1

…..T1 Tm

Data$

Pipe1

PipeP

RISC N

…..T1 Tm

Data$

ing

SP

I

egrS

PI

dataMgr

queMgr

ing

Pkt

sch

Pkt

egrP

kt

shP

ort

Sh

are

mem

Sh

are

mem

SRAMSRAMSRAM

#proc. = 9-10 (@ 500 MHz)

#threads = 4-16NoC Latency =

0, 40, 80 Clks(+/-25% jitter)

DSOCscheduler

Page 32: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

32

Traffic Manager Performance

0

0,5

1

1,5

2

2,5

3

3,5

4

4,5

4 8 16

#Threads/Processor

Th

ou

gh

pu

t (G

b/s

)

Latency 0

Latency 40

Latency 80

# Ports = 256# Class = 32# Proc. = 10

(500 MHz)

NoC Latency(clock cycles)

# of H/W Threads per Processor

Th

rou

gh

pu

t (G

b/s

ec)

Traffic Manager Performance

Page 33: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

33

5% offunctionality

MPEG4 Video Codec (VGA, 30fps)

SAD

DCT/iDCT

Badd/Bdiff

Bq/Biq

Bsum

H/Whdl

STBusSTBusC

95% offunctionality

Parallelization (SMP, DSOC), mapping in <2 monthsFlexibility in S/W, Performance in H/W

Pipe1

PipeP

ARM 1

…..T1 Tm

Data$

Pipe1

PipeP

ARM 1

…..T1 Tm

Data$

Pipe1

PipeP

ARM 1

…..T1 Tm

Data$

Pipe1

PipeP

ARM 1

…..T1 Tm

Data$

Pipe1

Pipe4

ARM 1..5

…..T1 T8

I$

S/W// C

ReferenceC code(4.1 GIPS)

80% ofperformance(3.2 GIPS)

20% ofperformance(0.9 GIPS)

Page 34: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

34

Local Local STBusSTBus

DDR ExternalMemory

VideoIn

VideoOut

SMPDSOCscheduler

MPEG4 Video CodecArchitecture

Pipe1

PipeP

ARM 1

…..T1 Tm

Data$

Program Cache #1

Pipe1

PipeP

ARM 5

…..T1 Tm

Data$

System System STBusSTBus

Program Cache #N

< 200 MHz (DDR rate)

CLIP

DIV

ABS

SIGN

CLIP

DIV

ABS

SIGN

L1 data Cache #N

L1 data Cache #1

L1 Data Bank

(Stack,Data)

Interleaver

L1 Data Bank

(Stack,Data)

L1 Data Bank

(Stack,Data)

L2 Data Bank

(Data)

SAD

DCT/iDCT

Badd/Bdiff

Bq/Biq

Bsum

BieBzigzagL1

Data

L1 Data Bank

(Stack,Data)

Page 35: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

35

Execution speed upMPEG4 VGA Video Codec Results

0

5

10

15

20

25

30

35

2 3 4 5 6Number of ARMs

Fra

me/

Sec

8 threads Theoretical 2 threads4 threads Latency = 0

Theoretical upper bound

MultiFlexresult (STBus)

MultiFlex(0 latency bus)

1 thread

8 threads:4X speedup,1.5X area

Page 36: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

36

� Two parallelProgramming Models�DSOC: Object-Oriented

Message passing�SMP: Shared memory

I/O

I/O

I/O

MultiFlex Summary

Application to platform mapping

. . .

NoC

. . .eFPGARISC

mem

FPGAFPGAH/WeMEMeMEM

Parallel System-Level Application Code

eFPGADSP

mem

H/W MP-O/S scheduleraccelerators

H/W O/SSchedulers

H/W message passing,IP Plug and Play

Programmability + High-performance

Page 37: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

37

� Platformindependent S/W� S/W reuse across

multiple products

� Platform scalability� Exploit fine-grain

parallelismArchitecture explorationHigh-performanceand flexibilityProgrammingproductivity

MultiFlexValue-add

FPGAFPGAH/W PE

I/O

I/O

I/O eMEMMEM

DSPI/O

MP-SoCPlatform N

RISC

FPGAFPGAH/W PE eMEMMEM

I/O GPP

MP-SoCPlatform 1

Object1 Object2

Object4Object3

SharedMemory

T1T2

T3Msg.

passing

System Application

Page 38: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

38

Outline

�MP-SoC Trends�Key market trends�Key technology trends

�MP-SoC Technology Objectives�ST’s MP-SoC Technologies�Applications�R&D Outlook

Page 39: Multi-Processor Systems-on-a-Chip: Trends and Technologiesweb.cecs.pdx.edu/.../DSP1/Paulin_mpsoc_trends0206.pdf · Market Trends Summary Continued reduction in time-to market Fast

39

Key Challenges

�Many architecture parameters to explore�HW-S/W partition�Number and specialization of processors

Memory architecture�Virtual platform simulation �Good observability, controllability, debug�High level of abstraction

Too slow�Verification, temporal correctness

Need formal parallelism analysis tools