the next level platforms: networks-on-silicon - mpsoc … · the next level platforms:...

Albert van der Werf

Philips Research

Albert van der Werf

Philips Research

The Next Level Platforms:Networks-on-SiliconThe Next Level Platforms:Networks-on-Silicon

2Research

Overview• Trends

– Technology– Industry

• Networks-on-Silicon– Infrastructure– Mapping– Predictability

• Concluding Remarks

3Research

• Moore’s Law is driving us– ~60% yearly growth in number of transistors

• Enabling new products with – lower cost (mm²) and lower power (W) dissipation

and– higher flexibility (MIPS) and functional

performance (GOPS)

Trends

4Research

Moore’s Law: Computational Efficiency

2 1 0.5 0.25 0.13 0.07

feature size(µm)time

1000

100

10

1

0.1

0.01

0.001

Computing efficiency (MOPS/mWatt)

ISProcessors???????????????????????hardwired muxed

3DTV

MP3

sequential

parallel

2003[Roza]

5Research

• System with one or more CPUs: Moore’s law gives bothcost down and feature-up.

– General Purpose processors are eatingmore of the current application pie (e.g.audio on CPU ) – with IP in SW

– Flexibility becomes more importantbut also more affordable

Required by wireless standards

Offered by conventional DSPs

• Challenge/opportunity: high performance at low power requires (massive) parallelism

– System customized compute engines– Multi-processor systems– Subsystem integration

• New systems are demandingmore compute power

Technology Trends

6Research

I.McIC Empire Empress Chrysalis

Video processing & compression

The plots are at the same scale0.50

1997

0.35

1998

0.25

2000

0.18

2002

CPU

MPEG2 Storage: IC Roadmap

7Research

The OEM’s integration problem

2002: 8MB2004: 256 MB2007: 1 GBMRAM?

2004: 64 MB

TFT, LTPS…320 x 240 pixels18-bit colorstouch screen3D displays

GSM / GPRSCDMA2000WCDMAUMTS4G?

subVGA→VGA→1.3 Mpixel

polyphonic ring tonesMP3MEMS microphonesecho cancellation

SIM cardRFID PaymentNFC dev pairing

MP3JPEGMPEG4Voice commands3D accelerators

Camera

Host AudioI/O

Display

RF Modem

Flashstorage

RAM

Media

Security& identif.

Connectivity Software

Broadcast

GPSlocation-based services

USB 1.1

Bluetoothmultiple profiles

WLAN VoIPUWB?

IRDA

FM radioDVB-H

8Research

Industry Trends (Observations)

• Mask cost is increasing above a dangerous threshold.

• Design teams are becoming very large (> 100 FTE); design productivity is not increasing asfast as Moore’s Law.

• Embedded SW content is exploding.

103

102

101

4 8 12 16 year

complexity/productivity

HW gap

SW gap

58%

21%

8%

1000

500

250

0.25 0.18 0.12 0.090feature size(µm)

mask set cost (k$)

“Cost of design is the greatest threat to continuation of the semiconductor roadmap”

[ITRS 2003]

9Research

Industry Trends (Approach)

• Industry wide reuse of Intellectual Property (IP) blocks, from many IP suppliers; from captive cores to “open” cores with extensive Ecosystems

• Facilitated by platform choices (Nexperia Home/Mobile)

• From Application Specific ICs towards Programmable ICs for a certain Application Domain; product differentiation through SW.

10Research

Networks-on-Silicon: Positioning100 msec

10 msec

1 msec

100 µsec

10 µsec

1 µsec

100 nsec

10 nsec

1 nsec

100 psec

late

ncie

s →

Networks-on-Silicon

enabling technologies

sub systems

system integration

operating system

applications

HW infrastructure

OS + infrastructure (proprietary | Symbian | WinCE | Linux)

phone mess-aging games music still

imaging

gsm gps storage display camera

CPUs DSPs RF memories buses,dma, ...

...

...

...

DSM technologySI integration

11Research

From Busses to Networks-on-Silicon• IP blocks become

subsystems• Busses become

networks offeringservices at variouslevels à la OSI

CPU

DSP

DMA

bridge

bridge

bridge

IP

wrapperSRAM

Flash

SDRAMMem

ctrl

IP

IPIP

wrapper

IP

SRAM

IPIP

I/F

I/F

I/F

IP

IP

IP

SRAM

I/FI/F

CPU IP

Mem ctrl

SDRAM

bridge

bridge

IP

IP

IP

CPU

DSP

DMA

bridge

bridge

bridge

IP

wrapperSRAM

Flash

SDRAMMem

ctrl

IP

IPIP

wrapper

IP

SRAM

IPIP

I/F

I/F

I/F

IP

IP

IP

SRAM

I/FI/F

CPU IP

Mem ctrl

SDRAM

bridge

bridge

IP

IP

IP

“aut

onom

ous”

...

subs

yste

ms

On-/off-chip memoryInterconnect

Backbone network

GP p

roce

ssor

Diversity at physical layer; allow for implementation with few wires/pins

Streaming services with real-time guarantees required for some streams; best-effort for others

Escape for “dumb” subsystems, let GP CPU deal with part of protocol stack

“dum

b”

periph

erals

Combine stand-alone memories; let the network offer memory services to subsystems

auto

nomou

s“a

uton

omou

s”

...

subs

yste

ms

On-/off-chip memoryInterconnect

Backbone network

GP p

roce

ssor

Diversity at physical layer; allow for implementation with few wires/pins

Streaming services with real-time guarantees required for some streams; best-effort for others

Escape for “dumb” subsystems, let GP CPU deal with part of protocol stack

“dum

b”

periph

erals

“dum

b”

periph

erals

Combine stand-alone memories; let the network offer memory services to subsystems

auto

nomou

s

12Research

Networks-on-Silicon: Connecting ICs

RF Camera

App. Engine (APE)Baseband (CMT)

Router DisplayMass Memory

Ext. device

Touch

13Research

Networks-on-Silicon: Technology• To solve the transport delay problem and to

handle the complexity in future generations of SoCby defining the next generation paradigm for platforms.– Infrastructure and related design technology for

communication.– To organize the communication according to a layered &

standardized communication stack (OSI like). – To describe systems as networks and efficiently map

programs on multiple processors in the network– To guarantee (predictable) performance and scalability

with plug & play of subsystems.

14Research

Networks-on-Silicon: Infrastructure• Main active elements in

network infrastructure:– network interface– router

• ATM-like packet-based programming model

• 52Gb/s aggregate throughput• QoS

– Guaranteed Throughput– Best Effort

• Supporting interoperability

subsystem

router

router

routersub

system

router

IP

subsystem

... ...

...

...

... ...router router

IP IPIP IP

IP IPIP IP

IP

mesh, fat tree, express cube, ...

subsystem

router

router

routersub

system

router

IP

subsystem

... ......


IP IPIP IP

IP IPIP IP

IP

subsystem

router

router

routersub

system

router

IP

subsystem

... ......


IP IPIP IP

IP IPIP IP

IP

mesh, fat tree, express cube, ...

15Research

Networks-on-Silicon: Inter-Chip Link– Standardization of physical interconnect

• For seamless links between chips

– From SiP up to box level• Multiple chips in different process technologies

– Transparent to IPs / subsystems on chips• Physical interconnect hidden by more abstract interface• Abstract interface supports re-partitioning with a different

physical interconnect for on-chip communication

– Variety of classes: from Mbits/s up to Gbits/s• CMOS Audio, Compressed Video• Low swing differential Standard resolution images• Embedded clocks High resolution images

16Research

Mapping Applications to Networks

Architecture- multi-processor - distributed shared

memory

Application- parallel tasks- streams

Mapping- tasks to processors- FIFOs to memory

Processor ASIP ASD

Memory

System synthesis: minimal hardware that is required to meet the timing requirements as defined in the specification.

System programming: given a multiprocessor network find a mapping of the application that satisfies the timing constraints.

17Research

Networks-on-Silicon: Linking HW & SW

t3t1t2sh1 sh3

t1t2

b1sh1b2b3

CPU

Memory

Bus

Code Data

t1

t2

t3c1 c2

c3

t3t1t2sh1 sh3

t1t2

b1sh1b2b3

CPU ASIP

Memory

Bus

Code Data

t1

t2

t3c1 c2

c3

TTL interface

communication interface

t1 t2 t3

TTLshell

TTLshell

Hardware

Software

TTLinterface

t1

Infrastructure

t2 t3

c1c2

c3

TTL interface

t1 t2 t3

TTLshell

TTLshell

Hardware

Software

t1 t2 t3

TTLshell

TTLshell

Software

TTLinterface

t1

Infrastructure

t2 t3

c1c2

c3

t1

Platform Infrastructure

t2 t3

c1c2

c3

•Goal is to facilitate reuse of tasks through definition of interface for streaming (queues, fifos, channels)

18Research

Memory and Streaming

Support for streaming via on-chip memory• Streaming via off-chip memory:

– Bandwidth bottleneck– Power consumption– Pin count

• Streaming via on-chip memory– High sync rate is enabler (implementation challenge)– Small buffers (in distributed shared memory)– Low latencies

19Research

• Meeting the temporal requirements is essential for many consumersystems. – Hard real-time:

• don’t miss a deadline (= guarantee throughput and latency)• graceful degradation is not supported• e.g. channel decoders, picture improvement, audio decoding

– Soft real-time:• there is some diminished value when deadline is missed and

value does not increase if result is delivered earlier• graceful degradation or fall back must be supported• objective is constant Quality of Service (QoS)• e.g. video decoders

– Best-effort:• an earlier delivered result is appreciated• e.g. web browser

Predictable Multiprocessor Networks

20Research

Related Models of Computation

• Kahn Process Networks [Kahn, 1974]– concurrent processes communicating through unbounded fifos– deterministic communication only

• Communicating Sequential Processes [Hoare, 1978]– concurrent processes communicating through unbuffered channels– non-deterministic communication through probe [Martin, 1985]

• Dataflow Process Networks [Lee and Parks, 1995]– special case of KPN; processes are actors plus firing rules– Fire & Exit: each iteration has to be one atomic action.

Requires explicit state saving for data-dependent behavior• Communicating Finite State Machines [Balarin et al.,1997]

– broadcasting of time-stamped events– global notion of time difficult to implement in parallel and

distributed signal processing systems

21Research

Networks-on-Silicon: QoS

SwitchA

SwitchB

d1

d3

d2

d4

a2

c1a1

d3

d2a2

d4 a1

d1

b2

b1c1

b2

b1

d1d2

d3 d4

a2 c1

a1

TDMA

Guaranteed time behaviorSystem-level predictabilitySmaller buffers

22Research

• A number of (smaller processors) communication using a protocol with cache coherence extensions

• Each processor has its own L1 cache and shares an L2 cache with interleaved memory banks

• Escaping from Pollack’s rule (exploding power densities for higher performant CPUs)

Multi-processor network nodes

23Research

Concluding Remarks• From computation centric to communication

centric architectures: Networks-on-Silicon will be at the heart of future platforms

• Digital architectures offer a wealth of implementation options. Therefore standardizationis key– In interfaces, services, and protocols– In design environments (including SDK)

• Automated flow with fast performance verification is essential.

• Towards an Open Platform and Ecosystem

the next level platforms: networks-on-silicon - mpsoc … · the next level platforms:...

Documents