the next level platforms: networks-on-silicon - mpsoc … · the next level platforms:...
TRANSCRIPT
Albert van der Werf
Philips Research
Albert van der Werf
Philips Research
The Next Level Platforms:Networks-on-SiliconThe Next Level Platforms:Networks-on-Silicon
2Research
Overview• Trends
– Technology– Industry
• Networks-on-Silicon– Infrastructure– Mapping– Predictability
• Concluding Remarks
3Research
• Moore’s Law is driving us– ~60% yearly growth in number of transistors
• Enabling new products with – lower cost (mm²) and lower power (W) dissipation
and– higher flexibility (MIPS) and functional
performance (GOPS)
Trends
4Research
Moore’s Law: Computational Efficiency
2 1 0.5 0.25 0.13 0.07
feature size(µm)time
1000
100
10
1
0.1
0.01
0.001
Computing efficiency (MOPS/mWatt)
ISProcessors???????????????????????hardwired muxed
3DTV
MP3
sequential
parallel
2003[Roza]
5Research
• System with one or more CPUs: Moore’s law gives bothcost down and feature-up.
– General Purpose processors are eatingmore of the current application pie (e.g.audio on CPU ) – with IP in SW
– Flexibility becomes more importantbut also more affordable
Required by wireless standards
Offered by conventional DSPs
• Challenge/opportunity: high performance at low power requires (massive) parallelism
– System customized compute engines– Multi-processor systems– Subsystem integration
• New systems are demandingmore compute power
Technology Trends
6Research
I.McIC Empire Empress Chrysalis
Video processing & compression
The plots are at the same scale0.50
1997
0.35
1998
0.25
2000
0.18
2002
CPU
MPEG2 Storage: IC Roadmap
7Research
The OEM’s integration problem
2002: 8MB2004: 256 MB2007: 1 GBMRAM?
2004: 64 MB
TFT, LTPS…320 x 240 pixels18-bit colorstouch screen3D displays
GSM / GPRSCDMA2000WCDMAUMTS4G?
subVGA→VGA→1.3 Mpixel
polyphonic ring tonesMP3MEMS microphonesecho cancellation
SIM cardRFID PaymentNFC dev pairing
MP3JPEGMPEG4Voice commands3D accelerators
Camera
Host AudioI/O
Display
RF Modem
Flashstorage
RAM
Media
Security& identif.
Connectivity Software
Broadcast
GPSlocation-based services
USB 1.1
Bluetoothmultiple profiles
WLAN VoIPUWB?
IRDA
FM radioDVB-H
8Research
Industry Trends (Observations)
• Mask cost is increasing above a dangerous threshold.
• Design teams are becoming very large (> 100 FTE); design productivity is not increasing asfast as Moore’s Law.
• Embedded SW content is exploding.
103
102
101
4 8 12 16 year
complexity/productivity
HW gap
SW gap
58%
21%
8%
1000
500
250
0.25 0.18 0.12 0.090feature size(µm)
mask set cost (k$)
“Cost of design is the greatest threat to continuation of the semiconductor roadmap”
[ITRS 2003]
9Research
Industry Trends (Approach)
• Industry wide reuse of Intellectual Property (IP) blocks, from many IP suppliers; from captive cores to “open” cores with extensive Ecosystems
• Facilitated by platform choices (Nexperia Home/Mobile)
• From Application Specific ICs towards Programmable ICs for a certain Application Domain; product differentiation through SW.
10Research
Networks-on-Silicon: Positioning100 msec
10 msec
1 msec
100 µsec
10 µsec
1 µsec
100 nsec
10 nsec
1 nsec
100 psec
late
ncie
s →
Networks-on-Silicon
enabling technologies
sub systems
system integration
operating system
applications
HW infrastructure
OS + infrastructure (proprietary | Symbian | WinCE | Linux)
phone mess-aging games music still
imaging
gsm gps storage display camera
CPUs DSPs RF memories buses,dma, ...
...
...
...
DSM technologySI integration
11Research
From Busses to Networks-on-Silicon• IP blocks become
subsystems• Busses become
networks offeringservices at variouslevels à la OSI
CPU
DSP
DMA
bridge
bridge
bridge
IP
wrapperSRAM
Flash
SDRAMMem
ctrl
IP
IPIP
wrapper
IP
SRAM
IPIP
I/F
I/F
I/F
IP
IP
IP
SRAM
I/FI/F
CPU IP
Mem ctrl
SDRAM
bridge
bridge
IP
IP
IP
CPU
DSP
DMA
bridge
bridge
bridge
IP
wrapperSRAM
Flash
SDRAMMem
ctrl
IP
IPIP
wrapper
IP
SRAM
IPIP
I/F
I/F
I/F
IP
IP
IP
SRAM
I/FI/F
CPU IP
Mem ctrl
SDRAM
bridge
bridge
IP
IP
IP
“aut
onom
ous”
...
subs
yste
ms
On-/off-chip memoryInterconnect
Backbone network
GP p
roce
ssor
Diversity at physical layer; allow for implementation with few wires/pins
Streaming services with real-time guarantees required for some streams; best-effort for others
Escape for “dumb” subsystems, let GP CPU deal with part of protocol stack
“dum
b”
periph
erals
Combine stand-alone memories; let the network offer memory services to subsystems
auto
nomou
s“a
uton
omou
s”
...
subs
yste
ms
On-/off-chip memoryInterconnect
Backbone network
GP p
roce
ssor
Diversity at physical layer; allow for implementation with few wires/pins
Streaming services with real-time guarantees required for some streams; best-effort for others
Escape for “dumb” subsystems, let GP CPU deal with part of protocol stack
“dum
b”
periph
erals
“dum
b”
periph
erals
Combine stand-alone memories; let the network offer memory services to subsystems
auto
nomou
s
12Research
Networks-on-Silicon: Connecting ICs
RF Camera
App. Engine (APE)Baseband (CMT)
Router DisplayMass Memory
Ext. device
Touch
13Research
Networks-on-Silicon: Technology• To solve the transport delay problem and to
handle the complexity in future generations of SoCby defining the next generation paradigm for platforms.– Infrastructure and related design technology for
communication.– To organize the communication according to a layered &
standardized communication stack (OSI like). – To describe systems as networks and efficiently map
programs on multiple processors in the network– To guarantee (predictable) performance and scalability
with plug & play of subsystems.
14Research
Networks-on-Silicon: Infrastructure• Main active elements in
network infrastructure:– network interface– router
• ATM-like packet-based programming model
• 52Gb/s aggregate throughput• QoS
– Guaranteed Throughput– Best Effort
• Supporting interoperability
subsystem
router
router
routersub
system
router
IP
subsystem
... ...
...
...
... ...router router
IP IPIP IP
IP IPIP IP
IP
mesh, fat tree, express cube, ...
subsystem
router
router
routersub
system
router
IP
subsystem
... ......
... ...router router
IP IPIP IP
IP IPIP IP
IP
subsystem
router
router
routersub
system
router
IP
subsystem
... ......
... ...router router
IP IPIP IP
IP IPIP IP
IP
mesh, fat tree, express cube, ...
15Research
Networks-on-Silicon: Inter-Chip Link– Standardization of physical interconnect
• For seamless links between chips
– From SiP up to box level• Multiple chips in different process technologies
– Transparent to IPs / subsystems on chips• Physical interconnect hidden by more abstract interface• Abstract interface supports re-partitioning with a different
physical interconnect for on-chip communication
– Variety of classes: from Mbits/s up to Gbits/s• CMOS Audio, Compressed Video• Low swing differential Standard resolution images• Embedded clocks High resolution images
16Research
Mapping Applications to Networks
Architecture- multi-processor - distributed shared
memory
Application- parallel tasks- streams
Mapping- tasks to processors- FIFOs to memory
Processor ASIP ASD
Memory
System synthesis: minimal hardware that is required to meet the timing requirements as defined in the specification.
System programming: given a multiprocessor network find a mapping of the application that satisfies the timing constraints.
17Research
Networks-on-Silicon: Linking HW & SW
t3t1t2sh1 sh3
t1t2
b1sh1b2b3
CPU
Memory
Bus
Code Data
t1
t2
t3c1 c2
c3
t3t1t2sh1 sh3
t1t2
b1sh1b2b3
CPU ASIP
Memory
Bus
Code Data
t1
t2
t3c1 c2
c3
TTL interface
communication interface
t1 t2 t3
TTLshell
TTLshell
Hardware
Software
TTLinterface
t1
Infrastructure
t2 t3
c1c2
c3
TTL interface
t1 t2 t3
TTLshell
TTLshell
Hardware
Software
t1 t2 t3
TTLshell
TTLshell
Software
TTLinterface
t1
Infrastructure
t2 t3
c1c2
c3
t1
Platform Infrastructure
t2 t3
c1c2
c3
•Goal is to facilitate reuse of tasks through definition of interface for streaming (queues, fifos, channels)
18Research
Memory and Streaming
Support for streaming via on-chip memory• Streaming via off-chip memory:
– Bandwidth bottleneck– Power consumption– Pin count
• Streaming via on-chip memory– High sync rate is enabler (implementation challenge)– Small buffers (in distributed shared memory)– Low latencies
19Research
• Meeting the temporal requirements is essential for many consumersystems. – Hard real-time:
• don’t miss a deadline (= guarantee throughput and latency)• graceful degradation is not supported• e.g. channel decoders, picture improvement, audio decoding
– Soft real-time:• there is some diminished value when deadline is missed and
value does not increase if result is delivered earlier• graceful degradation or fall back must be supported• objective is constant Quality of Service (QoS)• e.g. video decoders
– Best-effort:• an earlier delivered result is appreciated• e.g. web browser
Predictable Multiprocessor Networks
20Research
Related Models of Computation
• Kahn Process Networks [Kahn, 1974]– concurrent processes communicating through unbounded fifos– deterministic communication only
• Communicating Sequential Processes [Hoare, 1978]– concurrent processes communicating through unbuffered channels– non-deterministic communication through probe [Martin, 1985]
• Dataflow Process Networks [Lee and Parks, 1995]– special case of KPN; processes are actors plus firing rules– Fire & Exit: each iteration has to be one atomic action.
Requires explicit state saving for data-dependent behavior• Communicating Finite State Machines [Balarin et al.,1997]
– broadcasting of time-stamped events– global notion of time difficult to implement in parallel and
distributed signal processing systems
21Research
Networks-on-Silicon: QoS
SwitchA
SwitchB
d1
d3
d2
d4
a2
c1a1
d3
d2a2
d4 a1
d1
b2
b1c1
b2
b1
d1d2
d3 d4
a2 c1
a1
TDMA
Guaranteed time behaviorSystem-level predictabilitySmaller buffers
22Research
• A number of (smaller processors) communication using a protocol with cache coherence extensions
• Each processor has its own L1 cache and shares an L2 cache with interleaved memory banks
• Escaping from Pollack’s rule (exploding power densities for higher performant CPUs)
Multi-processor network nodes
23Research
Concluding Remarks• From computation centric to communication
centric architectures: Networks-on-Silicon will be at the heart of future platforms
• Digital architectures offer a wealth of implementation options. Therefore standardizationis key– In interfaces, services, and protocols– In design environments (including SDK)
• Automated flow with fast performance verification is essential.
• Towards an Open Platform and Ecosystem