1. dac 2006 cad challenges for leading-edge multimedia designs
TRANSCRIPT
1
DAC 2006
CAD Challenges for Leading-Edge
Multimedia Designs
NOMADIK
“The challenge of low power, high performance and
scalable multimedia acceleration”
Alain Artieri - Patrick Blouet
STMicroelectronics
July 26, 2006
4
Multimedia Computing Landscape
5
The convergence paradigm
New Mobile Multimedia Computing
Architecture
Personal Computer
Mobile PhoneConsumer Electronics
6
Consumer versus Computer
Consumer Products High quality of service Designed for worst case Highly parallel architecture Hardware accelerators
Personal Computer Monolithic processor
architecture High MHz for performance High power consumption Open OS Flexibility Rich set of standard interfaces
for storage and connectivity
New computing architecture must combine the best of both worlds
Open platform, multi OS Flexibility Rich set of standard interfaces
for storage and connectivity
7
Cell Phones : a Key Driver
1990 2000 2005 2010
M Units <100 400 700 900
Features Voice Voice & Data Multimedia Global Convergence
8
Competing Technical Constraints
Scalability
Low PowerMultimedia
Performance
9
Multimedia Performance Requirements :
Multiple video standard, encode and decode (MPEG4, H264, WMV, …), up to HDTV format
High resolution : VGA screen and above in small form factor, Output to HDTV with large screen
Multi megapixel camera, DSC class image reconstruction chain and picture improvement
Sophisticated Audio use cases : combination of multiple Codecs, sound effects, speech codecs, …
Advanced 3D graphics acceleration for gaming
Consume & produce high bandwidth multimedia content
10
Low Power
A key system technology driver Of course a product feature :
Battery life time
But helps product manufacturability Stacking in a power budget
And product cost Low cost packaging No heat sink
11
Nomadik Architecture Overview
12
Host processor & peripherals,Host processor & peripherals,No differentiationNo differentiation
Application Processor Content
Host ProcessorHost Processor
PeripheralsPeripherals
Multimedia AcceleratorMultimedia Accelerator
Multimedia Acceleration,Multimedia Acceleration,differentiating factordifferentiating factor
The architecture & design challenge is in Multimedia Acceleration (Audio, Video, Imaging, Graphics)
This is were innovation is required and competitive advantage is built
Embedded MemoryEmbedded Memory
13
DMA DMA engineengine
Tightly Tightly Coupled Coupled
HWHW
Nomadik Multimedia Acceleration Model
DSPDSP
DMA DMA engineengine
Tightly Tightly Coupled Coupled
HWHW
DSPDSP
DMA DMA engineengine
Tightly Tightly Coupled Coupled
HWHW
DSPDSP
InterconnectInterconnect
Multiple DSP
Attached to HW acceleration
Data mover
Multiple DSP based sub-system Symmetrical DSPs
(generic S/W component can run anywhere) Attached HW resources
(dependence resolved at component manager level)
…
14
Multiple DSP approach benefits
High computing performance : Multiple non interfering domains of intense activity, each having its
own processor, DMA services and hardware accelerators for data intensive functions
Hardware acceleration embedding standard functions (e.g. video codec, image reconstruction & improvement)
Highest & predictable performance through a careful bus and memory hierarchy design
Low Power (target: 100’s of mW) : Intrinsic low power sub systems Fine grain power management at sub system level Leakage management by switching on & off sub systems
15
Power management
Combination of multiple techniques : Dynamic power reduction :
• Clock gating
• Voltage scaling (DVFS)
• Pulse-Width Modulation (PWM) Static power reduction :
• Biasing
• Power On/Off switching (Power gating)
A global system issue from power management inside the OS down to silicon process (e.g. gate leakage)
16
DVFS PrincipleOperating Operating
System Load System Load Monitor (SW)Monitor (SW)
Voltage/Voltage/Frequency Frequency
TablesTables
CPU performance
requirements
Process Requirements :
-Large voltage excursion
-Low leakage
CPU Voltage1.3V
1.2V
1.1V
28%
ene
rgy sa
ving
28%
ene
rgy sa
ving
55%
ene
rgy sa
ving
55%
ene
rgy sa
ving
100%
85%
62%
17
PWM PrincipleOperating Operating
System Load System Load Monitor (SW)Monitor (SW)
Active clock Active clock ratio tableratio table
CPU performance
requirements
Process Requirements :
-Clock as fast as possible
-Source bias or switch off when clock is stopped
CPU Voltage
1.0V
1.0V
1.0V
15%
ene
rgy sa
ving
15%
ene
rgy sa
ving
38%
ene
rgy sa
ving
38%
ene
rgy sa
ving
100%
85%
62%
18
Multi-step PWM
Power management state machine under SW control Source Bias for short clock stop period Power off with context save/restore for long period
Short stop
(Source Bias – reduced leakage)
Long stop
(Power Off – zero leakage)
save restore
19
Power management
Power mode changes are managed by software: Constraints and impact must be known by software developer. Information initially needed only at design level
is now flowing into the software space.
Power awareness in the software world is coming form the design world through better link between design tools and software development tools.
Need for a power view of the application accessible to software developers.
20
Software Architecture for Multimedia Acceleration
21
HardwareHardware
Codecs, Sensors, Codecs, Sensors, PresentationPresentation
Execution InfrastructureExecution Infrastructure
Media Network ServerMedia Network Server
Multimedia APIMultimedia API
Multimedia FrameworkMultimedia Framework
Operating SystemOperating System
Complex Multimedia Software Stack
User InterfaceUser Interface
SoC designSoC designperimeterperimeter
Upw
ard
perv
asio
n of
U
pwar
d pe
rvas
ion
of
desi
gn c
onst
rain
tsde
sign
con
stra
ints
22
Objectives A unified programming model for distributed
computing One S/W component can run anywhere possible Dynamically configurable Run complex algorithms that requires more than one DSP
Enforce software architecture Modularity Component programming model Multimedia framework
Comprehensive debug System level monitoring Component observable by construction
(auto code instrumentation)
23
Complex use case illustration•16 QCIF decode
•1 Grab & Viewfinder
•Graphics & control on Host CPU
•SVGA display
•100mW
24
Architecture evolution
25
SoC evolution across technology nodes
Constant SoC Die Size Slow evolution of peripherals (area decrease) General purpose CPU sub-system complexity double at
each node (constant area), Embedded memory capacity double at each node
(constant area) Loosely coupled DSP sub-system complexity increase by
30% at each node (30% area decrease)
2004 2006 2008 2010 2012
Technology Node (nm) 90 65 45 32 22
Loosely coupled Sub-Systems 2 4 6 8 12
General Purpose CPU Single Multiple
Hardware Accelerator Hardwired Reconfigurable
26
Main trends
Host CPU evolving toward multi-core architecture to meet the performance increase requirements
HW acceleration mapped on reconfigurable arrays Performances close to dedicated HW in many areas Good fit with regular design constraints imposed by 45nm process
and beyond Excellent structure for best optimized power management And … FLEXIBILITY …
27
Reconfigurable Hardware (DSP fabric)
Target signal processing and arithmetic intensive applications
Reconfigurable array of simple DSP core (CNode)
Low power architecture Hierarchical clock gating Distributed leakage control (fine grain power gating)
Programmable DMA engine
Reconfigurable at run time, multi task
28
Mapping Flow
• Alus execute a cyclic micro-sequence
• Data exchanges through hierarchical clustered interconnect
• Configuration step is sequence loading and interconnect programming
Data in Data out
ILP + software pipelining
Procedure(In,Out,inout)
Constant A,b,c,…;
Begin
X=a-in[0];
……..
End;
Behavioral code
Data in Data out
Data in Data out
Data in
Data out
Partitioning/static scheduling
DFG
Coarse grained configuration
MUX
Clusters Level0
Mux level 2
N0_i
N0_o
N2_o N2_i
N1_i N1_o
Level 1
30
Interconnect
4MB Multi-port Embedded
Memory HostCore 2
L1
L2
Peripherals& analog
What can fit in 45mm² in 45nm
L1
DSP
HW
DMA
L1
DSP
HW
DMA
L1
DSP
HW
DMA
L1
DSP
HW
DMA
L1
DSP
HW
DMA
L1
DSP
HW
DMA
Programmable Multimedia Accelerator
ImagingH/W
192 CNode
(40 GOPS)
HostCore 1
L1
VideoH/W
31
CAD Challenges
32
Main area of CAD challenges
Low Power design Static & Dynamic power global optimization Power control is becoming very fine grain.
Must be tightly linked with software environment. Power control is beyond the pure SoC.
System level power view is needed.
Software design Efficient software design on hierarchical multiprocessor engine Capability to architect & design software architecture as efficiently
as HW Capture tools, simulation, verification, automated code generation
33
Main area of CAD challenges
Synthesis on Reconfigurable hardware Configuring the hardware network
• 3D place & route of massively parallel code on arrays of DSP’s
• Design constraints going up in the software
– Reconfiguration latency
– Expected performance.
Reconfigurable hardware managed at software level.• Software development environment has to be aware of reconfigurable
hardware.
– Profiling to extract hot spot and benefit if doing in hardware.
– Code generation as well reconfiguration sequence for hardware.
34
Conclusion
For multimedia processors, the complexity is moving to software design Hardware complexity resolved through regular design
(multicore host, multi-DSP, coarse-grained DSP fabric)
CAD challenge lies essentially in S/W design tools Multimedia software execution infrastructure,
simulation, debug Programmable hardware acceleration