renesas electronics america inc. © 2010 renesas electronics america inc. all rights reserved. id...
TRANSCRIPT
Renesas Electronics America Inc.
© 2010 Renesas Electronics America Inc. All rights reserved.
ID 112C: MCU Architecture Evolution – Now Better than Ever – So who’s the Best?
Mark Rootz
Sr. Marketing Manager
12 October 2010
Version: 1.2
2 © 2010 Renesas Electronics America Inc. All rights reserved.
Mark Rootz
Renesas Sr. Marketing Manager, 32-bit MCUs Definition and Promotion of 32-bit MCUs, N. America
BSEE and MSEE from University of Missouri – Rolla
Seven years at STMicroelectronics Marketing Manager, STR9 32-bit ARM9 MCU line (France)
Product Marketing Manager, uPSD 8-bit 8051 MCU (San Jose CA)
Product definition, technical marketing, business mgt, infrastructure
Three years at Waferscale Inc Applications Manager, uPSD MCUs
Tools, software, training, documentation, solutions, silicon validation
Three years at Hypertech Inc Project Manager and engineering
Automotive powertrain controller software and hardware
Twelve years at McDonnell Aircraft (now Boeing) Project Manager and engineering
F15/F18 fighter avionics systems engineering (weapons, radar, navigation)
Real-time simulation/test environment for complete avionics suite
Embedded MCUs, MPUs, PLDs software and hardware design
3 © 2010 Renesas Electronics America Inc. All rights reserved.
Renesas Technology and Solution Portfolio
Microcontrollers& Microprocessors
#1 Market shareworldwide *
Analog andPower Devices#1 Market share
in low-voltageMOSFET**
Solutionsfor
Innovation
Solutionsfor
InnovationASIC, ASSP& Memory
Advanced and proven technologies
* MCU: 31% revenue basis from Gartner "Semiconductor Applications Worldwide Annual Market Share: Database" 25 March 2010
** Power MOSFET: 17.1% on unit basis from Marketing Eye 2009 (17.1% on unit basis).
4 © 2010 Renesas Electronics America Inc. All rights reserved.
4
Renesas Technology and Solution Portfolio
Microcontrollers& Microprocessors
#1 Market shareworldwide *
Analog andPower Devices#1 Market share
in low-voltageMOSFET**
ASIC, ASSP& Memory
Advanced and proven technologies
* MCU: 31% revenue basis from Gartner "Semiconductor Applications Worldwide Annual Market Share: Database" 25 March 2010
** Power MOSFET: 17.1% on unit basis from Marketing Eye 2009 (17.1% on unit basis).
Solutionsfor
Innovation
Solutionsfor
Innovation
5 © 2010 Renesas Electronics America Inc. All rights reserved.
5
Microcontroller and Microprocessor Line-up
Superscalar, MMU, Multimedia Up to 1200 DMIPS, 45, 65 & 90nm process Video and audio processing on Linux Server, Industrial & Automotive
Up to 500 DMIPS, 150 & 90nm process 600uA/MHz, 1.5 uA standby Medical, Automotive & Industrial
Legacy Cores Next-generation migration to RX
High Performance CPU, FPU, DSC
Embedded Security
Up to 10 DMIPS, 130nm process350 uA/MHz, 1uA standbyCapacitive touch
Up to 25 DMIPS, 150nm process190 uA/MHz, 0.3uA standbyApplication-specific integration
Up to 25 DMIPS, 180, 90nm process 1mA/MHz, 100uA standby Crypto engine, Hardware security
Up to 165 DMIPS, 90nm process 500uA/MHz, 2.5 uA standby Ethernet, CAN, USB, Motor Control, TFT Display
High Performance CPU, Low Power
Ultra Low PowerGeneral Purpose
6 © 2010 Renesas Electronics America Inc. All rights reserved.
6
RX: Performance without Sacrafice
High Performance CPU, FPU, DSC
High Performance CPU, Low Power
Superscalar, MMU, Multimedia Up to 1200 DMIPS, 45, 65 & 90nm process Video and audio processing on Linux Server, Industrial & Automotive
Up to 500 DMIPS, 150 & 90nm process 600uA/MHz, 1.5 uA standby Medical, Automotive & Industrial
High Performance CPU, Low Power
Legacy Cores Next-generation migration to RX
Up to 165 DMIPS, 90nm process 500uA/MHz, 2.5 uA standby Ethernet, CAN, USB, Motor Control, TFT Display
Key Attributes
7 © 2010 Renesas Electronics America Inc. All rights reserved.
There are many 32-bit MCU/DSP Architectures
covering varied capabilities
RX Innovation – Single Chip Enablement
PIC32
CortexM3/M4Coldfire
Kinetis
TMS320
ARM7/9AVR32
In a single Family of devices, RX will
Encompass / Exceed these Capabilities
8 © 2010 Renesas Electronics America Inc. All rights reserved.
Sound--Vibration—
AC Signals
Temperature
Streaming
Digital D
ata
Proce
ssed A
udio
Power and
Motor
Control
Streaming
Digital Data
Voltage --
CurrentGraphic Imaging
A single RX MCU can:
• Interpret a multitude of analog and digital input sources
• Generate precision analog and digital outputs in real time
RX Innovation – Single Chip Enablement
9 © 2010 Renesas Electronics America Inc. All rights reserved.
RX Innovation – Single Chip Enablement
One MCU family for many applications
* Photos are examples of end-products that could use an RX600 MCU. RX600 MCUs not
necessarily used in these products.
10 © 2010 Renesas Electronics America Inc. All rights reserved.
RX Microcontrollers … Best of the Best
RX MCUs were conceived and designed from the best CPU
architecture and technology available in the industry today
delivering the perfect blend of:
• CPU and Memory Performance
• Analog and DSP Capability
• Power and Memory Efficiency
• Scalability
• Connectivity
• System Cost
“Best of the Best”
11 © 2010 Renesas Electronics America Inc. All rights reserved.
Agenda
Traditional Architectures
32-bit Choices
RX Architecture
Memory Speed vs. Performance
Comparing with Other 32-bit MCUs
Who’s the Best?
Q & A
12 © 2010 Renesas Electronics America Inc. All rights reserved.
Key Takeaways
By the end of this session you will be able to:
Understand Key MCU Architectural Elements
Understand RX Architecture
Compare RX with Other Architectures
Make an Informed Decision
13 © 2010 Renesas Electronics America Inc. All rights reserved.
MCU, DSP, Digital Signal Controller … What’s the Difference?
Traditional MCUs• Single-Chip Device
• Interrupt Management System
• Fast Interrupt Response
• Efficient General Instructions
• Fine Power Management
• Wide Connectivity Choice
• Rich Supervisory Functions
• Easily Programmed in C
• Simple Low-Cost Tools
• Broad Ecosystem
• Simple Integer Math
Traditional DSPs• Multi-Chip Solution
• Single-Task Oriented
• Slower Interrupt Response
• Very Specific Instructions
• High Power Consumption
• Limited Connectivity Choice
• Few Supervisory Functions
• Complex Software
• More Expensive Special Tools
• Narrow Selection of 3rd Parties
• Hardware Multiply and Divide
• Saturating Math
• 1-Cycle, wide Multiply-Accumulate
• Barrel Shifters
• Simultaneous Code/Data Access
• Floating Point Unit
DSCDSCOptimum Blend of Optimum Blend of
MCU and DSPMCU and DSP
Traditional MCUs• Single-Chip Device
• Interrupt Management System
• Fast Interrupt Response
• Efficient General Instructions
• Fine Power Management
• Wide Connectivity Choice
• Rich Supervisory Functions
• Easily Programmed in C
• Simple Low-Cost Tools
• Broad Ecosystem
• Simple Integer Math
Traditional DSPs• Multi-Chip Solution
• Single-Task Oriented
• Slower Interrupt Response
• Very Specific Instructions
• High Power Consumption
• Limited Connectivity Choice
• Few Supervisory Functions
• Complex Software
• More Expensive Special Tools
• Narrow Selection of 3rd Parties
• Hardware Multiply and Divide
• Saturating Math
• 1-Cycle, wide Multiply-Accumulate
• Barrel Shifters
• Simultaneous Code/Data Access
• Floating Point Unit
14 © 2010 Renesas Electronics America Inc. All rights reserved.
The Evolved DSC, Many Practical Uses
More MCUs are gaining DSC Features MCUs now have better analog capabilities Signal processing is a must Pushes bandwidth limits of traditional MCUs
DSC Applications Motor Control Digital Power Management Audio Codecs Medical Monitoring Factory Automation
Even benefits traditional MCU applications More work in less time
15 © 2010 Renesas Electronics America Inc. All rights reserved.
16/32-bit MCUs and DSCs in the Market
Core VendorCPU
Width (bits)
DMIPS/MHz of CPU Core
Available Frequency
(MHz)
Flash Speed (MHz)
Max Flash Size (KB)
V850ES Renesas 32 1.90 20 - 50 32 1024
ARM CortexM3 Various 32 1.257 60 - 150 <=502 1024
PIC326 Microchip 32 1.56 40 - 80 30 512
ARM7TDMI (Flash) Various 32 0.957 24 - 60 <=308 1024
MCUs
3 Optional FPU4 MIPS, not DMIPS5 MIPS, not DMIPS. 80MHz external clock yields 40MIPS
DSCs
1 Core is capable of, no released product yet2 Based on existing CM3 and CM4 -based MCUs in mass production today
6 Microchip. PIC32MX3XX/4XX Family Data Sheet, DS61143E7 ARM, “An Introduction to the ARM Cortex-M3 Processor”, Oct 2006
8 Renesas 32-bit Flash MCU market assessment 9 Atmel, AVR32 brochure 7919F-AVR32-07/09/5K
10 Atmel, AVR32 Architecture Document 32000B-AVR32-11/07
11 Atmel, AT32UC3A datasheet 32058G-AVR32-01/09
12 ARM, CortexM4 Features Summary, www.arm.com13 ARM, Cortex-M4 Technical Reference Manual r0p014 ST, STR91xFAxxx datasheet 13495 rev 6
15 TI, Data Manual, TMS320F283xx & TMS320F282xx DSCs, SPRS439H, March 2010
17 Freescale, Data Sheet, 56F8323/56F8123 16-bit DSCs, MC56F8323 rev 17, May 2007
18 Microchip, Data Sheet, dsPIC33FJXXXMCX06A/X08A/X10A, 16-bit DSCs, DS70594B, 2009
16 TI, Data Manual, TMS320F280xx MCus, SPRS584D, June 2010
Core VendorCPU
Width (bits)
DMIPS/MHz of CPU Core
Available Frequency
(MHz)
Flash Speed (MHz)
Max Flash Size (KB)
MAC (result width bits)
FPU
(width bits)
SH-2A (Flash) Renesas 32 2.00 100 - 200 100 1024 32 and 64 64RX600 Renesas 32 1.65 80 - 100 100 2046 48 and 80 32
AVR329,10,11 Atmel 32 1.50 40 - 66 33 512 32, 48, and 64 -
ARM CortexM412,13 Various 32 1.25 1501 <=502 1024 32 and 64 323
STR9 ARM966E14 ST 32 1.10 96 33 2048 32 and 64 -TMS320 Delfino (Flash)15 TI 32 n/a 100 - 150 27 512 64 32
TMS320 Piccolo16 TI 32 n/a 40 - 60 25 128 64 -56F8000/830017 Freescale 16 1.004 32 - 60 No spec 512 36 -
dsPIC18 Microchip 16 0.505 60 - 80 No spec 256 40 -
16 © 2010 Renesas Electronics America Inc. All rights reserved.
RX is Best of BothMem-to-Mem instructions
73 Inst + DSP + FPU
10 addressing modes
1 to 8 byte instructions
Up to 28% smaller code
• Any inst accesses memory
• Many rich instructions
• Many addressing modes
• Variable instruction formats
• Smaller code size in memory
• Single register set
• Multi-clock instructions
• Less to no pipelining
• Longer interrupt response
• Only load/store mem access
• Few instructions
• Few addressing modes
• Fixed instruction formats
• Larger code size in memory
• Multiple register sets
• Single-clock instructions
• Highly pipelined
• Faster interrupt response
CISC and RISC
16 x 32-bit registers
One clock per instruction
5-stage pipeline
5-clock interrupt response
Plus it has an FPU.
Let’s Build an RX…
Traditional CISC Complex Instruction Set Computer
GOAL: Small Memory Footprint
Traditional RISC Reduced Instruction Set Computer
GOAL: 1 Clock per Instruction
17 © 2010 Renesas Electronics America Inc. All rights reserved.
Typically SRAM
Typically Flash Memory
RX Flash is 10 nsec, or
100 MHz zero-wait
RX SRAM is also 10 nsec
RX600 CISC CPU5-STAGE PIPELINE
5 STAGES OF PIPELINE
F = FETCH INSTRUCTION
D = DECODE INSTRUCTION
E = EXECUTE INSTRUCTION
M = READ OR WRITE MEMORY
W = WRITE BACK TO REGISTER
Inst64bit path Instruction
Data32bit path Operand
(Data)
ENHANCED HARVARD ARCHITECTURE
WRITE BUFFER
For Slow Memory
PRE-FETCH QUEUE (PFQ)
Holds 4 to 32 Instructions for Slower Memory Memory Interface
64
32
100MHz CPU Core 1.65 DMIPS/MHz
16 x 32bit General Purpose
Registers
9 x 32bit Control
Registers
RX Architecture … CPU Core and Pipeline
32bit Floating Point
Unit
16x16 or 32x32 MAC, 48bit or 80bit
Result
32 x 32 DIV or MULT, 32bit or 64bit Result
Memory Protect
Unit
Interrupt Control
On-Chip Debug
ENHANCED HARVARD ARCHITECTURE
5-STAGE PIPELINE
64
bit
s
64
bit
s
64
bit
s
64
bit
s
Buffer Only for Writes
F D E M W
TIC
K
F D
F
TIC
K
E
D
F
TIC
K
M
E
D
F
TIC
K
W
M
E
D
F
TIC
K
F
W
M
E
D
TIC
K
D
F
W
M
E
TIC
K
E
D
F
W
M
TIC
K
M
E
D
F
W
TIC
K
EE
EE
E
W
M
E
D
F
Achieves One Clock-Per-Instruction (CPI)
EE
18 © 2010 Renesas Electronics America Inc. All rights reserved.
RX Architecture … Memory Interface
SRAM, 100MHz Access
64 bits
Flash Memory, 100MHz Access
64 bits
100 MHz Flash and SRAM means zero wait-state code and data access
PFQ minimizes stalls from slower memory, such as external memory
Bus master of Internal Bus 1 is the CPU
Next we look at Internal Bus 2…
External Bus Pins
for CPU
External Bus
Controller (BSC)
32 bits
Internal Main Bus 132 bits
32 bits
Bus Bridge
Peripherals
RX600 MCU
RX600 CPU
100MHz
PIPELINE PFQ
BUFFER
64b INST
32b DATA
Bus Master of Internal Main Bus 1
BUS MATRIX
19 © 2010 Renesas Electronics America Inc. All rights reserved.
CNTL
Communication (USB, CAN, SCI, SPI, I2C)
Timers (MTU, TPU, TMR, CMT)
Analog (DAC, ADC, PGA) GPIO
System Control (DMA, E2P, ICU, LVD, RTC, WDG,
CLKS)
Multiple Peripheral Busses to Spread Bandwidth Loading
CN
TL
CN
TL
CN
TL
Internal Main Bus 232 bits
DTC (bus master)
Bus Bridge
DMAC (bus master)
Ethernet DMAC (bus
master)
RX Architecture … System Interface
RX600 CPU
100MHz
PIPELINE PFQ
BUFFER
64b INST
32b DATA
External Bus Pins
for CPU
Bus Master of Internal Main Bus 1
64 bits
64 bits
Bus Bridge
EXDMA (external bus master)
32 bits
Internal Main Bus 132 bits
32 bits
RX600 MCU
BUS MATRIX
SRAM, 100MHz Access
Flash Memory, 100MHz Access
External Bus
Controller (BSC)
On
e E
xter
nal
Dev
ice
An
oth
er E
xter
nal
Dev
ice
4 Transfers at one time, plus 2 interleaving!
4 Transfers at one time, plus 2 interleaving!
Ethernet MAC
2K FIFO
FIFO 2K
20 © 2010 Renesas Electronics America Inc. All rights reserved.
1.5
DMIPS per MHz
1.0
RX 1.65 DMIPS/MHz
Note: Dhrystone 2.1 numbers for ARM processors taken from www.arm.com
ARM7
ARM9
Cortex-M3
Cortex-M4
RX CPU Core Performance
21 © 2010 Renesas Electronics America Inc. All rights reserved.
Up to 43% Power Reduction
Low power design techniques• Clock gating
• Low power HVT transistors in slower paths
• Power gating
Low power modes• 500A* per MHz in Run Mode
• All Peripherals ON
• Four Low-Power Modes
• Sleep
• All-Module Stop
• Standby
• Deep Standby
•2.5A* in Deep Standby
• RX63x, RTC ON
Milliwatts* per DMIPS
2.01.0
43% less
= RX600
Note: Derived from IDD specifications stated in product datasheets
= A Cortex-M3 based MCU
* Typical Conditions, 3.3V and 25oC, all peripheral clocks on
22 © 2010 Renesas Electronics America Inc. All rights reserved.
RX600 Instruction Set
= Single clock instruction
23 © 2010 Renesas Electronics America Inc. All rights reserved.
Instruction Length (bytes)
List of Instructions Number of Instructions
1 NOP, RTS, BRK 3
1-3 BCnd 1
1-4 BRA 1
2 RMPA, ROLC, RORC, SAT, SATR, POP, POPC, POPM, PUSHC, PUSHM, JMP, JSR, SCMPU, SMOVB, SMOVF, SMOVU, SSTR, SUNTIL, SWHILE, CLRPSW, RTE, RTFI, SETPSW, WAIT
24
2-3 ABS, NEG, NOT, SHAR, SHLL, SHLR, RTSD 7
2-4 MOVU, PUSH, BSR 3
2-5 SUB, BCLR, BSET, BTST 4
2-6 ADD, AND, CMP, MUL, OR 5
2-8 MOV 1
3 ROTL, ROTR, REVL, REVW, INT, MVFC, MACHI, MACLO, MULHI, MULLO, MVFACHI, MVFACMI, MVTACHI, MVTACLO, RACW
15
3-5 FTOI, ROUND, SCCnd, BMCnd, BNOT 5
3-6 SBB, ITOF, XCHG 3
3-7 DIV, DIVU, EMUL, EMULU, MAX, MIN, TST, XOR, FADD, FCMP, FDIV, FMUL, FSUB, MVTC
14
4-6 ADC 1
4-7 STNZ, STZ 2
6% have minimum
instruction length of 1 byte
49% have minimum
instruction length of 2 bytes
42% have minimum
instruction length of 3 bytes
Total = 89 instructionsMOV instruction length is 2-8 bytes
RX Instruction Set Summary and Size
24 © 2010 Renesas Electronics America Inc. All rights reserved.
Instruction length (bytes)1 4 732 5 86
MOV instruction example
RdopcodeMEMMEM [Rs] [Rd] Rs
Function Source Destination
Rd RsopcodeMEMREG [Rs] Rd
Rd RsopcodeREGMEM Rs [Rd]
#IMM:8 Rdopcode#IMM:8 [Rd]
Rdopcode #IMM:16#IMM:16 [Rd]
Rd RsopcodeREGREG Rs Rd
opcode Rd#IMM:32IMMREG #IMM:32 Rd
opcode Rd#IMM:32#IMM:32 [Rd]
IMMMEM
Rd#IMM:32dsp:16opcode#IMM:32 dsp:16[Rd]
Direct Memory-to-Memory operation
Efficient Addressing Modes
Efficient Addressing Modes
25 © 2010 Renesas Electronics America Inc. All rights reserved.
Example: Moving data in memory
Direct Memory-to-Memory operation allows RX to avoid lengthy load/store operations and results in smaller code size
MOV [r1], [r2]
RX
Code size = 2 bytes
Number of Cycles = 3
2 bytesLDR r3, [r1]
STR r3, [r2] 2 bytes
Traditional RISC
Code size = 4 bytes
Number of Cycles = 4
2 bytes
26 © 2010 Renesas Electronics America Inc. All rights reserved.
Up to 28% Code Size Reduction
Code size (relative)
1.0
28% less
= RX600= A Cortex-M3 based MCU
19% less
17% less
25% less
25% less
Note: Internal benchmark test, your results may vary
Motor control
Data communication
Data conversion
Real-time control
System control
27 © 2010 Renesas Electronics America Inc. All rights reserved.
RX makes Out-of-Order Instruction Decisions
F D E M M WB
F D S S WBE
F DSS WBE
1) MOV [R1], R2
2) ADD R4, R5
3) SUB R4, R5
Instructions
Instructions 2) and 3) delayed, waiting on 1)
WBE
D WBE
F D E M M WB
F D
F
1) MOV [R1], R2
2) ADD R4, R5
3) SUB R4, R5
Delay is Eliminated
S S
SS
• Is possible when there are no dependencies
• Multiple WB within same clock cycle OK if destination is different
CPU Clock
Fetch
Decode
Execute
Memory
Write Back
Stall
28 © 2010 Renesas Electronics America Inc. All rights reserved.
Resolve Interrupt, PC & PSW to Backup Regs
PC&PSW from B/U
Regs, Return
Optional Push Gen
Regs to StackISR
Optional Pop Gen Regs from Stack
RX Fast Interrupt
5 clks typ.
3 clks
Interrupt HandlingIRQ
RX Normal Interrupt
7clks typ.
Resolve Interrupt
PC & PSW
to Stack
Ret-urn
POP PC & PSW from
Stack
Optional Push Gen
Regs to StackISR
Optional Pop Gen Regs from Stack
6 clks
Resolve Interrupt, PC & PSW to Backup Regs
ReturnISR
5 clks typ.
3 clks
RX Fast Interrupt plus Gen Register Usage
General CPU Registers
R0R1R2R3R4R5R6R7R8R9
R10R11R12R13R14R15
Use Registers instead of Stack
= Automatic by CPU = Done by Firmware
Save 5 clocks
Save many clocks
* ARM, Technical Reference Manuals: CortexM3 r1p1, CortexM4 r0p0
29 © 2010 Renesas Electronics America Inc. All rights reserved.
Interrupt HandlingIRQ
Resolve Interrupt, PC & PSW to Backup Regs
ReturnISR
5 clks typ.
3 clks
RX Fast Interrupt plus Gen Register Usage
Resolve Interrupt, PC & PSW to Backup Regs
PC&PSW from B/U
Regs, Return
Optional Push Gen
Regs to StackISR
Optional Pop Gen Regs from Stack
RX Fast Interrupt
5 clks typ.
3 clks
= Automatic by CPU = Done by Firmware
Resolve Interrupt, and Push CPU State
and 5 Regs to Stack
Pop CPU State and 5 regs from Stack, and Return
ISR
12 clks 12 clks
ARM Cortex M3 or M4*
* ARM, Technical Reference Manuals: CortexM3 r1p1, CortexM4 r0p0
Save up to 16 clocks
Zero-wait memory is needed at full CPU speed, else ISR takes longer
Zero-wait memory is needed at full CPU speed, else ISR takes longer
RX Typical Interrupt
7clks typ.
Resolve Interrupt
PC & PSW
to Stack
Ret-urn
POP PC & PSW from
Stack
Optional Push Gen
Regs to StackISR
Optional Pop Gen Regs from Stack
6 clks
30 © 2010 Renesas Electronics America Inc. All rights reserved.
Floating-Point Unit
Dedicated Data Registers
General Registers
Typical Operation
Load/Store
No Load/Store Instructions Needed
RX Operation
General Registers
Floating-Point Unit
FPU directly accesses General Registers
Higher FPU performance
Smaller code size
RX FPU is Single-Precision, 32-bits, IEEE-754
RX FPU is Single-Precision, 32-bits, IEEE-754
31 © 2010 Renesas Electronics America Inc. All rights reserved.
FPU Applications
© 2010 Renesas Electronics America Inc. All rights reserved.31
Pressure regulator
Pump control
Thermo couple conversion
Motion Control
Motor Control
Flow Control
Digital filtering
e
nnn dk )1()( 1 nnn yyd
Low pass filter
yn
1024
1023K
Derivative
dn
dt
d
Low pass filter
1024
1023K
eyky nn 1
)(n
All of these applications require floating point math computation
e
nnn dk )1()( 1 nnn yyd
Low pass filter
yn
1024
1023K
Derivative
dn
dt
d
Low pass filter
1024
1023K
eyky nn 1
)(ne
nnn dk )1()( 1 nnn yyd
Low pass filter
yn
1024
1023K
Derivative
dn
dt
d
Low pass filter
1024
1023K
eyky nn 1
)(n
All of these applications require floating point math computation
32 © 2010 Renesas Electronics America Inc. All rights reserved.
FPU benefits: Two examples
© 2010 Renesas Electronics America Inc. All rights reserved.32
1- Motor Control
FPU removes limitations due to scaling or saturation
Improves accuracy for motor position and speed
Increases motor efficiency
Easy code development and maintenance. Write formulas directly into C code
Reduces CPU loading
Reduces code size
2- Thermocouple Conversion
Sensorless vector motor control
compiled for Fixed Integer vs Floating
Point FPUFPU provides
the best combined
execution time and code size
33 © 2010 Renesas Electronics America Inc. All rights reserved.
FPU Comparison
The FPU provides a dramatic increase in performance and code efficiency over math libraries.
Example: Conversion of thermocouple reading to temperature Thermocouple formula: Temperature = (an * xn)
n = 0 ~ 5; a0 ~ a5 are constants; x is A/D reading
MCUOperating Frequency
(MHZ)
CPU Cycles (count)
Actual Execution
Time (usec)
Execution Time with
Ideal Memory (usec)
Code Size (bytes)
RX600 100 94 0.94 0.94 48
A CM3-based MCU
72 1130 15.7 14.7 892
> 16x Faster
> 18x Smaller
• RX610 MCU: Renesas Compiler v0.02 Alpha, Size Max
• A CM3-based MCU: IAR Compiler v4.42A, Size Max
34 © 2010 Renesas Electronics America Inc. All rights reserved.
DSP Arithmetic Functions
Repeated Multiply and Accumulate (RMPA)
16-bit
16-bit
General register
General register 48-bit
Multiply-Accumulate unit
Multiply and Accumulate (MAC)
Memory (coeffic-ients)
32-bit
32-bit
80-bit
Multiply-Accumulate unit
Memory (ADC
Samples)
Alternatively,
FPU can be used for
floating point DSP
Alternatively,
FPU can be used for
floating point DSP
AccumulateAccumulateAccumulateAccumulateAccumulateAccumulateAccumulate
35 © 2010 Renesas Electronics America Inc. All rights reserved.
60 MHz
2 wait cycles
IF D E M WBIF D E M WBIF D E M WBIF D E M WB
1 wait cycle
IF D E M WBIF D E M WBIF D E M WBIF D E M WB
30 MHz
no wait
IF D E M WBIF D E M WBIF D E M WBIF D E M WB
D E M WBD E M WBD E M WBD E M WB
WW
D E M WBD E M WBD E M WBD E M WBW
W W
W
100 MHz
Pro
cess
ing
perf
orm
ance
MCU frequency
RX with 100 MHz
Flash
Competing MCU with 30 MHz
Flash
Performance and Flash Speed
36 © 2010 Renesas Electronics America Inc. All rights reserved.
FIR Filter, RX600 and a CM3-based MCU
0.000
0.500
1.000
1.500
2.000
2.500
3.000
3.500
4.000
4.500
5.000
16 24 32 40 48 56 64 72 80 88 96 100MCU Operating Frequency (MHz)
Co
mp
leti
on
Tim
e,
10
0 i
tera
tio
ns
of
FIR
A
lgo
rith
m (
us
ec
)
A CM3 MCU Theorectical (73 CPU cycles per Iteration)
A CM3 MCU Actual w/ Memory Acceleration
A CM3 MCU Actual w/o Memory Acceleration
RX600 Theorectical (46 CPU cycles per Iteration)
RX600 Actual
DSP and Benefit of 10nsec Flash
• Theoretical performance with “No-Wait Memory” for this CM3 MCU
• Performance loss due to Flash slower than CPU demand on a CM3 MCU
• Mitigation effect of Memory Acceleration on a CM3 MCU
• Theoretical performance with “No-Wait Memory” for RX600
• Theoretical is Identical to Actual performance for RX600 because of 10nsec Flash
• 8 Tap FIR Filter, 16 x 16 to 32bit accumulate
• RX610 MCU: Renesas compiler v1.0, Speed 2, macro used for RMPA
• A CM3-based MCU: IAR Compiler v5.40.0.315, Speed Max
Lower is
Better
1 wait state
2 wait states
Better, but delay
remains
RX has 63% better
performance
8 Tap FIR Filter16 x 16 to 32-bit accumulate
37 © 2010 Renesas Electronics America Inc. All rights reserved.
Flash-MCU History and Speed
1990 2000 2010 Year
Op
erat
ing
Fre
qu
ency
(M
Hz)
100
10
20051995
Competitors
(0.15um) (90nm)
(40nm)
(0.8um)
(0.5um)
(0.35um)
(0.18um)
Flash-MONOS
MONOS for EEPROM & IC-card
MCU Freq.Renesas Flash Freq.General Flash Freq.
Renesas MONOS reaches100MHz single cycle access
Source: Renesas
38 © 2010 Renesas Electronics America Inc. All rights reserved.
50
Max MHz
100
200
2010Existing MCUs 2011 2012
FamilyFamily
RX600 SeriesRX600 Series32 Bit, 90nm32 Bit, 90nm
Extreme High PerformanceExtreme High PerformanceHigh EfficiencyHigh Efficiency
RX200 SeriesRX200 Series32 Bit, 130 nm32 Bit, 130 nm
High PerformanceHigh PerformanceLow Power / Low VoltageLow Power / Low Voltage
RX600RX60040 nm40 nm100MHz+100MHz+
H8SXH8SX32 Bit32 Bit
R32CR32C32 Bit32 Bit
M16CM16C16 Bit16 Bit
H8SH8S16 Bit16 Bit
RX Family Roadmap
39 © 2010 Renesas Electronics America Inc. All rights reserved.
RX600 System On A Chip
Can Drive Color
TFT-LCD!
Can Drive Color
TFT-LCD!
40 © 2010 Renesas Electronics America Inc. All rights reserved.
RX600 Series Portfolio
LGA64 5x5mm 0.5mm
LQFP64 10x10mm
0.5mm
LQFP80 14x14mm0.65mm
LGA85 7x7mm0.
65mm
LQFP100 14x14mm
0.5mm
LQFP112 20x20mm0.65mm
LQFP144 20x20mm
0.5mm
LGA145 9x9mm 0.65mm
BGA176 13x13mm0.8mm
41 © 2010 Renesas Electronics America Inc. All rights reserved.
RX600 Series - 100Mhz Extreme Performance
RX Migration Between Series
Pins
Flash
32 176
32KB
2MB
RX200 Series - 50Mhz Low Power / Low Voltage
RX600: 500uA/MHz (all peripherals on), 2.5uA RTC Deep Standby, 2.7V to 3.6V
RX200: 200uA/MHz (all peripherals on), <1uA RTC Deep Standby, 1.62V to 3.6V
Common CPU & Peripherals
48 64 80/85 100 112 144/145
1MB
64KB
128KB
256KB
384KB
512KB
Migration Within RX
Family
42 © 2010 Renesas Electronics America Inc. All rights reserved.
RX Solutions
Motor Control, RX62T Drive Sensorless PMAC Motor Field Oriented Control, 3-phase High integration, low system cost
Direct Drive TFT-LCD, RX62N Drive 4.3” Color WQVGA TFT-LCD by RGB Full basic graphic library and demo Source code included
WiFi
802.11b/g/n WiFi, RX62N Simple SPI connection to WiFi module Kit contains driver and examples Very low power 802.11b/g/n connectivity
Connectivity, RX62N RDK Ethernet, USB Host/Device/USB, CAN Many surrounding functions/features Source code, built-in JTAG debugger
See www.am.renesas.com/rx for details
43 © 2010 Renesas Electronics America Inc. All rights reserved.
RX Tools for SolutionsSee www.am.renesas.com/rx for details
Hi-Speed Trace• JTAG, USB-HS, plus 6 lines connection• Trace depth: - 2M branches/cycles• SRAM monitor, 4 KB
Hi-Speed Trace• JTAG, USB-HS, plus 6 lines connection• Trace depth: - 2M branches/cycles• SRAM monitor, 4 KB
On-Chip Debug• JTAG and USB-HS connection• Program Flash• Single step execution• 256 Software break points• 12 Hardware breakpoints• PC and data breakpoints• On-chip Trace - 256 branches/cycles• Read/Write SRAM• Read/Write C variables• Performance monitoring• Non-intrusive• Hot-plug capable
On-Chip Debug• JTAG and USB-HS connection• Program Flash• Single step execution• 256 Software break points• 12 Hardware breakpoints• PC and data breakpoints• On-chip Trace - 256 branches/cycles• Read/Write SRAM• Read/Write C variables• Performance monitoring• Non-intrusive• Hot-plug capable
E1
E20
$99*
$995*
HEW4Plus Renesas C/C++ $1200*
Single Integrated Development & Debugging Environment
HEW4 also supports GNU-RX C/C++ compiler, all at $0
Wide 3rd Party Support for IDE, Compilers, Middleware, RTOS:• Micrium, IAR, Segger, CMX, KPIT Cummings, freeRTOS, and more
* Suggested resale price when sold individually
44 © 2010 Renesas Electronics America Inc. All rights reserved.
Feature Unit RX600 CortexM31 CortexM42 AVR32A3 PIC324
CPU Type - CISC, DSC RISC, MCU RISC, DSC RISC, DSC RISC, MCU
Performance DMIPS/MHz 1.65 1.25 1.25 1.50 1.50
Pipeline Length Stages 5 3 3 3 5
Inst Lengths Bytes 1 to 8 2 and 4 2 and 4 2 and 4 2 and 4
# of Instructions For CPU,DSP 80, 9 97,3 97,83 115,8 129, 2
FPU # of instructions Yes, 8 No, 0 Option, 25 No, 0 No, 0
General Regs # of regs, bits 15 x 32 12 x 32 12 x 32 13 x 32 27 x 32
Min Intr Latency CPU Clocks 7 or 5 12 or 6 12 or 6 12 or 2 12 instructions
MPU - Option Option Option Option No
Bit Manipulation - Yes Yes Yes Yes Yes
Debug ConnectionJTAG or
2-wireJTAG or
2-wireJTAG or
2-wireJTAG JTAG
Hi-Speed Trace Connection 6-wire 6-wire 6-wire 12-wire 4,8,or 16-wire
Comparing other 32-bit CPU Architectures
1 ARM, CortexM3 Technical Reference Manual Revision:r1p1, ARMv7-M Architecture Reference Manual DDI 0403C_errata_v32 ARM, CortexM4 Technical Reference Manual Revision:r0p0, ARMv7-M Architecture Reference Manual DDI 0403C_errata_v33 Atmel, AVR32C Technical Reference Manual 32002A-AVR32-03/074 Microchip, PIC32MX Family Reference Manual DS611271C. MIPS Technology, MIPS32 Architecture for Programmers Vol II: MIPS32 Instruction Set, rev 2.5, MIPS32 MK4 Processor Core Datasheet, Rev 02.01
References:
45 © 2010 Renesas Electronics America Inc. All rights reserved.
Who’s the Best? You Decide based on what you have seen. To help your decision, here are publicly released benchmark
results based on widely acknowledged CoremarkTM from EEMBC.
*Vendor *Processor Type*CPU Freq (MHz)
*CoreMark / MHz *CoreMark *Compiler Comment
Microchip PIC32MX360F512L MCU 30 2.599 78 GCC 4.3.2 Only 30 MHz operation
Microchip PIC32MX360F512L MCU 80 2.297 184 GCC 4.3.2 Negative effect of slow Flash
Renesas RX610 DSC 100 2.240 224 GNURX 201009
Full speed with no loss of performance
TI Stellaris LM3S9B96 CortexM3 MCU 50 1.921 96 Keil
V4.0.0.524
ST STM32 CortexM3 120MHz. 90nm MCU 120 1.905 229 KEIL
4.0.0.524Has new “ART” memory
accelerator
Microchip PIC24HJ128GP202 MCU 40 1.862 74 GCC4.0.3
ST STM32F103RB CortexM3 MCU 24 1.797 43 GCC 4.4.1
NXP LPC1768 MCU 100 1.753 175 ARMCC 4.0
TI Stellaris LM3S9B96 CortexM3 MCU 80 1.596 127 Keil
V4.0.0.524Negative effect of slow
Flash
ST STM32F103RB CortexM3 MCU 72 1.504 108 GCC 4.4.1 Negative effect of slow
Flash
Freescale ColdFire MCF52233 MCU 60 1.038 62 IAR EW 1.20
Freescale ColdFire MCF5274 MCU 150 0.773 115 GCC4.1.1
*Source: www.coremark.org as of 1 Sep 2010
Larger Coremark score is Better
Larger Coremark score is BetterSorted by
CoreMark/MHz
46 © 2010 Renesas Electronics America Inc. All rights reserved.
Who’s the Best? Now sorted by raw Coremark, not Coremark/MHz
*Vendor *Processor Type*CPU Freq (MHz)
*CoreMark / MHz *CoreMark *Compiler Comment
ST STM32 CortexM3 120MHz. 90nm MCU 120 1.905 229 KEIL
4.0.0.524Much Higher CPU freq needed for same result
Renesas RX610 DSC 100 2.240 224 GNURX 201009
Positive effect of efficient CPU and fast Flash
Microchip PIC32MX360F512L MCU 80 2.297 184 GCC 4.3.2
NXP LPC1768 MCU 100 1.753 175 ARMCC 4.0
TI Stellaris LM3S9B96 CortexM3 MCU 80 1.596 127 Keil
V4.0.0.524
Freescale ColdFire MCF5274 MCU 150 0.773 115 GCC4.1.1
ST STM32F103RB CortexM3 MCU 72 1.504 108 GCC 4.4.1
TI Stellaris LM3S9B96 CortexM3 MCU 50 1.921 96 Keil
V4.0.0.524
Microchip PIC32MX360F512L MCU 30 2.599 78 GCC 4.3.2
Microchip PIC24HJ128GP202 MCU 40 1.862 74 GCC4.0.3
Freescale ColdFire MCF52233 MCU 60 1.038 62 IAR EW 1.20
ST STM32F103RB CortexM3 MCU 24 1.797 43 GCC 4.4.1
*Source: www.coremark.org as of 1 Sep 2010
Larger Coremark score is Better
Larger Coremark score is BetterSorted by
CoreMark/MHz
47 © 2010 Renesas Electronics America Inc. All rights reserved.
Questions
1: What is the read access time of RX600 Flash Memory?
10 nsec (100MHz) across entire voltage range 2.7V to 3.6V
1.65 DMIPS/MHz, and 1mW/DMIPS
2: How many DMIPS/MHz does RX600 produce, and how many mW/DMIP does it consume?
3: What does the RMPA instruction do?
Repeat Multiply Accumulate. One instruction automatically multiplies data from two different memory arrays, and adds result to 80-bit accumulator, then post-increments to next two values. Repeats until specified array length is met. DSP!!
48 © 2010 Renesas Electronics America Inc. All rights reserved.
Innovation – Single Chip Enablement
One MCU Family for many applications
See www.am.renesas.com/rx for details
© 2010 Renesas Electronics America Inc. All rights reserved.
49
Thank You!
www.am.renesas.com/rx
Renesas Electronics America Inc.