apm collateral - embeddedarmtechforum.com.cn/2013/2_whichcortex-mprocessor.pdf · programmer’s...
TRANSCRIPT
Which Cortex-M? A Selection Guide
Joseph Yiu
Senior Embedded Technology Specialist, ARM
1
Agenda
Overview of the whole ARM processor family
Cortex®-M processor family
Key features of Cortex-M
Programmer’s model
Instruction Set
Exceptions and NVIC
System features
Performance
Debug features
Summary
2
The ARM Processor Family
Many cores are developed over the years
3
Application
Processors
(with MMU,
support Linux,
MS mobile OS)
Real Time
Processors
Microcontrollers
and deeply
embedded
System capability &
performance
ARM7TM series
ARM920TTM,
ARM940TTM ARM946TM,
ARM966TM
ARM926TM
Cortex-M3
Cortex-M1
(FPGA)
Cortex-M0 Cortex-M0+
Cortex-M4
Cortex-R4
Cortex-R5
Cortex-R7
Cortex-A8
Cortex-A9
Cortex-A5
Cortex-A15
Cortex-A7
ARM Cortex Processors Classic ARM Processors
Cortex-A57
Cortex-A53
Cortex-A12
ARM11TM
series
CONFIDENTIAL
Cortex-M Main Stream Microcontroller Processors
Cortex-M Configurable
Low Silicon Area Ease of Use
Deterministic WIDELY ADOPTED
MARKET PROVEN
HIGH VOLUME
Energy Efficient
Cortex-M0
“8/16-bit” applications
Lowest cost
Cortex-M0+
Cortex-M3 Cortex-M4
“32-bit/DSC” applications
MCU plus DSP
Accelerated SIMD, FP-SP
“16/32-bit” applications
Performance efficiency
Feature rich connectivity
“8/16-bit” applications
Lowest power
Outstanding Energy efficiency
Key features
Various range of the Thumb instruction set
Baseline programmer’s model
Exception model and interrupt control (NVIC)
Sleep modes
OS supports
Same baseline debug support with various extras
Ease of use
Baseline Programmer’s Model
Register banks
R0 to R15
R13 – Stack Pointers
R14 – Link Register
R15 – Program Counter
Special registers
Program Status Registers (PSR)
CONTROL
PRIMASK
Cortex-M3/M4 has additional
mask registers
6
Two Levels
Privileged
Unprivileged
Instruction Set
7
Floating Point
General data processing
I/O control tasks
Advanced data processing
Bit field manipulations
DSP (SIMD, fast MAC)
ARMv6-M
Architecture
ARMv7-M
Architecture
Instruction Set Comparison
8
Cortex-M0/M0+ Cortex-M3 Cortex-M4
Baseline Thumb ISA (mostly 16-bit) Y Y Y
Multiply 32 bit results only 32/64 bit results 32/64 bit results
Hardware divide Y Y
High register (R8-R12) utilisation Y Y
Advanced instructions for memory accesses Y Y
Table branch, compare & branch, conditional exec Y Y
Bit field operations Y Y
Unaligned accesses Y Y
Exclusive access Y Y
Multiply accumulate Y Y (single cycle)
Saturation arithmetic Adjust only Full set
SIMD, DSP Y
Float point Optional
Performance
DSP
Performance
Integer divide
SIMD (Single Instruction Multiple Data)
Handle 4x 8-bit, or 2x 16-bit with single instruction
9
int8_t
int8_t
int8_t
int8_t
31
0
int8_t
int8_t
int8_t
int8_t
31
0
Rn
Rm
int8_t
int8_t
int8_t
int8_t
31
0
Rd
+
+
+
+Signed
saturation
Saturation
bit
position
8
Signed
saturation
Signed
saturation
Signed
saturation
QADD8 {<Rd>,} <Rn>, <Rm>
int16_t
int16_t
31
0
31
0
Rn
Rm
int16_t
int16_t
+
+
Signed
saturation
Saturation
bit
position
16
Signed
saturation
Saturation
bit
position
16
int16_t
int16_t
31
0
Rd
QADD16 {<Rd>,} <Rn>, <Rm>
Nested Vector Interrupt Controller (NVIC)
NVIC
Supports multiple sources
Vectored Interrupt/Exception Services
Stack base exception handling
Automatic nested IRQ handling
Prioritization
Masking
Interrupt Priority
Cortex-M0/M0+ : 4 prog levels
Cortex-M3/M4 : 8 to 256 prog levels
10
NVIC
SysTick
(System Tick
Timer)
Peripherals
NMI
IRQs
Cortex-M
processor
Core
Configuration
registers
Internal bus interconnect
System
exceptions
Bus interface
Peripheral
Exception Types
Some exceptions are on ARMv7-M only
11
11 SVCall Programmable System SerVice call
12 Debug Monitor Programmable Break points, watch points, external debug
14 PendSV Programmable Pendable SerVice request for System Device
15 Systick Programmable System Tick Timer
16 Interrupt #0 Programmable External Interrupt #0
255 Interrupt #239 Programmable External Interrupt #239
Exception Name Priority Descriptions
1 Reset -3 (Highest) Reset
2 NMI -2 Non-Maskable Interrupt
3 Hard Fault -1 Default fault if other hander not implemented
4 MemManage Fault Programmable MPU violation or access to illegal locations
5 Bus Fault Programmable Fault if AHB interface receives error
6 Usage Fault Programmable Exceptions due to program errors
… … … …
Fau
lt M
ode &
Star
t-up
Han
dle
rs
Syst
em
Han
dle
rs
Cust
om
Han
dle
rs
… … … … … … … …
Nested Vector Interrupt Controller (NVIC)
Number of Interrupts
Up to 32 IRQs + NMI in Cortex-M0/M0+
Up to 240 IRQs + NMI in Cortex-M3/M4
Priority levels
12
0x00
0x40
7 6
0x80
0xC0
Higher priority
HardFault
NMI
IRQs
-1 -2
0
0x00 0x20
7 6
0x40
0xE0
Higher priority
HardFault
NMI
IRQs
-1 -2
0 ARMv6-M ARMv7-M Priority level register Priority level register
Up to 256 levels Up to 4 levels
Interrupt Latency
Assume zero memory wait state
Affected by
System level design (worst case wait state in a system)
Other Interrupt services
Do not forgot to consider processing time of ISRs
13
Interrupt latency (cycles)
Cortex-M0 16
Cortex-M0+ 15
Cortex-M3 12
Cortex-M4 12
Include stacking of 8
registers in the process
Harvard bus architecture
enables stacking and
vector fetch/program
fetches take place in
parallel
Vector Table Relocation
Vector table
Store starting addresses of ISRs/handlers
By default locate at 0x00000000
Vector Table Offset Register (VTOR)
Define vector table at different addresses
Available on Cortex-M0+, M3, M4
Useful for
Boot loader
Faster vector table fetch (e.g. slow flash)
Dynamic changing of handlers
Execute programs from RAM
14
0x00000000
0xFFFFFFFF
Flash
SRAM
Vector table
Memory
System Control
Space (e.g. NVIC)
Peripherals
NVIC Comparison
15
Cortex-M0 Cortex-M0+ Cortex-M3/M4
Number of IRQs Up to 32 Up to 32 Up to 256
NMI Y Y Y
SysTick timer Optional Optional Y
Fault handlers 1 1 4
VTOR - Optional Y
Programmable priority levels 4 4 8 to 256
Software Trigger Interrupt reg - - Y
Active status registers - - Y
Register R/W 32-bit only 32-bit only 8/16/32-bit
Dynamic Priority change - - Y
CMSIS-Core support Y Y Y
Use Interrupt Set Pending
Register (ISPR) instead
Use CMSIS-Core APIs for
portable code
Disable IRQ temporarily
when changing priority
System Features
Cortex-M0 Cortex-M0+ Cortex-M3/M4
NMI (Non-Maskable Int.) Yes Yes Yes
Sleep modes Yes Yes Yes
WIC, SRPG support Yes Yes Yes
OS support (SVC, PendSV) Yes Yes Yes
SysTick Optional Optional Yes
MPU + Unprivileged - Optional (0 or 8 regions) Optional (0 or 8 regions)
Fault Exception/Handler HardFault HardFault HardFault+3 configurable
Fault Status Registers - - Yes
Bit Band - - Yes (Optional)
Single cycle I/O interface - Yes (optional) -
16
Low Power support
OS support
Advanced OS support
I/O related
Fault handling
Performance considerations
CoreMark® and Dhrystone (per MHz)
Results depends on C compiler used
Might not be a good indication of the
performance for your real world applications
17
Dhrystone
(official)
Dhrystone
(max opt)
CoreMark
Cortex-M0 0.84 1.21 2.33
Cortex-M0+ 0.94 1.31 2.42
Cortex-M3 1.25 1.89 3.32
Cortex-M4 1.25 1.95 3.40
* CoreMark data from ARM website and CoreMark.org website Cortex-M3/M4 can be faster because of
• Richer instruction set
• Harvard bus architecture
• Write buffer
• Speculative fetch of branch targets
Cortex-M0+ can be faster for simple
I/O control tasks because of
• Shorter pipeline
• Single cycle I/O interface Actual performance depends on
• System level design
• Memory wait states
• Clock configurations, etc
Debug and Trace Features comparison
18
Debug & Trace features Cortex-M0 Cortex-M0+ Cortex-M3 Cortex-M4
Halt, resume, single step Yes Yes Yes Yes
On-the-fly memory accesses Yes Yes Yes Yes
HW Breakpoint / Watchpoints 4 / 2 4 /2 8 / 4 8 / 4
Instruction Trace - Yes (MTB) Yes (ETM) Yes (ETM)
Data / Event / Instrumentation
Trace
- - Yes Yes
Profiling counters - - Yes Yes
Embedded Trace Macrocell
Real-time, unlimited instruction
trace using Trace Port
connection
Micro Trace Buffer
Limited history of
instruction trace using
Serial Wire or JTAG
Standard
Debug support
Data / Event / Instrumentation trace
Data trace
Supported on Cortex-M3/M4
Capture data value when accessed
Event trace
History & timing of exceptions/interrupts
Instrumentation trace
Software generated trace message
(e.g. printf)
19
Summary - Which architecture suits my applications?
ARMv6-M (Cortex-M0/M0+)
Efficient instruction set for simple
applications
Ultra low power
Not full optimized for speed
◦ Limited R8-R12 access
◦ Fewer memory addressing modes
◦ Smaller immediate data, offset size
Ideal for
◦ 8-bit/16-bit replacement
◦ Ultra-low power designs
ARMv7-M (Cortex-M3/M4)
Powerful instruction set (at a cost of larger
size, power)
Cortex-M3 is well capable for most
applications
High performance, low power
Unaligned data support
Rich debug & trace features
Cortex-M4 provides additional advantages
◦ SIMD, DSP
◦ Floating point
20
Wide choices of Cortex-M processors
Various level of features, performance
From 8-bit/16-bit low cost MCU replacements
To high performance microcontrollers with over 200MHz
Comprehensive debug and trace features
Standard debug operations available on all Cortex-M processors
Advanced trace features on Cortex-M3/M4
Instruction trace available on Cortex-M0+
Easy to use
Program in C
Wide choices of tools, low cost development boards
21