arm cortex m4 architecture

Upload: tran-vo-khoi-nguyen

Post on 06-Jan-2016

220 views

Category:

Documents


13 download

DESCRIPTION

ARM Cortex M4 ArchitectureARM Cortex M4 ArchitectureARM Cortex M4 Architecture

TRANSCRIPT

  • erCortex-M4 Architecture and ASM ProgrammingIntroductionIn this chapter programming the Cortex-M4 in assembly and Cwill be introduced. Preference will be given to explaining codedevelopment for the STM32F4 Discovery and LPC4088 QuickStart. The basis for the material presented in this chapter is thecourse notes from the ARM LiB program1.

    Overview Cortex-M4 Memory Map

    Cortex-M4 Memory Map Bit-band Operations Cortex-M4 Program Image and Endianness

    ARM Cortex-M4 Processor Instruction Set ARM and Thumb Instruction Set Cortex-M4 Instruction Set

    1.LiB Low-level Embedded NXP LPC4088 Quick Start

    Chapt

    3

  • Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Memory Map The Cortex-M4 processor has 4 GB of memory address space

    Support for bit-band operation (detailed later) The 4GB memory space is architecturally defined as a num-

    ber of regions Each region is given for recommended usage Easy for software programmer to port between different

    devices Nevertheless, despite of the default memory map, the actual

    usage of the memory map can also be flexibly defined by theuser, except some fixed memory addresses, such as internalprivate peripheral bus32 ECE 5655/4655 Real-Time DSP

  • M4 Memory Map (cont.)M4 Memory Map (cont.)

    Priv

    ate

    peri

    pher

    als

    e.g.

    NV

    IC, S

    CS

    Mai

    nly

    used

    for

    exte

    rnal

    per

    iphe

    rals

    e.g.

    SD c

    ard

    Mai

    nly

    used

    for

    exte

    rnal

    mem

    orie

    se.

    g. ex

    tern

    al D

    DR

    , FLA

    SH, L

    CD

    Mai

    nly

    used

    for

    on-c

    hip

    peri

    pher

    als

    e.g.

    AH

    B, A

    PB p

    erip

    hera

    ls

    Mai

    nly

    used

    for

    data

    mem

    ory

    e.g.

    on-c

    hip

    SRA

    M, S

    DR

    AM

    Mai

    nly

    used

    for

    prog

    ram

    cod

    e e.

    g. on

    -chi

    p FL

    ASH

    Vend

    or s

    peci

    ficM

    emor

    y

    Exte

    rnal

    dev

    ice

    Exte

    rnal

    RA

    M

    Peri

    pher

    als

    SRA

    M

    Cod

    e

    0xFF

    FFFF

    FF

    0xE0

    0000

    00Pr

    ivat

    e Pe

    riph

    eral

    Bus

    (PPB

    )0x

    DFF

    FFFF

    F

    0xA

    0000

    000

    0x9F

    FFFF

    FF

    0x60

    0000

    000x

    5FFF

    FFFF

    0x40

    0000

    000x

    3FFF

    FFFF

    0x1F

    FFFF

    FF0x

    2000

    0000

    0x00

    0000

    00

    512M

    B

    512M

    B

    512M

    B

    1GB

    1GB

    512M

    B0x

    E00F

    FFFF

    0xE0

    1000

    00R

    eser

    ved

    for

    othe

    r pu

    rpos

    esRO

    M t

    able

    Exte

    rnal

    PPB

    Embe

    dded

    trac

    e m

    acro

    cell

    Trac

    e po

    rt in

    terf

    ace

    unit

    Res

    erve

    d

    Syst

    em C

    ontr

    ol S

    pace

    , inc

    ludi

    ngN

    este

    d Ve

    ctor

    ed In

    terr

    upt

    Con

    trol

    ler

    (NV

    IC)

    Res

    erve

    dFe

    tch

    patc

    h an

    d br

    eakp

    oint

    uni

    t

    Dat

    a w

    atch

    poin

    tan

    d tr

    ace

    unit

    Inst

    rum

    enta

    tion

    trac

    e m

    acro

    cell

    Exte

    rnal

    PPB

    Inte

    rnal

    PPBECE 5655/4655 Real-Time DSP 33

  • Chapter 3 Cortex-M4 Architecture and ASM ProgrammingM4 Memory Map (cont.) Code Region

    Primarily used to store program code Can also be used for data memory On-chip memory, such as on-chip FLASH

    SRAM Region Primarily used to store data, such as heaps and stacks Can also be used for program code On-chip memory; despite its name SRAM, the actual

    device could be SRAM, SDRAM or other types Peripheral Region

    Primarily used for peripherals, such as Advanced High-performance Bus (AHB) or Advanced Peripheral Bus(APB) peripherals

    External RAM Region Primarily used to store large data blocks, or memory

    caches Off-chip memory, slower than on-chip SRAM region

    External Device Region Primarily used to map to external devices Off-chip devices, such as SD card

    Internal Private Peripheral Bus (PPB)34 ECE 5655/4655 Real-Time DSP

  • Cortex-M4 Memory Map Example Used inside the processor core for internal control Within PPB, a special range of memory is defined as Sys-

    tem Control Space (SCS) The Nested Vectored Interrupt Controller (NVIC) is part of

    SCS

    Cortex-M4 Memory Map Example

    AHB bus

    External SRAM,FLASH External LCD SD card

    Cortex-M4 PPB SCS NVICDebug Ctrl

    On-chip FLASH(Code Region)

    On-chip SRAM(SRAM Region) Peripheral Region

    External memory interface(External RAM Region)

    External device interface(External Device Region)

    Timer UART GPIO

    Chip SiliconECE 5655/4655 Real-Time DSP 35

  • Chapter 3 Cortex-M4 Architecture and ASM ProgrammingBit-band Operations Bit-band operation allows a single load/store operation to

    access a single bit in the memory, for example, to change asingle bit of one 32-bit data: Normal operation without bit-band (read-modify-write) Read the value of 32-bit data Modify a single bit of the 32-bit value (keep other bits

    unchanged) Write the value back to the address Bit-band operation Directly write a single bit (0 or 1) to the bit-band alias

    address of the data Bit-band alias address

    Each bit-band alias address is mapped to a real dataaddress

    When writing to the bit-band alias address, only a singlebit of the data will be changed36 ECE 5655/4655 Real-Time DSP

  • Bit-band Operation ExampleBit-band Operation Example For example, in order to set bit[3] in word data in address

    0x20000000:

    Read-Modify-Write operation Read the real data address (0x20000000) Modify the desired bit (retain other bits unchanged) Write the modified data back

    Bit-band operation Directly set the bit by writing 1 to address 0x2200000C,

    which is the alias address of the fourth bit of the 32-bitdata at 0x20000000

    In effect, this single instruction is mapped to 2 bus trans-fers: read data from 0x20000000 to the buffer, and thenwrite to 0x20000000 from the buffer with bit [3] set

    ;Read-Modify-Write Operation

    LDR R1, =0x20000000 ;Setup addressLDR R0, [R1] ;ReadORR.W R0, #0x8 ;Modify bitSTR R0, [R1] ;Write back

    ;Bit-band Operation

    LDR R1, =0x2200000C ;Setup addressMOV R0, #1 ;Load dataSTR R0, [R1] ;WriteECE 5655/4655 Real-Time DSP 37

  • Chapter 3 Cortex-M4 Architecture and ASM ProgrammingBit-band Alias AddressEach bit of the 32-bit data is one-to-one mapped to the bit-bandalias address

    For example, the fourth bit (bit [3]) of the data at0x20000000 is mapped to the bit-band alias address at0x2200000C

    Hence, to set bit [3] of the data at 0x20000000, we onlyneed to write 1 to address 0x2200000C

    In Cortex-M4, there are two pre-defined bit-band aliasregions: one for SRAM region, and one for peripheralsregion

    0x20000000

    0x20000004

    0x20000008

    Real 32-bit data address

    0x22000000

    0x22000080

    0x22000100

    0x2200000C

    0x22000018

    Bit-band alias address38 ECE 5655/4655 Real-Time DSP

  • Bit-band Alias Address (cont.)Bit-band Alias Address (cont.) SRAM region

    32MB memory space (0x22000000 0x23FFFFFF) isused as the bit-band alias region for 1MB data(0x20000000 0x200FFFFF)

    Peripherals region 32MB memory space (0x42000000 0x43FFFFFF) is

    used as the bit-band alias region for 1MB data(0x40000000 0x400FFFFF)

    External RAM

    Peripherals

    SRAM

    Code

    0x600000000x5FFFFFFF

    0x400000000x3FFFFFFF

    0x1FFFFFFF0x20000000

    0x00000000

    512MB

    512MB

    512MB

    1MB Bit-band region

    32MB Bit-band alias

    31MB non-bit-band region

    0x20000000

    0x20100000

    0x22000000

    0x23FFFFFF

    0x21FFFFFF

    1MB Bit-band region

    32MB Bit-band alias

    31MB non-bit-band region

    0x40000000

    0x40100000

    0x42000000

    0x43FFFFFF

    0x41FFFFFFECE 5655/4655 Real-Time DSP 39

  • Chapter 3 Cortex-M4 Architecture and ASM ProgrammingBenefits of Bit-Band Operations Faster bit operations Fewer instructions Atomic operation, avoid hazards

    For example, if an interrupt is triggered and served duringthe Read-Modify-Write operations, and the interrupt ser-vice routine modifies the same data, a data conflict willoccur

    Interrupt occurs

    Read data at 0x00 Modify bit [1]

    Read data at 0x00 Modify bit [1] Write data back

    Write data back

    Interrupt returns

    Bit [1] modified by ISR is overwritten by the main program

    Interrupt Service Routine

    Main program310 ECE 5655/4655 Real-Time DSP

  • Cortex-M4 Program ImageCortex-M4 Program Image The program image in Cortex-M4 contains

    Vector table -- includes the starting addresses of exceptions(vectors) and the value of the main stack point (MSP);

    C start-up routine; Program code application code and data; C library code program codes for C library functions

    0x00000000

    Code region

    Start-up routine &Program code &C library code

    Vector table

    ProgramImage

    Initial MSP valueReset vectorNMI vector

    Hard fault vectorMemManage fault

    ReservedPendSVSysTick

    External Interrupts

    Bus faultUsage fault

    Reserved

    SVCallDebug monitorECE 5655/4655 Real-Time DSP 311

  • Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Program Image (cont) After Reset, the processor:

    First reads the initial MSP value; Then reads the reset vector; Branches to the start of the programme execution address

    (reset handler); Subsequently executes program instructions

    Reset

    Fetch initial value for MSP(Read address 0x00000000)

    Fetch reset vector(Read address 0x00000004)

    Fetch 1st instruction(Read address of reset vector)

    Fetch 2nd instruction(Read subsequent instructions)312 ECE 5655/4655 Real-Time DSP

  • Cortex-M4 EndiannessCortex-M4 Endianness Endian refers to the order of bytes stored in memory

    Little endian: lowest byte of a word-size data is stored inbit 0 to bit 7

    Big endian: lowest byte of a word-size data is stored in bit24 to bit 31

    Cortex-M4 supports both little endian and big endian However, Endianness only exists in the hardware level

    Byte0 Byte1 Byte2 Byte3

    Byte0 Byte1 Byte2 Byte3

    Byte0 Byte1 Byte2 Byte3

    [7:0][15:8][23:16][31:24]

    Word 1

    Word 2

    Word 3

    Big endian 32-bit memory

    Byte0Byte1Byte2Byte3

    Byte0Byte1Byte2Byte3

    Byte0Byte1Byte2Byte3

    0x00000000

    0x00000004

    0x00000008

    Address [7:0][15:8][23:16][31:24]

    Word 1

    Word 2

    Word 3

    Little endian 32-bit memoryECE 5655/4655 Real-Time DSP 313

  • Chapter 3 Cortex-M4 Architecture and ASM ProgrammingARM and Thumb Instruction Set Early ARM instruction set

    32-bit instruction set, called the ARM instructions Powerful and good performance Larger program memory compared to 8-bit and 16-bit pro-

    cessors

    Larger power consumption Thumb-1 instruction set

    16-bit instruction set, first used in ARM7TDMI processorin 1995

    Provides a subset of the ARM instructions, giving bettercode density compared to 32-bit RISC architecture

    Code size is reduced by ~30%, but performance is alsoreduced by ~20%314 ECE 5655/4655 Real-Time DSP

  • ARM and Thumb Instruction Set (cont.)ARM and Thumb Instruction Set (cont.) Mix of ARM and Thumb-1 Instruction sets

    Benefit from both 32-bit ARM (high performance) and 16-bit Thumb-1 (high code density)

    A multiplexer is used to switch between two states: ARMstate (32-bit) and Thumb state (16-bit), which requires aswitching overhead

    Thumb-2 instruction set Consists of both 32-bit Thumb instructions and original 16-

    bit Thumb-1 instruction sets Compared to 32-bit ARM instructions set, code size is

    reduced by ~26%, while keeping a similar performance Capable of handling all processing requirements in one oper-

    ation state

    IncomingInstructions Thumb remap

    to ARM

    ARMInstructiondecoder

    InstructionsExecuting

    T bit, 0: select ARM,1: select Thumb

    0

    1ECE 5655/4655 Real-Time DSP 315

  • Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Instruction Set Cortex-M4 processor

    ARMv7-M architecture Supports 32-bit Thumb-2 instructions Possible to handle all processing requirements in one oper-

    ation state (Thumb state) Compared with traditional ARM processors (use state

    switching), advantages include:* No state switching overhead both execution time and instruc-

    tion space are saved* No need to separate ARM code and Thumb code source files,

    which makes the development and maintenance of softwareeasier

    * Easier to get optimized efficiency and performance316 ECE 5655/4655 Real-Time DSP

  • Cortex-M4 Instruction Set (cont.)Cortex-M4 Instruction Set (cont.) ARM assembly syntax:

    labelmnemonic operand1,operand2, ; Comments

    Label is used as a reference to an address location; Mnemonic is the name of the instruction; Operand1 is the destination of the operation; Operand2 is normally the source of the operation; Comments are written after ; , which does not affect the

    program; For exampleMOVSR3, #0x11;Set register R3 to 0x11

    Note that the assembly code can be assembled by eitherARM assembler (armasm) or assembly tools from a vari-ety of vendors (e.g. GNU tool chain). When using GNUtool chain, the syntax for labels and comments is slightlydifferentECE 5655/4655 Real-Time DSP 317

  • Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Instruction Set Tables

    Mne

    mon

    icO

    pera

    nds

    Bri

    ef d

    escr

    ipti

    onFl

    ags

    AD

    C, A

    DC

    S{R

    d,} R

    n, O

    p2A

    dd w

    ith C

    arry

    N,Z

    ,C,V

    AD

    D, A

    DD

    S{R

    d,} R

    n, O

    p2A

    ddN

    ,Z,C

    ,V

    AD

    D, A

    DD

    W{R

    d,} R

    n, #

    imm

    12A

    ddN

    ,Z,C

    ,V

    AD

    RR

    d, la

    bel

    Load

    PC

    -rel

    ativ

    e A

    ddre

    ss

    AN

    D, A

    ND

    S{R

    d,} R

    n, O

    p2Lo

    gica

    l AN

    DN

    ,Z,C

    ASR

    , ASR

    SR

    d, R

    m,

    Ari

    thm

    etic

    Shi

    ft R

    ight

    N,Z

    ,C

    Bla

    bel

    Bran

    ch

    BFC

    Rd,

    #ls

    b, #

    wid

    thBi

    t Fie

    ld C

    lear

    BFI

    Rd,

    Rn,

    #ls

    b, #

    wid

    thBi

    t Fie

    ld In

    sert

    BIC

    , BIC

    S{R

    d,} R

    n, O

    p2Bi

    t Cle

    arN

    ,Z,C

    BKPT

    #im

    mBr

    eakp

    oint

    BLla

    bel

    Bran

    ch w

    ith L

    ink

    BLX

    Rm

    Bran

    ch in

    dire

    ct w

    ith L

    ink

    BXR

    mBr

    anch

    indi

    rect318 ECE 5655/4655 Real-Time DSP

  • Cortex-M4 Instruction Set TablesCortex-M4 Instruction Set Tables (cont.)

    Mne

    mon

    icO

    pera

    nds

    Bri

    ef d

    escr

    ipti

    onFl

    ags

    CBN

    ZR

    n, la

    bel

    Com

    pare

    and

    Bra

    nch

    if N

    on Z

    ero

    CBZ

    Rn,

    labe

    lC

    ompa

    re a

    nd B

    ranc

    h if

    Zer

    o

    CLR

    EXC

    lear

    Exc

    lusi

    ve

    CLZ

    Rd,

    Rm

    Cou

    nt L

    eadi

    ng Z

    eros

    CM

    NR

    n, O

    p2C

    ompa

    re N

    egat

    ive

    N,Z

    ,C,V

    CM

    PR

    n, O

    p2C

    ompa

    reN

    ,Z,C

    ,V

    CPS

    IDi

    Cha

    nge

    Proc

    esso

    r St

    ate,

    Dis

    able

    Inte

    rrup

    ts

    CPS

    IEi

    Cha

    nge

    Proc

    esso

    r St

    ate,

    Ena

    ble

    Inte

    rrup

    ts

    DM

    BD

    ata

    Mem

    ory

    Barr

    ier

    DSB

    Dat

    a Sy

    nchr

    oniz

    atio

    n Ba

    rrie

    r

    EOR

    , EO

    RS

    {Rd,

    } R

    n, O

    p2Ex

    clus

    ive

    OR

    N,Z

    ,C

    ISB

    -In

    stru

    ctio

    n Sy

    nchr

    oniz

    atio

    n Ba

    rrie

    rECE 5655/4655 Real-Time DSP 319

  • Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Instruction Set Tables (cont.)

    Mne

    mon

    icO

    pera

    nds

    Bri

    ef d

    escr

    ipti

    onFl

    ags

    ITIf-

    The

    n co

    nditi

    on b

    lock

    LDM

    Rn{

    !}, r

    eglis

    tLo

    ad M

    ultip

    le r

    egis

    ters

    , inc

    rem

    ent a

    fter

    LDM

    DB,

    LD

    MEA

    Rn{

    !}, r

    eglis

    tLo

    ad M

    ultip

    le r

    egis

    ters

    , dec

    rem

    ent b

    efor

    e

    LDM

    FD, L

    DM

    IAR

    n{!}

    , reg

    list

    Load

    Mul

    tiple

    reg

    iste

    rs, i

    ncre

    men

    t afte

    r

    LDR

    Rt,

    [Rn,

    #of

    fset

    ]Lo

    ad R

    egis

    ter

    with

    wor

    d

    LDR

    B, L

    DR

    BTR

    t, [R

    n, #

    offs

    et]

    Load

    Reg

    iste

    r w

    ith b

    yte

    LDR

    DR

    t, R

    t2, [

    Rn,

    #of

    fset

    ]Lo

    ad R

    egis

    ter

    with

    tw

    o by

    tes

    LDR

    EXR

    t, [R

    n, #

    offs

    et]

    Load

    Reg

    iste

    r Ex

    clus

    ive

    LDR

    EXB

    Rt,

    [Rn]

    Load

    Reg

    iste

    r Ex

    clus

    ive

    with

    Byt

    e

    LDR

    EXH

    Rt,

    [Rn]

    Load

    Reg

    iste

    r Ex

    clus

    ive

    with

    Hal

    fwor

    d

    LDR

    H, L

    DR

    HT

    Rt,

    [Rn,

    #of

    fset

    ]Lo

    ad R

    egis

    ter

    with

    Hal

    fwor

    d320 ECE 5655/4655 Real-Time DSP

  • Cortex-M4 Instruction Set TablesCortex-M4 Instruction Set Tables (cont.)

    Mne

    mon

    icO

    pera

    nds

    Bri

    ef d

    escr

    ipti

    onFl

    ags

    LDR

    SB, L

    DR

    SBT

    Rt,

    [Rn,

    #of

    fset

    ]Lo

    ad R

    egis

    ter

    with

    Sig

    ned

    Byte

    LDR

    SH, L

    DR

    SHT

    Rt,

    [Rn,

    #of

    fset

    ]Lo

    ad R

    egis

    ter

    with

    Sig

    ned

    Hal

    fwor

    d

    LDRT

    Rt,

    [Rn,

    #of

    fset

    ]Lo

    ad R

    egis

    ter

    with

    wor

    d

    LSL,

    LSL

    SR

    d, R

    m,

    Logi

    cal S

    hift

    Left

    N,Z

    ,C

    LSR

    , LSR

    SR

    d, R

    m,

    Logi

    cal S

    hift

    Rig

    htN

    ,Z,C

    MLA

    Rd,

    Rn,

    Rm

    , Ra

    Mul

    tiply

    with

    Acc

    umul

    ate,

    32-

    bit r

    esul

    t

    MLS

    Rd,

    Rn,

    Rm

    , Ra

    Mul

    tiply

    and

    Sub

    trac

    t, 32

    -bit

    resu

    lt

    MO

    V, M

    OV

    SR

    d, O

    p2M

    ove

    N,Z

    ,C

    MO

    VT

    Rd,

    #im

    m16

    Mov

    e To

    p

    MO

    VW

    , MO

    VR

    d, #

    imm

    16M

    ove

    16-b

    it co

    nsta

    ntN

    ,Z,C

    MR

    SR

    d, s

    pec_

    reg

    Mov

    e fr

    om S

    peci

    al R

    egis

    ter

    to g

    ener

    al r

    egis

    ter

    MSR

    spec

    _reg

    , Rm

    Mov

    e fr

    om g

    ener

    al r

    egis

    ter

    to S

    peci

    al R

    egis

    ter

    N,Z

    ,C,VECE 5655/4655 Real-Time DSP 321

  • Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Instruction Set Tables (cont.)

    Mne

    mon

    icO

    pera

    nds

    Bri

    ef d

    escr

    ipti

    onFl

    ags

    MU

    L, M

    ULS

    {Rd,

    } Rn,

    Rm

    Mul

    tiply,

    32-

    bit r

    esul

    tN

    ,Z

    MV

    N, M

    VN

    SR

    d, O

    p2M

    ove

    NO

    TN

    ,Z,C

    NO

    PN

    o O

    pera

    tion

    OR

    N, O

    RN

    S{R

    d,} R

    n, O

    p2Lo

    gica

    l OR

    NO

    TN

    ,Z,C

    OR

    R, O

    RR

    S{R

    d,} R

    n, O

    p2Lo

    gica

    l OR

    N,Z

    ,C

    PKH

    TB,

    PKH

    BT{R

    d, }

    Rn,

    Rm

    , Op2

    Pack

    Hal

    fwor

    d

    POP

    regl

    ist

    Pop

    regi

    ster

    s fr

    om s

    tack

    PUSH

    regl

    ist

    Push

    reg

    iste

    rs o

    nto

    stac

    k

    QA

    DD

    {Rd,

    } R

    n, R

    mSa

    tura

    ting

    doub

    le a

    nd A

    ddQ

    QA

    DD

    16{R

    d, }

    Rn,

    Rm

    Satu

    ratin

    gAdd

    16

    QA

    DD

    8{R

    d, }

    Rn,

    Rm

    Satu

    ratin

    gAdd

    8322 ECE 5655/4655 Real-Time DSP

  • Cortex-M4 Instruction Set TablesCortex-M4 Instruction Set Tables (cont.)

    Mne

    mon

    icO

    pera

    nds

    Bri

    ef d

    escr

    ipti

    onFl

    ags

    QA

    SX{R

    d, }

    Rn,

    Rm

    Sa

    tura

    ting A

    dd a

    nd S

    ubtr

    act w

    ith E

    xcha

    nge

    QD

    AD

    D{R

    d, }

    Rn,

    Rm

    Satu

    ratin

    g Add

    Q

    QD

    SUB

    {Rd,

    } R

    n, R

    mSa

    tura

    ting

    doub

    le a

    nd S

    ubtr

    act

    Q

    QSA

    X{R

    d, }

    Rn,

    Rm

    Sa

    tura

    ting

    Subt

    ract

    and

    Add

    with

    Exc

    hang

    e

    QSU

    B{R

    d,}

    Rn,

    Rm

    Satu

    ratin

    g Su

    btra

    ctQ

    QSU

    B16

    {Rd,

    } R

    n, R

    mSa

    tura

    ting

    Subt

    ract

    16

    QSU

    B8{R

    d, }

    Rn,

    Rm

    Satu

    ratin

    g Su

    btra

    ct 8

    RBI

    TR

    d, R

    nR

    ever

    se B

    its

    REV

    Rd,

    Rn

    Rev

    erse

    byt

    e or

    der

    in a

    wor

    d

    REV

    16R

    d, R

    nR

    ever

    se b

    yte

    orde

    r in

    eac

    h ha

    lfwor

    d

    REV

    SHR

    d, R

    nR

    ever

    se b

    yte

    orde

    r in

    bot

    tom

    hal

    fwor

    dan

    d si

    gn e

    xten

    d

    ROR

    , RO

    RS

    Rd,

    Rm

    , R

    otat

    e R

    ight

    N,Z

    ,CECE 5655/4655 Real-Time DSP 323

  • Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Instruction Set Tables (cont.)

    Mne

    mon

    icO

    pera

    nds

    Bri

    ef d

    escr

    ipti

    onFl

    ags

    RR

    X, R

    RX

    SR

    d, R

    mR

    otat

    e R

    ight

    with

    Ext

    end

    N,Z

    ,C

    RSB

    , RSB

    S{R

    d,} R

    n, O

    p2R

    ever

    se S

    ubtr

    act

    N,Z

    ,C,V

    SAD

    D16

    {Rd,

    } R

    n, R

    mSi

    gned

    Add

    16

    GE

    SAD

    D8

    {Rd,

    } R

    n, R

    mSi

    gned

    Add

    8G

    E

    SASX

    {Rd,

    }R

    n, R

    mSi

    gned

    Add

    and

    Sub

    trac

    t with

    Exc

    hang

    eG

    E

    SBC

    , SBC

    S{R

    d,} R

    n, O

    p2Su

    btra

    ct w

    ith C

    arry

    N,Z

    ,C,V

    SBFX

    Rd,

    Rn,

    #ls

    b, #

    wid

    thSi

    gned

    Bit

    Fiel

    d Ex

    trac

    t

    SDIV

    {Rd,

    } Rn,

    Rm

    Sign

    ed D

    ivid

    e

    SEV

    Send

    Eve

    nt

    SHA

    DD

    16{R

    d,} R

    n, R

    mSi

    gned

    Hal

    ving

    Add

    16

    SHA

    DD

    8{R

    d,} R

    n, R

    mSi

    gned

    Hal

    ving

    Add

    8

    SHA

    SX{R

    d,} R

    n, R

    mSi

    gned

    Hal

    ving

    Add

    and

    Sub

    trac

    t with

    Exc

    hang

    e324 ECE 5655/4655 Real-Time DSP

  • Cortex-M4 Instruction Set TablesCortex-M4 Instruction Set Tables (cont.)

    Mne

    mon

    icO

    pera

    nds

    Bri

    ef d

    escr

    ipti

    onFl

    ags

    SHSA

    X{R

    d,}

    Rn,

    Rm

    Sign

    ed H

    alvi

    ng S

    ubtr

    act a

    nd A

    dd w

    ith E

    xcha

    nge

    SHSU

    B16

    {Rd,

    } Rn,

    Rm

    Sign

    ed H

    alvi

    ng S

    ubtr

    act 1

    6

    SHSU

    B8{R

    d,}

    Rn,

    Rm

    Sign

    ed H

    alvi

    ng S

    ubtr

    act 8

    SMLA

    BB, S

    MLA

    BT, S

    MLA

    TB,

    SM

    LAT

    TR

    d, R

    n, R

    m, R

    aSi

    gned

    Mul

    tiply

    Acc

    umul

    ate

    Long

    (hal

    fwor

    ds)

    Q

    SMLA

    D, S

    MLA

    DX

    Rd,

    Rn,

    Rm

    , Ra

    Sign

    ed M

    ultip

    ly A

    ccum

    ulat

    e D

    ual

    Q

    SMLA

    LR

    dLo,

    RdH

    i, R

    n, R

    mSi

    gned

    Mul

    tiply

    with

    Acc

    umul

    ate

    (32

    x 32

    + 6

    4), 6

    4-bi

    t re

    sult

    SMLA

    LBB,

    SM

    LALB

    T, SM

    LALT

    B,

    SMLA

    LTT

    RdL

    o, R

    dHi,

    Rn,

    Rm

    Sign

    ed M

    ultip

    ly A

    ccum

    ulat

    e Lo

    ng, h

    alfw

    ords

    SMLA

    LD, S

    MLA

    LDX

    RdL

    o, R

    dHi,

    Rn,

    Rm

    Sign

    ed M

    ultip

    ly A

    ccum

    ulat

    e Lo

    ng D

    ual

    SMLA

    WB,

    SM

    LAW

    TR

    d, R

    n, R

    m, R

    aSi

    gned

    Mul

    tiply

    Acc

    umul

    ate,

    wor

    d by

    hal

    fwor

    dQ

    SMLS

    DR

    d, R

    n, R

    m, R

    aSi

    gned

    Mul

    tiply

    Sub

    trac

    t Dua

    lQ

    SMLS

    LDR

    dLo,

    RdH

    i, R

    n, R

    mSi

    gned

    Mul

    tiply

    Sub

    trac

    t Lon

    g D

    ual

    SMM

    LAR

    d, R

    n, R

    m, R

    aSi

    gned

    Mos

    t sig

    nific

    ant w

    ord

    Mul

    tiply

    Acc

    umul

    ateECE 5655/4655 Real-Time DSP 325

  • Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Instruction Set Tables (cont.)

    Mne

    mon

    icO

    pera

    nds

    Bri

    ef d

    escr

    ipti

    onFl

    ags

    SMM

    LS, S

    MM

    LRR

    d, R

    n, R

    m, R

    aSi

    gned

    Mos

    t sig

    nific

    ant w

    ord

    Mul

    tiply

    Sub

    trac

    t

    SMM

    UL,

    SM

    MU

    LR{R

    d,} R

    n, R

    mSi

    gned

    Mos

    t sig

    nific

    ant w

    ord

    Mul

    tiply

    SMU

    AD

    {Rd,

    } Rn,

    Rm

    Sign

    ed d

    ual M

    ultip

    ly A

    ddQ

    SMU

    LBB,

    SM

    ULB

    T S

    MU

    LTB,

    SM

    ULT

    T{R

    d,} R

    n, R

    mSi

    gned

    Mul

    tiply

    (hal

    fwor

    ds)

    SMU

    LLR

    dLo,

    RdH

    i, R

    n, R

    mSi

    gned

    Mul

    tiply

    (32

    x 32

    ), 64

    -bit

    resu

    lt

    SMU

    LWB,

    SM

    ULW

    T{R

    d,} R

    n, R

    mSi

    gned

    Mul

    tiply

    wor

    d by

    hal

    fwor

    d

    SMU

    SD, S

    MU

    SDX

    {Rd,

    } Rn,

    Rm

    Sign

    ed d

    ual M

    ultip

    ly S

    ubtr

    act

    SSAT

    Rd,

    #n,

    Rm

    {,shi

    ft #s

    }Si

    gned

    Sat

    urat

    eQ

    SSAT

    16R

    d, #

    n, R

    mSi

    gned

    Sat

    urat

    e 16

    Q

    SSA

    X{R

    d,} R

    n, R

    mSi

    gned

    Sub

    trac

    t and

    Add

    with

    Exc

    hang

    eG

    E

    SSU

    B16

    {Rd,

    } Rn,

    Rm

    Sign

    ed S

    ubtr

    act 1

    6

    SSU

    B8{R

    d,} R

    n, R

    mSi

    gned

    Sub

    trac

    t 8326 ECE 5655/4655 Real-Time DSP

  • Cortex-M4 Instruction Set TablesCortex-M4 Instruction Set Tables (cont.)

    Mne

    mon

    icO

    pera

    nds

    Bri

    ef d

    escr

    ipti

    onFl

    ags

    STM

    Rn{

    !}, r

    eglis

    tSt

    ore

    Mul

    tiple

    reg

    iste

    rs, i

    ncre

    men

    t afte

    r

    STM

    DB,

    ST

    MEA

    Rn{

    !}, r

    eglis

    tSt

    ore

    Mul

    tiple

    reg

    iste

    rs, d

    ecre

    men

    t bef

    ore

    STM

    FD, S

    TM

    IAR

    n{!}

    , reg

    list

    Stor

    e M

    ultip

    le r

    egis

    ters

    , inc

    rem

    ent a

    fter

    STR

    Rt,

    [Rn,

    #of

    fset

    ]St

    ore

    Reg

    iste

    r w

    ord

    STR

    B, S

    TR

    BTR

    t, [R

    n, #

    offs

    et]

    Stor

    e R

    egis

    ter

    byte

    STR

    DR

    t, R

    t2, [

    Rn,

    #of

    fset

    ]St

    ore

    Reg

    iste

    r tw

    o w

    ords

    STR

    EXR

    d, R

    t, [R

    n, #

    offs

    et]

    Stor

    e R

    egis

    ter

    Excl

    usiv

    e

    STR

    EXB

    Rd,

    Rt,

    [Rn]

    Stor

    e R

    egis

    ter

    Excl

    usiv

    e By

    te

    STR

    EXH

    Rd,

    Rt,

    [Rn]

    Stor

    e R

    egis

    ter

    Excl

    usiv

    e H

    alfw

    ord

    STR

    H, S

    TR

    HT

    Rt,

    [Rn,

    #of

    fset

    ]St

    ore

    Reg

    iste

    r H

    alfw

    ord

    STRT

    Rt,

    [Rn,

    #of

    fset

    ]St

    ore

    Reg

    iste

    r w

    ord

    SUB,

    SU

    BS{R

    d,} R

    n, O

    p2Su

    btra

    ctN

    ,Z,C

    ,VECE 5655/4655 Real-Time DSP 327

  • Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Instruction Set Tables (cont.)

    Mne

    mon

    icO

    pera

    nds

    Bri

    ef d

    escr

    ipti

    onFl

    ags

    SUB,

    SU

    BW{R

    d,} R

    n, #

    imm

    12Su

    btra

    ctN

    ,Z,C

    ,V

    SVC

    #im

    mSu

    perv

    isor

    Cal

    l

    SXTA

    B{R

    d,}

    Rn,

    Rm

    ,{,RO

    R #

    }Ex

    tend

    8 b

    its t

    o 32

    and

    add

    SXTA

    B16

    {Rd,

    } R

    n, R

    m,{,

    ROR

    #}

    Dua

    l ext

    end

    8 bi

    ts t

    o 16

    and

    add

    SXTA

    H{R

    d,}

    Rn,

    Rm

    ,{,RO

    R #

    }Ex

    tend

    16

    bits

    to 3

    2 an

    d ad

    d

    SXT

    B16

    {Rd,

    } Rm

    {,RO

    R #

    n}Si

    gned

    Ext

    end

    Byte

    16

    SXT

    B{R

    d,} R

    m{,R

    OR

    #n}

    Sign

    ext

    end

    a by

    te

    SXT

    H{R

    d,} R

    m{,R

    OR

    #n}

    Sign

    ext

    end

    a ha

    lfwor

    d

    TBB

    [Rn,

    Rm

    ]Ta

    ble

    Bran

    ch B

    yte

    TBH

    [Rn,

    Rm

    , LSL

    #1]

    Tabl

    e Br

    anch

    Hal

    fwor

    d

    TEQ

    Rn,

    Op2

    Test

    Equ

    ival

    ence

    N,Z

    ,C

    TST

    Rn,

    Op2

    Test

    N,Z

    ,C328 ECE 5655/4655 Real-Time DSP

  • Cortex-M4 Instruction Set TablesCortex-M4 Instruction Set Tables (cont.)

    Mne

    mon

    icO

    pera

    nds

    Bri

    ef d

    escr

    ipti

    onFl

    ags

    UA

    DD

    16{R

    d,} R

    n, R

    mU

    nsig

    ned

    Add

    16

    GE

    UA

    DD

    8{R

    d,} R

    n, R

    mU

    nsig

    ned

    Add

    8G

    E

    USA

    X{R

    d,}

    Rn,

    Rm

    Uns

    igne

    d Su

    btra

    ct a

    nd A

    dd w

    ith E

    xcha

    nge

    GE

    UH

    AD

    D16

    {Rd,

    } Rn,

    Rm

    Uns

    igne

    d H

    alvi

    ng A

    dd 1

    6

    UH

    AD

    D8

    {Rd,

    } Rn,

    Rm

    Uns

    igne

    d H

    alvi

    ng A

    dd 8

    UH

    ASX

    {Rd,

    } Rn,

    Rm

    Uns

    igne

    d H

    alvi

    ng A

    dd a

    nd S

    ubtr

    act w

    ith E

    xcha

    nge

    UH

    SAX

    {Rd,

    } Rn,

    Rm

    Uns

    igne

    d H

    alvi

    ng S

    ubtr

    act a

    nd A

    dd w

    ith E

    xcha

    nge

    UH

    SUB1

    6{R

    d,} R

    n, R

    mU

    nsig

    ned

    Hal

    ving

    Sub

    trac

    t 16

    UH

    SUB8

    {Rd,

    } Rn,

    Rm

    Uns

    igne

    d H

    alvi

    ng S

    ubtr

    act 8

    UBF

    XR

    d, R

    n, #

    lsb,

    #w

    idth

    Uns

    igne

    d Bi

    t Fie

    ld E

    xtra

    ct

    UD

    IV{R

    d,} R

    n, R

    mU

    nsig

    ned

    Div

    ide

    UM

    AA

    LR

    dLo,

    RdH

    i, R

    n, R

    mU

    nsig

    ned

    Mul

    tiply

    Acc

    umul

    ate

    Acc

    umul

    ate

    Long

    (32

    x 32

    +

    32 +

    32),

    64-b

    it re

    sultECE 5655/4655 Real-Time DSP 329

  • Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Instruction Set Tables (cont.)

    Mne

    mon

    icO

    pera

    nds

    Bri

    ef d

    escr

    ipti

    onFl

    ags

    UM

    LAL

    RdL

    o, R

    dHi,

    Rn,

    Rm

    Uns

    igne

    d M

    ultip

    ly w

    ith A

    ccum

    ulat

    e (3

    2 x

    32 +

    64)

    , 64-

    bit

    resu

    lt

    UM

    ULL

    RdL

    o, R

    dHi,

    Rn,

    Rm

    Uns

    igne

    d M

    ultip

    ly (3

    2 x

    32),

    64-b

    it re

    sult

    UQ

    AD

    D16

    {Rd,

    } Rn,

    Rm

    Uns

    igne

    d Sa

    tura

    ting A

    dd 1

    6

    UQ

    AD

    D8

    {Rd,

    } Rn,

    Rm

    Uns

    igne

    d Sa

    tura

    ting A

    dd 8

    UQ

    ASX

    {Rd,

    } Rn,

    Rm

    Uns

    igne

    d Sa

    tura

    ting A

    dd a

    nd S

    ubtr

    act w

    ith E

    xcha

    nge

    UQ

    SAX

    {Rd,

    } R

    n, R

    mU

    nsig

    ned

    Satu

    ratin

    g Su

    btra

    ct a

    nd A

    dd w

    ith E

    xcha

    nge

    UQ

    SUB1

    6{R

    d,} R

    n, R

    mU

    nsig

    ned

    Satu

    ratin

    g Su

    btra

    ct 1

    6

    UQ

    SUB8

    {Rd,

    } Rn,

    Rm

    Uns

    igne

    d Sa

    tura

    ting

    Subt

    ract

    8

    USA

    D8

    {Rd,

    } Rn,

    Rm

    Uns

    igne

    d Su

    m o

    f Abs

    olut

    e D

    iffer

    ence

    s

    USA

    DA

    8{R

    d,}

    Rn,

    Rm

    , Ra

    Uns

    igne

    d Su

    m o

    f Abs

    olut

    e D

    iffer

    ence

    s an

    d A

    ccum

    ulat

    e

    USA

    TR

    d, #

    n, R

    m{,s

    hift

    #s}

    Uns

    igne

    d Sa

    tura

    teQ

    USA

    T16

    Rd,

    #n,

    Rm

    Uns

    igne

    d Sa

    tura

    te 1

    6Q330 ECE 5655/4655 Real-Time DSP

  • Cortex-M4 Instruction Set TablesCortex-M4 Instruction Set Tables (cont.)

    Mne

    mon

    icO

    pera

    nds

    Bri

    ef d

    escr

    ipti

    onFl

    ags

    UA

    SX{R

    d,} R

    n, R

    mU

    nsig

    ned

    Add

    and

    Sub

    trac

    t with

    Exc

    hang

    eG

    E

    USU

    B16

    {Rd,

    } Rn,

    Rm

    Uns

    igne

    d Su

    btra

    ct 1

    6G

    E

    USU

    B8{R

    d,} R

    n, R

    mU

    nsig

    ned

    Subt

    ract

    8G

    E

    UX

    TAB

    {Rd,

    } Rn,

    Rm

    ,{,RO

    R #

    }R

    otat

    e, e

    xten

    d 8

    bits

    to 3

    2 an

    d A

    dd

    UX

    TAB1

    6{R

    d,} R

    n, R

    m,{,

    ROR

    #}

    Rot

    ate,

    dua

    l ext

    end

    8 bi

    ts t

    o 16

    and

    Add

    UX

    TAH

    {Rd,

    } Rn,

    Rm

    ,{,RO

    R #

    }R

    otat

    e, u

    nsig

    ned

    exte

    nd a

    nd A

    dd H

    alfw

    ord

    UX

    TB

    {Rd,

    } Rm

    {,RO

    R #

    n}Z

    ero

    exte

    nd a

    Byt

    e

    UX

    TB1

    6{R

    d,} R

    m {,

    ROR

    #n}

    Uns

    igne

    d Ex

    tend

    Byt

    e 16

    UX

    TH

    {Rd,

    } Rm

    {,RO

    R #

    n}Z

    ero

    exte

    nd a

    Hal

    fwor

    d

    VABS

    .F32

    Sd, S

    mFl

    oatin

    g-po

    int A

    bsol

    ute

    VAD

    D.F

    32{S

    d,} S

    n, S

    mFl

    oatin

    g-po

    int A

    dd

    VC

    MP.F

    32Sd

    ,

    Com

    pare

    two

    float

    ing-

    poin

    t reg

    iste

    rs, o

    r on

    e flo

    atin

    g-po

    int r

    egis

    ter

    and

    zero

    FPSC

    RECE 5655/4655 Real-Time DSP 331

  • Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Instruction Set Tables (cont.)

    Mne

    mon

    icO

    pera

    nds

    Bri

    ef d

    escr

    ipti

    onFl

    ags

    VC

    MPE

    .F32

    Sd, C

    ompa

    re tw

    o flo

    atin

    g-po

    int r

    egis

    ters

    , or

    one

    float

    ing-

    poin

    t reg

    iste

    r an

    d ze

    ro w

    ith In

    valid

    Ope

    ratio

    n ch

    eck

    FPSC

    R

    VC

    VT.

    S32.

    F32

    Sd, S

    mC

    onve

    rt b

    etw

    een

    float

    ing-

    poin

    t and

    inte

    ger

    VC

    VT.

    S16.

    F32

    Sd, S

    d, #

    fbits

    Con

    vert

    bet

    wee

    n flo

    atin

    g-po

    int a

    nd fi

    xed

    poin

    t

    VC

    VT

    R.S

    32.F

    32Sd

    , Sm

    Con

    vert

    bet

    wee

    n flo

    atin

    g-po

    int a

    nd in

    tege

    r w

    ith

    roun

    ding

    VC

    VT

    .F3

    2.F1

    6Sd

    , Sm

    Con

    vert

    s ha

    lf-pr

    ecis

    ion

    valu

    e to

    sin

    gle-

    prec

    isio

    n

    VC

    VT

    T.

    F32.

    F16

    Sd, S

    mC

    onve

    rts

    sing

    le-p

    reci

    sion

    reg

    iste

    r to

    hal

    f-pre

    cisi

    on

    VD

    IV.F

    32{S

    d,} S

    n, S

    mFl

    oatin

    g-po

    int D

    ivid

    e

    VFM

    A.F

    32{S

    d,} S

    n, S

    mFl

    oatin

    g-po

    int F

    used

    Mul

    tiply

    Acc

    umul

    ate

    VFN

    MA

    .F32

    {Sd,

    } Sn,

    Sm

    Floa

    ting-

    poin

    t Fus

    ed N

    egat

    e M

    ultip

    ly A

    ccum

    ulat

    e

    VFM

    S.F3

    2{S

    d,} S

    n, S

    mFl

    oatin

    g-po

    int F

    used

    Mul

    tiply

    Sub

    trac

    t

    VFN

    MS.

    F32

    {Sd,

    } Sn,

    Sm

    Floa

    ting-

    poin

    t Fus

    ed N

    egat

    e M

    ultip

    ly S

    ubtr

    act

    VLD

    M.F

    Rn{

    !}, li

    stLo

    ad M

    ultip

    le e

    xten

    sion

    reg

    iste

    rs332 ECE 5655/4655 Real-Time DSP

  • Cortex-M4 Instruction Set TablesCortex-M4 Instruction Set Tables (cont.)

    Mne

    mon

    icO

    pera

    nds

    Bri

    ef d

    escr

    ipti

    onFl

    ags

    VLD

    R.F

    ,

    [Rn]

    Load

    an

    exte

    nsio

    n re

    gist

    er fr

    om m

    emor

    y

    VLM

    A.F

    32{S

    d,} S

    n, S

    mFl

    oatin

    g-po

    int M

    ultip

    ly A

    ccum

    ulat

    e

    VLM

    S.F3

    2{S

    d,} S

    n, S

    mFl

    oatin

    g-po

    int M

    ultip

    ly S

    ubtr

    act

    VM

    OV.

    F32

    Sd, #

    imm

    Floa

    ting-

    poin

    t Mov

    e im

    med

    iate

    VM

    OV

    Sd, S

    mFl

    oatin

    g-po

    int M

    ove

    regi

    ster

    VM

    OV

    Sn, R

    tC

    opy A

    RM

    cor

    e re

    gist

    er t

    o si

    ngle

    pre

    cisi

    on

    VM

    OV

    Sm, S

    m1,

    Rt,

    Rt2

    Cop

    y 2

    AR

    M c

    ore

    regi

    ster

    s to

    2 s

    ingl

    e pr

    ecis

    ion

    VM

    OV

    Dd[

    x], R

    tC

    opy A

    RM

    cor

    e re

    gist

    er t

    o sc

    alar

    VM

    OV

    Rt,

    Dn[

    x]C

    opy

    scal

    ar t

    o A

    RM

    cor

    e re

    gist

    er

    VM

    RS

    Rt,

    FPSC

    RM

    ove

    FPSC

    R to

    AR

    M c

    ore

    regi

    ster

    or A

    PSR

    N,Z

    ,C,V

    VM

    SRFP

    SCR

    , Rt

    Mov

    e to

    FPS

    CR

    from

    AR

    M C

    ore

    regi

    ster

    FPSC

    R

    VM

    UL.

    F32

    {Sd,

    } Sn,

    Sm

    Floa

    ting-

    poin

    t Mul

    tiplyECE 5655/4655 Real-Time DSP 333

  • Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Instruction Set Tables (cont.)

    Mne

    mon

    icO

    pera

    nds

    Bri

    ef d

    escr

    ipti

    onFl

    ags

    VN

    EG.F

    32Sd

    , Sm

    Floa

    ting-

    poin

    t Neg

    ate

    VN

    MLA

    .F32

    Sd, S

    n, S

    mFl

    oatin

    g-po

    int M

    ultip

    ly a

    nd A

    dd

    VN

    MLS

    .F32

    Sd, S

    n, S

    mFl

    oatin

    g-po

    int M

    ultip

    ly a

    nd S

    ubtr

    act

    VN

    MU

    L{S

    d,} S

    n, S

    mFl

    oatin

    g-po

    int M

    ultip

    ly

    VPO

    Plis

    tPo

    p ex

    tens

    ion

    regi

    ster

    s

    VPU

    SHlis

    tPu

    sh e

    xten

    sion

    reg

    iste

    rs

    VSQ

    RT.F

    32Sd

    , Sm

    Cal

    cula

    tes

    float

    ing-

    poin

    t Squ

    are

    Roo

    t

    VST

    MR

    n{!}

    , list

    Floa

    ting-

    poin

    t reg

    iste

    r St

    ore

    Mul

    tiple

    VST

    R.F

    Sd, [

    Rn]

    Stor

    es a

    n ex

    tens

    ion

    regi

    ster

    to

    mem

    ory

    VSU

    B.F{S

    d,} S

    n, S

    mFl

    oatin

    g-po

    int S

    ubtr

    act

    WFE

    Wai

    t Fo

    r Ev

    ent

    WFI

    Wai

    t Fo

    r In

    terr

    upt

    Note:

    full e

    xplan

    ation

    of ea

    ch in

    struc

    tion c

    an be

    foun

    d in C

    ortex

    -M4 D

    evice

    s Ge

    neric

    Use

    r Guid

    e (Re

    f-4)334 ECE 5655/4655 Real-Time DSP

  • Cortex-M4 Instruction Set TablesCortex-M4 Instruction Set Tables (cont.) Cortex-M4 Suffix

    Some instructions can be followed by suffixes to updateprocessor flags or execute the instruction on a certain con-dition

    Suffi

    x D

    escr

    ipti

    onE

    xam

    ple

    Exa

    mpl

    e ex

    plan

    atio

    n

    SU

    pdat

    e A

    PSR

    (flag

    s)A

    DD

    SR

    1,

    #0x2

    1A

    dd0x

    21 t

    o R

    1 an

    d up

    date

    APS

    R

    EQ, N

    E,C

    S, C

    C, M

    I, PL

    , VS,

    VC, H

    I, LS

    , G

    E, L

    T, G

    T, LE

    Con

    ditio

    nex

    ecut

    ion

    e.g.

    EQ=

    equa

    l, N

    E= n

    ot e

    qual

    , LT=

    less

    tha

    nBN

    Ela

    bel

    Bran

    ch t

    o th

    e la

    bel i

    f not

    equ

    alECE 5655/4655 Real-Time DSP 335

  • Chapter 3 Cortex-M4 Architecture and ASM ProgrammingC Calling AssemblyFor real-time DSP applications the most common scenarioinvolving assembly code writing, if needed at all, will be C call-ing assembly. In simple terms the rules are:

    Formally, the ARM Architecture Procedure Call Standard(AAPCS) defines: Which registers must be saved and restored How to call procedures How to return from procedures

    AAPCS Register Use Conventions Make it easier to create modular, isolated and integrated code Scratch registers are not expected to be preserved upon

    returning from a called subroutine This applies to r0r3

    Preserved (variable) registers are expected to have theiroriginal values upon returning from a called subroutine This applies to r4r8, r10r11 Use PUSH {r4,..} and POP {r4,...}336 ECE 5655/4655 Real-Time DSP

  • C Calling AssemblyAAPCS Core Register Use

    Reg

    iste

    rSy

    nony

    mSp

    ecia

    lR

    ole

    in t

    he p

    roce

    dure

    cal

    l sta

    ndar

    dr1

    5PC

    The

    Pro

    gram

    Cou

    nter

    .r1

    4LR

    The

    Lin

    k R

    egis

    ter.

    r13

    SPT

    he S

    tack

    Poi

    nter

    .r1

    2IP

    The

    Intr

    a-Pr

    oced

    ure-

    call

    scra

    tch

    regi

    ster

    .r1

    1v8

    Vari

    able

    -reg

    ister

    8.

    r10

    v7Va

    riab

    le-r

    egist

    er 7

    .

    r9v6

    ,SB,

    TR

    Plat

    form

    reg

    iste

    r. T

    he m

    eani

    ng o

    f thi

    s re

    gist

    er is

    def

    ined

    by

    the

    pla

    tform

    sta

    ndar

    d.r8

    v5Va

    riab

    le-r

    egist

    er 5

    .r7

    v4Va

    riab

    le r

    egis

    ter

    4.r6

    v3Va

    riab

    le r

    egis

    ter

    3.r5

    v2Va

    riab

    le r

    egis

    ter

    2.r4

    v1Va

    riab

    le r

    egis

    ter

    1.r3

    a4A

    rgum

    ent

    / scr

    atch

    reg

    iste

    r 4.

    r2a3

    Arg

    umen

    t / s

    crat

    ch r

    egis

    ter

    3.r1

    a2A

    rgum

    ent

    / res

    ult

    / scr

    atch

    reg

    iste

    r 2.

    r0a1

    Arg

    umen

    t / r

    esul

    t / s

    crat

    ch r

    egis

    ter

    1.

    Mus

    t be

    sav

    ed, r

    esto

    red

    by

    calle

    e-pr

    oced

    ure

    it m

    ay m

    odify

    th

    em.

    Cal

    ling

    subr

    outi

    ne e

    xpec

    ts

    thes

    e to

    ret

    ain

    thei

    r va

    lue.

    Mus

    t be

    sav

    ed, r

    esto

    red

    by

    calle

    e-pr

    oced

    ure

    it m

    ay m

    odify

    th

    em.

    Don

    t n

    eed

    to b

    e sa

    ved.

    May

    be

    used

    for

    argu

    men

    ts, r

    esul

    ts, o

    r te

    mpo

    rary

    val

    ues.ECE 5655/4655 Real-Time DSP 337

  • Chapter 3 Cortex-M4 Architecture and ASM ProgrammingExample: Vector Norm SquaredIn this example we will be computing the squared length of avector using 16-bit (int16_t) signed numbers. In mathematicalterms we are finding

    (3.1)

    where

    (3.2)is an -dimensional vector (column or row vector). The solution will be obtained in two different ways:

    Conventional C programming Cortex-M assembly

    Optimization is not a concern at this point The focus here is to see by way of a simple example, how to

    call a C routine from C (obvious), and how to call an assem-bly routine from C

    A 2 An2

    n 1=

    N

    =

    A A1 AN=N338 ECE 5655/4655 Real-Time DSP

  • Example: Vector Norm SquaredC Version We implement this simple routine in C using a declared vec-

    tor length N and vector contents in the array v The C source, which includes the called functionnorm_sq_c is given below:

    /******************************************************

    Vector norm-squared routine in C

    ******************************************************/

    int main(void){int16_t x = 0;int16_t v[5] = {1,2,3,6,7};...

    x = norm_sq_c (v, 5);// call c functionsprintf(my_debug, "Norm: The answer is %d\n", x);TM_USART_Puts(USART6, my_debug);...

    }

    int16_t norm_sq_c(int16_t* v, int16_t n){

    int16_t i;int16_t out = 0;for(i=0; i

  • Chapter 3 Cortex-M4 Architecture and ASM Programming The expected answer is

    The cycle count comparison follow assembly version

    Assembly Version The parent C routine is the following:

    /******************************************************

    Vector norm-squared routine in assembly

    ******************************************************/

    extern int16_t norm_sq_asm(int16_t *x, int16_t n);

    int main(){

    int16_t x = 0;int16_t v[5] = {1,2,3,6,7};...

    zInt_sq = simple_sqrt(zInt);sprintf(my_debug, "uint16_t SQRT of %d is %d\n",

    zInt, zInt_sq);TM_USART_Puts(USART6, my_debug);...

    }; File demo_asm.s

    PRESERVE8 ; Preserve 8 byte stack alignment THUMB ; indicate THUMB code is used

    AREA |.text|, CODE, READONLY;Start of the CODE areaEXPORT norm_sq_asm

    norm_sq_asm FUNCTION; Input array address: R0

    1 4 9 36 49+ + + + 99=From CoolTerm340 ECE 5655/4655 Real-Time DSP

  • Example: Vector Norm Squared; Number of elements: R1MOVS R2, R0 ; move the address in R0 to R2MOVS R0, #0 ; initialize the result

    sum_loopLDRSH R3, [R2],#0x2; load int16_t value pointed to

    ; by R2 into R3, then incrementMLA R0, R3, R3, R0; sq & accum in one step (faster)SUBS R1, R1, #1; R1 = R1 - 1, decrement the countCMP R1, #0 ; compare to 0 and set Z registerBNE sum_loop; branch if compare not zeroBX LR ; return R0ENDFUNCEND ; End of file

    From just the C source it is not obvious that the function pro-totype for norm_asm is actually an assembly routine

    The answer is again 99

    Performance Comparison In the Keil IDE debugger we set break points around the

    function to b timed:

    From CoolTermECE 5655/4655 Real-Time DSP 341

  • Chapter 3 Cortex-M4 Architecture and ASM Programming Then make note of the States and Sec in the registers win-dow:

    Example: Unsigned Integer Square Root1uint32_t zInt = 64;uint32_t zInt_sq;...

    arm_status status;// uint16_t Square root experimentzInt_sq = simple_sqrt(zInt);sprintf(my_debug, "uint16_t SQRT of %d is %d\n",

    zInt, zInt_sq);TM_USART_Puts(USART6, my_debug);

    ; in demo_asm.sPRESERVE8 ; Preserve 8 byte stack alignment

    THUMB ; indicate THUMB code is usedAREA |.text|, CODE, READONLY; Start of the CODE areaEXPORT simple_sqrt

    simple_sqrt FUNCTION; Input : R0; Output : R0 (square root result)

    1.Yiu Chapter 20, p. 664.

    norm

    _sq_

    c with

    O0

    norm

    _sq_

    asm

    norm

    _sq_

    c with

    O3

    cycles86

    time0.51us

    cycles49

    time0.29us

    cycles86

    time0.47us

    BEST!342 ECE 5655/4655 Real-Time DSP

  • Example: Unsigned Integer Square RootMOVW R1, #0x8000 ; R1 = 0x00008000MOVS R2, #0 ; Initialize result

    simple_sqrt_loopADDS R2, R2, R1 ; M = (M + N)MUL R3, R2, R2 ; R3 = M^2CMP R3, R0 ; If M^2 > InputIT HI ; Greater ThanSUBHI R2, R2, R1 ; M = (M - N)LSRS R1, R1, #1 ; N = N >> 1BNE simple_sqrt_loopMOV R0, R2 ; Copy to R0 and returnBX LR ; ReturnENDFUNC

    Function Flow ChartECE 5655/4655 Real-Time DSP 343

  • Chapter 3 Cortex-M4 Architecture and ASM ProgrammingSample Results For an input of 64 the output is 8, as expected

    For an input of 99 the output is 9 (81 is closest to 99), asexpected

    Useful Resources Architecture Reference Manual:

    http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0403c/index.html

    Cortex-M4 Technical Reference Manual:http://infocenter.arm.com/help/topic/com.arm.doc.ddi0439d/DDI0439D_cortex_m4_proces-

    sor_r0p1_trm.pdf

    Cortex-M4 Devices Generic User Guide:http://infocenter.arm.com/help/topic/com.arm.doc.dui0553a/DUI0553A_cortex_m4_dgug.pdf

    cycles125

    time0.74us

    From CoolTerm344 ECE 5655/4655 Real-Time DSP

    Cortex-M4 Architecture and ASM ProgrammingIntroductionOverviewCortex-M4 Memory MapM4 Memory Map (cont.)M4 Memory Map (cont.)Cortex-M4 Memory Map ExampleBit-band OperationsBit-band Operation ExampleBit-band Alias AddressBit-band Alias Address (cont.)Benefits of Bit-Band OperationsCortex-M4 Program ImageCortex-M4 Program Image (cont)Cortex-M4 EndiannessARM and Thumb Instruction SetARM and Thumb Instruction Set (cont.)Cortex-M4 Instruction SetCortex-M4 Instruction Set (cont.)Cortex-M4 Instruction Set TablesC Calling AssemblyAAPCS Register Use ConventionsAAPCS Core Register Use

    Example: Vector Norm SquaredC VersionAssembly VersionPerformance Comparison

    Example: Unsigned Integer Square RootFunction Flow ChartSample Results

    Useful Resources