1824 arm integer cores - university of...

40
© 2005 PEVE IT Unit – ARM System Design ARM integer cores – v5 – 1 MANCHEstER 1824 The University of Manchester ARM integer cores Outline: the ARM 3-stage pipeline the ARM7TDMI core the ARM 5-stage pipeline the ARM9TDMI core the ARM10TDMI core StrongARM & XScale the ARM11 core Cortex hands-on: system software – interrupts

Upload: others

Post on 12-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 1

MANCHEstER1824

The

Uni

vers

ity

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM integer cores

❏ Outline:

❍ the ARM 3-stage pipeline

❍ the ARM7TDMI core

❍ the ARM 5-stage pipeline

❍ the ARM9TDMI core

❍ the ARM10TDMI core

❍ StrongARM & XScale

❍ the ARM11 core

❍ Cortex

☞ hands-on: system software – interrupts

Page 2: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 2

MANCHEstER1824

The

Uni

vers

ity

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM integer cores

❏ Outline:

➜ the ARM 3-stage pipeline

❍ the ARM7TDMI core

❍ the ARM 5-stage pipeline

❍ the ARM9TDMI core

❍ the ARM10TDMI core

❍ StrongARM & XScale

❍ the ARM11 core

❍ Cortex

☞ hands-on: system software – interrupts

Page 3: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 3

MANCHEstER1824

The

Uni

vers

ity

line

n the instruction set.)

ath control signals

bank, shifted,ten back

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r The 3-stage ARM pipe

(The original architecture which has affected o

❏ fetch

❍ the instruction is fetched from memory

❏ decode

❍ the instruction is decoded and the datapprepared for the next cycle

❏ execute

❍ the operands are read from the register combined in the ALU and the result writ

Page 4: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 4

MANCHEstER1824

The

Uni

vers

ity

line

ently

xecute

ecode execute

time

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r The 3-stage ARM pipe

❏ Single cycle instructions

❍ complete at a rate of one per clock cycle

❍ intended to use (a single) memory effici

fetch decode execute

fetch decode e

fetch d

1

2

3

instruction

Page 5: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 5

MANCHEstER1824

The

Uni

vers

ity

line

s

i

time

execute

decode execute

etch ADD decode execute

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r The 3-stage ARM pipe

❏ More complex instructions:

❍ ‘STR’ causes a stall while transfer occur

fetch ADD decode execute1

2

3

nstruction

4

5

fetch STR decode calc. addr.

fetch ADD decode

fetch ADD

f

data xfer

Page 6: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 6

MANCHEstER1824

The

Uni

vers

ity

line

n executes

8

is architecturally

ssary adjustments,

ARMs ,ay vary.

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r The 3-stage ARM pipe

❏ PC behaviour

❍ r15 increments twice before an instructio

– due to pipeline operation

❍ therefore r15 = address of instruction +

– (+12 if used after first cycle, though this undefined)

– in Thumb code the offset is +4

❍ normally the assembler makes the necee.g. in branches

This behaviour is consistent for allalthough the pipeline structures m

Page 7: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 7

MANCHEstER1824

The

Uni

vers

ity

tion

5

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r 3-stage ARM organiza

❏ ARM components:

❍ register bank

– 2 read ports, 1 write port

– plus additional read and write ports for r1

❍ barrel shifter

❍ ALU

❍ address register and incrementer

❍ memory data registers

❍ instruction decoder and control

Page 8: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 8

MANCHEstER1824

The

Uni

vers

ity

tion

ter data in register

ster

incrementer

relfter

instruction

control

and

decode

instruction

control

register

D[31:0]

B bus

PC

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r 3-stage ARM organiza

❍ Separate addressincrementer

❍ Two register read ports

❍ Barrel shifter in serieswith ALU

data out regis

address regi

register

bank

barshi

ALU

multiplyregister

A[31:0]

ALU

bus

A bus

PC

Page 9: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 9

MANCHEstER1824

The

Uni

vers

ity

r?

sult in pipeline stallsumstances.

n a branch is taken

lt from a previous one

s long latency)

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r Why does this matte

The internal structure of the processor can re(⇒ loss of performance) in some circ

❏ Instruction dependencies

❍ pipeline needs flushing and refilling whe

❍ alleviated by branch prediction

❏ Data dependencies

❍ instruction may have to wait for the resu

❍ alleviated by forwarding

❍ particular problem with loads (memory i

– code reordering can help

Page 10: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 10

MANCHEstER1824

The

Uni

vers

ity

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM integer cores

❏ Outline:

❍ the ARM 3-stage pipeline

➜ the ARM7TDMI core

❍ the ARM 5-stage pipeline

❍ the ARM9TDMI core

❍ the ARM10TDMI core

❍ StrongARM & XScale

❍ the ARM11 core

❍ Cortex

☞ hands-on: system software – interrupts

Page 11: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 11

MANCHEstER1824

The

Uni

vers

ity

t

t

ware

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r The ARM7TDMI

❏ The ARM7TDMI is …

❍ an ARM7 3-stage pipeline core, with

❍ T - support for the Thumb instruction se

❍ D - support for debug

– the processor can stop on a debug even

❍ M - support for long multiplies

❍ I - the EmbeddedICE macrocell

– provides breakpoint and watchpoint hard

• described later

Page 12: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 12

MANCHEstER1824

The

Uni

vers

ity

tion

or debugging

ssore

othersignals

scan chain 0

hain 1

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM7TDMI organiza

❍ Scan chains provide access to signals f

EmbeddedICE

JTAG TAPcontroller

bussplitter

procecor

extern0extern1

Din[31:0]

Dout[31:0]

D[31:0]

A[31:0]

control

JTAG

scan chain 2

scan c

Page 13: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 13

MANCHEstER1824

The

Uni

vers

ity

A[31:0]

Din[31:0]

Dout[31:0]

D[31:0]

bl[3:0]r/wmas[1:0]mreqseqlock

transmode[4:0]abort

Tbit

tapsm[3:0]ir[3:0]tdoentck1tck0screg[3:0]

drivebsecapclkbsicapclkbshighzpclkbsrstclkbssdinbssdoutbsshclkbsshclk2bs

TRSTTCKTMSTDITDO

TDMI

re

boundaryscanextension

memoryinterface

state

TAPinformation

JTAGcontrols

MMUinterface

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r

TheARM7TDMI

core interfacesignals

mclkwaiteclk

bigend

irqfiq

isync

reset

eninenout

enoutiabeale

apedbetbe

busenhighz

busdisecapclk

dbgrqbreakptdbgack

execextern1extern0dbgen

rangeout0rangeout1

dbgrqicommrxcommtx

opccpi

cpacpb

VddVss

ARM7

co

clockcontrol

configuration

interrupts

initialization

buscontrol

debug

coprocessorinterface

power

Page 14: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 14

MANCHEstER1824

The

Uni

vers

ity

e signals

f bus cycle which will

design

active

ory inactive

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM7TDMI core interfac

❏ Memory interface

❍ ~mreq - memory request

❍ seq - sequential address

– together these signals indicate the sort ohappen next

– they are pipelined ahead to aid memory

mreq seq Cycle Use

0 0 N Non-sequential memory access

0 1 S Sequential memory access

1 0 I Internal cycle bus and memory in

1 1 C Coprocessor register transfer mem

Page 15: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 15

MANCHEstER1824

The

Uni

vers

ity

e signals

ore transfer occurs

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM7TDMI core interfac

❏ Notes:

❍ request for memory is in clock cycle bef

❍ wait state insertion is possible

mclk

Din[31:0]

Dout[31:0]

mreq, seq

abort

A[31:0] & control

Page 16: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 16

MANCHEstER1824

The

Uni

vers

ity

MIPS 60Power 87 mWMIPS/W 690

n’ ARM7 (0.18µm) is < mm212---

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM7TDMI

❏ ARM7TDMI debug support

❍ the EmbeddedICE module

❍ supports breakpoints and watchpoints

– controlled via the JTAG test access port

– EmbeddedICE & JTAG are covered later

❏ ARM7TDMI characteristics:

Process 0.35 µm Transistors 74,209Metal layers 3 Core area 2.1 mm2

Vdd 3.3V Clock 0 to 66 MHz

A ‘moder

Page 17: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 17

MANCHEstER1824

The

Uni

vers

ity

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM integer cores

❏ Outline:

❍ the ARM 3-stage pipeline

❍ the ARM7TDMI core

➜ the ARM 5-stage pipeline

❍ the ARM9TDMI core

❍ the ARM10TDMI core

❍ StrongARM & XScale

❍ the ARM11 core

❍ Cortex

☞ hands-on: system software – interrupts

Page 18: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 18

MANCHEstER1824

The

Uni

vers

ity

nce

ipeline stage

tages)

on)

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r Getting higher performa

❏ Increase the clock rate

❍ the clock rate is limited by the slowest p

– decrease the logic complexity per stage

– increase the pipeline depth (number of s

❏ improve the CPI (clocks per instructi

❍ fewer wasted cycles

– better memory bandwidth

Page 19: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 19

MANCHEstER1824

The

Uni

vers

ity

line

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r The 5-stage ARM pipe

❏ Fetch

❏ Decode

❍ instruction decode and register read

❏ Execute

❍ shift and ALU

❏ Memory

❍ data memory access

❏ Write-back

Page 20: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 20

MANCHEstER1824

The

Uni

vers

ity

line

clock cycle

r

k cycle

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r The 5-stage ARM pipe

❏ Reducing the CPI

❍ ARM7 uses the memory on nearly every

– for either instruction fetch or data transfe

❍ therefore a reduced CPI requiresmore than one memory access per cloc

❏ Possible solutions are:

❍ separate instruction and data memories

❍ double-bandwidth memory (e.g. ARM8)

Page 21: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 21

MANCHEstER1824

The

Uni

vers

ity

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM integer cores

❏ Outline:

❍ the ARM 3-stage pipeline

❍ the ARM7TDMI core

❍ the ARM 5-stage pipeline

➜ the ARM9TDMI core

❍ the ARM10TDMI core

❍ StrongARM & XScale

❍ the ARM11 core

❍ Cortex

☞ hands-on: system software – interrupts

Page 22: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 22

MANCHEstER1824

The

Uni

vers

ity

pipeline

rts

edICE debug

e than the

0-200 MHz

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM9TDMI

❏ The ARM9TDMI is …

❍ a ‘classic’ Harvard architecture 5-stage

– separate instruction and data memory po

❍ with full support for Thumb and Embedd

❍ aimed at significantly higher performancARM7TDMI

– enhanced pipeline (then) operated at 10

– (now) up to 250MHz

Page 23: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 23

MANCHEstER1824

The

Uni

vers

ity

e

data mem.

access

ad

g.

write

reg.shift/ALU

write

reg.

Execute

Memory Write

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM9TDMI pipelin

❍ Thumb instructions are decoded directly

instruction

fetch

ARM

decode re

re

shift/ALUread

Thumb

decompress

instruction

fetch

decode

DecodeFetch

Decode ExecuteFetch

ARM7TDMI:

ARM9TDMI:

Page 24: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 24

MANCHEstER1824

The

Uni

vers

ity

register write

shifter

ALU

mul

registerread

byte repl.

D-cache

rot/sgnex

I decode

I cache

reg shift

FE

TC

HD

EC

OD

EE

XE

CU

TE

BU

FF

ER

/DATA

WR

ITE

immediate

forwarding

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r

ARM9TDMIpipeline

❍ Pipelineintroduces moredependencies

❍ alleviated withforwarding paths

+ 4

mux

+ 4

next PC

PC+4

PC+8(r15)

LDM/STM

B, BL,MOV pc,SUB pc

LDR pc

address

post- index

pre- index

Page 25: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 25

MANCHEstER1824

The

Uni

vers

ity

MIPS 220Power 150 mWMIPS/W 1,500

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM9TDMI

❏ EmbeddedICE

❍ as ARM7TDMI, plus:

– hardware single-stepping

– breakpoints on exceptions

❏ On-chip coprocessor support

❍ for floating-point, DSP, and so on

Process 0.25 µm Transistors 111,000Metal layers 3 Core area 2.1 mm2

Vdd 2.5 V Clock 0-200 MHz

Page 26: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 26

MANCHEstER1824

The

Uni

vers

ity

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM integer cores

❏ Outline:

❍ the ARM 3-stage pipeline

❍ the ARM7TDMI core

❍ the ARM 5-stage pipeline

❍ the ARM9TDMI core

➜ the ARM10TDMI core

❍ StrongARM & XScale

❍ the ARM11 core

❍ Cortex

☞ hands-on: system software – interrupts

Page 27: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 27

MANCHEstER1824

The

Uni

vers

ity

e than the

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM10TDMI

❏ The ARM10TDMI is …

❍ aimed at significantly higher performancARM9TDMI

❍ achieved through use of:

– higher clock rate

– 64-bit I- and D-memory buses

– branch prediction

– hit-under-miss D-memory interface

Page 28: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 28

MANCHEstER1824

The

Uni

vers

ity

e

Write

multiplier

partial add write

reg.

Memory

ta mem.

ccess write

data

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM10TDMI pipelin

❏ 6-stage pipeline

❍ Additional time allowed for

– I- and D-memory accesses

– instruction decode

multiply

instruction

fetchdecode

decode

read shift/ALU

ExecuteDecodeIssueFetch

da

a

addr.

calc.

Page 29: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 29

MANCHEstER1824

The

Uni

vers

ity

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM integer cores

❏ Outline:

❍ the ARM 3-stage pipeline

❍ the ARM7TDMI core

❍ the ARM 5-stage pipeline

❍ the ARM9TDMI core

❍ the ARM10TDMI core

➜ StrongARM & XScale

❍ the ARM11 core

❍ Cortex

☞ hands-on: system software – interrupts

Page 30: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 30

MANCHEstER1824

The

Uni

vers

ity

le

gned by ARM Ltd.

performance

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r StrongARM & XSca

❏ Most ARM implementations are desi

❏ Two notable exceptions:

❍ StrongARM

– designed by Digital, later bought by Intel

– now largely obsolete

❍ XScale

– designed by Intel

– current

– up to 800 MHz

Both of these were designed primarily for high Used for proprietary devices

Page 31: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 31

MANCHEstER1824

The

Uni

vers

ity

lls

em.

ss

rly termination

iply

ulate

ALU

writeback

data

writeback

MAC

writeback

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r XScale™ pipeline

❍ another deep pipeline

❍ (unpredicted) branches are expensive

❍ data (memory) dependencies cause sta

BTB

fetch

data m

acce

ea

mult

accum

fetch2read/

shiftdecode

ALU

exec.

state

exec.

Page 32: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 32

MANCHEstER1824

The

Uni

vers

ity

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM integer cores

❏ Outline:

❍ the ARM 3-stage pipeline

❍ the ARM7TDMI core

❍ the ARM 5-stage pipeline

❍ the ARM9TDMI core

❍ the ARM10TDMI core

❍ StrongARM & XScale

➜ the ARM11 core

❍ Cortex

☞ hands-on: system software – interrupts

Page 33: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 33

MANCHEstER1824

The

Uni

vers

ity

ARM Ltd.)

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM11

❏ Most recent available ARM core (from

❏ process portable

❏ high speed

❍ faster clock (~550MHz)

❍ deeper pipeline

– also to accommodate DSP operations

❍ improved branch prediction

Page 34: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 34

MANCHEstER1824

The

Uni

vers

ity

y during LDM/STM)

e miss stalls

data mem.

access

ALU saturationregister

write

Writeback

ultiply

mulate

register

write

/data1/AC2

Sat./data2/MAC3

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM11 pipeline

❏ 8-stage pipeline

❍ Parallel ALU and data access (especiall

❍ ‘Hit under Miss’ in ‘data1’ alleviates cach

instruction

fetch

Fetch 1

branch

prediction

Fetch 2

instruction

decode

Decode

register

read

Issue

shift

shift/addr./

m

accu

address

generation

MAC1ALU

M

Page 35: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 35

MANCHEstER1824

The

Uni

vers

ity

tion

t taken)

sp!, {…,pc} }

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM 11 branch predic❍ Two-level dynamic branch prediction

– constant offset branches– 128 entry, direct mapped (addr[9:3])

– cost: 1 or 0 cycles if successful (taken/no

❍ Static branch prediction

– constant offset branches …– … that miss the dynamic predictor– cost: 4 cycles if successful

❍ Return stack

– three entries– Pushes on BL or BLX (inc. BLX Rn )– Pops on: {BX lr; MOV pc, lr; LDR pc, [sp], #n; LDMIA

– cost: 4 cycles if successful

Page 36: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 36

MANCHEstER1824

The

Uni

vers

ity

ies

cle penalty (cache hit)

s registers early

- no stall

waiting

le

in total

rom the pipeline figure

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM 11 dependenc

❍ Data operations forward results

❍ Load operations impose an extra two cy

❍ Memory operations require their addres

Thus: ADD r2, r1, r0 ; produce R2ADD r4, r3, r2 ; consume R2

LDR r2, [r1] ; load r2ADD r4, r3, r2 ; lose 2 cycles

ADD r2, r1, r0 ; produce R2LDR r4, [r2] ; lose one cyc

LDR r2, [r1] ;LDR r4, [r2] ; lose 3 cycles

– a few other dependencies may be seen f

Page 37: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 37

MANCHEstER1824

The

Uni

vers

ity

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r ARM integer cores

❏ Outline:

❍ the ARM 3-stage pipeline

❍ the ARM7TDMI core

❍ the ARM 5-stage pipeline

❍ the ARM9TDMI core

❍ the ARM10TDMI core

❍ StrongARM & XScale

❍ the ARM11 core

➜ Cortex

☞ hands-on: system software – interrupts

Page 38: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 38

MANCHEstER1824

The

Uni

vers

ity

ets

ms.

ets

ors

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r Cortex™

❏ Three ‘flavours’:

❍ Cortex-A Series

– applications processors

– ARM, Thumb and Thumb-2 instruction s

❍ Cortex-R Series

– embedded processors for real-time syste

– ARM, Thumb, and Thumb-2 instruction s

❍ Cortex-M Series

– deeply embedded/cost sensitive process

– Thumb-2 instruction set only

Page 39: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 39

MANCHEstER1824

The

Uni

vers

ity

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r Cortex-M3

❏ First of the line

❍ Thumb-2 only

❍ new design

– 3-stage pipeline

– Harvard core

❍ small core

– ~ mm2 (excluding cache)

❍ performance (0.18µm):

– ~100 MHz

– ~8000 MIPS/W

12---

Page 40: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from

ARM integer cores – v5 – 40

MANCHEstER1824

The

Uni

vers

ity

are

re issues

© 2005 PEVEIT Unit – ARM System Design

of M

anch

este

r Hands-on: system softw– interrupts

❏ Look further into ARM system softwa

❍ Write an interrupt handler

❍ Use it to trigger a context switch

☞ Follow the ‘Hands-on’ instructions