![Page 1: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/1.jpg)
ARM integer cores – v5 – 1
MANCHEstER1824
The
Uni
vers
ity
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM integer cores
❏ Outline:
❍ the ARM 3-stage pipeline
❍ the ARM7TDMI core
❍ the ARM 5-stage pipeline
❍ the ARM9TDMI core
❍ the ARM10TDMI core
❍ StrongARM & XScale
❍ the ARM11 core
❍ Cortex
☞ hands-on: system software – interrupts
![Page 2: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/2.jpg)
ARM integer cores – v5 – 2
MANCHEstER1824
The
Uni
vers
ity
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM integer cores
❏ Outline:
➜ the ARM 3-stage pipeline
❍ the ARM7TDMI core
❍ the ARM 5-stage pipeline
❍ the ARM9TDMI core
❍ the ARM10TDMI core
❍ StrongARM & XScale
❍ the ARM11 core
❍ Cortex
☞ hands-on: system software – interrupts
![Page 3: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/3.jpg)
ARM integer cores – v5 – 3
MANCHEstER1824
The
Uni
vers
ity
line
n the instruction set.)
ath control signals
bank, shifted,ten back
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r The 3-stage ARM pipe
(The original architecture which has affected o
❏ fetch
❍ the instruction is fetched from memory
❏ decode
❍ the instruction is decoded and the datapprepared for the next cycle
❏ execute
❍ the operands are read from the register combined in the ALU and the result writ
![Page 4: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/4.jpg)
ARM integer cores – v5 – 4
MANCHEstER1824
The
Uni
vers
ity
line
ently
xecute
ecode execute
time
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r The 3-stage ARM pipe
❏ Single cycle instructions
❍ complete at a rate of one per clock cycle
❍ intended to use (a single) memory effici
fetch decode execute
fetch decode e
fetch d
1
2
3
instruction
![Page 5: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/5.jpg)
ARM integer cores – v5 – 5
MANCHEstER1824
The
Uni
vers
ity
line
s
i
time
execute
decode execute
etch ADD decode execute
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r The 3-stage ARM pipe
❏ More complex instructions:
❍ ‘STR’ causes a stall while transfer occur
fetch ADD decode execute1
2
3
nstruction
4
5
fetch STR decode calc. addr.
fetch ADD decode
fetch ADD
f
data xfer
![Page 6: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/6.jpg)
ARM integer cores – v5 – 6
MANCHEstER1824
The
Uni
vers
ity
line
n executes
8
is architecturally
ssary adjustments,
ARMs ,ay vary.
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r The 3-stage ARM pipe
❏ PC behaviour
❍ r15 increments twice before an instructio
– due to pipeline operation
❍ therefore r15 = address of instruction +
– (+12 if used after first cycle, though this undefined)
– in Thumb code the offset is +4
❍ normally the assembler makes the necee.g. in branches
This behaviour is consistent for allalthough the pipeline structures m
![Page 7: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/7.jpg)
ARM integer cores – v5 – 7
MANCHEstER1824
The
Uni
vers
ity
tion
5
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r 3-stage ARM organiza
❏ ARM components:
❍ register bank
– 2 read ports, 1 write port
– plus additional read and write ports for r1
❍ barrel shifter
❍ ALU
❍ address register and incrementer
❍ memory data registers
❍ instruction decoder and control
![Page 8: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/8.jpg)
ARM integer cores – v5 – 8
MANCHEstER1824
The
Uni
vers
ity
tion
ter data in register
ster
incrementer
relfter
instruction
control
and
decode
instruction
control
register
D[31:0]
B bus
PC
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r 3-stage ARM organiza
❍ Separate addressincrementer
❍ Two register read ports
❍ Barrel shifter in serieswith ALU
data out regis
address regi
register
bank
barshi
ALU
multiplyregister
A[31:0]
ALU
bus
A bus
PC
![Page 9: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/9.jpg)
ARM integer cores – v5 – 9
MANCHEstER1824
The
Uni
vers
ity
r?
sult in pipeline stallsumstances.
n a branch is taken
lt from a previous one
s long latency)
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r Why does this matte
The internal structure of the processor can re(⇒ loss of performance) in some circ
❏ Instruction dependencies
❍ pipeline needs flushing and refilling whe
❍ alleviated by branch prediction
❏ Data dependencies
❍ instruction may have to wait for the resu
❍ alleviated by forwarding
❍ particular problem with loads (memory i
– code reordering can help
![Page 10: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/10.jpg)
ARM integer cores – v5 – 10
MANCHEstER1824
The
Uni
vers
ity
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM integer cores
❏ Outline:
❍ the ARM 3-stage pipeline
➜ the ARM7TDMI core
❍ the ARM 5-stage pipeline
❍ the ARM9TDMI core
❍ the ARM10TDMI core
❍ StrongARM & XScale
❍ the ARM11 core
❍ Cortex
☞ hands-on: system software – interrupts
![Page 11: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/11.jpg)
ARM integer cores – v5 – 11
MANCHEstER1824
The
Uni
vers
ity
t
t
ware
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r The ARM7TDMI
❏ The ARM7TDMI is …
❍ an ARM7 3-stage pipeline core, with
❍ T - support for the Thumb instruction se
❍ D - support for debug
– the processor can stop on a debug even
❍ M - support for long multiplies
❍ I - the EmbeddedICE macrocell
– provides breakpoint and watchpoint hard
• described later
![Page 12: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/12.jpg)
ARM integer cores – v5 – 12
MANCHEstER1824
The
Uni
vers
ity
tion
or debugging
ssore
othersignals
scan chain 0
hain 1
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM7TDMI organiza
❍ Scan chains provide access to signals f
EmbeddedICE
JTAG TAPcontroller
bussplitter
procecor
extern0extern1
Din[31:0]
Dout[31:0]
D[31:0]
A[31:0]
control
JTAG
scan chain 2
scan c
![Page 13: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/13.jpg)
ARM integer cores – v5 – 13
MANCHEstER1824
The
Uni
vers
ity
A[31:0]
Din[31:0]
Dout[31:0]
D[31:0]
bl[3:0]r/wmas[1:0]mreqseqlock
transmode[4:0]abort
Tbit
tapsm[3:0]ir[3:0]tdoentck1tck0screg[3:0]
drivebsecapclkbsicapclkbshighzpclkbsrstclkbssdinbssdoutbsshclkbsshclk2bs
TRSTTCKTMSTDITDO
TDMI
re
boundaryscanextension
memoryinterface
state
TAPinformation
JTAGcontrols
MMUinterface
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r
TheARM7TDMI
core interfacesignals
mclkwaiteclk
bigend
irqfiq
isync
reset
eninenout
enoutiabeale
apedbetbe
busenhighz
busdisecapclk
dbgrqbreakptdbgack
execextern1extern0dbgen
rangeout0rangeout1
dbgrqicommrxcommtx
opccpi
cpacpb
VddVss
ARM7
co
clockcontrol
configuration
interrupts
initialization
buscontrol
debug
coprocessorinterface
power
![Page 14: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/14.jpg)
ARM integer cores – v5 – 14
MANCHEstER1824
The
Uni
vers
ity
e signals
f bus cycle which will
design
active
ory inactive
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM7TDMI core interfac
❏ Memory interface
❍ ~mreq - memory request
❍ seq - sequential address
– together these signals indicate the sort ohappen next
– they are pipelined ahead to aid memory
mreq seq Cycle Use
0 0 N Non-sequential memory access
0 1 S Sequential memory access
1 0 I Internal cycle bus and memory in
1 1 C Coprocessor register transfer mem
![Page 15: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/15.jpg)
ARM integer cores – v5 – 15
MANCHEstER1824
The
Uni
vers
ity
e signals
ore transfer occurs
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM7TDMI core interfac
❏ Notes:
❍ request for memory is in clock cycle bef
❍ wait state insertion is possible
mclk
Din[31:0]
Dout[31:0]
mreq, seq
abort
A[31:0] & control
![Page 16: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/16.jpg)
ARM integer cores – v5 – 16
MANCHEstER1824
The
Uni
vers
ity
MIPS 60Power 87 mWMIPS/W 690
n’ ARM7 (0.18µm) is < mm212---
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM7TDMI
❏ ARM7TDMI debug support
❍ the EmbeddedICE module
❍ supports breakpoints and watchpoints
– controlled via the JTAG test access port
– EmbeddedICE & JTAG are covered later
❏ ARM7TDMI characteristics:
Process 0.35 µm Transistors 74,209Metal layers 3 Core area 2.1 mm2
Vdd 3.3V Clock 0 to 66 MHz
A ‘moder
![Page 17: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/17.jpg)
ARM integer cores – v5 – 17
MANCHEstER1824
The
Uni
vers
ity
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM integer cores
❏ Outline:
❍ the ARM 3-stage pipeline
❍ the ARM7TDMI core
➜ the ARM 5-stage pipeline
❍ the ARM9TDMI core
❍ the ARM10TDMI core
❍ StrongARM & XScale
❍ the ARM11 core
❍ Cortex
☞ hands-on: system software – interrupts
![Page 18: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/18.jpg)
ARM integer cores – v5 – 18
MANCHEstER1824
The
Uni
vers
ity
nce
ipeline stage
tages)
on)
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r Getting higher performa
❏ Increase the clock rate
❍ the clock rate is limited by the slowest p
– decrease the logic complexity per stage
– increase the pipeline depth (number of s
❏ improve the CPI (clocks per instructi
❍ fewer wasted cycles
– better memory bandwidth
![Page 19: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/19.jpg)
ARM integer cores – v5 – 19
MANCHEstER1824
The
Uni
vers
ity
line
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r The 5-stage ARM pipe
❏ Fetch
❏ Decode
❍ instruction decode and register read
❏ Execute
❍ shift and ALU
❏ Memory
❍ data memory access
❏ Write-back
![Page 20: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/20.jpg)
ARM integer cores – v5 – 20
MANCHEstER1824
The
Uni
vers
ity
line
clock cycle
r
k cycle
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r The 5-stage ARM pipe
❏ Reducing the CPI
❍ ARM7 uses the memory on nearly every
– for either instruction fetch or data transfe
❍ therefore a reduced CPI requiresmore than one memory access per cloc
❏ Possible solutions are:
❍ separate instruction and data memories
❍ double-bandwidth memory (e.g. ARM8)
![Page 21: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/21.jpg)
ARM integer cores – v5 – 21
MANCHEstER1824
The
Uni
vers
ity
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM integer cores
❏ Outline:
❍ the ARM 3-stage pipeline
❍ the ARM7TDMI core
❍ the ARM 5-stage pipeline
➜ the ARM9TDMI core
❍ the ARM10TDMI core
❍ StrongARM & XScale
❍ the ARM11 core
❍ Cortex
☞ hands-on: system software – interrupts
![Page 22: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/22.jpg)
ARM integer cores – v5 – 22
MANCHEstER1824
The
Uni
vers
ity
pipeline
rts
edICE debug
e than the
0-200 MHz
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM9TDMI
❏ The ARM9TDMI is …
❍ a ‘classic’ Harvard architecture 5-stage
– separate instruction and data memory po
❍ with full support for Thumb and Embedd
❍ aimed at significantly higher performancARM7TDMI
– enhanced pipeline (then) operated at 10
– (now) up to 250MHz
![Page 23: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/23.jpg)
ARM integer cores – v5 – 23
MANCHEstER1824
The
Uni
vers
ity
e
data mem.
access
ad
g.
write
reg.shift/ALU
write
reg.
Execute
Memory Write
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM9TDMI pipelin
❍ Thumb instructions are decoded directly
instruction
fetch
ARM
decode re
re
shift/ALUread
Thumb
decompress
instruction
fetch
decode
DecodeFetch
Decode ExecuteFetch
ARM7TDMI:
ARM9TDMI:
![Page 24: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/24.jpg)
ARM integer cores – v5 – 24
MANCHEstER1824
The
Uni
vers
ity
register write
shifter
ALU
mul
registerread
byte repl.
D-cache
rot/sgnex
I decode
I cache
reg shift
FE
TC
HD
EC
OD
EE
XE
CU
TE
BU
FF
ER
/DATA
WR
ITE
immediate
forwarding
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r
ARM9TDMIpipeline
❍ Pipelineintroduces moredependencies
❍ alleviated withforwarding paths
+ 4
mux
+ 4
next PC
PC+4
PC+8(r15)
LDM/STM
B, BL,MOV pc,SUB pc
LDR pc
address
post- index
pre- index
![Page 25: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/25.jpg)
ARM integer cores – v5 – 25
MANCHEstER1824
The
Uni
vers
ity
MIPS 220Power 150 mWMIPS/W 1,500
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM9TDMI
❏ EmbeddedICE
❍ as ARM7TDMI, plus:
– hardware single-stepping
– breakpoints on exceptions
❏ On-chip coprocessor support
❍ for floating-point, DSP, and so on
Process 0.25 µm Transistors 111,000Metal layers 3 Core area 2.1 mm2
Vdd 2.5 V Clock 0-200 MHz
![Page 26: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/26.jpg)
ARM integer cores – v5 – 26
MANCHEstER1824
The
Uni
vers
ity
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM integer cores
❏ Outline:
❍ the ARM 3-stage pipeline
❍ the ARM7TDMI core
❍ the ARM 5-stage pipeline
❍ the ARM9TDMI core
➜ the ARM10TDMI core
❍ StrongARM & XScale
❍ the ARM11 core
❍ Cortex
☞ hands-on: system software – interrupts
![Page 27: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/27.jpg)
ARM integer cores – v5 – 27
MANCHEstER1824
The
Uni
vers
ity
e than the
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM10TDMI
❏ The ARM10TDMI is …
❍ aimed at significantly higher performancARM9TDMI
❍ achieved through use of:
– higher clock rate
– 64-bit I- and D-memory buses
– branch prediction
– hit-under-miss D-memory interface
![Page 28: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/28.jpg)
ARM integer cores – v5 – 28
MANCHEstER1824
The
Uni
vers
ity
e
Write
multiplier
partial add write
reg.
Memory
ta mem.
ccess write
data
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM10TDMI pipelin
❏ 6-stage pipeline
❍ Additional time allowed for
– I- and D-memory accesses
– instruction decode
multiply
instruction
fetchdecode
decode
read shift/ALU
ExecuteDecodeIssueFetch
da
a
addr.
calc.
![Page 29: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/29.jpg)
ARM integer cores – v5 – 29
MANCHEstER1824
The
Uni
vers
ity
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM integer cores
❏ Outline:
❍ the ARM 3-stage pipeline
❍ the ARM7TDMI core
❍ the ARM 5-stage pipeline
❍ the ARM9TDMI core
❍ the ARM10TDMI core
➜ StrongARM & XScale
❍ the ARM11 core
❍ Cortex
☞ hands-on: system software – interrupts
![Page 30: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/30.jpg)
ARM integer cores – v5 – 30
MANCHEstER1824
The
Uni
vers
ity
le
gned by ARM Ltd.
performance
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r StrongARM & XSca
❏ Most ARM implementations are desi
❏ Two notable exceptions:
❍ StrongARM
– designed by Digital, later bought by Intel
– now largely obsolete
❍ XScale
– designed by Intel
– current
– up to 800 MHz
Both of these were designed primarily for high Used for proprietary devices
![Page 31: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/31.jpg)
ARM integer cores – v5 – 31
MANCHEstER1824
The
Uni
vers
ity
lls
em.
ss
rly termination
iply
ulate
ALU
writeback
data
writeback
MAC
writeback
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r XScale™ pipeline
❍ another deep pipeline
❍ (unpredicted) branches are expensive
❍ data (memory) dependencies cause sta
BTB
fetch
data m
acce
ea
mult
accum
fetch2read/
shiftdecode
ALU
exec.
state
exec.
![Page 32: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/32.jpg)
ARM integer cores – v5 – 32
MANCHEstER1824
The
Uni
vers
ity
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM integer cores
❏ Outline:
❍ the ARM 3-stage pipeline
❍ the ARM7TDMI core
❍ the ARM 5-stage pipeline
❍ the ARM9TDMI core
❍ the ARM10TDMI core
❍ StrongARM & XScale
➜ the ARM11 core
❍ Cortex
☞ hands-on: system software – interrupts
![Page 33: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/33.jpg)
ARM integer cores – v5 – 33
MANCHEstER1824
The
Uni
vers
ity
ARM Ltd.)
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM11
❏ Most recent available ARM core (from
❏ process portable
❏ high speed
❍ faster clock (~550MHz)
❍ deeper pipeline
– also to accommodate DSP operations
❍ improved branch prediction
![Page 34: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/34.jpg)
ARM integer cores – v5 – 34
MANCHEstER1824
The
Uni
vers
ity
y during LDM/STM)
e miss stalls
data mem.
access
ALU saturationregister
write
Writeback
ultiply
mulate
register
write
/data1/AC2
Sat./data2/MAC3
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM11 pipeline
❏ 8-stage pipeline
❍ Parallel ALU and data access (especiall
❍ ‘Hit under Miss’ in ‘data1’ alleviates cach
instruction
fetch
Fetch 1
branch
prediction
Fetch 2
instruction
decode
Decode
register
read
Issue
shift
shift/addr./
m
accu
address
generation
MAC1ALU
M
![Page 35: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/35.jpg)
ARM integer cores – v5 – 35
MANCHEstER1824
The
Uni
vers
ity
tion
t taken)
sp!, {…,pc} }
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM 11 branch predic❍ Two-level dynamic branch prediction
– constant offset branches– 128 entry, direct mapped (addr[9:3])
– cost: 1 or 0 cycles if successful (taken/no
❍ Static branch prediction
– constant offset branches …– … that miss the dynamic predictor– cost: 4 cycles if successful
❍ Return stack
– three entries– Pushes on BL or BLX (inc. BLX Rn )– Pops on: {BX lr; MOV pc, lr; LDR pc, [sp], #n; LDMIA
– cost: 4 cycles if successful
![Page 36: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/36.jpg)
ARM integer cores – v5 – 36
MANCHEstER1824
The
Uni
vers
ity
ies
cle penalty (cache hit)
s registers early
- no stall
waiting
le
in total
rom the pipeline figure
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM 11 dependenc
❍ Data operations forward results
❍ Load operations impose an extra two cy
❍ Memory operations require their addres
Thus: ADD r2, r1, r0 ; produce R2ADD r4, r3, r2 ; consume R2
LDR r2, [r1] ; load r2ADD r4, r3, r2 ; lose 2 cycles
ADD r2, r1, r0 ; produce R2LDR r4, [r2] ; lose one cyc
LDR r2, [r1] ;LDR r4, [r2] ; lose 3 cycles
– a few other dependencies may be seen f
![Page 37: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/37.jpg)
ARM integer cores – v5 – 37
MANCHEstER1824
The
Uni
vers
ity
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM integer cores
❏ Outline:
❍ the ARM 3-stage pipeline
❍ the ARM7TDMI core
❍ the ARM 5-stage pipeline
❍ the ARM9TDMI core
❍ the ARM10TDMI core
❍ StrongARM & XScale
❍ the ARM11 core
➜ Cortex
☞ hands-on: system software – interrupts
![Page 38: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/38.jpg)
ARM integer cores – v5 – 38
MANCHEstER1824
The
Uni
vers
ity
ets
ms.
ets
ors
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r Cortex™
❏ Three ‘flavours’:
❍ Cortex-A Series
– applications processors
– ARM, Thumb and Thumb-2 instruction s
❍ Cortex-R Series
– embedded processors for real-time syste
– ARM, Thumb, and Thumb-2 instruction s
❍ Cortex-M Series
– deeply embedded/cost sensitive process
– Thumb-2 instruction set only
![Page 39: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/39.jpg)
ARM integer cores – v5 – 39
MANCHEstER1824
The
Uni
vers
ity
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r Cortex-M3
❏ First of the line
❍ Thumb-2 only
❍ new design
– 3-stage pipeline
– Harvard core
❍ small core
– ~ mm2 (excluding cache)
❍ performance (0.18µm):
– ~100 MHz
– ~8000 MIPS/W
12---
![Page 40: 1824 ARM integer cores - University of Manchesterapt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/07_Cores.pdf · prepared for the next cycle execute the operands are read from](https://reader030.vdocuments.us/reader030/viewer/2022041010/5eb9daa400ca5037e27e78b8/html5/thumbnails/40.jpg)
ARM integer cores – v5 – 40
MANCHEstER1824
The
Uni
vers
ity
are
re issues
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r Hands-on: system softw– interrupts
❏ Look further into ARM system softwa
❍ Write an interrupt handler
❍ Use it to trigger a context switch
☞ Follow the ‘Hands-on’ instructions