computer and digital system architecturepersonal.stevens.edu/~bmcnair/cpe517-f12/week08-517.pdf ·...

39
Computer and Digital System Architecture EE/CpE-517-A Bruce McNair [email protected] EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-1/39

Upload: vuongthuan

Post on 16-Jun-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

Computer and Digital System Architecture

EE/CpE-517-A

Bruce McNair [email protected]

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-1/39

Page 2: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

Week 8

ARM processor cores

Furber Ch. 9

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-2/39

Page 3: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

FPGA architecture

I/O pin

Switch block

Interconnects

Logic blocks

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-3/39

Page 4: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

FPGA logic block

Lookup Table (LUT)

FF

1 0

config

inputs output

config

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-4/39

Page 5: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

FPGA LUT

Lookup Table (LUT)

FF

1 0

config

inputs output

config

2-input LUT example

Input Output function

A B AND OR XOR NAND NOR … …

0 0 0 0 0 1 1

0 1 0 1 1 1 0

1 0 0 1 1 1 0

1 1 1 1 0 0 0

A 2-input LUT can implement 16 logical

functions

Note: Xylinx Virtex-7 FPGAs provide 6-input LUTs with up to ~2M logic cells

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-5/39

Page 6: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ASIC/FPGA development

Schematic design

HDL design

FPGA macrocell mapping

Placement

Design optimization

ASIC standard cell

mapping

Routing

Mask generation Programming

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-6/39

Page 7: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ASIC/FPGA development

Schematic design

HDL design

FPGA macrocell mapping

Placement

Design optimization

ASIC standard cell

mapping

Routing

Mask generation Programming

Design iterations

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-7/39

Page 8: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

FPGA placement/routing

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-8/39

Page 9: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ASIC/FPGA development

Soft core

Hard core

Schematic design

HDL design

FPGA macrocell mapping

Placement

Design optimization

ASIC standard cell

mapping

Routing

Mask generation Programming

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-9/39

Page 10: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

Typical ARM core designs

ARM core

Cache

Memory management

Signal processing

Interface logic

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-10/39

Page 11: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ARM cores

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-11/39

Page 12: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ARM7TDMI example core

ARM7TDMI

ARM7 device: 3.3 V logic

32-bit integer core 3 stage pipeline

Optional use of Thumb 16-bit compressed

instruction set

On-chip JTAG Debug support

Multiplier with 64-bit

result

EmbeddedICE support

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-12/39

Page 13: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ARM7TDMI example core

ARM7TDMI

ARM7 device: 3.3 V logic

32-bit integer core 3 stage pipeline

Optional use of Thumb 16-bit compressed

instruction set

On-chip JTAG Debug support

Multiplier with 64-bit

result

EmbeddedICE support

Applications: D-Link ADSL Router Apple iPod Lego Mindstors NXT Nokia cellular phones Nintendo DS Gameboy Advance Roomba 500 series Sirius Satellite radio Automotive systems

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-13/39

Page 14: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ARM7TDMI organization

Embedded ICE

bus splitter

JTAG TAP controller

ARM processor

core

TCK TMS TRST TDI TDO

Dout[31:0]

Din[31:0]

D[31:0]

A[31:0] mas[1:0] mreq, trans opc, r/w

extern1 extern0

other signals

scan chain 0

scan chain 1

scan chain 2

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-14/39

Page 15: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ARM7TDMI core interface signals

mreqseqlock

Dout[31:0]

D[31:0]

r/wmas[1:0]

mode[4:0]trans

abort

opccpicpacpb

memoryinterface

MMUinterface

coprocessorinterface

mclkwaiteclk

isync

bigend

enin

irq¼q

reset

enout

abe

VddVss

clockcontrol

configuration

interrupts

initialization

buscontrol

power

aleapedbe

dbgrqbreakptdbgack

debug

execextern1extern0dbgen

bl[3:0]

TRSTTCKTMSTDI

JTAGcontrols

TDO

Tbit statetbe

rangeout0rangeout1

dbgrqicommrxcommtx

enouti

highzbusdisecapclk

busen

Din[31:0]

A[31:0]

ARM7TDMI

core

tapsm[3:0]ir[3:0]tdoentck1tck2screg[3:0]

TAPinformation

drivebsecapclkbsicapclkbshighzpclkbsrstclkbssdinbssdoutbsshclkbsshclk2bs

boundaryscanextension

ARM7 TDMI

core

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-15/39

Page 16: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ARM7TDMI core interface signals

ARM7TDMI

core

A[31:0] Din[31:0]

Dout[31:0]

D[31:0] bl[3:0] r/w mas[1:0] mreq seq lock

trans mode[4:0] abort

Tbit

Memory interface

MMU interface

State

trans: Translation control for user/ supervisor mode

mode: CPSR[4:0] bits (processor mode)

abort: Disallowed access

Tbit: ARM or Thumb instruction set

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-16/39

Page 17: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ARM7TDMI memory interface timing

mclk

A[31:0], r’/w, mas, lock, trans’, opc’

Din[31:0]

Dout[31:0]

abort

mreq’, seq

enout’

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-17/39

Page 18: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ARM7TDMI core interface signals

ARM7TDMI

core

tapsm[3:0]

TAP information

boundary scan

extension

JTAG controls

ir[3:0] tdoen tck1 tck2 screg[3:0]

drivebs ecapclkbs lcapclkbs highz pclkbs rstclkbs sdinbs sdoutbs shclkbs shclk2bs TRST TCK TMS TDI TDO

TAP: Additional scan chains can be added to JTAG

Boundary scan extension: Allow for additional JTAG paths

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-18/39

Page 19: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ARM7TDMI core interface signals

ARM7TDMI

core

clock control

configuration

interrupts

initialization

mclk wait eclk

bigend

irq fiq

isync

reset

bus control

enin enout

enouti abe ale

ape dbe tbe

busen highz

busdis ecapclk

Bigend: memory access mode (big-endian or little-endian)

isync: interrupt latency can be reduced if they are already synchronized externally

reset: start execution at 0000000016

enout: ARM performing write cycle

ape: control latch to retime addresses if needed by external logic EE/CpE517A Copyright ©2011

Stevens Institute of Technology - All rights reserved 1-19/39

Page 20: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ARM7TDMI core interface signals

ARM7TDMI

core

debug

dbgrq breakpt dbgack

exec extern1 extern0

dbgen rangeout0 rangeout1

dbgrqi commrx commtx

opc cpl

cpa cpb

Vdd Vss

coprocessor interface

power power: +5 or +3 volt power supply

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-20/39

Page 21: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ARM7TDMI hard core

ARM7TDMI standard core characteristics 350 nm CMOS Process 74,209 Transistors 60 MIPS

2.1 mm2 core 87 mW power @ 3.3 V 690 MIPS/W 0-66 MHz clock

ARM7TDMI implementations 250 nm CMOS Process 0.9 V 12,000 MIPS/W

ARM7TDMI-S Synthesizable core

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-21/39

Page 22: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

Improving performance

External memory

FPGA/ASIC

ARM7 core

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-22/39

Page 23: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

Improving performance

External memory

FPGA/ASIC

ARM7 core External

memory

FPGA/ASIC

Memory cache

ARM7 core

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-23/39

Page 24: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

Time to execute a program

instprog

clk

N CPITf×=

Ninst = number of instructions CPI = average cycles per instruction fclk = clock speed of processor

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-24/39

Page 25: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

memory (double-

bandwidth)

ARM8 core organization

prefetch unit

integer unit

coprocessor(s)

PC instructions

CPinst CPdata write data

read data

addresses

To get around memory speed bottleneck, fetch more data/instruction information per access. Assume two sequential memory accesses in 1.5 cycles from on-chip cache memory.

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-25/39

Page 26: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ARM8 vs ARM7TDMI pipeline comparison

Instruction fetch

Thumb decompr

ARM decode

reg read

reg write Shift/ALU

Execute Decode Fetch ARM7TDMI

ARM8

decode

Instruction fetch

r.read Shift/ALU Data mem

access Reg write

Fetch Decode Execute Memory Write

Prefetch unit

Integer unit

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-26/39

Page 27: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ARM8 integer unit organization

inst decode

register read

write pipeline

register write

Rot/sgnx

ALU/shifter

multiplier

+4 mux

PC+8 instructions

coprocessor instructions

coproc data

write data

read data

address

forwarding paths write

memory

execute

decode

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-27/39

Page 28: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ARM8 core

ARM8 standard core characteristics 500 nm CMOS Process 124,554 Transistors 120-180 MIPS

~5-6 mm2 core 0-72 MHz clock

ARM8 core

On-chip cache

ARM810

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-28/39

Page 29: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ARM8 core

ARM8 standard core characteristics 500 nm CMOS Process 124,554 Transistors 120-180 MIPS

~5-6 mm2 core 0-72 MHz clock

vs. ARM7TDMI hard core

ARM7TDMI standard core characteristics 350 nm CMOS Process 74,209 Transistors 60 MIPS

2.1 mm2 core 87 mW power @ 3.3 V 690 MIPS/W 0-66 MHz clock

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-29/39

Page 30: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ARM9TDMI pipeline

I-cache

rot/sgn ex

+4

byte repl.

ALU

I decode

register read

D-cache

fetch

instructiondecode

execute

buffer/data

write-back

forwardingpaths

immediatefields

nextpc

regshift

load/storeaddress

LDR pc

SUBS pc

post-index

pre-index

LDM/STM

register write

r15

pc + 8

pc + 4

+4

mux

shift

mul

B, BLMOV pc

EXECUTE

DECODE

FETCH

BUFFER/ DATA

WRITE-BACK EE/CpE517A Copyright ©2011

Stevens Institute of Technology - All rights reserved 1-30/39

Page 31: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

decode

ARM9TDMI vs ARM7TDMI pipeline comparison

Instruction fetch

Thumb decompr

ARM decode

reg read

reg write Shift/ALU

Execute Decode Fetch ARM7TDMI

ARM9TDMI

Instruction fetch

r.read Shift/ALU Data mem

access Reg write

Fetch Decode Execute Memory Write

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-31/39

Page 32: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ARM9TDMI characteristics

ARM9TDMI core characteristics 250 nm CMOS Process 110,000 Transistors 220 MIPS

2.1 mm2 core 150 mW power @ 2.5 V 1500 MIPS/W 0-200 MHz clock

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-32/39

Page 33: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ARM9TDMI vs. ARM7TDMI

Parameter ARM9 ARM7

Process 250 nm 350 nm

Transistors 110,000 74,209

MIPS 220 60

Core area 2.1 mm2 2.1 mm2

Power 150 mW @ 2.5V 87 mW @ 3.3 V

MIPS/W 1500 690

Clock 0-200 MHz 0-66 MHz EE/CpE517A Copyright ©2011

Stevens Institute of Technology - All rights reserved 1-33/39

Page 34: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ARM9 Application – Qualcomm MSM6100 chip set

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-34/39

Page 35: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ARM9 Application – Qualcomm MSM6100 chip set

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-35/39

Page 36: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

ARM10TDMI core

ARM7 ARM9 ARM10

Increased clock speed Clocks/instruction reduced

3-stage 5-stage pipeline

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-36/39

Page 37: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

decode

ARM10TDMI pipeline

Instruction fetch decode r.read Multiplier

partials add reg

write

data write

data memory access

addr. calc.

shift/ALU multiply

branch prediction

Fetch Issue Decode Execute Memory Write

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-37/39

Page 38: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

decode

ARM10TDMI pipeline

Instruction fetch decode r.read Multiplier

partials add reg

write

data write

data memory access

addr. calc.

shift/ALU multiply

branch prediction

Fetch Issue Decode Execute Memory Write

Lengthened memory cycle time

Lengthened memory cycle time

Multiplier critical path shortened

Additional “Issue” stage added to decode

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-38/39

Page 39: Computer and Digital System Architecturepersonal.stevens.edu/~bmcnair/CpE517-F12/Week08-517.pdf · macrocell mapping Placement Design optimization ... ICE bus splitter JTAG TAP controller

decode

ARM10TDMI reduction in cycles/instruction

Instruction fetch decode r.read Multiplier

partials add reg

write

data write

data memory access

addr. calc.

shift/ALU multiply

branch prediction

Fetch Issue Decode Execute Memory Write

Double memory fetch allows improved prediction –

backwards branches assumed true (as in loops) forward branches assumed false

Non-blocking load/store: if execution is not dependent on load/store access delay,

let it proceed.

Double-width memory access allows load/store

multiple register operations to occur in parallel

EE/CpE517A Copyright ©2011 Stevens Institute of Technology - All rights reserved 1-39/39