eeng449b/savvides lec 5.1 1/27/04 january 27, 2004 prof. andreas savvides spring 2004 eeng...

35
EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004 http://www.eng.yale.edu/courses/ eeng449bG EENG 449bG/CPSC 439bG Computer Systems Lecture 5 FP Pipelining & Dynamically Scheduled Pipelines and Overview of ARM Architecture Part I

Post on 22-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.1

1/27/04

January 27, 2004

Prof. Andreas Savvides

Spring 2004

http://www.eng.yale.edu/courses/eeng449bG

EENG 449bG/CPSC 439bG Computer Systems

Lecture 5

FP Pipelining & Dynamically Scheduled Pipelines

andOverview of ARM Architecture Part I

Page 2: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.2

1/27/04

Floating-Point Support in Pipelines

• Floating point operations will take more than 1 or 2 cycles to complete

– Structural hazards– Data hazards

• Multiple functional units required– Loads, stores and integer ALUs– FP and integer multiplier– FP adder that handles FP add, subtract and

conversion– FP and integer divider

• Initiation interval – number of cycles that must elapse before issuing two operations of a given type

Page 3: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.3

1/27/04

Multiple FUs and Latencies

Functional Unit Latency

Initiation Interval

Integer ALU 0 1

Data memory

(integer and FP Loads)

1 1

FP add 3 1

FP Multiply 6 1

FP Divide 24 25

Page 4: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.4

1/27/04

Support for Multiple Outstanding Operations

Additional pipeline registers needed

Page 5: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.5

1/27/04

Hazards in Longer Pipelines

1. Divide unit is not fully pipelined - structural hazards can occur

2. Instructions have varying running times so the number of register writes required in a cycle can be larger than 1.

3. WAW hazards are possible, since instructions don’t reach WB in order

4. Instructions can complete in different order than the one they were issued causing problems with exceptions

5. Because of longer latency of operations, stalls for RAW hazards will be more frequent

Page 6: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.6

1/27/04

FP Pipeline Hazards Example

Figure A.34

Simultaneous writeback

• Stall an instruction in the ID stage• Stall the instruction when it tries to enter WB

Page 7: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.7

1/27/04

Checks for Detecting Hazards

Three checks to be performed before a multicycle instruction can issue in the ID stage:

• Check for structural hazards– A structural unit is not busy and a write register

port is available when needed

• Check for a RAW data hazard– Wait until the source registers are not listed as

pending destinations

• Check for WAW data hazard– Determine an instruction that already issued

has the same destination as this instruction. If so stall the instruction issue in ID.

Page 8: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.8

1/27/04

MIPS R4000 Pipeline

• Decompose the 5-stage pipeline to a deeper 8-stage pipeline(superpipeline)

– achieve higher clock rates => better performance

• Extra stages come from decomposing memory accesses

• Longer pipelines increase the amount of forwarding and branch delays

Page 9: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.9

1/27/04

Branch Delay Cycles

Branch outcome needs 3 cycles

Page 10: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.10

1/27/04

Dynamic Scheduled Pipelines

• Simple pipelines result in hazards that require stalling.

• Static scheduling – compilers rearrange instructions to avoid stalls.

• Dynamic scheduling – processor executes instructions out-of-order to minimize stalls

• Dynamic scheduling requires splitting the ID stage into stages:

– Issue – Decode instructions, check for structural hazards

– Read operands – Wait until there are no data hazards, then read operands

– Also need to know when each instruction begins and ends execution

• Requires a lot more bookkeeping! More when we discuss Tomasulo’s algorithm in chapter 3…

Page 11: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.11

1/27/04

Scoreboarding

Scoreboarding – a technique that allows out-of-order execution when resources are available and there are no data dependencies – originated in CDC6600 in the mid 60s.

• Scoreboard fully responsible for instruction execution and hazard detection

– Requires changes in # of functional units and latency of operations

– Needs to keep track of status of all instructions in execution

Page 12: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.12

1/27/04

Scoreboarding II

Page 13: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.13

1/27/04

More Hazards

• WAR and WAW hazards are now possible!

DIV.D F0, F2, F4ADD.D F10, F0, F8SUB.D F8, F8, F14

DIV.D F0, F2, F4ADD.D F10, F0, F8SUB.D F10, F8, F14

WAR! If SUB.DExecutes first

WAW! If SUB.DExecutes first

Page 14: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.14

1/27/04

Refer to figures A.52 – A.54 for example scoreboard tables

Scoreboarding is limited by:• Amount of parallelism among

instructions• The number of scoreboard entries• The number and types of functional

units• Presence of antidependencies and

output dependencies

Page 15: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.15

1/27/04

Announcements

• Example on page 44 of the textbook is wrong

– CPI for FPSQR not included in the computation of CPI…

– Everything after that is affected…

• Midterm I, Thursday Feb, 19– Chapters 1, 2, Appendix A and microcontroller

material from class.

• Readings for next class and project related material posted on the class website

Page 16: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.16

1/27/04

ARM Architecture Part I

Page 17: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.17

1/27/04

Where is ARM Today?

Page 18: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.18

1/27/04

Page 19: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.19

1/27/04

Page 20: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.20

1/27/04

Page 21: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.21

1/27/04

Page 22: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.22

1/27/04

Not the case when you have loads and stores!!!!

Page 23: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.23

1/27/04

Page 24: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.24

1/27/04

Page 25: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.25

1/27/04

Page 26: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.26

1/27/04

Page 27: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.27

1/27/04

Page 28: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.28

1/27/04

Page 29: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.29

1/27/04

Page 30: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.30

1/27/04

Page 31: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.31

1/27/04

Page 32: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.32

1/27/04

Microcontroller View

Page 33: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.33

1/27/04

Price/Performance/Peripheral Tradeoffs

• For many consumer electronics cost is an issue

– ARM7TDMI cores have less HW and cost less– With today’s prices you can get an ARM7 based

chip for < $5.00

• Power Tradeoffs– Power performance is given in Watts/MIPS but– Lifetime is a bandwidth vs. throughput issue

» Bandwidth vs. thoughput of battery life

Page 34: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.34

1/27/04

Features• ARM7TDMI• ROM-less (ML675001) 256KB MCP Flash (ML67Q5002) 512KB MCP Flash (ML67Q5003)• 8KB Unified Cache• 32KB RAM • Interrupts 25 + 1 FIQ• I2C (1-ch x master)• DMA (2-ch)• Timers (7 x 16-bit)• WDT (16-bit)• PWM (2 x 16-bit)• UART (2-ch)/ SIO (1-ch) • GPIO (5 x 8-bit) • ADC (4-ch x 10-bit)

• up to 66MHz• -40 ~ +85 C• Package 144 LFBGA 144 QFP

ML675001/67Q5002/67Q5003

Page 35: EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 5.35

1/27/04

Next Time

• Power Metrics• Dynamic Voltage Scaling• Microcontroller Programming Cycle