eeng449b/savvides lec 5.1 1/27/04 january 27, 2004 prof. andreas savvides spring 2004 eeng...
Post on 22-Dec-2015
213 views
TRANSCRIPT
EENG449b/SavvidesLec 5.1
1/27/04
January 27, 2004
Prof. Andreas Savvides
Spring 2004
http://www.eng.yale.edu/courses/eeng449bG
EENG 449bG/CPSC 439bG Computer Systems
Lecture 5
FP Pipelining & Dynamically Scheduled Pipelines
andOverview of ARM Architecture Part I
EENG449b/SavvidesLec 5.2
1/27/04
Floating-Point Support in Pipelines
• Floating point operations will take more than 1 or 2 cycles to complete
– Structural hazards– Data hazards
• Multiple functional units required– Loads, stores and integer ALUs– FP and integer multiplier– FP adder that handles FP add, subtract and
conversion– FP and integer divider
• Initiation interval – number of cycles that must elapse before issuing two operations of a given type
EENG449b/SavvidesLec 5.3
1/27/04
Multiple FUs and Latencies
Functional Unit Latency
Initiation Interval
Integer ALU 0 1
Data memory
(integer and FP Loads)
1 1
FP add 3 1
FP Multiply 6 1
FP Divide 24 25
EENG449b/SavvidesLec 5.4
1/27/04
Support for Multiple Outstanding Operations
Additional pipeline registers needed
EENG449b/SavvidesLec 5.5
1/27/04
Hazards in Longer Pipelines
1. Divide unit is not fully pipelined - structural hazards can occur
2. Instructions have varying running times so the number of register writes required in a cycle can be larger than 1.
3. WAW hazards are possible, since instructions don’t reach WB in order
4. Instructions can complete in different order than the one they were issued causing problems with exceptions
5. Because of longer latency of operations, stalls for RAW hazards will be more frequent
EENG449b/SavvidesLec 5.6
1/27/04
FP Pipeline Hazards Example
Figure A.34
Simultaneous writeback
• Stall an instruction in the ID stage• Stall the instruction when it tries to enter WB
EENG449b/SavvidesLec 5.7
1/27/04
Checks for Detecting Hazards
Three checks to be performed before a multicycle instruction can issue in the ID stage:
• Check for structural hazards– A structural unit is not busy and a write register
port is available when needed
• Check for a RAW data hazard– Wait until the source registers are not listed as
pending destinations
• Check for WAW data hazard– Determine an instruction that already issued
has the same destination as this instruction. If so stall the instruction issue in ID.
EENG449b/SavvidesLec 5.8
1/27/04
MIPS R4000 Pipeline
• Decompose the 5-stage pipeline to a deeper 8-stage pipeline(superpipeline)
– achieve higher clock rates => better performance
• Extra stages come from decomposing memory accesses
• Longer pipelines increase the amount of forwarding and branch delays
EENG449b/SavvidesLec 5.9
1/27/04
Branch Delay Cycles
Branch outcome needs 3 cycles
EENG449b/SavvidesLec 5.10
1/27/04
Dynamic Scheduled Pipelines
• Simple pipelines result in hazards that require stalling.
• Static scheduling – compilers rearrange instructions to avoid stalls.
• Dynamic scheduling – processor executes instructions out-of-order to minimize stalls
• Dynamic scheduling requires splitting the ID stage into stages:
– Issue – Decode instructions, check for structural hazards
– Read operands – Wait until there are no data hazards, then read operands
– Also need to know when each instruction begins and ends execution
• Requires a lot more bookkeeping! More when we discuss Tomasulo’s algorithm in chapter 3…
EENG449b/SavvidesLec 5.11
1/27/04
Scoreboarding
Scoreboarding – a technique that allows out-of-order execution when resources are available and there are no data dependencies – originated in CDC6600 in the mid 60s.
• Scoreboard fully responsible for instruction execution and hazard detection
– Requires changes in # of functional units and latency of operations
– Needs to keep track of status of all instructions in execution
EENG449b/SavvidesLec 5.12
1/27/04
Scoreboarding II
EENG449b/SavvidesLec 5.13
1/27/04
More Hazards
• WAR and WAW hazards are now possible!
DIV.D F0, F2, F4ADD.D F10, F0, F8SUB.D F8, F8, F14
DIV.D F0, F2, F4ADD.D F10, F0, F8SUB.D F10, F8, F14
WAR! If SUB.DExecutes first
WAW! If SUB.DExecutes first
EENG449b/SavvidesLec 5.14
1/27/04
Refer to figures A.52 – A.54 for example scoreboard tables
Scoreboarding is limited by:• Amount of parallelism among
instructions• The number of scoreboard entries• The number and types of functional
units• Presence of antidependencies and
output dependencies
EENG449b/SavvidesLec 5.15
1/27/04
Announcements
• Example on page 44 of the textbook is wrong
– CPI for FPSQR not included in the computation of CPI…
– Everything after that is affected…
• Midterm I, Thursday Feb, 19– Chapters 1, 2, Appendix A and microcontroller
material from class.
• Readings for next class and project related material posted on the class website
EENG449b/SavvidesLec 5.16
1/27/04
ARM Architecture Part I
EENG449b/SavvidesLec 5.17
1/27/04
Where is ARM Today?
EENG449b/SavvidesLec 5.18
1/27/04
EENG449b/SavvidesLec 5.19
1/27/04
EENG449b/SavvidesLec 5.20
1/27/04
EENG449b/SavvidesLec 5.21
1/27/04
EENG449b/SavvidesLec 5.22
1/27/04
Not the case when you have loads and stores!!!!
EENG449b/SavvidesLec 5.23
1/27/04
EENG449b/SavvidesLec 5.24
1/27/04
EENG449b/SavvidesLec 5.25
1/27/04
EENG449b/SavvidesLec 5.26
1/27/04
EENG449b/SavvidesLec 5.27
1/27/04
EENG449b/SavvidesLec 5.28
1/27/04
EENG449b/SavvidesLec 5.29
1/27/04
EENG449b/SavvidesLec 5.30
1/27/04
EENG449b/SavvidesLec 5.31
1/27/04
EENG449b/SavvidesLec 5.32
1/27/04
Microcontroller View
EENG449b/SavvidesLec 5.33
1/27/04
Price/Performance/Peripheral Tradeoffs
• For many consumer electronics cost is an issue
– ARM7TDMI cores have less HW and cost less– With today’s prices you can get an ARM7 based
chip for < $5.00
• Power Tradeoffs– Power performance is given in Watts/MIPS but– Lifetime is a bandwidth vs. throughput issue
» Bandwidth vs. thoughput of battery life
EENG449b/SavvidesLec 5.34
1/27/04
Features• ARM7TDMI• ROM-less (ML675001) 256KB MCP Flash (ML67Q5002) 512KB MCP Flash (ML67Q5003)• 8KB Unified Cache• 32KB RAM • Interrupts 25 + 1 FIQ• I2C (1-ch x master)• DMA (2-ch)• Timers (7 x 16-bit)• WDT (16-bit)• PWM (2 x 16-bit)• UART (2-ch)/ SIO (1-ch) • GPIO (5 x 8-bit) • ADC (4-ch x 10-bit)
• up to 66MHz• -40 ~ +85 C• Package 144 LFBGA 144 QFP
ML675001/67Q5002/67Q5003
EENG449b/SavvidesLec 5.35
1/27/04
Next Time
• Power Metrics• Dynamic Voltage Scaling• Microcontroller Programming Cycle