chapter 12 reduced instruction set...

75
Yonsei Yonsei University University Chapter 12 Chapter 12 Reduced Instruction Reduced Instruction Set Computers Set Computers

Upload: others

Post on 14-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity

Chapter 12Chapter 12

Reduced InstructionReduced InstructionSet ComputersSet Computers

Page 2: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-2

• Instruction Execution Characteristics• The Use of a Large Register File• Compiler-based Register Optimization• Reduced Instruction Set Architecture• RISC Pipelining • MIPS R4000• SPARC• RISC versus CISC Controversy

Contents Contents

Page 3: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-3

Major Advances in ComputersMajor Advances in Computers• The family concept

– IBM System/360 1964 and DEC PDP-8– Separates architecture from implementation

• Microprogrammed control unit– Produced by IBM S/360 1964

• Cache memory– IBM S/360 model 85 1969

• Solid State RAM• Microprocessors• Pipelining

– Introduces parallelism into fetch execute cycle• Multiple processors

Introduction Introduction

Page 4: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-4

The Next Step The Next Step -- RISCRISC

• Reduced Instruction Set Computer

• Key features– Large number of general purpose registers or use

of compiler technology to optimize register use– Limited and simple instruction set– Emphasis on optimising the instruction pipeline

Introduction Introduction

Page 5: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-5

Comparison of ProcessorsComparison of Processors Introduction Introduction

643216-321283286464Cache size

-----246480420control memory

size(kbits)

3240-520

323240-520

81616Number of

general purpose register

1121111224Addressing

modes

444441-112-572-6Instruction size (bytes)

2259469235303208Number of instructions

19961996199319911987198919781973Year developed

MIPSR10000

UltraSPARC

PowerPCMIPSR4000

SPARCIntel

40486VAX

11/780IBM

370/168characteristic

SuperscalarReduced

instruction set

(RISC)computer

Complex Instruction set

CISC(Computer)

Page 6: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-6

Driving Driving FForceorce for CISCfor CISC

• Software costs far exceed hardware costs• Increasingly complex high level languages• Semantic gap• Leads to:

– Large instruction sets– More addressing modes– Hardware implementations of HLL statements

• e.g. CASE (switch) on VAX

Instruction executionInstruction executioncharacteristics characteristics

Page 7: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-7

Intention of CISCIntention of CISC

• Ease compiler writing• Improves execution efficiency

– Complex operations in microcode

• Supports more complex HLLs

Instruction executionInstruction executioncharacteristics characteristics

Page 8: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-8

Execution CharacteristicsExecution Characteristics

• Operations performed• Operands used• Execution sequencing• Studies have been done based on programs

written in HLLs• Dynamic studies are measured during the

execution of the program

Instruction executionInstruction executioncharacteristics characteristics

Page 9: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-9

OperationsOperations

• Assignments– Movement of data

• Conditional statements (IF, LOOP)– Sequence control

• Procedure call-return is very time consuming• Some HLL instruction lead to many machine

code operations

Instruction executionInstruction executioncharacteristics characteristics

Page 10: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-10

Relative Dynamic FrequencyRelative Dynamic Frequency Instruction executionInstruction executioncharacteristics characteristics

121316Other

----3-GoTo

13721114329If

454433311215Call

2633324235Loop

151413133845Assign

CPascalCPascalCPascal

Memory ReferenceWeighted

Machine-InstructionWeighted

Dynamic Occurrence

Page 11: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-11

OperandsOperands

• Mainly local scalar variables• Optimization should concentrate on

accessing local variables

Instruction executionInstruction executioncharacteristics characteristics

252426Array

structure

555358Scalar

variable

202316Integer

constant

AverageCPascal

Page 12: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-12

Procedure CallsProcedure Calls• Very time consuming• Depends on number of parameters passed• Depends on level of nesting• Most programs do not do a lot of calls

followed by lots of returns• Most variables are local• (c.f. locality of reference)

Instruction executionInstruction executioncharacteristics characteristics

Page 13: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-13

Procedure CallsProcedure Calls• Procedure Argument and Local Scalar

Variables– The number of words required per procedure

activation is not large

Instruction executionInstruction executioncharacteristics characteristics

Page 14: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-14

ImplicationsImplications• Best support is given by optimizing most

used and most time consuming features

• Large number of registers– Operand referencing

• Careful design of pipelines– Branch prediction etc.

• Simplified (reduced) instruction set

Instruction executionInstruction executioncharacteristics characteristics

Page 15: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-15

Large Register FileLarge Register File• Needed : a strategy that allows the most

frequently accessed operands to be kept in registers and to minimize registers-memory operations– Software solution : Rely on the compiler to

maximize register usage• Require compiler to allocate registers• Allocate based on most used variables in a

given time• Requires sophisticated program analysis

– Hardware solution• Have more registers• Thus more variables will be in registers

The use of a large The use of a large register fileregister file

Page 16: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-16

Registers for Local VariablesRegisters for Local Variables• Local variables must be saved from the

registers into memory• Store local scalar variables in registers• Reduce memory access• Every procedure (function) call changes

locality• Parameters must be passed• Results must be returned• Variables from calling programs must be

restored

The use of a large The use of a large register fileregister file

Page 17: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-17

Register WindowsRegister Windows• Only few parameters• Limited range of depth of call• Use multiple small sets of registers• Calls switch to a different set of registers• Returns switch back to a previously used

set of registers

The use of a large The use of a large register fileregister file

Page 18: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-18

Register WindowsRegister Windows• Three areas within a register set

– Parameter registers• Hold parameters passed down from the current

procedure• Hold the results to be passed back up

– Local registers• Used for local variables

– Temporary registers• Used to exchange parameters and results with the next

lower level• The temporary registers at one level are physically the

same as the parameter registers at the next level• This overlap permits parameters to be passed without

the actual movement of data

The use of a large The use of a large register fileregister file

Page 19: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-19

Overlapping Register WindowsOverlapping Register Windows The use of a large The use of a large register fileregister file

Page 20: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-20

Circular Buffer Circular Buffer DDiagramiagram The use of a large The use of a large register fileregister file

Page 21: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-21

Operation of Circular BufferOperation of Circular Buffer• When a call is made, a current window

pointer is moved to show the currently active register window

• If all windows are in use, an interrupt is generated and the oldest window (the one furthest back in the call nesting) is saved to memory

• A saved window pointer indicates where the next saved windows should restore to

The use of a large The use of a large register fileregister file

Page 22: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-22

Global VariablesGlobal Variables• Allocated by the compiler to memory

– Inefficient for frequently accessed variables

• Have a set of registers for global variables

The use of a large The use of a large register fileregister file

Page 23: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-23

Registers Registers vsvs CacheCache• The register file acts as a small, fast buffer for a

subset of all variables• Register file acts much like a cache memory

• Large Register File Cache

• All local scalars Recently used local scalars• Individual variables Blocks of memory• Compiler assigned global variables Recently used global variables• Save/restore based on procedure Save/restore based on

nesting caching algorithm • Register addressing Memory addressing

The use of a large The use of a large register fileregister file

Page 24: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-24

Registers Registers vsvs CacheCache• The choice between a large register file and a

cache is not clear-cut• The register approach is superior

– A cache-based system will be noticeably slower• The amount of addressing overhead is larger

The use of a large The use of a large register fileregister file

Page 25: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-25

Referencing a ScalarReferencing a Scalar The use of a large The use of a large register fileregister file

Referencing a Scalar Referencing a Scalar -- Register FileRegister File

Page 26: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-26

Referencing a Scalar Referencing a Scalar -- CacheCache The use of a large The use of a large register fileregister file

Page 27: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-27

CompilerCompiler--Based Register Based Register OptimizationOptimization

• Assume small number of registers (16-32)• Optimizing use is up to compiler• HLL programs have no explicit references

to registers– usually - think about C - register int

• Assign symbolic or virtual register to each candidate variable

• Map (unlimited) symbolic registers to real registers

• Symbolic registers that do not overlap can share real registers

• If you run out of real registers some variables use memory

CompilerCompiler--based based register optimizationregister optimization

Page 28: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-28

CompilerCompiler--Based Register Based Register OptimizationOptimization

• The essential task– To decide which quantities are to be assigned

to registers at any given point in the program

• The technique most commonly used– Graph coloring

CompilerCompiler--based based register optimizationregister optimization

Page 29: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-29

Graph Graph ColoringColoring• Given a graph of nodes and edges• Assign a color to each node• Adjacent nodes have different colors• Use minimum number of colors• Nodes are symbolic registers• Two registers that are live in the same

program fragment are joined by an edge• Try to color the graph with n colors, where

n is the number of real registers• Nodes that can not be colored are placed

in memory

CompilerCompiler--based based register optimizationregister optimization

Page 30: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-30

Graph Graph ColoringColoring ApproachApproach CompilerCompiler--based based register optimizationregister optimization

Page 31: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-31

Register Register OptimizationsOptimizations

• Trade-off between the use of a large set of registers and compiler-based register optimization

• With even simple register opt.– The use of more than 64-registers is better

• With reasonably sophisticated register opt.– The use of more than 32-registers only

improve performance marginally

• With a shared register optimization– The use of a small number of registers

CompilerCompiler--based based register optimizationregister optimization

Page 32: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-32

CISC and RISCCISC and RISC

• Motivation– To simplify compilers– To improve performance

Reduced instruction set Reduced instruction set architecturearchitecture

Page 33: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-33

Why CISC?Why CISC?

• Compiler simplification?– Disputed…– Complex machine instructions harder to exploit– Optimization more difficult

• Smaller programs?– Program takes up less memory but…– Memory is now cheap– May not occupy less bits, just look shorter in

symbolic form• More instructions require longer op-codes• Register references require fewer bits

– The number of bits of memory occupied may not be noticeably smaller

Reduced instruction set Reduced instruction set architecturearchitecture

Page 34: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-34

Code Size Relative to RISCCode Size Relative to RISC Reduced instruction set Reduced instruction set architecturearchitecture

Page 35: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-35

Why CISC?Why CISC?• Faster programs?

– Bias towards use of simpler instructions– More complex control unit– Microprogram control store larger– Thus simple instructions take longer to execute– The speedup in the execution is due not so

much to the power of the complex machine instructions as their residence in high-speed control store

• It is far from clear that CISC is the appropriate solution

Reduced instruction set Reduced instruction set architecturearchitecture

Page 36: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-36

RISC CharacteristicsRISC Characteristics• One instruction per cycle• Register to register operations• Few, simple addressing modes• Few, simple instruction formats• Hardwired design (no microcode)• Fixed instruction format• More compile time/effort

Reduced instruction set Reduced instruction set architecturearchitecture

Page 37: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-37

RISC CharacteristicsRISC Characteristics• One instruction per cycle

– One machine instruction per machine cycle– A machine cycle : the time it takes to fetch two

operands from registers, perform an ALU operation and store the result in a register

• Register to register operations– Most operations should be register-to-register

with only simple LOAD and STORE operations– The design feature simplifies the instruction

set and the control unit

Reduced instruction set Reduced instruction set architecturearchitecture

Page 38: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-38

RISC CharacteristicsRISC Characteristics• Few, simple addressing modes

– Almost all instructions use register addressing– Several additional modes

• Displacement• PC-relative

• Few, simple instruction formats– Only one or a few formats are used– Instruction length is fixed and aligned on word

boundaries

Reduced instruction set Reduced instruction set architecturearchitecture

Page 39: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-39

RR--toto--R R vsvs MM--toto--MM Reduced instruction set Reduced instruction set architecturearchitecture

Page 40: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-40

BBenefitsenefits of RISCof RISC

• Related to performance• Related to VLSI implementation

Reduced instruction set Reduced instruction set architecturearchitecture

Page 41: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-41

Benefits of RISCBenefits of RISC• Related to performance

– More effective optimizing compiler can be developed

– Most instructions generated by a compiler are relatively simple

– The use of the instruction pipelining– RISC programs should be more responsive to

interrupts because interrupts are checked between rather elementary operations

Reduced instruction set Reduced instruction set architecturearchitecture

Page 42: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-42

Benefits of RISCBenefits of RISC

• Related to VLSI implementation– Possible to put an entire processor on a single

chip– On-chip delays are of much shorter duration

than interchip delays– Design-and-implementation time

Reduced instruction set Reduced instruction set architecturearchitecture

Page 43: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-43

Design and Layout EffortDesign and Layout Effort Reduced instruction set Reduced instruction set architecturearchitecture

Page 44: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-44

Characteristics of ProcessorsCharacteristics of Processors Reduced instruction Reduced instruction set architectureset architecture

Page 45: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-45

RISC PipeliningRISC Pipelining• Most instructions are register to register• Two phases of execution

– I : Instruction fetch– E: Execute

• ALU operation with register input and output

• For load and store– I : Instruction fetch– E: Execute

• Calculate memory address– D: Memory

• Register to memory or memory to register operation

• If an instruction needs an operand that is altered by the preceding instruction, a delay is required– This delay can be accomplished by a NOOP

RISC pipeliningRISC pipelining

Page 46: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-46

Sequential ExecutionSequential Execution RISC pipeliningRISC pipelining

Page 47: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-47

TwoTwo--way Pipelined Timingway Pipelined Timing RISC pipeliningRISC pipelining

Page 48: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-48

ThreeThree--way Pipelined Timingway Pipelined Timing RISC pipeliningRISC pipelining

Page 49: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-49

FourFour--way Pipelined Timingway Pipelined Timing RISC pipeliningRISC pipelining

Page 50: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-50

OptimizationOptimization of Pipeliningof Pipelining• Data and branch dependencies reduce the

overall execution rate• Delayed branch

– Does not take effect until after execution of following instruction

– This following instruction is the delay slot

RISC pipeliningRISC pipelining

Page 51: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-51

Normal and Delayed BranchNormal and Delayed Branch RISC pipeliningRISC pipelining

LOAD X, A

JUMP 105

ADD 1, A

ADD A, B

SUB C, B

STORE A, Z

LOAD X, A

ADD 1, A

JUMP 106

NOP

ADD A, B

SUB C, B

STORE A, Z

LOAD X, A

ADD 1, A

JUMP 105

ADD A, B

SUB C, B

STORE A, Z

100

101

102

103

104

105

106

Optimized Delayed Branch

Delayed Branch

Normal BranchAddress

Page 52: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-52

Use of Delayed BranchUse of Delayed Branch RISC pipeliningRISC pipelining

Page 53: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-53

Delayed Delayed BBranchranch• The interchange of instructions will work

successfully for unconditional branches, cells and returns– Cannot be blindly applied for conditional branches– If the condition that is tested for the branch can be

altered by the immediately preceding instruction, the compiler must refrain from doing the interchange and instead insert a NOOP

RISC pipeliningRISC pipelining

Page 54: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-54

Delayed Delayed BBranchranch• Delayed load can be used on LOAD

instructions– On the LOAD instruction, the register that is to be

the target of the load is locked by the processor– The processor continues execution of the

instruction stream until it reaches an instruction requiring that register

• At that point, it idles until the load is complete

• The scheduling of instructions for the pipeline and the dynamic allocation of registers should be considered together to achieve the greatest efficiency

RISC pipeliningRISC pipelining

Page 55: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-55

MIPS R4000MIPS R4000• 64 rather than 32bits for all internal and

external data paths and for addresses, registers and the ALU

• The processor chip– Partitioned into two sections

• One containing the CPU• The other containing a coprocessor for memory

management– Supports 32 64-bit registers– Provides for up to 128 kbytes of high-speed cache,

half each for instructions and data

MIPS R4000MIPS R4000

Page 56: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-56

MIPS RMIPS R--series Instruction Setseries Instruction Set MIPS R4000MIPS R4000

Page 57: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-57

Additional R4000 InstructionAdditional R4000 Instruction MIPS R4000MIPS R4000

Page 58: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-58

MIPS Instruction FormatsMIPS Instruction Formats MIPS R4000MIPS R4000

Page 59: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-59

Other Addressing ModesOther Addressing Modes

• Synthesizing other addressing modes with the MIPS addressing mode

MIPS R4000MIPS R4000

Page 60: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-60

Instruction PipelineInstruction Pipeline• Two classes of processors have evolved

to offer execution of multiple instructions per clock cycle – Superscalar architecture– Superpipelined architecture

MIPS R4000MIPS R4000

Page 61: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-61

Instruction PipelineInstruction Pipeline• Superscalar architecture

– Replicates each of the pipeline stages so that two or more instructions at the same stage of the pipeline can be processed simultaneously

• Superpipelined architecture– Makes use of more fine-grained, pipeline

stages– With more stages, more instructions can be in

the pipeline at the same time, increasing parallelism

MIPS R4000MIPS R4000

Page 62: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-62

Instruction PipelineInstruction Pipeline• Both approaches have limitations

– With superscalar architecture• Dependencies between instructions in

different pipelines can slow down the system• Overhead logic is required to coordinate these

dependencies– With superpipelining

• Overhead associated with transferring instructions from one stage to the next

MIPS R4000MIPS R4000

Page 63: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-63

R3000R3000

• Five pipeline stages– Instruction fetch– Source operand fetch from register file– ALU operation or data operand address

generation– Data memory reference– Write back into register file

MIPS R4000MIPS R4000

Page 64: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-64

R3000 PipelineR3000 Pipeline MIPS R4000MIPS R4000

Page 65: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-65

R3000 Pipeline StagesR3000 Pipeline Stages MIPS R4000MIPS R4000

Page 66: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-66

R4000 R4000 • Eight pipeline stages

– Instruction fetch first half– Instruction fetch second half– Register file– Instruction execute– Data cache first– Data cache second– Tag check– Write back

MIPS R4000MIPS R4000

Page 67: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-67

R3000 and Actual R4000R3000 and Actual R4000 MIPS R4000MIPS R4000

Page 68: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-68

SPARC Register Window LayoutSPARC Register Window Layout SPARCSPARC

Page 69: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-69

Eight Register WindowsEight Register Windows SPARCSPARC

Page 70: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-70

SPARC Register OverlapSPARC Register Overlap• The calling procedure places any parameter to

be passed in its outs registers• The called procedure treats these same

physical registers as its ins registers• The processor maintains a current window

pointer(CWP)– CWP is located in the processor status

register(PSR)• PSR points to the window of the currently

executing procedure

• The window valid mask indicates which window is invalid

SPARCSPARC

Page 71: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-71

Instruction SetInstruction Set• Available ALU operations

– Integer addition (with or without carry)– Integer subtraction (with or without carry)– Bitwise Boolean AND, OR, XOR and their

negations– Shift left logical, right logical or right arithmetic

SPARCSPARC

Page 72: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-72

SPARC Instruction SetSPARC Instruction Set SPARCSPARC

Page 73: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-73

SynthesizingSynthesizing• Synthesizing other addressing modes with

SPARC addressing modes

SPARCSPARC

Page 74: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-74

SPARC Instruction FormatSPARC Instruction Format SPARCSPARC

Page 75: Chapter 12 Reduced Instruction Set Computerssoc.yonsei.ac.kr/class/material/computersystems/2003/chapter12.pdf · Chapter 12 Reduced Instruction Set Computers. 12-2 Yonsei University

YonseiYonsei UniversityUniversity12-75

ControversyControversy• Quantitative

– compare program sizes and execution speeds

• Qualitative– examine issues of high level language support

and use of VLSI real estate

• Problems– No pair of RISC and CISC that are directly

comparable– No definitive set of test programs– Difficult to separate hardware effects from

complier effects– Most comparisons done on “toy” rather than

production machines– Most commercial devices are a mixture

ControversyControversy