chapter one introduction to pipelined processors

41
Chapter One Introduction to Pipelined Processors

Upload: corby

Post on 02-Feb-2016

58 views

Category:

Documents


0 download

DESCRIPTION

Chapter One Introduction to Pipelined Processors. Principle of Designing Pipeline Processors. (Design Problems of Pipeline Processors). Register Tagging. Example : IBM Model 91 : Floating Point Execution Unit. Example : IBM Model 91-FPU. The floating point execution unit consists of : - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter One  Introduction to Pipelined Processors

Chapter One Introduction to Pipelined

Processors

Page 2: Chapter One  Introduction to Pipelined Processors

Principle of Designing Pipeline Processors

(Design Problems of Pipeline Processors)

Page 3: Chapter One  Introduction to Pipelined Processors

Register Tagging

Page 4: Chapter One  Introduction to Pipelined Processors

Example : IBM Model 91 : Floating Point Execution Unit

Page 5: Chapter One  Introduction to Pipelined Processors

Example : IBM Model 91-FPU

• The floating point execution unit consists of :– Data registers– Transfer paths– Floating Point Adder Unit– Multiply-Divide Unit– Reservation stations– Common Data Bus

Page 6: Chapter One  Introduction to Pipelined Processors
Page 7: Chapter One  Introduction to Pipelined Processors

Example : IBM Model 91-FPU• There are 3 reservation stations for adder

named A1, A2 and A3 and 2 for multipliers named M1 and M2.

• Each station has the source & sink registers and their tag & control fields

• The stations hold operands for next execution.

Page 8: Chapter One  Introduction to Pipelined Processors
Page 9: Chapter One  Introduction to Pipelined Processors

Example : IBM Model 91-FPU

• 3 store data buffers(SDBs) and 4 floating point registers (FLRs) are tagged

• Busy bits in FLR indicates the dependence of instructions in subsequent execution

• Common Data Bus(CDB) is to transfer operands

Page 10: Chapter One  Introduction to Pipelined Processors

Example : IBM Model 91-FPU• There are 11 units to supply information to

CDB: 6 FLBs, 3 adders & 2 multiply/divide unit• Tags for these stations are :

Unit Tag Unit TagFLB1 0001 ADD1 1010FLB2 0010 ADD2 1011FLB3 0011 ADD3 1100FLB4 0100 M1 1000FLB5 0101 M2 1001FLB6 0110

Page 11: Chapter One  Introduction to Pipelined Processors

Example : IBM Model 91-FPU• Internal forwarding can be achieved with

tagging scheme on CDB.• Example: • Let F refers to FLR and FLBi stands for ith FLB

and their contents be (F) and (FLBi)

• Consider instruction sequenceADD F,FLB1 F (F) + (FLB1)

MPY F,FLB2 F (F) x (FLB2)

Page 12: Chapter One  Introduction to Pipelined Processors

Example : IBM Model 91-FPU• During addition :

– Busy bit of F is set to 1– Contents of F and FLB1 is sent to adder A1 – Tag of F is set to 1010 (tag of adder)

Busy Bit = 1 Tag=1010F

Page 13: Chapter One  Introduction to Pipelined Processors

Floating Point

Operand Stack(FLOS)

Tag Sink Tag Source CTRLTag Sink Tag Source CTRL1010 F 0001 FLB1 CTRL

TagsStore 3data buffers 2(SDB) 1

Tag Sink Tag Source CTRLTag Sink Tag Source CTRL

Floating Point Buffers (FLB)

Control

1

2

3

4

5

6

Storage Bus Instruction Unit

Decoder

Adder Multiplier

(Common Data Bus)

Busy Bit = 1 Tag=1010

Page 14: Chapter One  Introduction to Pipelined Processors

Example : IBM Model 91-FPU• Meantime, the decode of MPY reveals F is

busy, then– F should set tag of M1 as 1010 (Tag of adder)– F should change its tag to 1000 (Tag of Multiplier)– Send content of FLB2 to M1

Busy Bit = 1 Tag=1000F

Page 15: Chapter One  Introduction to Pipelined Processors

Floating Point

Operand Stack(FLOS)

Tag Sink Tag Source CTRLTag Sink Tag Source CTRLTag Sink Tag Source CTRL

TagsStore 3data buffers 2(SDB) 1

1010 F 0010 FLB2 CTRLTag Sink Tag Source CTRL

Floating Point Buffers (FLB)

Control

1

2

3

4

5

6

Storage Bus Instruction Unit

Decoder

Adder Multiplier

(Common Data Bus)

Busy Bit = 1 Tag=1000

Before addition

Page 16: Chapter One  Introduction to Pipelined Processors

Floating Point

Operand Stack(FLOS)

Tag Sink Tag Source CTRLTag Sink Tag Source CTRLTag Sink Tag Source CTRL

TagsStore 3data buffers 2(SDB) 1

1000 F 0010 FLB2 CTRLTag Sink Tag Source CTRL

Floating Point Buffers (FLB)

Control

1

2

3

4

5

6

Storage Bus Instruction Unit

Decoder

Adder Multiplier

(Common Data Bus)

Busy Bit = 1 Tag=1000

After addition

Page 17: Chapter One  Introduction to Pipelined Processors

Example : IBM Model 91-FPU

• When addition is done, CDB finds that the result should be sent to M1

• Multiplication is done when both operands are available

Page 18: Chapter One  Introduction to Pipelined Processors

Hazard Detection and Resolution

Page 19: Chapter One  Introduction to Pipelined Processors

Hazard Detection and Resolution

• Hazards are caused by resource usage conflicts among various instructions

• They are triggered by inter-instruction dependencies

Terminologies:• Resource Objects: set of working registers,

memory locations and special flags

Page 20: Chapter One  Introduction to Pipelined Processors

Hazard Detection and Resolution

• Data Objects: Content of resource objects• Each Instruction can be considered as a

mapping from a set of data objects to a set of data objects.

• Domain D(I) : set of resource of objects whose data objects may affect the execution of instruction I.(e.g.Source Registers)

Page 21: Chapter One  Introduction to Pipelined Processors

Hazard Detection and Resolution

• Range R(I): set of resource objects whose data objects may be modified by the execution of instruction I .(e.g. Destination Register)

• Instruction reads from its domain and writes in its range

Page 22: Chapter One  Introduction to Pipelined Processors

Hazard Detection and Resolution

• Consider execution of instructions I and J, and J appears immediately after I.

• There are 3 types of data dependent hazards:1.RAW (Read After Write)2.WAW(Write After Write)3.WAR (Write After Read)

Page 23: Chapter One  Introduction to Pipelined Processors

RAW (Read After Write)

• The necessary condition for this hazard is )()( JDIR

Page 24: Chapter One  Introduction to Pipelined Processors

RAW (Read After Write)

• Example:I1 : LOAD r1,aI2 : ADD r2,r1

• I2 cannot be correctly executed until r1 is loaded

• Thus I2 is RAW dependent on I1

Page 25: Chapter One  Introduction to Pipelined Processors

WAW(Write After Write)• The necessary condition is

)()( JRIR

Page 26: Chapter One  Introduction to Pipelined Processors

WAW(Write After Write)

• ExampleI1 : MUL r1, r2I2 : ADD r1,r4

• Here I1 and I2 writes to same destination and hence they are said to be WAW dependent.

Page 27: Chapter One  Introduction to Pipelined Processors

WAR(Write After Read)

• The necessary condition is )()( JRID

Page 28: Chapter One  Introduction to Pipelined Processors

WAR(Write After Read)

• Example:• I1 : MUL r1,r2• I2 : ADD r2,r3• Here I2 has r2 as destination while I1 uses it as

source and hence they are WAR dependent

Page 29: Chapter One  Introduction to Pipelined Processors

Hazard Detection and Resolution

• Hazards can be detected in fetch stage by comparing domain and range.

• Once detected, there are two methods:1.Generate a warning signal to prevent hazard2.Allow incoming instruction through pipe and

distribute detection to all pipeline stages.

Page 30: Chapter One  Introduction to Pipelined Processors

Job Sequencing and Collision Prevention

Page 31: Chapter One  Introduction to Pipelined Processors

Job Sequencing and Collision Prevention

• Consider reservation table given below at t=1

  1 2 3 4 5 6

Sa A A

Sb A A

Sc A A

Page 32: Chapter One  Introduction to Pipelined Processors

Job Sequencing and Collision Prevention

• Consider next initiation made at t=2

• The second initiation easily fits in the reservation table

  1 2 3 4 5 6 7 8

Sa A1 A2 A1 A2

Sb A1 A2 A1 A2

Sc A1 A2 A1 A2

Page 33: Chapter One  Introduction to Pipelined Processors

Job Sequencing and Collision Prevention

• Now consider the case when first initiation is made at t = 1 and second at t = 3.

• Here both markings A1 and A2 falls in the same stage time units and is called collision and it must be avoided

  1 2 3 4 5 6 7 8

Sa A1 A2 A1 A2

Sb A1 A2

A1

A2 A2

Sc A1 A2 A1A2 A2

Page 34: Chapter One  Introduction to Pipelined Processors

Terminologies

Page 35: Chapter One  Introduction to Pipelined Processors

Terminologies

• Latency: Time difference between two initiations in units of clock period

• Forbidden Latency: Latencies resulting in collision

• Forbidden Latency Set: Set of all forbidden latencies

Page 36: Chapter One  Introduction to Pipelined Processors

General Method of finding Latency

Considering all initiations:

• Forbidden Latencies are 3 and 6

  1 2 3 4 5 6 7 8 910 11

Sa A1 A2 A3 A4 A5

A6A

1 A2 A3 A4 A5 A6

Sb A1 A2

A1

A3

A2A

4

A3A

5

A4A

6 A5 A6

Sc A1 A2

A1A

3

A2A

4

A3A

5

A4A

6 A5 A6

Page 37: Chapter One  Introduction to Pipelined Processors

Shortcut Method of finding Latency

• Forbidden Latency Set = {1,6} U {1,3} U {1,3} = { 1, 3, 6 }

Page 38: Chapter One  Introduction to Pipelined Processors

Terminologies• Latency Sequence : Sequence of latencies

between successive initiations• For a RT, number of valid initiations and

latencies are infinite

Page 39: Chapter One  Introduction to Pipelined Processors

Terminologies• Latency Cycle:• Among the infinite possible latency sequence,

the periodic ones are significant. E.g. { 1, 3, 3, 1, 3, 3,… }• The subsequence that repeats itself is called

latency cycle.E.g. {1, 3, 3}

Page 40: Chapter One  Introduction to Pipelined Processors

Terminologies• Period of cycle: The sum of latencies in a

latency cycle (1+3+3=7)• Average Latency: The average taken over its

latency cycle (AL=7/3=2.33)• To design a pipeline, we need a control

strategy that maximize the throughput (no. of results per unit time)

• Maximizing throughput is minimizing AL

Page 41: Chapter One  Introduction to Pipelined Processors

Terminologies

• Latency sequence which is aperiodic in nature is impossible to design

• Thus design problem is arriving at a latency cycle having minimal average latency.