![Page 1: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/1.jpg)
Chapter One Introduction to Pipelined
Processors
![Page 2: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/2.jpg)
Principle of Designing Pipeline Processors
(Design Problems of Pipeline Processors)
![Page 3: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/3.jpg)
Register Tagging
![Page 4: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/4.jpg)
Example : IBM Model 91 : Floating Point Execution Unit
![Page 5: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/5.jpg)
Example : IBM Model 91-FPU
• The floating point execution unit consists of :– Data registers– Transfer paths– Floating Point Adder Unit– Multiply-Divide Unit– Reservation stations– Common Data Bus
![Page 6: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/6.jpg)
![Page 7: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/7.jpg)
Example : IBM Model 91-FPU• There are 3 reservation stations for adder
named A1, A2 and A3 and 2 for multipliers named M1 and M2.
• Each station has the source & sink registers and their tag & control fields
• The stations hold operands for next execution.
![Page 8: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/8.jpg)
![Page 9: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/9.jpg)
Example : IBM Model 91-FPU
• 3 store data buffers(SDBs) and 4 floating point registers (FLRs) are tagged
• Busy bits in FLR indicates the dependence of instructions in subsequent execution
• Common Data Bus(CDB) is to transfer operands
![Page 10: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/10.jpg)
Example : IBM Model 91-FPU• There are 11 units to supply information to
CDB: 6 FLBs, 3 adders & 2 multiply/divide unit• Tags for these stations are :
Unit Tag Unit TagFLB1 0001 ADD1 1010FLB2 0010 ADD2 1011FLB3 0011 ADD3 1100FLB4 0100 M1 1000FLB5 0101 M2 1001FLB6 0110
![Page 11: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/11.jpg)
Example : IBM Model 91-FPU• Internal forwarding can be achieved with
tagging scheme on CDB.• Example: • Let F refers to FLR and FLBi stands for ith FLB
and their contents be (F) and (FLBi)
• Consider instruction sequenceADD F,FLB1 F (F) + (FLB1)
MPY F,FLB2 F (F) x (FLB2)
![Page 12: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/12.jpg)
Example : IBM Model 91-FPU• During addition :
– Busy bit of F is set to 1– Contents of F and FLB1 is sent to adder A1 – Tag of F is set to 1010 (tag of adder)
Busy Bit = 1 Tag=1010F
![Page 13: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/13.jpg)
Floating Point
Operand Stack(FLOS)
Tag Sink Tag Source CTRLTag Sink Tag Source CTRL1010 F 0001 FLB1 CTRL
TagsStore 3data buffers 2(SDB) 1
Tag Sink Tag Source CTRLTag Sink Tag Source CTRL
Floating Point Buffers (FLB)
Control
1
2
3
4
5
6
Storage Bus Instruction Unit
Decoder
Adder Multiplier
(Common Data Bus)
Busy Bit = 1 Tag=1010
![Page 14: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/14.jpg)
Example : IBM Model 91-FPU• Meantime, the decode of MPY reveals F is
busy, then– F should set tag of M1 as 1010 (Tag of adder)– F should change its tag to 1000 (Tag of Multiplier)– Send content of FLB2 to M1
Busy Bit = 1 Tag=1000F
![Page 15: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/15.jpg)
Floating Point
Operand Stack(FLOS)
Tag Sink Tag Source CTRLTag Sink Tag Source CTRLTag Sink Tag Source CTRL
TagsStore 3data buffers 2(SDB) 1
1010 F 0010 FLB2 CTRLTag Sink Tag Source CTRL
Floating Point Buffers (FLB)
Control
1
2
3
4
5
6
Storage Bus Instruction Unit
Decoder
Adder Multiplier
(Common Data Bus)
Busy Bit = 1 Tag=1000
Before addition
![Page 16: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/16.jpg)
Floating Point
Operand Stack(FLOS)
Tag Sink Tag Source CTRLTag Sink Tag Source CTRLTag Sink Tag Source CTRL
TagsStore 3data buffers 2(SDB) 1
1000 F 0010 FLB2 CTRLTag Sink Tag Source CTRL
Floating Point Buffers (FLB)
Control
1
2
3
4
5
6
Storage Bus Instruction Unit
Decoder
Adder Multiplier
(Common Data Bus)
Busy Bit = 1 Tag=1000
After addition
![Page 17: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/17.jpg)
Example : IBM Model 91-FPU
• When addition is done, CDB finds that the result should be sent to M1
• Multiplication is done when both operands are available
![Page 18: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/18.jpg)
Hazard Detection and Resolution
![Page 19: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/19.jpg)
Hazard Detection and Resolution
• Hazards are caused by resource usage conflicts among various instructions
• They are triggered by inter-instruction dependencies
Terminologies:• Resource Objects: set of working registers,
memory locations and special flags
![Page 20: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/20.jpg)
Hazard Detection and Resolution
• Data Objects: Content of resource objects• Each Instruction can be considered as a
mapping from a set of data objects to a set of data objects.
• Domain D(I) : set of resource of objects whose data objects may affect the execution of instruction I.(e.g.Source Registers)
![Page 21: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/21.jpg)
Hazard Detection and Resolution
• Range R(I): set of resource objects whose data objects may be modified by the execution of instruction I .(e.g. Destination Register)
• Instruction reads from its domain and writes in its range
![Page 22: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/22.jpg)
Hazard Detection and Resolution
• Consider execution of instructions I and J, and J appears immediately after I.
• There are 3 types of data dependent hazards:1.RAW (Read After Write)2.WAW(Write After Write)3.WAR (Write After Read)
![Page 23: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/23.jpg)
RAW (Read After Write)
• The necessary condition for this hazard is )()( JDIR
![Page 24: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/24.jpg)
RAW (Read After Write)
• Example:I1 : LOAD r1,aI2 : ADD r2,r1
• I2 cannot be correctly executed until r1 is loaded
• Thus I2 is RAW dependent on I1
![Page 25: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/25.jpg)
WAW(Write After Write)• The necessary condition is
)()( JRIR
![Page 26: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/26.jpg)
WAW(Write After Write)
• ExampleI1 : MUL r1, r2I2 : ADD r1,r4
• Here I1 and I2 writes to same destination and hence they are said to be WAW dependent.
![Page 27: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/27.jpg)
WAR(Write After Read)
• The necessary condition is )()( JRID
![Page 28: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/28.jpg)
WAR(Write After Read)
• Example:• I1 : MUL r1,r2• I2 : ADD r2,r3• Here I2 has r2 as destination while I1 uses it as
source and hence they are WAR dependent
![Page 29: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/29.jpg)
Hazard Detection and Resolution
• Hazards can be detected in fetch stage by comparing domain and range.
• Once detected, there are two methods:1.Generate a warning signal to prevent hazard2.Allow incoming instruction through pipe and
distribute detection to all pipeline stages.
![Page 30: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/30.jpg)
Job Sequencing and Collision Prevention
![Page 31: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/31.jpg)
Job Sequencing and Collision Prevention
• Consider reservation table given below at t=1
1 2 3 4 5 6
Sa A A
Sb A A
Sc A A
![Page 32: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/32.jpg)
Job Sequencing and Collision Prevention
• Consider next initiation made at t=2
• The second initiation easily fits in the reservation table
1 2 3 4 5 6 7 8
Sa A1 A2 A1 A2
Sb A1 A2 A1 A2
Sc A1 A2 A1 A2
![Page 33: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/33.jpg)
Job Sequencing and Collision Prevention
• Now consider the case when first initiation is made at t = 1 and second at t = 3.
• Here both markings A1 and A2 falls in the same stage time units and is called collision and it must be avoided
1 2 3 4 5 6 7 8
Sa A1 A2 A1 A2
Sb A1 A2
A1
A2 A2
Sc A1 A2 A1A2 A2
![Page 34: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/34.jpg)
Terminologies
![Page 35: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/35.jpg)
Terminologies
• Latency: Time difference between two initiations in units of clock period
• Forbidden Latency: Latencies resulting in collision
• Forbidden Latency Set: Set of all forbidden latencies
![Page 36: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/36.jpg)
General Method of finding Latency
Considering all initiations:
• Forbidden Latencies are 3 and 6
1 2 3 4 5 6 7 8 910 11
Sa A1 A2 A3 A4 A5
A6A
1 A2 A3 A4 A5 A6
Sb A1 A2
A1
A3
A2A
4
A3A
5
A4A
6 A5 A6
Sc A1 A2
A1A
3
A2A
4
A3A
5
A4A
6 A5 A6
![Page 37: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/37.jpg)
Shortcut Method of finding Latency
• Forbidden Latency Set = {1,6} U {1,3} U {1,3} = { 1, 3, 6 }
![Page 38: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/38.jpg)
Terminologies• Latency Sequence : Sequence of latencies
between successive initiations• For a RT, number of valid initiations and
latencies are infinite
![Page 39: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/39.jpg)
Terminologies• Latency Cycle:• Among the infinite possible latency sequence,
the periodic ones are significant. E.g. { 1, 3, 3, 1, 3, 3,… }• The subsequence that repeats itself is called
latency cycle.E.g. {1, 3, 3}
![Page 40: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/40.jpg)
Terminologies• Period of cycle: The sum of latencies in a
latency cycle (1+3+3=7)• Average Latency: The average taken over its
latency cycle (AL=7/3=2.33)• To design a pipeline, we need a control
strategy that maximize the throughput (no. of results per unit time)
• Maximizing throughput is minimizing AL
![Page 41: Chapter One Introduction to Pipelined Processors](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681595d550346895dc69ba2/html5/thumbnails/41.jpg)
Terminologies
• Latency sequence which is aperiodic in nature is impossible to design
• Thus design problem is arriving at a latency cycle having minimal average latency.