chapter one introduction to pipelined processors
DESCRIPTION
Chapter One Introduction to Pipelined Processors. Principle of Designing Pipeline Processors. (Design Problems of Pipeline Processors). Instruction Prefetch and Branch Handling. The instructions in computer programs can be classified into 4 types: Arithmetic/Load Operations (60%) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/1.jpg)
Chapter One Introduction to Pipelined
Processors
![Page 2: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/2.jpg)
Principle of Designing Pipeline Processors
(Design Problems of Pipeline Processors)
![Page 3: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/3.jpg)
Instruction Prefetch and Branch Handling
• The instructions in computer programs can be classified into 4 types:– Arithmetic/Load Operations (60%) – Store Type Instructions (15%)– Branch Type Instructions (5%)– Conditional Branch Type (Yes – 12% and No – 8%)
![Page 4: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/4.jpg)
Instruction Prefetch and Branch Handling
• Arithmetic/Load Operations (60%) : – These operations require one or two operand
fetches. – The execution of different operations requires a
different number of pipeline cycles
![Page 5: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/5.jpg)
Instruction Prefetch and Branch Handling
• Store Type Instructions (15%) :– It requires a memory access to store the data.
• Branch Type Instructions (5%) :– It corresponds to an unconditional jump.
![Page 6: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/6.jpg)
Instruction Prefetch and Branch Handling
• Conditional Branch Type (Yes – 12% and No – 8%) : – Yes path requires the calculation of the new
address – No path proceeds to next sequential instruction.
![Page 7: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/7.jpg)
Instruction Prefetch and Branch Handling
• Arithmetic-load and store instructions do not alter the execution order of the program.
• Branch instructions and Interrupts cause some damaging effects on the performance of pipeline computers.
![Page 8: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/8.jpg)
Interrupts
• When instruction I is being executed,the occurrence of an interrupt postpones instruction I+1 until ISR is serviced.
• There are two types of interrupt: – Precise : caused by illegal operation codes and can
be detected at decoding stage – Imprecise: caused by defaults from storage,
address and execution functions
![Page 9: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/9.jpg)
Handling Interrupts
• Precise: Since decoding is the first stage, instruction I prohibits I+1 from entering the pipeline and all preceding instructions are executed before ISR
• Imprecise : No new instructions are allowed and all incomplete instructions whether they precede or follow are executed before ISR.
![Page 10: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/10.jpg)
Handling Example – Interrupt System of Cray1
![Page 11: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/11.jpg)
Cray-1 System• The interrupt system is built around an exchange
package. • When an interrupt occurs, the Cray-1 saves 8 scalar
registers, 8 address registers, program counter and monitor flags.
• These are packed into 16 words and swapped with a block whose address is specified by a hardware exchange address register
• Since exchange package does not have all state information, software interrupt handler have to store remaining states
![Page 12: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/12.jpg)
Instruction Prefetch and Branch Handling
• In general, the higher the percentage of branch type instructions in a program, the slower a program will run on a pipeline processor.
![Page 13: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/13.jpg)
Effect of Branching on Pipeline Performance
• Consider a linear pipeline of 5 stages
Fetch Instruction Decode Fetch
OperandsExecute Store
Results
![Page 14: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/14.jpg)
Overlapped Execution of Instruction without branching
I1
I2I3
I4
I5I6
I7I8
![Page 15: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/15.jpg)
I5 is a branch instruction
I1
I2
I3I4
I5
I6I7
I8
![Page 16: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/16.jpg)
Estimation of the effect of branching on an n-segment instruction pipeline
![Page 17: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/17.jpg)
Estimation of the effect of branching
• Consider an instruction cycle with n pipeline clock periods.
• Let – p – probability of conditional branch (20%)– q – probability that a branch is successful (60% of
20%) (12/20=0.6)
![Page 18: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/18.jpg)
Estimation of the effect of branching
• Suppose there are m instructions • Then no. of instructions of successful branches
= mxpxq (mx0.2x0.6)• Delay of (n-1)/n is required for each successful
branch to flush pipeline.
![Page 19: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/19.jpg)
Estimation of the effect of branching
• Thus, the total instruction cycle required for m instructions =
n
nmpqmn
n
)1(1
1
![Page 20: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/20.jpg)
Estimation of the effect of branching
• As m becomes large , the average no. of instructions per instruction cycle is given as
= ?
nnmpq
nmn
mLtm )1(1
![Page 21: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/21.jpg)
Estimation of the effect of branching
• As m becomes large , the average no. of instructions per instruction cycle is given as
nnmpq
nmn
mLtm )1(1
)1(1
npq
n
![Page 22: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/22.jpg)
Estimation of the effect of branching
• When p =0, the above measure reduces to n, which is ideal.
• In reality, it is always less than n.
![Page 23: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/23.jpg)
Solution = ?
![Page 24: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/24.jpg)
Multiple Prefetch Buffers• Buffers can be used to match the instruction
fetch rate to pipeline consumption rate1.Sequential Buffers: for in-sequence pipelining2.Target Buffers: instructions from a branch
target (for out-of-sequence pipelining)
![Page 25: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/25.jpg)
Multiple Prefetch Buffers• A conditional branch cause both sequential
and target to fill and based on condition one is selected and other is discarded
![Page 26: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/26.jpg)
Data Buffering and Busing Structures
![Page 27: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/27.jpg)
Speeding up of pipeline segments
• The processing speed of pipeline segments are usually unequal.
• Consider the example given below:
S1 S2 S3
T1 T2 T3
![Page 28: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/28.jpg)
Speeding up of pipeline segments• If T1 = T3 = T and T2 = 3T, S2 becomes the
bottleneck and we need to remove it• How?• One method is to subdivide the bottleneck– Two divisions possible are:
![Page 29: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/29.jpg)
Speeding up of pipeline segments
• First Method:
S1
T T 2T
S3
T
![Page 30: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/30.jpg)
Speeding up of pipeline segments
• First Method:
S1
T T 2T
S3
T
![Page 31: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/31.jpg)
Speeding up of pipeline segments
• Second Method:
S1
T T T
S3
T T
![Page 32: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/32.jpg)
Speeding up of pipeline segments
• If the bottleneck is not sub-divisible, we can duplicate S2 in parallel
S1
S2
S3
T
3T
T
S2
3T
S2
3T
![Page 33: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/33.jpg)
Speeding up of pipeline segments
• Control and Synchronization is more complex in parallel segments
![Page 34: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/34.jpg)
Data Buffering• Instruction and data buffering provides a
continuous flow to pipeline units• Example: 4X TI ASC
![Page 35: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/35.jpg)
Example: 4X TI ASC • In this system it uses a memory buffer unit
(MBU) which– Supply arithmetic unit with a continuous stream
of operands– Store results in memory
• The MBU has three double buffers X, Y and Z (one octet per buffer)– X,Y for input and Z for output
![Page 36: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/36.jpg)
Example: 4X TI ASC • This provides pipeline processing at high rate
and alleviate bandwidth mismatch problem between memory and arithmetic pipeline
![Page 37: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/37.jpg)
Busing Structures
• PBLM: Ideally subfunctions in pipeline should be independent, else the pipeline must be halted till dependency is removed.
• SOLN: An efficient internal busing structure.• Example : TI ASC
![Page 38: Chapter One Introduction to Pipelined Processors](https://reader034.vdocuments.us/reader034/viewer/2022051517/568158ec550346895dc62db1/html5/thumbnails/38.jpg)
Example : TI ASC
• In TI ASC, once instruction dependency is recognized, update capability is incorporated by transferring contents of Z buffer to X or Y buffer.