pipeline computing by s. m. risalat hasan chowdhury
TRANSCRIPT
Abstract:
Computer is a magic box of modern world. Now a day it is not used in only arithmetic operations, it is used in space
to earth everywhere. For this reasons engineers try to reduce the size of a computer and rapidly increase the
computation (computing, networking, gaming, etc) power of a computer. Now people use the computer to entertain
themselves too. So, it is a challenge for engineers to give them a perfect device what they want. Pipeline computing
is a technique to achieve a machine which can give satisfactory services to the users. Pipelining began in the late
1970’s in Supercomputers as vector processors and array processors. This paper is about the pipeline computing and
its works.
1. Introduction:
Pipelining what is a technique to decompose a sequential process into sub operations. Each sub process will be
executed in a special dedicated segment that operates concurrently with all other segments. The registers give the
facility of isolation between each segment so that individual segment can operate on distinct data simultaneously. If
one wants to do a simple arithmetic work A x B + C = ? , the technique will provide the service as follows-
A B
C
Fig. 1.1: Example of Pipeline arithmetic data processing
Then the processor holds the value of A in register R1, B in R2 and C in R4 register. First multiply the values of A
and B and store the multiplied value in register R3. After that the processor takes the value of C from R4 register
and adds with the R3 register value with the help of adder. Finally the result of the problem will be hold in the R5
register. In pipelining, the processor works in a serial way, it doesn’t do the work in parallel way. The processes will
be done in a serial way. Here, there is a reservation table for store the pipeline processes. The reservation table
mainly displays the time space flow of data through the pipeline for a function [1]. Pipelining is a technique for
improving processing performance of a micro-processor. This architecture allows the concurrent execution of
several instructions. For achieving pipelining, a task is subdivided into a sequence of subtasks and each of which can
be executed by a specialized hardware stage that operates concurrently with other stages in the pipeline.
2. Types of Pipelining: There are two types of pipeline computing. Such as Linear and Nonlinear (Dynamic Pipelining). Each has its own
characteristics and functions to do task. They use a reservation table (for Linear) or a set of reservation tables (for
Dynamic). A reservation table for a linear pipeline computing can be generated easily because data flow follows a
linear stream. But Dynamic pipeline or non-linear pipeline uses a non-linear pattern, is followed so multiple
reservation tables can be generated for different functions. The reservation table generally displays the time space
flow of data through the pipeline for a function. Different functions in a reservation table follow different paths.
R1 R2
R4
Multiplier
R3
Adder
R5
Figure: Types of Pipeline Computing
Linear pipelining processor is a series of processing stages and memory access. It uses only one reservation table for
accessing processes without thinking the result of the processes. Reservation table is used for the data or process
flow. But in the case of Nonlinear pipelining, it uses several reservation table. And it uses a table when it needs that
table to do a process. Tables are always there to help the nonlinear technique [2].
3. Computer-related pipelines are:
- Instruction pipelines: such as the classic RISC pipeline which are used in central processing units (CPUs)
to allow overlapping execution of multiple instructions with the same circuitry. The circuitry is usually
divided up into stages, including instruction decoding, arithmetic, and registers fetching stages, where in
each stage processes one instruction at a time.
- Graphics pipelines: found in most graphics processing units (GPUs), which consist of multiple arithmetic
units, or complete CPUs, that implement the various stages of common rendering operations (perspective
projection, window clipping, color and light calculation, rendering, etc.).
- Software pipelines: where commands can be written where the output of one operation is automatically
fed to the next, following operation. The UNIX system call pipe is a classic example of this concept,
although other operating systems do support pipes as well.
- HTTP pipelining: where multiple requests are sent without waiting for the result of the first request [3].
4.1. Instruction Pipelining:
- First stage fetches the instruction and buffers it.
- When the second stage is free, the first stage passes it the buffered instruction.
- While the second stage is executing the instruction, the first stage takes advantages of any unused
memory cycles to fetch and buffer the next instruction.
- This is called instruction pre-fetch or fetches overlap.
4.2. Inefficiency in two stage instruction pipelining:
There are two reasons-
- The execution time will generally be longer than the fetch time. Thus the fetch stage may have to wait
for some time before it can empty the buffer.
- When conditional branch occurs, then the address of next instruction to be fetched becomes unknown.
Then the execution stage has to wait while the next instruction is fetched.
Fig-1: Two stage instruction pipelining
4.3. Use the Idea of Pipelining in a Computer:
Pipeline
computing
Linear Pipelining
Non-linear Pipelining
Fetch Execute Instruction Instruction Result
The speed of execution of programs is prejudiced by many factors and one way to get better performance is to use faster circuit technology to build the processor and the main memory. Another option is to arrange the hardware so that more than one operation can be performed at the same time. In this way, the number of operations performed
per second is increased even though the elapsed time needed to perform any one operation is not changed. We have encountered concurrent activities several times before. Chapter 1 in-traduced the concept of multiprogramming and explained how it is possible for I/O transfers and computational activities to proceed simultaneously. DMA devices make this possible because they can perform I/O transfers independently once these transfers are initiated by the
processor. Pipelining is a particularly effective way of organizing concurrent activity in a computer system. The basic idea is very simple. It is frequently encountered in manufacturing plants, where pipelining is commonly known as an assembly-line operation. Readers are undoubtedly familiar with the assembly line used in car manufacturing. The first station in an assembly line may prepare the chassis of a car, the next station adds the body, and the next one
installs the engine, and so on. While one group of workers is installing the engine on one car, another group is fitting a car body on the chassis of another car, and yet another group is preparing a new chassis for a third car. It may take days to complete work on a given car, but it is possible to have a new car rolling off the end of the assembly line every few minutes.
Fig-2:Building a car using unpielined Fig-3:Building a car using pielined
Consider how the idea of pipelining can be used in a computer. The processor executes a program by fetching and
executing instructions, one after the other. Let Fi and Ei refer to the fetch and execute steps for instruction. Execution
of a program consists of a sequence of fetch and execute steps, as shown in Figure 3a.Now consider a computer that
has two separate hardware units, one for fetching instructions and another for executing them, as shown in Figure
3b. The instruction fetched by the fetch unit is deposited in an intermediate storage buffer, B1. This buffer is needed
to enable the execution unit to execute the instruction while the fetch unit is fetching the next instruction. The results
of execution are deposited in the destination location specified by the instruction. For the purposes of this
discussion, we assume that both the source and the destination of the data operated on by the instructions are inside
the block labeled “Execution unit [4].
Fig 3: Basic idea of instruction pipelining [7]
4.4. Decomposition of instruction processing
- To gain further speedup, the pipeline have more stages(6 stages)
- Fetch instruction(FI)
- Decode instruction(DI)
- Calculate operands (i.e. EAs)(CO)
- Fetch operands(FO)
- Execute instructions(EI)
- Write operand(WO)
Fig: Six-stage CPU instruction pipeline [7]
Fig 4: A 4 stage pipeline [7]
Fetch Instruction (FI): Read the next expected instruction into a buffer
Decode Instruction (DI): Determine the Opcode and the operand specifiers.
Calculate Operands (CO): Calculate the effective address of each source operand.
Fetch Operands (FO): Fetch each operand from memory. Operands in registers need not be fetched.
Execute Instruction (EI) : Perform the indicated operation and store the result
Write Operand (WO): Store the result in memory.
4.5. Timing diagram for instruction pipeline operation:
Fig 5: Timing diagram for instruction pipeline operation [7]
4.6. High efficiency of instruction pipelining:
- Assume all the below in diagram
- All stages will be of equal duration.
- Each instruction goes through all the six stages of the pipeline.
- All the stages can be performed parallel.
- No memory conflicts.
- All the accesses occur simultaneously.
In the previous diagram the instruction pipelining works very efficiently and give high performance
4.7. Limits to performance enhancement:
The factors affecting the performance are
1. If six stages are not of equal duration, then there will be some waiting time at various stages.
2. Conditional branch instruction which can invalidate several instructions fetches.
3. Interrupt which is unpredictable event.
4. Register and memory conflicts.
5. CO stage may depend on the contents of a register that could be altered by a previous instruction that is still
in pipeline.
4.8. Effect of conditional branch on instruction pipeline operation:
Fig 6: Effect of conditional branch on instruction pipeline operation [7]
5. Implementations:
Buffered, synchronous pipelines:
Conventional microprocessors are synchronous circuits that use buffered, synchronous pipelines. In these
pipelines, "pipeline registers" are inserted in-between pipeline stages, and are clocked synchronously. The
time between each clock signal is set to be greater than the longest delay between pipeline stages, so that
when the registers are clocked, the data that is written to them is the final result of the previous stage.
Buffered, asynchronous pipelines
Asynchronous pipelines are used in asynchronous circuits, and have their pipeline registers clocked
asynchronously. Generally speaking, they use a request/acknowledge system, where in each stage can
detect when it's "finished". When the stage, Si, is ready to transmit, it sends a ready signal to stage Si+1.
After stage Si+1, receives the incoming data, it returns an acknowledgement signal to Si. The AMULET
microprocessor is an example of a microprocessor that uses buffered, asynchronous pipelines.
Unbuffered pipelines:
Unbuffered pipelines, called "wave pipelines", do not have registers in-between pipeline stages. Instead, the
delays in the pipeline are "balanced" so that, for each stage, the difference between the first stabilized
output data and the last is minimized. Thus, data flows in "waves" through the pipeline, and each wave is
kept as short (synchronous) as possible. The maximum rate that data can be fed into a wave pipeline is
determined by the maximum difference in delay between the first piece of data coming out of the pipe and
the last piece of data, for any given wave. If data is fed in faster than this, it is possible for waves of data to
interfere with each other [3].
6. Applications of Pipeline Computing:
6.1. RISC: A computer uses some instructions with simple constructs so they can execute faster within the CPU,
often without memory. This technique simply known as Reduced Instruction Set Computer (RISC). RISC uses
fewer instructions set but works faster than others. Fewer instruction set reduces its execution cycle. RISC is a CPU
design based on a simplified instruction set provides higher performance when combined with a microprocessor
architecture capable of executing those instructions using fewer microprocessor cycles per instruction. Its major
characteristics are-
- Mostly few instructions,
- Relatively less addressing modes,
- Memory access limited to load and store instructions,
- Operations done within the registers of the CPU,
- Single-cycle instruction execution.
RISC uses pipelining to do its execution and other tasks. The RiSC-16 is an 8-register, 16-bit computer. All
addresses are short word-addresses. The clock speeds are increased while the amounts of logic between successive
latches are decreased. If the full task is sliced up into smaller sub-tasks, the clock can run as fast as the largest sub-
task. A pipeline theoretically of n stages must run with a clock that is n times faster than any sequential
implementation. This theoretical limit is never reached because of latch overhead and sub-task of unequal length.
Sometimes the clock rate may extremely fast. The process of slicing up the instruction execution is called pipelining.
And this process is in every aspect of modern computer design; from processor core to DRAM sub-system, to
overlapping of transactions on memory and I/O buses etc [9].
6.2. Graphical Pipeline Viewer:
Figure: Graphical Pipeline Viewer
Figure shows as a Graphical Pipeline Viewer. Here an architectural simulator is used to produce a pipetrace stream
which contains a detailed description of the instruction flow through the machine and document the movement of
instructions in the pipeline from start to end. Pipetrace stream denotes various other events and stage transitions that
occur during an instruction’s lifetime. Generally, Pipetrace stream which comes from the architectural simulator can
be sent directly into GPV (Graphical Pipeline Viewer) or buffered in a file for further analysis or processes. GPV
digests this information or data and produces a graphical representation of the data. The graph generated by the GPV
plots instructions in program order, showing the lifetime of an instruction what operation it was performing. The
data or information which is the need of machine in a vision form always chooses pipelining computing. On a TV
screen or computer monitor, people see images one by one. After viewing one image people can see another because
of pipelining. Next data have to wait for the previous one completed. Architectural Simulator controls the flow of
information to turn into vision format. After the pipelining process, any kind of display device can display the data
in a vision format. Here is an example of CRT monitor and it’s inside elements.
Architectural
Simulator
Text
file
GPV
PERL TK
Pipetrace stream
Screen
Shadowmask
Phosphors
Phosphor screen
Electron guns
BlueGreen
Red
Fig.: Shadow masking in CRT
CRT monitor has 3types of electron guns, control electrodes, focusing electrodes, horizontal deflection plates,
vertical deflection plates, shadow mask and a phosphor screen. Electron guns shoot electrons; control electrode,
focusing electrodes, horizontal deflection plates, vertical deflection plates, shadow mask are used to shoot the
electrons on exact point. Shadow mask also filters the electrons. While electrons hit the phosphor screen, it glows
and shows different dot colors and the dot color make a full image. RGB color coding is a common form of color
coding. Using this technique display devices and other devices can frequently use color (how much and where the
machine needs) [10].
6.3. Pipeline and Unpipeline in CPU:
Pipelining is a concept used in many fields, as well as in CPU architectures, to speed-up a job. The principle that lies
behind it is basically the following. Consider that a job J is completed by a sequential system in a time T. In this case
the latency is T and the throughput is Tseq=1/T. obviously, given the nature of the system, N jobs of the same kind
of J are completed in a time N*T. It is likely though, that they can be divided in smaller stages and each stage be
executed by a sub-system. Consider the following figure with J divided in 4 parts.
In this scenario the system doesn't need to be strictly sequential. Indeed, when JobA (in the following figure) is in
the first stage just the sub-system in charge of executing J1 is busy. Consequently, is possible to execute other jobs
in the same clock cycle. In fact, as in the example below, four jobs are executed in parallel in the fourth clock.
Ideally, if K is the number of stages, the throughput of a pipelined system is Tpip=K/T, hence K times bigger
than Tseq [11}.
7. Advantages and Limitations of Pipelining:
Pipelining does not help in all cases. There are several possible limitations. An instruction pipeline is said to be fully
pipelined if it can accept a new instruction every clock cycle. A pipeline that is not fully pipelined has wait cycles
that delay the progress of the pipeline.
7.1. Advantages of Pipelining:
- The cycle time of the processor is reduced, thus increasing instruction issue-rate in most cases.
- Some combinational circuits such as adders or multipliers can be made faster by adding more
circuitry. If pipelining is used instead, it can save circuitry vs. a more complex combinational
circuit.
- Pipeline are most convenient ,efficient and economical mode of transporting liquids like
petroleum, petroleum products, nature gas, water, milk etc.
- pipelines have relieved the increasing pressure on the exiting surface transport system(railways
and roadways)
- Pipeline is a safe and reliable mode of transport system.
- It is an economical and dependable mode of transport system particularly to the sensitive and
strategic area.
- In case of underground pipelines, the land in which pipeline is laid can still be used for
agricultural use.
7.2. Limitations of Pipelining:
- First is its complexity and second is the inability to constantly run the pipeline at full speed.
- A non-pipelined processor executes only a single instruction at a time. This prevents branch delays
(in effect, every branch is delayed) and problems with serial instructions being executed
concurrently. Consequently the design is simpler and cheaper to manufacture.
- The instruction latency in a non-pipelined processor is slightly lower than in a pipelined
equivalent. This is due to the fact that extra flip flops must be added to the data path of a pipelined
processor.
- A non-pipelined processor will have a stable instruction bandwidth. The performance of a
pipelined processor is much harder to predict and may vary more widely between different
programs [12}.
Contribution:
S. M. Risalat Hasan Chowdhury
Introduction, Types of Pipelining, Reservation Table, Application (RISC-
16, Graphical Pipeline Viewer, CRT monitor basic).
Sohan Khan
Instruction Pipelining, Inefficiency in two stage instruction pipelining, Use
the Idea of Pipelining in a Computer, Decomposition of instruction
processing, Timing diagram for instruction pipeline operation, High
efficiency of instruction pipelining, Limits to performance enhancement,
Effect of conditional branch on instruction pipeline operation.
Umma Habiba
Computer-related pipelines, Implementations, Pipeline and Unpipeline in
CPU, Advantages and Limitations of Pipelining.
References:
1. https://app.box.com/shared/i3uj6z2y78
2. Performance Evaluation of Nonlinear Pipeline through UML by Dr. Vipin Saxena and Manish Shrivastava.
3. Graphics pipeline. (n.d.). Computer Desktop Encyclopedia. Retrieved December 13, 2005,
4. www.mhhe.com/engcs/electrical/hamacher/5e/.../ch08_453-510.pdf
5. A.Bright,J.Fritts,andM.Gschwind. Decoupledfetch-executeengine withstaticbranchpredictionsupport.
Technicalreport,IBMResearch Report RC23261, IBM Research Division, 1999.
6. https://www.cs.auckland.ac.nz/~jmor159/363/html/pipelines.html
7. http://www.slideshare.net/siddiqueibrahim37/pipelining-41608675
8. The SPARC Architecture Manual, Version 9, D. Weaver and T. Germond, ed., PTR Prentice Hall,
Englewood Cliffs, New Jersey, 1994.
9. ENEE 446: Digital Computer Design — The Pipelined RiSC-16 by Prof. Bruce Jacob.Performance
Analysis Using Pipeline Visualization by Chris Weaver, Kenneth C. Barr, Eric Marsman, Dan Ernst, and
Todd Austin.
10. Performance Analysis Using Pipeline Visualization by Chris Weaver and others; Advanced Computer
Architecture Laboratory University of Michigan; Ann Arbor MI 48104
11. https://www.quora.com/Processor-Architecture-In-a-CPU-what-is-the-benefit-of-having-many-pipeline-
stages
12. https://www.ukessays.com/essays/information-technology/pipelining-and-superscalar-architecture-
information-technology-essay.php