pipeline computing by s. m. risalat hasan chowdhury

A Simple Study

On

Pipeline Computing

By

S. M. Risalat Hasan Chowdhury

Sohan Khan

Umma Habiba

Abstract:

Computer is a magic box of modern world. Now a day it is not used in only arithmetic operations, it is used in space

to earth everywhere. For this reasons engineers try to reduce the size of a computer and rapidly increase the

computation (computing, networking, gaming, etc) power of a computer. Now people use the computer to entertain

themselves too. So, it is a challenge for engineers to give them a perfect device what they want. Pipeline computing

is a technique to achieve a machine which can give satisfactory services to the users. Pipelining began in the late

1970’s in Supercomputers as vector processors and array processors. This paper is about the pipeline computing and

its works.

1. Introduction:

Pipelining what is a technique to decompose a sequential process into sub operations. Each sub process will be

executed in a special dedicated segment that operates concurrently with all other segments. The registers give the

facility of isolation between each segment so that individual segment can operate on distinct data simultaneously. If

one wants to do a simple arithmetic work A x B + C = ? , the technique will provide the service as follows-

A B

C

Fig. 1.1: Example of Pipeline arithmetic data processing

Then the processor holds the value of A in register R1, B in R2 and C in R4 register. First multiply the values of A

and B and store the multiplied value in register R3. After that the processor takes the value of C from R4 register

and adds with the R3 register value with the help of adder. Finally the result of the problem will be hold in the R5

register. In pipelining, the processor works in a serial way, it doesn’t do the work in parallel way. The processes will

be done in a serial way. Here, there is a reservation table for store the pipeline processes. The reservation table

mainly displays the time space flow of data through the pipeline for a function [1]. Pipelining is a technique for

improving processing performance of a micro-processor. This architecture allows the concurrent execution of

several instructions. For achieving pipelining, a task is subdivided into a sequence of subtasks and each of which can

be executed by a specialized hardware stage that operates concurrently with other stages in the pipeline.

2. Types of Pipelining: There are two types of pipeline computing. Such as Linear and Nonlinear (Dynamic Pipelining). Each has its own

characteristics and functions to do task. They use a reservation table (for Linear) or a set of reservation tables (for

Dynamic). A reservation table for a linear pipeline computing can be generated easily because data flow follows a

linear stream. But Dynamic pipeline or non-linear pipeline uses a non-linear pattern, is followed so multiple

reservation tables can be generated for different functions. The reservation table generally displays the time space

flow of data through the pipeline for a function. Different functions in a reservation table follow different paths.

R1 R2

R4

Multiplier

R3

Adder

R5

Figure: Types of Pipeline Computing

Linear pipelining processor is a series of processing stages and memory access. It uses only one reservation table for

accessing processes without thinking the result of the processes. Reservation table is used for the data or process

flow. But in the case of Nonlinear pipelining, it uses several reservation table. And it uses a table when it needs that

table to do a process. Tables are always there to help the nonlinear technique [2].

3. Computer-related pipelines are:

- Instruction pipelines: such as the classic RISC pipeline which are used in central processing units (CPUs)

to allow overlapping execution of multiple instructions with the same circuitry. The circuitry is usually

divided up into stages, including instruction decoding, arithmetic, and registers fetching stages, where in

each stage processes one instruction at a time.

- Graphics pipelines: found in most graphics processing units (GPUs), which consist of multiple arithmetic

units, or complete CPUs, that implement the various stages of common rendering operations (perspective

projection, window clipping, color and light calculation, rendering, etc.).

- Software pipelines: where commands can be written where the output of one operation is automatically

fed to the next, following operation. The UNIX system call pipe is a classic example of this concept,

although other operating systems do support pipes as well.

- HTTP pipelining: where multiple requests are sent without waiting for the result of the first request [3].

4.1. Instruction Pipelining:

- First stage fetches the instruction and buffers it.

- When the second stage is free, the first stage passes it the buffered instruction.

- While the second stage is executing the instruction, the first stage takes advantages of any unused

memory cycles to fetch and buffer the next instruction.

- This is called instruction pre-fetch or fetches overlap.

4.2. Inefficiency in two stage instruction pipelining:

There are two reasons-

- The execution time will generally be longer than the fetch time. Thus the fetch stage may have to wait

for some time before it can empty the buffer.

- When conditional branch occurs, then the address of next instruction to be fetched becomes unknown.

Then the execution stage has to wait while the next instruction is fetched.

Fig-1: Two stage instruction pipelining

4.3. Use the Idea of Pipelining in a Computer:

Pipeline

computing

Linear Pipelining

Non-linear Pipelining

Fetch Execute Instruction Instruction Result

https://en.wikipedia.org/wiki/Instruction_pipeline

https://en.wikipedia.org/wiki/Graphics_pipeline

https://en.wikipedia.org/wiki/Graphics_processing_unit

https://en.wikipedia.org/wiki/Arithmetic_and_logical_unit

https://en.wikipedia.org/wiki/Arithmetic_and_logical_unit

https://en.wikipedia.org/wiki/Central_processing_unit

https://en.wikipedia.org/wiki/Perspective_projection

https://en.wikipedia.org/wiki/Perspective_projection

https://en.wikipedia.org/wiki/Color

https://en.wikipedia.org/wiki/Light

https://en.wikipedia.org/wiki/Pipeline_(software)

https://en.wikipedia.org/wiki/Pipeline_(Unix)

https://en.wikipedia.org/wiki/HTTP_pipelining

The speed of execution of programs is prejudiced by many factors and one way to get better performance is to use faster circuit technology to build the processor and the main memory. Another option is to arrange the hardware so that more than one operation can be performed at the same time. In this way, the number of operations performed

per second is increased even though the elapsed time needed to perform any one operation is not changed. We have encountered concurrent activities several times before. Chapter 1 in-traduced the concept of multiprogramming and explained how it is possible for I/O transfers and computational activities to proceed simultaneously. DMA devices make this possible because they can perform I/O transfers independently once these transfers are initiated by the

processor. Pipelining is a particularly effective way of organizing concurrent activity in a computer system. The basic idea is very simple. It is frequently encountered in manufacturing plants, where pipelining is commonly known as an assembly-line operation. Readers are undoubtedly familiar with the assembly line used in car manufacturing. The first station in an assembly line may prepare the chassis of a car, the next station adds the body, and the next one

installs the engine, and so on. While one group of workers is installing the engine on one car, another group is fitting a car body on the chassis of another car, and yet another group is preparing a new chassis for a third car. It may take days to complete work on a given car, but it is possible to have a new car rolling off the end of the assembly line every few minutes.

Fig-2:Building a car using unpielined Fig-3:Building a car using pielined

Consider how the idea of pipelining can be used in a computer. The processor executes a program by fetching and

executing instructions, one after the other. Let Fi and Ei refer to the fetch and execute steps for instruction. Execution

of a program consists of a sequence of fetch and execute steps, as shown in Figure 3a.Now consider a computer that

has two separate hardware units, one for fetching instructions and another for executing them, as shown in Figure

3b. The instruction fetched by the fetch unit is deposited in an intermediate storage buffer, B1. This buffer is needed

to enable the execution unit to execute the instruction while the fetch unit is fetching the next instruction. The results

of execution are deposited in the destination location specified by the instruction. For the purposes of this

discussion, we assume that both the source and the destination of the data operated on by the instructions are inside

the block labeled “Execution unit [4].

Fig 3: Basic idea of instruction pipelining [7]

4.4. Decomposition of instruction processing

- To gain further speedup, the pipeline have more stages(6 stages)

- Fetch instruction(FI)

- Decode instruction(DI)

- Calculate operands (i.e. EAs)(CO)

- Fetch operands(FO)

- Execute instructions(EI)

- Write operand(WO)

Fig: Six-stage CPU instruction pipeline [7]

Fig 4: A 4 stage pipeline [7]

Fetch Instruction (FI): Read the next expected instruction into a buffer

Decode Instruction (DI): Determine the Opcode and the operand specifiers.

Calculate Operands (CO): Calculate the effective address of each source operand.

Fetch Operands (FO): Fetch each operand from memory. Operands in registers need not be fetched.

Execute Instruction (EI) : Perform the indicated operation and store the result

Write Operand (WO): Store the result in memory.

4.5. Timing diagram for instruction pipeline operation:

Fig 5: Timing diagram for instruction pipeline operation [7]

4.6. High efficiency of instruction pipelining:

- Assume all the below in diagram

- All stages will be of equal duration.

- Each instruction goes through all the six stages of the pipeline.

- All the stages can be performed parallel.

- No memory conflicts.

- All the accesses occur simultaneously.

In the previous diagram the instruction pipelining works very efficiently and give high performance

4.7. Limits to performance enhancement:

The factors affecting the performance are

1. If six stages are not of equal duration, then there will be some waiting time at various stages.

2. Conditional branch instruction which can invalidate several instructions fetches.

3. Interrupt which is unpredictable event.

4. Register and memory conflicts.

5. CO stage may depend on the contents of a register that could be altered by a previous instruction that is still

in pipeline.

4.8. Effect of conditional branch on instruction pipeline operation:

Fig 6: Effect of conditional branch on instruction pipeline operation [7]

5. Implementations:

Buffered, synchronous pipelines:

Conventional microprocessors are synchronous circuits that use buffered, synchronous pipelines. In these

pipelines, "pipeline registers" are inserted in-between pipeline stages, and are clocked synchronously. The

time between each clock signal is set to be greater than the longest delay between pipeline stages, so that

when the registers are clocked, the data that is written to them is the final result of the previous stage.

Buffered, asynchronous pipelines

Asynchronous pipelines are used in asynchronous circuits, and have their pipeline registers clocked

asynchronously. Generally speaking, they use a request/acknowledge system, where in each stage can

detect when it's "finished". When the stage, Si, is ready to transmit, it sends a ready signal to stage Si+1.

After stage Si+1, receives the incoming data, it returns an acknowledgement signal to Si. The AMULET

microprocessor is an example of a microprocessor that uses buffered, asynchronous pipelines.

Unbuffered pipelines:

Unbuffered pipelines, called "wave pipelines", do not have registers in-between pipeline stages. Instead, the

delays in the pipeline are "balanced" so that, for each stage, the difference between the first stabilized

output data and the last is minimized. Thus, data flows in "waves" through the pipeline, and each wave is

kept as short (synchronous) as possible. The maximum rate that data can be fed into a wave pipeline is

determined by the maximum difference in delay between the first piece of data coming out of the pipe and

the last piece of data, for any given wave. If data is fed in faster than this, it is possible for waves of data to

interfere with each other [3].

6. Applications of Pipeline Computing:

6.1. RISC: A computer uses some instructions with simple constructs so they can execute faster within the CPU,

often without memory. This technique simply known as Reduced Instruction Set Computer (RISC). RISC uses

https://en.wikipedia.org/wiki/Synchronous_circuit

https://en.wikipedia.org/wiki/Hardware_register

https://en.wikipedia.org/wiki/Clock_signal

https://en.wikipedia.org/wiki/Asynchronous_circuit

https://en.wikipedia.org/wiki/AMULET_microprocessor

https://en.wikipedia.org/wiki/AMULET_microprocessor

fewer instructions set but works faster than others. Fewer instruction set reduces its execution cycle. RISC is a CPU

design based on a simplified instruction set provides higher performance when combined with a microprocessor

architecture capable of executing those instructions using fewer microprocessor cycles per instruction. Its major

characteristics are-

- Mostly few instructions,

- Relatively less addressing modes,

- Memory access limited to load and store instructions,

- Operations done within the registers of the CPU,

- Single-cycle instruction execution.

RISC uses pipelining to do its execution and other tasks. The RiSC-16 is an 8-register, 16-bit computer. All

addresses are short word-addresses. The clock speeds are increased while the amounts of logic between successive

latches are decreased. If the full task is sliced up into smaller sub-tasks, the clock can run as fast as the largest sub-

task. A pipeline theoretically of n stages must run with a clock that is n times faster than any sequential

implementation. This theoretical limit is never reached because of latch overhead and sub-task of unequal length.

Sometimes the clock rate may extremely fast. The process of slicing up the instruction execution is called pipelining.

And this process is in every aspect of modern computer design; from processor core to DRAM sub-system, to

overlapping of transactions on memory and I/O buses etc [9].

6.2. Graphical Pipeline Viewer:

Figure: Graphical Pipeline Viewer

Figure shows as a Graphical Pipeline Viewer. Here an architectural simulator is used to produce a pipetrace stream

which contains a detailed description of the instruction flow through the machine and document the movement of

instructions in the pipeline from start to end. Pipetrace stream denotes various other events and stage transitions that

occur during an instruction’s lifetime. Generally, Pipetrace stream which comes from the architectural simulator can

be sent directly into GPV (Graphical Pipeline Viewer) or buffered in a file for further analysis or processes. GPV

digests this information or data and produces a graphical representation of the data. The graph generated by the GPV

plots instructions in program order, showing the lifetime of an instruction what operation it was performing. The

data or information which is the need of machine in a vision form always chooses pipelining computing. On a TV

screen or computer monitor, people see images one by one. After viewing one image people can see another because

of pipelining. Next data have to wait for the previous one completed. Architectural Simulator controls the flow of

information to turn into vision format. After the pipelining process, any kind of display device can display the data

in a vision format. Here is an example of CRT monitor and it’s inside elements.

Architectural

Simulator

Text

file

GPV

PERL TK

Pipetrace stream

Screen

https://en.wikipedia.org/wiki/CPU_design

https://en.wikipedia.org/wiki/CPU_design

https://en.wikipedia.org/wiki/Instruction_set

https://en.wikipedia.org/wiki/Microarchitecture

https://en.wikipedia.org/wiki/Microarchitecture

https://en.wikipedia.org/wiki/Cycles_per_instruction

Shadowmask

Phosphors

Phosphor screen

Electron guns

BlueGreen

Red

Fig.: Shadow masking in CRT

CRT monitor has 3types of electron guns, control electrodes, focusing electrodes, horizontal deflection plates,

vertical deflection plates, shadow mask and a phosphor screen. Electron guns shoot electrons; control electrode,

focusing electrodes, horizontal deflection plates, vertical deflection plates, shadow mask are used to shoot the

electrons on exact point. Shadow mask also filters the electrons. While electrons hit the phosphor screen, it glows

and shows different dot colors and the dot color make a full image. RGB color coding is a common form of color

coding. Using this technique display devices and other devices can frequently use color (how much and where the

machine needs) [10].

6.3. Pipeline and Unpipeline in CPU:

Pipelining is a concept used in many fields, as well as in CPU architectures, to speed-up a job. The principle that lies

behind it is basically the following. Consider that a job J is completed by a sequential system in a time T. In this case

the latency is T and the throughput is Tseq=1/T. obviously, given the nature of the system, N jobs of the same kind

of J are completed in a time N*T. It is likely though, that they can be divided in smaller stages and each stage be

executed by a sub-system. Consider the following figure with J divided in 4 parts.

In this scenario the system doesn't need to be strictly sequential. Indeed, when JobA (in the following figure) is in

the first stage just the sub-system in charge of executing J1 is busy. Consequently, is possible to execute other jobs

in the same clock cycle. In fact, as in the example below, four jobs are executed in parallel in the fourth clock.

Ideally, if K is the number of stages, the throughput of a pipelined system is Tpip=K/T, hence K times bigger

than Tseq [11}.

7. Advantages and Limitations of Pipelining:

Pipelining does not help in all cases. There are several possible limitations. An instruction pipeline is said to be fully

pipelined if it can accept a new instruction every clock cycle. A pipeline that is not fully pipelined has wait cycles

that delay the progress of the pipeline.

7.1. Advantages of Pipelining:

- The cycle time of the processor is reduced, thus increasing instruction issue-rate in most cases.

- Some combinational circuits such as adders or multipliers can be made faster by adding more

circuitry. If pipelining is used instead, it can save circuitry vs. a more complex combinational

circuit.

- Pipeline are most convenient ,efficient and economical mode of transporting liquids like

petroleum, petroleum products, nature gas, water, milk etc.

- pipelines have relieved the increasing pressure on the exiting surface transport system(railways

and roadways)

- Pipeline is a safe and reliable mode of transport system.

- It is an economical and dependable mode of transport system particularly to the sensitive and

strategic area.

- In case of underground pipelines, the land in which pipeline is laid can still be used for

agricultural use.

7.2. Limitations of Pipelining:

- First is its complexity and second is the inability to constantly run the pipeline at full speed.

- A non-pipelined processor executes only a single instruction at a time. This prevents branch delays

(in effect, every branch is delayed) and problems with serial instructions being executed

concurrently. Consequently the design is simpler and cheaper to manufacture.

- The instruction latency in a non-pipelined processor is slightly lower than in a pipelined

equivalent. This is due to the fact that extra flip flops must be added to the data path of a pipelined

processor.

- A non-pipelined processor will have a stable instruction bandwidth. The performance of a

pipelined processor is much harder to predict and may vary more widely between different

programs [12}.

Contribution:

S. M. Risalat Hasan Chowdhury

Introduction, Types of Pipelining, Reservation Table, Application (RISC-

16, Graphical Pipeline Viewer, CRT monitor basic).

Sohan Khan

Instruction Pipelining, Inefficiency in two stage instruction pipelining, Use

the Idea of Pipelining in a Computer, Decomposition of instruction

processing, Timing diagram for instruction pipeline operation, High

efficiency of instruction pipelining, Limits to performance enhancement,

Effect of conditional branch on instruction pipeline operation.

Umma Habiba

Computer-related pipelines, Implementations, Pipeline and Unpipeline in

CPU, Advantages and Limitations of Pipelining.

References:

1. https://app.box.com/shared/i3uj6z2y78

2. Performance Evaluation of Nonlinear Pipeline through UML by Dr. Vipin Saxena and Manish Shrivastava.

3. Graphics pipeline. (n.d.). Computer Desktop Encyclopedia. Retrieved December 13, 2005,

4. www.mhhe.com/engcs/electrical/hamacher/5e/.../ch08_453-510.pdf

5. A.Bright,J.Fritts,andM.Gschwind. Decoupledfetch-executeengine withstaticbranchpredictionsupport.

Technicalreport,IBMResearch Report RC23261, IBM Research Division, 1999.

6. https://www.cs.auckland.ac.nz/~jmor159/363/html/pipelines.html

7. http://www.slideshare.net/siddiqueibrahim37/pipelining-41608675

8. The SPARC Architecture Manual, Version 9, D. Weaver and T. Germond, ed., PTR Prentice Hall,

Englewood Cliffs, New Jersey, 1994.

9. ENEE 446: Digital Computer Design — The Pipelined RiSC-16 by Prof. Bruce Jacob.Performance

Analysis Using Pipeline Visualization by Chris Weaver, Kenneth C. Barr, Eric Marsman, Dan Ernst, and

Todd Austin.

10. Performance Analysis Using Pipeline Visualization by Chris Weaver and others; Advanced Computer

Architecture Laboratory University of Michigan; Ann Arbor MI 48104

11. https://www.quora.com/Processor-Architecture-In-a-CPU-what-is-the-benefit-of-having-many-pipeline-

stages

12. https://www.ukessays.com/essays/information-technology/pipelining-and-superscalar-architecture-

information-technology-essay.php

https://app.box.com/shared/i3uj6z2y78

http://www.mhhe.com/engcs/electrical/hamacher/5e/.../ch08_453-510.pdf

https://www.cs.auckland.ac.nz/~jmor159/363/html/pipelines.html

http://www.slideshare.net/siddiqueibrahim37/pipelining-41608675

https://www.quora.com/Processor-Architecture-In-a-CPU-what-is-the-benefit-of-having-many-pipeline-stages

https://www.quora.com/Processor-Architecture-In-a-CPU-what-is-the-benefit-of-having-many-pipeline-stages

https://www.ukessays.com/essays/information-technology/pipelining-and-superscalar-architecture-information-technology-essay.php

https://www.ukessays.com/essays/information-technology/pipelining-and-superscalar-architecture-information-technology-essay.php

pipeline computing by s. m. risalat hasan chowdhury

Education