instruction issue logic for high- performance interruptible pipelined processors gurinder s. sohi...

Instruction Issue Logic for High-Performance Interruptible Pipelined

Processors

Gurinder S. SohiProfessor

UW-Madison Computer Architecture Group

University of Wisconsin-Madison

Sriram VajapeyamReal-Time Collaboration space

at Oracle, Bangalore, India

What is this about? The performance of pipelined processors is

severely limited by data dependencies and branch instructions.

Another major problem that arises in pipelined computer design is that an interrupt can be imprecise.

Both of these causes performance degradation.

A hardware solution is offered in this paper.

Problems and previous solutions

Data Dependency Code scheduling Waiting or Reservation stations

Branch Instructions Delayed branching Branch Prediction

Imprecise Interrupts Reorder buffer Reorder buffer with bypass logic

Same instruction set as the scalar unit of the CRAY-I Several functional units connected to a common result

bus Instruction Fetch Unit Decode and Issue Unit 144 registers

Basic Architecture

Tomasulo’s Algorithm First presented for the floating-point unit of the IBM

360/91. Extension of this algorithm for the scalar unit of the

CRAY-I is presented later. Algorithm:

Instruction whose operands are not available is forwarded to a Reservation stations (RS).

It waits in the RS until its operands are available. it is dispatched to the appropriate functional unit

register is assigned a bit that determines if the register is busy (it is the destination of an instruction).

Busy register is assigned a tag which represents the result to be stored in the register.

Tomasulo’s Algorithm (Contd...)

Fields in Reservation Station

Disadvantage: High cost of hardware for

register tagging and its associative comparison hardware.

Extension to Tomasulo’s Algorithm

A Separate Tag Unit Because only few sink registers (busy registers) are active. All tags from active registers are consolidated into Tag Unit Register retains the busy bit

Algorithm: At instruction issue time, if a source register is busy, the

TU is queried for the current tag of the appropriate register and the tag is forwarded to the reservation stations.

If the destination register not busy obtaining tag is straightforward.

If it is busy a new tag is obtained. Latest Field is used to keep the register busy even after

the old instruction is executed. If the TU is full instruction issue is stopped.

Fields in Reservation Station

Extension to Tomasulo’s Algorithm (contd…)

Other Extensions Merging Reservation Stations into RS

pool (Disadvantage: only one instruction can be issued at a time! NO)

Merging RS pool with Tag Unit. To make RS Tag Unit (RSTU)

Fields in RSTU

Implementation of Precise interrupts

Reorder Buffer: It allows instructions to finish execution out of order but updates registers, memory, etc. in the order that the instructions were present in the program. So it assures that a precise state of the machine is recoverable at any time.

Bypass Logic: An instruction does not have to wait for the reorder buffer to update a source register, it can fetch the value from the reorder buffer (if it is available) and can issue.

MERGING DEPENDENCY RESOLUTION AND PRECISE

INTERRUPTS

RSTU can be made to behave like a reorder buffer if it is forced to update the state of the machine in the order that the instructions are encountered by making it a queue.

Modified unit is called Register Update Unit (RUU). It

(i) determines which instruction should be issued to the functional units for execution, reserves the result bus and dispatches the instruction to the functional unit,

(ii) determines which instruction can commit, i.e., update the state of the machine,

(iii) monitors the result bus to resolve dependencies and

(iv) provides tags to and accepts new instructions from the decode and issue unit.

Fields in RUU

Merging … (Contd…) Destination Field

In the RSTU the issue logic needed to search the TU to obtain the correct tag for the source operand and to update the latest copy field for the destination

Here we use a counter to instead of multiple copies of a destination

2 n-bit counters - Number of Instances (NI) and Latest instance (LI)

When an instruction that writes into destination is issued to the RUU, both NI and LI are incremented. LI incremented modulo n.

When such instruction leaves the associated NI is decremented. Register tag consists of the register number appended with the

LI counter.

Merging … (Contd…)

Bypass Logic in the RUU case that bypass logic might be helpful is when Ij has

completed execution but has not committed when Ii is issued to the RUU (Ii is issued after Ij)

To provide bypass logic for this case, the monitoring capabilities of the reservation stations are extended to monitor both the result bus and the RUU to register bus.

SIMULATION Simulation Results

The benchmark programs used were the Lawrence Livermore loops

Large sized RUU is needed to achieve a performance improvement.

RUU of size 10 has same hardware requirements as an architecture that has reservation station with each of the functional unit.

BRANCH PREDICTION AND CONDITIONAL INSTRUCTIONS

To allow conditional execution of instructions, a hardware mechanism is needed that would allow the machine to recover from an incorrect branch prediction.

RUU provides a method for nullifying instructions, as for the interrupts.

Conclusions combined the issues of hardware dependency-resolution

and implementation of precise interrupts. A scheme to resolve dependencies and allowing the out-

order-execution is devised with low hardware cost. It is incorporated with precise interrupts. This incorporation made each issue simpler than before. Results of performance evaluation are quite

encouraging. This mechanism can be easily extended to support

conditional execution of instructions from a predicted path.

instruction issue logic for high- performance interruptible pipelined processors gurinder s. sohi...

Documents