instruction issue logic for high- performance interruptible pipelined processors gurinder s. sohi...
TRANSCRIPT
Instruction Issue Logic for High-Performance Interruptible Pipelined
Processors
Gurinder S. SohiProfessor
UW-Madison Computer Architecture Group
University of Wisconsin-Madison
Sriram VajapeyamReal-Time Collaboration space
at Oracle, Bangalore, India
What is this about? The performance of pipelined processors is
severely limited by data dependencies and branch instructions.
Another major problem that arises in pipelined computer design is that an interrupt can be imprecise.
Both of these causes performance degradation.
A hardware solution is offered in this paper.
Problems and previous solutions
Data Dependency Code scheduling Waiting or Reservation stations
Branch Instructions Delayed branching Branch Prediction
Imprecise Interrupts Reorder buffer Reorder buffer with bypass logic
Same instruction set as the scalar unit of the CRAY-I Several functional units connected to a common result
bus Instruction Fetch Unit Decode and Issue Unit 144 registers
Basic Architecture
Tomasulo’s Algorithm First presented for the floating-point unit of the IBM
360/91. Extension of this algorithm for the scalar unit of the
CRAY-I is presented later. Algorithm:
Instruction whose operands are not available is forwarded to a Reservation stations (RS).
It waits in the RS until its operands are available. it is dispatched to the appropriate functional unit
register is assigned a bit that determines if the register is busy (it is the destination of an instruction).
Busy register is assigned a tag which represents the result to be stored in the register.
Tomasulo’s Algorithm (Contd...)
Fields in Reservation Station
Disadvantage: High cost of hardware for
register tagging and its associative comparison hardware.
Extension to Tomasulo’s Algorithm
A Separate Tag Unit Because only few sink registers (busy registers) are active. All tags from active registers are consolidated into Tag Unit Register retains the busy bit
Algorithm: At instruction issue time, if a source register is busy, the
TU is queried for the current tag of the appropriate register and the tag is forwarded to the reservation stations.
If the destination register not busy obtaining tag is straightforward.
If it is busy a new tag is obtained. Latest Field is used to keep the register busy even after
the old instruction is executed. If the TU is full instruction issue is stopped.
Fields in Reservation Station
Extension to Tomasulo’s Algorithm (contd…)
Other Extensions Merging Reservation Stations into RS
pool (Disadvantage: only one instruction can be issued at a time! NO)
Merging RS pool with Tag Unit. To make RS Tag Unit (RSTU)
Fields in RSTU
Implementation of Precise interrupts
Reorder Buffer: It allows instructions to finish execution out of order but updates registers, memory, etc. in the order that the instructions were present in the program. So it assures that a precise state of the machine is recoverable at any time.
Bypass Logic: An instruction does not have to wait for the reorder buffer to update a source register, it can fetch the value from the reorder buffer (if it is available) and can issue.
MERGING DEPENDENCY RESOLUTION AND PRECISE
INTERRUPTS
RSTU can be made to behave like a reorder buffer if it is forced to update the state of the machine in the order that the instructions are encountered by making it a queue.
Modified unit is called Register Update Unit (RUU). It
(i) determines which instruction should be issued to the functional units for execution, reserves the result bus and dispatches the instruction to the functional unit,
(ii) determines which instruction can commit, i.e., update the state of the machine,
(iii) monitors the result bus to resolve dependencies and
(iv) provides tags to and accepts new instructions from the decode and issue unit.
Fields in RUU
Merging … (Contd…) Destination Field
In the RSTU the issue logic needed to search the TU to obtain the correct tag for the source operand and to update the latest copy field for the destination
Here we use a counter to instead of multiple copies of a destination
2 n-bit counters - Number of Instances (NI) and Latest instance (LI)
When an instruction that writes into destination is issued to the RUU, both NI and LI are incremented. LI incremented modulo n.
When such instruction leaves the associated NI is decremented. Register tag consists of the register number appended with the
LI counter.
Merging … (Contd…)
Bypass Logic in the RUU case that bypass logic might be helpful is when Ij has
completed execution but has not committed when Ii is issued to the RUU (Ii is issued after Ij)
To provide bypass logic for this case, the monitoring capabilities of the reservation stations are extended to monitor both the result bus and the RUU to register bus.
SIMULATION Simulation Results
The benchmark programs used were the Lawrence Livermore loops
Large sized RUU is needed to achieve a performance improvement.
RUU of size 10 has same hardware requirements as an architecture that has reservation station with each of the functional unit.
BRANCH PREDICTION AND CONDITIONAL INSTRUCTIONS
To allow conditional execution of instructions, a hardware mechanism is needed that would allow the machine to recover from an incorrect branch prediction.
RUU provides a method for nullifying instructions, as for the interrupts.
Conclusions combined the issues of hardware dependency-resolution
and implementation of precise interrupts. A scheme to resolve dependencies and allowing the out-
order-execution is devised with low hardware cost. It is incorporated with precise interrupts. This incorporation made each issue simpler than before. Results of performance evaluation are quite
encouraging. This mechanism can be easily extended to support
conditional execution of instructions from a predicted path.