We examined compiler techniques for scheduling the instructions so as to separate dependent instructions and minimize the number of actual hazards and resultant stalls. This approach called static scheduling became popular with pipelining.
Another approach, that earlier processors used, is called dynamic scheduling, where the hardware rearranges the instruction execution to reduce the stalls. Dynamic scheduling offers several advantages:
It enables handling some cases when dependencies are unknown at compile time (e.g., because they may involve a memory reference);These advantages are gained at a cost of a significant increase in hardware complexity!
It simplifies the compiler;
It allows code that was compiled with one pipeline in mind to run efficiently on a different pipeline.
Dynamic scheduling: The Idea
A major limitation of the pipelining techniques is that they use in-order
instruction issue: if an instruction is stalled
in the pipeline, no later instructions can proceed. Thus,
if there is a dependency between two closely spaced instructions in the
pipeline, it will stall. For example:
|DIVD||F0, F2, F4|
|ADDD||F10, F0, F8|
|SUBD||F12, F8, F14|
To allow SUBD to begin executing, we must separate the instruction issue process into two parts: checking the structural hazards and waiting for the absence of a data hazard. We can still check for structural hazards when we issue the instruction; thus, we still use in order instruction issue. However, we want the instructions to begin execution as soon as their data operands are available. Thus, the pipeline will do out-of-order execution, which implies out-of-order completion.
In introducing out-of-order execution, we have essentially split the ID pipe stage into two stages:
Issue - Decode instructions, check for structural hazards;An instruction fetch proceeds with the issue stage and may fetch either into a single-entry latch or into a queue; instructions are then issued from the latch or queue. The EX stage follows the read operands stage, just as in the DLX pipeline. As in the DLX floating-point pipeline, execution may take multiple cycles, depending on the operation. Thus, we may need to distinguish when an instruction begins execution and when it completes execution; between the two times, the instruction is in execution. This allows multiple instructions to be in execution at the same time.
Read operands - Wait until no data hazards, then read operands.
Scoreboarding is a technique for allowing instructions to execute out of order when there are sufficient resources and no data dependencies; it is named after the CDC 6600 scoreboard, which developed this capability.
The goal of a scoreboard is to maintain an execution rate of one instruction per clock cycle (when there are no structural hazards) by executing an instruction as early as possible. Thus, when the next instruction to execute is stalled, other instructions can be issued and executed if they do not depend on any active or stalled instruction. The scoreboard takes full responsibility for instruction issue and execution, including all hazard detection.
Every instruction goes through the scoreboard, where a record of the data dependencies is constructed; this step corresponds to instruction issue and replaces part of the ID step in the DLX pipeline. The scoreboard then determines when the instruction can read its operands and begin execution.
Tomasulo Approach is another scheme to allow execution to proceed in the presence of hazards developed by the IBM 360/91 floating-point unit. This scheme combines key elements of the scoreboarding scheme with the introduction of register renaming.
In the loop unrolling section we showed how a compiler could rename registers to avoid WAW and WAR hazards. In Tomasulo's scheme this functionality is provided by the reservation stations, which buffer the operands of instructions waiting to issue, and by issue logic.
The basic idea is that a reservation station fetches and buffers an operand as soon as it is available, eliminating the need to get the operand from a register. In addition, pending instructions designate the reservation station that will provide their input. Finally, when successive writes to a register appear, only the last one is actually used to update the register.
As instructions are issued, the register specifiers for pending operands are renamed to the names of the reservation station in a process called register renaming. This combination of issue logic and reservation stations provides renaming and eliminates WAW and WAR hazards.
This additional capability is the major conceptual difference between scoreboarding and Tomasulo's algorithm. Since there can be more reservation stations than real registers, the technique can eliminate hazards that could not be eliminated by a compiler.