Dynamic Scheduling Techniques

We examined compiler techniques for scheduling the instructions so as to separate dependent instructions and minimize the number of actual hazards and resultant stalls. This approach called static scheduling became popular with pipelining.

Another approach, that earlier processors used, is called dynamic scheduling, where the hardware rearranges the instruction execution to reduce the stalls. Dynamic scheduling offers several advantages:

It enables handling some cases when dependencies are unknown at compile time (e.g., because they may involve a memory reference);
It simplifies the compiler;
It allows code that was compiled with one pipeline in mind to run efficiently on a different pipeline.
These advantages are gained at a cost of a significant increase in hardware complexity!

Dynamic scheduling: The Idea

A major limitation of the pipelining techniques is that they use in-order instruction issue: if an instruction is stalled in the pipeline, no later instructions can proceed. Thus, if there is a dependency between two closely spaced instructions in the pipeline, it will stall. For example:
DIVD F0, F2, F4
ADDD F10, F0, F8
SUBD F12, F8,  F14
SUBD instruction cannot execute because the dependency of ADDD on DIVD causes the pipeline to stall; yet SUBD is not data dependent on anything in the pipeline. This is a performance limitation that can be eliminated by not requiring instructions to execute in order.

To allow SUBD to begin executing, we must separate the instruction issue process into two parts: checking the structural hazards and waiting for the absence of a data hazard. We can still check for structural hazards when we issue the instruction; thus, we still use in order instruction issue. However, we want the instructions to begin execution as soon as their data operands are available. Thus, the pipeline will do out-of-order execution, which implies out-of-order completion.

In introducing out-of-order execution, we have essentially split the ID pipe stage into two stages:

Issue - Decode instructions, check for structural hazards;
Read operands - Wait until no data hazards, then read operands.
An instruction fetch proceeds with the issue stage and may fetch either into a single-entry latch or into a queue; instructions are then issued from the latch or queue. The EX stage follows the read operands stage, just as in the DLX pipeline. As in the DLX floating-point pipeline, execution may take multiple cycles, depending on the operation. Thus, we may need to distinguish when an instruction begins execution and when it completes execution; between the two times, the instruction is in execution. This allows multiple instructions to be in execution at the same time.

Scoreboarding is a technique for allowing instructions to execute out of order when there are sufficient resources and no data dependencies; it is named after the CDC 6600 scoreboard, which developed this capability.

The goal of a scoreboard is to maintain an execution rate of one instruction per clock cycle (when there are no structural hazards) by executing an instruction as early as possible. Thus, when the next instruction to execute is stalled, other instructions can be issued and executed if they do not depend on any active or stalled instruction. The scoreboard takes full responsibility for instruction issue and execution, including all hazard detection.

Every instruction goes through the scoreboard, where a record of the data dependencies is constructed; this step corresponds to instruction issue and replaces part of the ID step in the DLX pipeline. The scoreboard then determines when the instruction can read its operands and begin execution.

Tomasulo Approach is another scheme to allow execution to proceed in the presence of hazards developed by the IBM 360/91 floating-point unit. This scheme combines key elements of the scoreboarding scheme with the introduction of register renaming.

In the loop unrolling section we showed how a compiler could rename registers to avoid WAW and WAR hazards. In Tomasulo's scheme this functionality is provided by the reservation stations, which buffer the operands of instructions waiting to issue, and by issue logic.

The basic idea is that a reservation station fetches and buffers an operand as soon as it is available, eliminating the need to get the operand from a register. In addition, pending instructions designate the reservation station that will provide their input. Finally, when successive writes to a register appear, only the last one is actually used to update the register.

As instructions are issued, the register specifiers for pending operands are renamed to the names of the reservation station in a process called register renaming. This combination of issue logic and reservation stations  provides renaming and eliminates WAW and WAR hazards.

This additional capability is the major conceptual difference between scoreboarding and Tomasulo's algorithm. Since there can be more reservation stations than real registers, the technique can eliminate hazards that could not be eliminated by a compiler.