Forwarding

The problem with data hazards, introduced by this sequence of instructions can be solved with a simple hardware technique called forwarding.
 

    1 2 3 4 5 6 7
ADD R1, R2, R3 IF ID EX MEM WB    
SUB R4, R5, R1   IF IDsub EX MEM WB  
AND R6, R1, R7     IF IDand EX MEM WB

The key insight in forwarding is that the result is not really needed by SUB until after the ADD actually produces it. The only problem is to make it available for SUB when it needs it.

If the result can be moved from where the ADD produces it (EX/MEM register), to where the SUB needs it (ALU input latch), then the need for a stall can be avoided.
Using this observation , forwarding works as follows:

The ALU result from the EX/MEM register is always fed back to the ALU input latches.
If the forwarding hardware detects that the previous ALU operation has written the register corresponding to the source for the current ALU operation, control logic selects the forwarded result as the ALU input rather than the value read from the register file.
Forwarding of results to the ALU requires the additional of three extra inputs on each ALU multiplexer and the addtion of three paths to the new inputs.


The paths correspond to a forwarding of:
(a) the ALU output at the end of EX,
(b) the ALU output at the end of MEM, and
(c) the memory output at the end of MEM.

Without forwarding our example will execute correctly with stalls:
 

    1 2 3 4 5 6 7 8 9
ADD R1, R2, R3 IF ID EX MEM WB        
SUB R4, R5, R1   IF stall stall IDsub EX MEM  WB  
AND R6, R1, R7     stall stall IF IDand EX MEM  WB

As our example shows, we need to forward results not only from the immediately previous instruction, but possibly from an instruction that started three cycles earlier. Forwarding can be arranged from MEM/WB latch to ALU input also.  Using those forwarding paths the code sequence can be executed without stalls:
 

    1 2 3 4 5 6 7
ADD R1, R2, R3 IF ID EXadd MEMadd WB    
SUB R4, R5, R1   IF ID EXsub MEM WB  
AND R6, R1, R7     IF ID EXand MEM WB

The first forwarding is for value of R1 from EXadd to EXsub .
The second forwarding is also for value of R1 from MEMadd to EXand.
This code now can be executed without stalls.

Forwarding can be generalized to include passing the result directly to the functional unit that requires it: a result is forwarded from the output of one unit to the input of another, rather than just from the result of a unit to the input of the same unit.
 
One more Example
To prevent a stall in this example, we would need to forward the values of R1 and R4 from the pipeline registers to the ALU and data memory inputs.
 

    1 2 3 4 5 6 7
ADD R1, R2, R3 IF ID EXadd MEMadd WB    
LW R4, d (R1)   IF ID EXlw MEMlw WB  
SW R4,12(R1)     IF ID EXsw MEMsw WB

Stores require an operand during MEM, and forwarding of that operand is shown here.
The first forwarding is for value of R1 from EXadd to EXlw .
The second forwarding is also for value of R1 from MEMadd to EXsw.
The third forwarding is for value of R4 from MEMlw to MEMsw.
Observe that the SW instruction is storing the value of R4 into a memory location computed by adding the displacement 12 to the value contained in register R1. This effective address computation is done in the ALU during the EX stage of the SW instruction. The value to be stored (R4 in this case) is needed only in the MEM stage as an input to Data Memory. Thus the value of R1 is forwarded to the EX stage for effective address computation and is needed earlier in time than the value of R4 which is forwarded to the input of Data Memory in the MEM stage.
So forwarding takes place from "left-to-right" in time, but operands are not ALWAYS forwarded to the EX stage - it depends on the instruction and the point in the Datapath where the operand is needed. Of course, hardware support is necessary to support data forwarding.