Pipeline Hazards

There are situations, called hazards, that prevent the next instruction in the instruction stream from being executing during its designated clock cycle. Hazards reduce the performance from the ideal speedup gained by pipelining.

There are three classes of hazards:

Hazards in pipelines can make it necessary to stall the pipeline.  The processor can stall on different events:
A cache miss. A cache miss stalls all the instructions on pipeline both before and after the instruction causing the miss.
A hazard in pipeline. Eliminating a hazard often requires that some instructions in the pipeline to be allowed to proceed while others are delayed. When the instruction is stalled, all the instructions issued later than the stalled instruction are also stalled. Instructions issued earlier than the stalled instruction must continue, since otherwise the hazard will never clear.
A hazard causes pipeline bubbles to be inserted.The following table shows how the stalls are actually implemented. As a result, no new instructions are fetched during clock cycle 4, no instruction will finish during clock cycle 8.
 In case of structural hazards:
Clock cycle number
Instr 1 2 3 4 5 6 7 8 9 10
Instr i IF ID EX MEM WB          
Instr i+1   IF ID EX MEM WB        
Instr i+2     IF ID EX MEM WB      
Stall       bubble bubble bubble bubble bubble    
Instr i+3         IF ID EX MEM WB  
Instr i+4           IF ID EX MEM WB
To simplify the picture it is also commonly shown like this:
Clock cycle number
Instr 1 2 3 4 5 6 7 8 9 10
Instr i IF ID EX MEM WB          
Instr i+1   IF ID EX MEM WB        
Instr i+2     IF  ID EX MEM WB      
Instr i+3       stall IF ID EX MEM WB  
Instr i+4           IF ID EX MEM WB

 In case of data hazards:
Clock cycle number
Instr 1 2 3 4 5 6 7 8 9 10
Instr i IF ID EX MEM WB          
Instr i+1   IF ID bubble EX MEM WB      
Instr i+2     IF bubble ID EX MEM WB    
Instr i+3       bubble IF ID EX MEM WB  
Instr i+4           IF ID EX MEM WB
which appears the same with stalls:
Clock cycle number
Instr 1 2 3 4 5 6 7 8 9 10
Instr i IF ID EX MEM WB          
Instr i+1   IF ID stall EX MEM WB      
Instr i+2     IF  stall ID EX MEM WB    
Instr i+3       stall IF ID EX MEM WB  
Instr i+4           IF ID EX MEM WB

Performance of Pipelines with Stalls

A stall causes the pipeline performance to degrade the ideal performance.
                                                          Average instruction time unpipelined
Speedup from pipelining   =        ----------------------------------------
                                                        Average instruction time pipelined
                                                     CPI unpipelined * Clock Cycle Time unpipelined
                                                 = -------------------------------------
                                                      CPI pipelined * Clock Cycle Time pipelined

The ideal CPI on a pipelined machine is almost always 1. Hence, the pipelined CPI is

CPIpipelined = Ideal CPI + Pipeline stall clock cycles per instruction
= 1 + Pipeline stall clock cycles per instruction

If we ignore the cycle time overhead of pipelining and assume the stages are all perfectly balanced, then the cycle time of the two machines are equal and

                    CPI unpipelined
Speedup = ----------------------------
                    1+ Pipeline stall cycles per instruction
 

If all instructions take the same number of cycles, which must also equal the number of pipeline stages ( the depth of the pipeline) then unpipelined CPI is equal to the depth of the pipeline, leading to

                         Pipeline depth
Speedup = --------------------------
                       1 + Pipeline stall cycles per instruction

If there are no pipeline stalls, this leads to the intuitive result that pipelining can improve performance by the depth of pipeline.