Basic Performance Issues in Pipelining

Pipelining increases the CPU instruction throughput - the number of instructions completed per unit of time. But it does not reduce the execution time of an individual instruction. In fact, it usually slightly increases the execution time of each instruction due to overhead in the pipeline control.
The increase in instruction throughput means that a program runs faster and has lower total execution time.

Limitations on practical depth of a pipeline arise from:

Pipeline latency. The fact that the execution time of each instruction does not decrease puts limitations on pipeline  depth;
Imbalance among pipeline stages. Imbalance among the pipe stages reduces performance since the clock can run no faster than the time needed for the slowest pipeline stage;
Pipeline overhead. Pipeline overhead arises from the combination of pipeline register delay (setup time plus propagation delay) and clock skew.
Once the clock cycle is as small as the sum of the clock skew and latch overhead, no further pipelining is useful, since there is no time left in the cycle for useful work.
 

Simple example

Consider a nonpipelined machine with 6 execution stages of lengths 50 ns, 50 ns, 60 ns, 60 ns, 50 ns, and 50 ns.
-  Find the instruction latency on this machine.
      -  How much time does it take to execute 100 instructions?

Solution:

Instruction latency = 50+50+60+60+50+50= 320 ns
Time to execute 100 instructions = 100*320 = 32000 ns

Suppose we introduce pipelining on this machine. Assume that when introducing pipelining, the clock skew adds 5ns of overhead to each execution stage.

      - What is the instruction latency on the pipelined machine?
      - How much time does it take to execute 100 instructions?

Solution:
Remember that in the pipelined implementation, the length of the pipe stages must all be the same, i.e., the speed of the slowest stage plus overhead. With 5ns overhead it comes to:

The length of pipelined stage = MAX(lengths of unpipelined stages) + overhead = 60 + 5 = 65 ns
Instruction latency 65 ns
Time to execute 100 instructions = 65*6*1 + 65*1*99  = 390 + 6435 = 6825 ns

- What is the speedup obtained from pipelining?
 
Solution:
Speedup is the ratio of the average instruction time without pipelining to the average instruction time with pipelining.
(here we do not consider any stalls introduced by different types of hazards which we will look at in the next section)

Average instruction time not pipelined = 320 ns
Average instruction time pipelined = 65 ns
Speedup for 100 instructions = 32000 / 6825 = 4.69