The Basic Pipeline for DLX

We can pipeline the DLX datapath with almost no changes by starting a new instruction on each clock cycle. Each of the clock cycles of the DLX datapath now becomes a pipe stage: a cycle in the pipeline.
While each instruction takes five clock cycles to complete, during each clock cycle the hardware will initiate a new instruction and will execute some part of the five different instructions. The typical way to show what is going on is:
Instr Num 1 2 3 4 5 6 7 8 9
instr i IF ID EX MEM WB        
instr i+1   IF ID EX MEM WB      
instr i+2     IF ID EX MEM WB    
instr i+3       IF ID EX MEM WB  
instr i+4         IF ID EX MEM WB

Let's check again what happens on every clock cycle of the machine and make sure it does not perform two different operations with the same datapath resource on the same clock cycle. For example, a single ALU can not compute an effective address and perform a subtract operation at the same time.
Fortunately, the simplicity of the DLX instruction set makes resource evaluation relatively easy. The major functional units are used in different cycles and hence overlapping the execution of multiple instructions introduces relatively few conflicts.


There are three observations on which this fact rests:

The basic datapath uses separate instruction and data memories. This eliminates a conflict for a single memory that would arise between instruction fetch and data memory access.
The register file is used in two stages : for reading in ID and for writing in WB. This does mean that we need to perform two reads and one write on every clock cycle. Question for you: What if a read and write are to the same register?
To start a new instruction every clock, we must increment and store the PC every clock , and this must be done during the IF stage in preparation for the next instruction. The problem arises when we conside the effect of branches, which change the PC also , but not until the MEM stage.

Pipelining the datapath requires that values passed from one pipe stage to the next must be placed in registers, called pipeline registers.

Pipelined Datapath :

All registers needed to hold values temporarily between clock cycles within one instruction are subsumed into these pipeline registers. The pipeline registers carry both data and control from one pipeline stage to the next.

Any instruction is active in exactly one stage of the pipeline at a time; therefore, any action taken on behalf of an instruction occurs between a pair of pipeline registers. The following table shows what happens in any pipeline stage depending on the instruction type.
 
Stage Any Instruction
IF IF/Id.IR <- Mem[PC]; 
IF/ID.NPC, PC <- (if EX/MEM.cond {EX/MEM.NPC} else {PC+4});
ID ID/EX.A <- Regs[IF/ID.IR6..10]; 
ID/EX.B <- Regs[IF/ID.IR11..15]; 
ID/EX.NPC <- IF/ID.NPC; 
ID/EX.IR <- IF/ID.IR; 
ID/EX.Imm <- (IR16)16##IR16..31
 
  ALU instruction Load or Store instruction Branch instruction
EX EX/MEM.IR <- ID/EX.IR;
EX/MEM.ALUoutput <- 
ID/EX.A op ID/EX.B;
or
EX/MEM.ALUoutput <-
ID/EX.A op ID/EX.Imm;
EX/MEM.cond <-0;
EX/MEM.IR <- ID/EX.IR;
EX/MEM.ALUoutput <-
ID/EX.A + ID/EX.Imm;
 

EX/MEM.cond <- 0;
EX/MEM.B <- ID/EX.B

EX/MEM.ALUoutput <-
ID/EX.NPC + ID/EX.Imm;
 
 

EX/MEM.cond <-
(ID/EX.A op 0;

MEM MEM/WB.IR <- EX/MEM.IR;
MEM/WB.ALUoutput <-
EX/MEM.ALUoutput;
MEM/WB.IR <- EX/MEM.IR;
MEM/WB.LMD <-
Mem[EX/MEM.ALUoutput];
or
Mem[EX/MEM.ALUoutput]<-
EX/MEM.B;
 
WB Regs[MEM/WB.IR16..20]<-
MEM/WB.ALUoutput;
or
Regs[MEM/WB.IR11..15] <-
MEM/WB.ALUoutput;
Regs[MEM/WB.IR11..15] <- MEM/WB.LMD;