MIPS Pipeline Stages
IF: Instruction FetchID: Instruction Decode & Register ReadEX: Execute operation or calculate addressMEM: Access operand in data memoryWB: Write back the result into a register
Pipeline Registers
Datapath
Store data used by the “current” instruction, and pass it down step by step. Here is what they store at the end of a cycle
IFStage,IF/IDstores- Instruction read from
InstructionMemory[PC](Split into register numbers and 16-bit offset to be sign extended) - PC+4
- Instruction read from
IDStage,ID/EXstores- Data read from register file
- 32-bit immediate value
- PC+4
- Write register number
EXStage,EX/MEMstores- PC+4 + Immediate 4
- ALU result
- isZero? signal
- Data read 2 from register file
- Write register number
MEMStage,MEM/WBstores- ALU result
- Memory read data
- Write register number
WBStage- Result is written back to register file if applicable
Control
Control signals are generated in the ID stage, and passed down through the pipeline registers: WB, MEM and EX control are all passed to ID/EX, then WB and MEM are passed to EX/MEM etc.

Performance Comparison
Single Cycle Processor (1 instr takes 1 cycle)
- Cycle time is the time taken by the slowest instruction:
- : Time for operation in stage (e.g.
IF,IDstage) - : Number of stages
- : Time for operation in stage (e.g.
- Execution time for instructions:
Multi Cycle Processor (1 instr at a time, many cycles per instr)
- Cycle time is the time taken by the slowest stage:
- Execution time for instructions:
- Average cycles per instruction needed since instructions take different number of cycles
Pipelining Processor
- Cycle time is time taken by slowest stage + overhead for pipelining:
- : Overhead for pipelining e.g. pipeline register
- Cycles needed for instructions: (need cycles to fill up the pipeline)
- Execution time for instructions:
- Analysing ideal case:
- Every stage takes same amount of time →
- No pipeline overhead →
- Number of instructions executed much greater than number of stages
- Pipeline speedup =
- So pipeline processor can have a times speedup, where is the number of stages
Hazards
A hazard is something that prevents the next instruction from immediately executing after the previous one
Structural
Simultaneous use of hardware resource
- Reading instruction from memory at
IFstage access memory at the same time asMEMstage of previous instruction- Solved by either: delaying the instruction by 3 clock cycles
- or having a separate instruction memory and data memory (this is what we have)
- Writing to register in
WBstage at the same time as reading inIDstage- Solved by splitting the clock cycle in half, write in first half, read in second half
- This order is used because the later instruction could depend on the data written by the earlier one
Data
Dependencies between instructions (instruction reading from previous instruction’s result) WAR & WAW dependencies exist, but will not cause problem here
- RAW read after write: instruction reads register written by previous instruction (aka true data dependency)
- e.g.
add $1, $2, $3thensub $4, $1, $5the issue is$1 - Depending on non-
lwinstruction: required data is already available, just not written to register yet, so it can be forwarded from one stage to another 
- Depending on
lwinstruction: data is not available when required, so delay execution, then forward data
Examples
- e.g.
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)Without forwarding:
With forwarding:

lw $2, 20($3)
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)Without forwarding:
With forwarding:

Control
When there is a change in program flow (branch)
When there is a branch instruction, decision whether to branch is made in the MEM stage. By then, the subsequent instructions could have partially executed
- Solution 1: Stall pipeline (by 3 cycles)
- This is bad because many instructions are branch instructions, so this will lead to a big slow down
- Solution 2: Early branch resolution (compute branch decision early)
- Add a comparator to the
IDstage, so that the branch result can be known afterIDstage - Then only need to stall for 1 cycle while waiting for
IDstage of branch instruction - But the register that decides whether to branch might be written by the previous instruction, causing issues
- So, add forwarding path from
EXtoIDstage, then add 1 cycle of delay so the register value is available 
- If previous instruction is
lw, need to add forwarding path fromMEMtoID, and more delay, but this is the same 3 cycle delay as no early branch resolution 
- Summary:
branch→ 1 cycle delay. Depend on previous instruction → 2 cycle delay. Depend on previouslwinstruction → 3 cycle delay
- So, add forwarding path from
- Add a comparator to the
- Solution 3: Branch prediction (only simplest implementation in 2100: predict that branch is not taken)
- Right prediction: no stall, since we can execute as per normal
- Wrong prediction: stop executing, flush the pipeline and execute the correct instruction in the next cycle (1 cycle delay with early branch resolution, 3 cycle delay without)
- Solution 4: Delayed branch Move instructions that do not affect the branch (non-control dependent instructions) to after the branch instruction. They can be executed while waiting for the branch to be decided
- The number of instructions that can be moved to after the branch: Branch-Delay slot (early branch: 1, no early branch: 3)
- Best case: there is an instruction that can be moved into branch delay slot
- Worst case: nothing can be moved there, so add a
nopinstruction in the slot - The compiler has to do this
or $8, $9, $10 //does not affect branch
add $1, $2, $3
sub $4, $5, $6
beq $1, $4, Exit
xor $10, $1, $11
Exit:
becomes
add $1, $2, $3
sub $4, $5, $6
beq $1, $4, Exit
or $8, $9, $10
xor $10, $1, $11
Exit: