MIPS Pipeline Stages

  • IF: Instruction Fetch
  • ID: Instruction Decode & Register Read
  • EX: Execute operation or calculate address
  • MEM: Access operand in data memory
  • WB: Write back the result into a register

Pipeline Registers

Datapath

Store data used by the “current” instruction, and pass it down step by step. Here is what they store at the end of a cycle

  • IF Stage, IF/ID stores
    • Instruction read from InstructionMemory[PC] (Split into register numbers and 16-bit offset to be sign extended)
    • PC+4
  • ID Stage, ID/EX stores
    • Data read from register file
    • 32-bit immediate value
    • PC+4
    • Write register number
  • EX Stage, EX/MEM stores
    • PC+4 + Immediate 4
    • ALU result
    • isZero? signal
    • Data read 2 from register file
    • Write register number
  • MEM Stage, MEM/WB stores
    • ALU result
    • Memory read data
    • Write register number
  • WB Stage
    • Result is written back to register file if applicable

Control

Control signals are generated in the ID stage, and passed down through the pipeline registers: WB, MEM and EX control are all passed to ID/EX, then WB and MEM are passed to EX/MEM etc.

Performance Comparison

Single Cycle Processor (1 instr takes 1 cycle)

  • Cycle time is the time taken by the slowest instruction:
    • : Time for operation in stage (e.g. IF, ID stage)
    • : Number of stages
  • Execution time for instructions:

Multi Cycle Processor (1 instr at a time, many cycles per instr)

  • Cycle time is the time taken by the slowest stage:
  • Execution time for instructions:
    • Average cycles per instruction needed since instructions take different number of cycles

Pipelining Processor

  • Cycle time is time taken by slowest stage + overhead for pipelining:
    • : Overhead for pipelining e.g. pipeline register
  • Cycles needed for instructions: (need cycles to fill up the pipeline)
  • Execution time for instructions:
  • Analysing ideal case:
    • Every stage takes same amount of time →
    • No pipeline overhead →
    • Number of instructions executed much greater than number of stages
    • Pipeline speedup =
    • So pipeline processor can have a times speedup, where is the number of stages

Hazards

A hazard is something that prevents the next instruction from immediately executing after the previous one

Structural

Simultaneous use of hardware resource

  • Reading instruction from memory at IF stage access memory at the same time as MEM stage of previous instruction
    • Solved by either: delaying the instruction by 3 clock cycles
    • or having a separate instruction memory and data memory (this is what we have)
  • Writing to register in WB stage at the same time as reading in ID stage
    • Solved by splitting the clock cycle in half, write in first half, read in second half
    • This order is used because the later instruction could depend on the data written by the earlier one

Data

Dependencies between instructions (instruction reading from previous instruction’s result) WAR & WAW dependencies exist, but will not cause problem here

  • RAW read after write: instruction reads register written by previous instruction (aka true data dependency)
    • e.g. add $1, $2, $3 then sub $4, $1, $5 the issue is $1
    • Depending on non-lw instruction: required data is already available, just not written to register yet, so it can be forwarded from one stage to another
    • Depending on lw instruction: data is not available when required, so delay execution, then forward data
    • Examples
sub $2,  $1, $3
and $12, $2, $5
or  $13, $6, $2
add $14, $2, $2
sw  $15, 100($2)

Without forwarding: With forwarding:

lw  $2,  20($3)
and $12, $2, $5
or  $13, $6, $2
add $14, $2, $2
sw  $15, 100($2)

Without forwarding: With forwarding:

Control

When there is a change in program flow (branch) When there is a branch instruction, decision whether to branch is made in the MEM stage. By then, the subsequent instructions could have partially executed

  • Solution 1: Stall pipeline (by 3 cycles)
    • This is bad because many instructions are branch instructions, so this will lead to a big slow down
  • Solution 2: Early branch resolution (compute branch decision early)
    • Add a comparator to the ID stage, so that the branch result can be known after ID stage
    • Then only need to stall for 1 cycle while waiting for ID stage of branch instruction
    • But the register that decides whether to branch might be written by the previous instruction, causing issues
      • So, add forwarding path from EX to ID stage, then add 1 cycle of delay so the register value is available
      • If previous instruction is lw, need to add forwarding path from MEM to ID, and more delay, but this is the same 3 cycle delay as no early branch resolution
      • Summary: branch → 1 cycle delay. Depend on previous instruction → 2 cycle delay. Depend on previous lw instruction → 3 cycle delay
  • Solution 3: Branch prediction (only simplest implementation in 2100: predict that branch is not taken)
    • Right prediction: no stall, since we can execute as per normal
    • Wrong prediction: stop executing, flush the pipeline and execute the correct instruction in the next cycle (1 cycle delay with early branch resolution, 3 cycle delay without)
  • Solution 4: Delayed branch Move instructions that do not affect the branch (non-control dependent instructions) to after the branch instruction. They can be executed while waiting for the branch to be decided
    • The number of instructions that can be moved to after the branch: Branch-Delay slot (early branch: 1, no early branch: 3)
    • Best case: there is an instruction that can be moved into branch delay slot
    • Worst case: nothing can be moved there, so add a nop instruction in the slot
    • The compiler has to do this
or  $8, $9, $10 //does not affect branch
add $1, $2, $3
sub $4, $5, $6
beq $1, $4, Exit
xor $10, $1, $11
Exit:

becomes

add $1, $2, $3
sub $4, $5, $6
beq $1, $4, Exit
or  $8, $9, $10
xor $10, $1, $11
Exit: