Pipeline Stall Related to BNE Execution and Label

Below is the solution related to a pipeline question.

After reading the solution, I have a question.

Why the first line bne $7, $0, L1 EX is at same cycle for the IF of last line L1:sw $8, 0($3)? For my understanding, before instruction fetch for the last line, it should wait until bne finish executing the condition and knowing if it needs to fetch instruction or not.

Any hint appreciates. Thanks so much for your time and help.

According to https://en.wikipedia.org/wiki/Classic_RISC_pipeline#Control_hazards, classic MIPS resolves branches in the ID stage, and the branch-delay slot completely hides the front-end bubble. (Assuming the compiler can fill it with something other than a NOP).

Even if that wasn't true and the branch did need to wait for EX to be resolved, the CPU can speculatively fetch and decode the later instructions; none of them reach MEM or WB before the correct branch direction is detected so they have no permanent effect on the architectural state. (In fact none of them even reach EX, so there's no speculative execution at all, just speculative decode).

If EX detected that the branch should have been taken, then the pipeline would have to restart fetch of the sw instruction without the jr in the pipe. (The add stays because it's in the branch-delay slot: it's executed in both cases.)

Further reading: difference between speculation and prediction, and also this unclearly-worded question Out-of-order execution vs. speculative execution. Hadi's nice answer covers a range of things CPUs can do before they're sure which way a branch goes.

Simply fetching and decoding instructions based on branch prediction, but not executing them, is one of the easier ones, and many people don't consider it speculative execution at all. It is still speculation that requires a pipeline flush / re-steer, unlike stalling until the correct fetch address is known for sure. (Without a branch delay slot, you couldn't even detect a branch (in decode) until you've already fetched an instruction from potentially the wrong path. In deeper / wider pipelines, branch prediction is important for predicting the next fetch-block address even before decode has figured out if there are any branches in the current block. This is separate from a detailed prediction of where a specific branch instruction goes.)

The weird thing with this diagram is that it shows jr and sw in the same stage in the same cycle. That makes no sense, and the sw can't stall before fetch even reaches it.

Is this for a taken-branch case? That wouldn't make sense either, because then jr shouldn't be in the pipeline at all. And sw can't stall on the same cycle where add is in the fetch stage.