There are some queries about branch prediction that I am not able to confidently figure out.Assume that I have to work with a static branch predictor.
- At which stage of the pipeline should branch prediction happen?
- How to know that a prediction has gone wrong? How does the datapath come to know that a misprediction has happened?
- If it comes to know that a misprediction has happened, how does it send the signal to take up the not-taken branch?
- After it has gone wrong, I have to take up that address that was not taken earlier. In the meanwhile, what if some memory-write or register-write has happened? How to prevent it from happening?
It will be very helpful even if some proper references with datapath in them are suggested. Thanks in advance.
I took my time reading the reference manual for the Cortex-A8: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344k/DDI0344K_cortex_a8_r3p2_trm.pdf
From section 5.1:
The processor contains program flow prediction hardware, also known as
branch prediction. With program flow prediction disabled, all taken
branches incur a 13-cycle penalty. With program flow prediction
enabled, all mispredicted branches incur a 13-cycle penalty.
Basically this means that static branch prediction always assume branches to be false. This is different compared to PowerPC that have "special instructions" for hinting the processor about taken/not-taken branches (postfix +/-).
From section 1.3.1:
The instruction fetch unit predicts the instruction stream, fetches
instructions from the L1 instruction cache, and places the fetched
instructions into a buffer for consumption by the decode pipeline.
- Instruction Fetch, the first stage, makes the prediction.
From section 7.6.2:
An instruction can remain in the pipeline between being fetched and
being executed. Because there can be several unresolved branches in
the pipeline, instruction fetches are speculative, meaning there is
no guarantee that they are executed. A branch or exceptional
instruction in the code stream can cause a pipeline flush, discarding the currently fetched instructions. Fetches or instruction
table walks that begin without an empty pipeline are marked
speculative. If the pipeline contains any instruction up to the
point of branch and exception resolution, then the pipeline is
considered not empty.
I interpret this as nothing reaches the execution stage while a branch is being processed. If mispredition occurs, as discovered when executing a branch in Instruction Execute, all instructions in the pipeline are "flushed". They are never executed. That should answer question 2 and 4. Not so sure about how the "marking" is performed.
- I don´t know how it sends the signal. As far as I can tell the reference manual does not cover that part. Guess it´s magic.
(For the record I find the PowerPC reference manuals (e500/e600) I´m used to being much easier to understand because of the many instruction timing samples.)
I guess that there are many different mechanisms that are possible, but some quick answers:
- Branch prediction certainly needs to happen before the instructions are decoded, during the fetch stages. Otherwise, you're going to decode instructions that are not correct.
- You will normally give extra information with the branch instruction that was predicted, like the target that was predicted. The branch will be executed, and if the real target does not match the predicted target, you will need to flush the pipe.
- It really depends on the implementation. If the branch is executed, you can use the real target, like a branch that was not predicted.
- You certainly need a mechanism to recover, or wait for the branches to be resolved until you write the results. This will loose some time, but not as much as a branch that was not predicted.