Just to make it clear, I'm not going for any sort of portability here, so any solutions that will tie me to a certain box is fine.
Basically, I have an if statement that will 99% of the time evaluate to true, and am trying to eke out every last clock of performance, can I issue some sort of compiler command (using GCC 4.1.2 and the x86 ISA, if it matters) to tell the branch predictor that it should cache for that branch?
I suggest rather than worry about branch prediction, profile the code and optimize the code to reduce the number of branches. One example is loop unrolling and another using boolean programming techniques rather than using
if
statements.Most processors love to prefetch statements. Generally, a branch statement will generate a fault within the processor causing it to flush the prefetch queue. This is where the biggest penalty is. To reduce this penalty time, rewrite (and design) the code so that fewer branches are available. Also, some processors can conditionally execute instructions without having to branch.
I've optimized a program from 1 hour of execution time to 2 minutes by using loop unrolling and large I/O buffers. Branch prediction would not have offered much time savings in this instance.