Why are conditionally executed instructions not pr

2019-03-18 01:05发布

Naively, conditionally executed instructions seem like a great idea to me.

As I read more about ARM (and ARM-like) instruction sets (Thumb2, Unicore, AArch64) I find that they all lack the bits for conditional execution.

Why is conditional execution missing from each of these?

Was conditional execution a mistake at the time, or have subsequent changes made it an expensive waste of instruction bits?

7条回答
爱情/是我丢掉的垃圾
2楼-- · 2019-03-18 01:37

Conditional execution is a good choice in implementation of many auxiliary or bit-twiddling routines, such as sorting, list or tree manipulation, number to string conversion, sqrt or long division. We could add UART drivers and extracting bit fields in routers. Those have a high branch to non-branch ratio with somewhat high unpredictability too.

However, once you get beyond the lowest level of services (or increase the abstraction level by using a higher level language), the code looks completely different: code blocks inside different branches of conditions consists more of moving data and calling sub-routines. Here the benefits of those extra 4 bits rapidly fade away. It's not only personal development but cultural: Culturally programming has grown from unstructured (Basic, Fortran, Assembler) towards structural. Different programming paradigms are supported better also in different instruction set architectures.

A technological compromise could have been the possibility to compress the five bit 'cond.S' field to four or three most frequently used combinations.

查看更多
叛逆
3楼-- · 2019-03-18 01:42

It's somewhat misleading to say that conditional execution is not present in ARMv8. The issue is to understand why you don't want to execute some instructions. Perhaps in the early ARM days, the actual non-execution of instructions mattered (for power or whatever) but today the significance of this feature is that it allows you to avoid branches for small dumb jumps, for example code like a=(b>0? 1: 2). This sort of thing is more common than you might imagine --- conceptually it's things like MAX/MIN or ABS (though for some CPUs there may be instructions to do these particular tasks).

In ARMv8, while there are not general conditionally executed instructions there are a few instructions that perform the specific task I am describing, namely allowing you to avoid branching for short dumb jumps; CSEL is the most obvious example, though there are other cases (e.g. conditional setting of conditions) to handle other common patterns (in that case the pattern of C short-circuited expression evaluation).

IMHO what ARM has done here is what makes the most sense. They've extracted the feature of conditional execution that remains valuable on modern CPUs (avoid many branches) while changing the details of the implementation to match the micro-architecture of modern CPUs.

查看更多
成全新的幸福
4楼-- · 2019-03-18 01:43

On the old ARM v4, the conditional instructions only saved time if there was a high probability that they would end up getting executed, or if the probability was about 50%, then if there were just 2 to 4 of them in a row. If they weren't getting executed, then it was wasting cycles to have to fetch past them, versus the overhead of using a branch to get past them. If they were being executed, the branch would be fetched but not executed.

A minor nuisance is that when debugging, placing a break on a conditional instruction always resulted in taking a break on that instruction, regardless of the condition (unless there's some really smart debugger that my company didn't have).

查看更多
Bombasti
5楼-- · 2019-03-18 01:43

Just like, the defer slot in mips being a trick (at the time), conditional execution in arm is a trick (at the time), as is the pc being two instructions ahead. Now down the road how much affect do they have? Will ARMs branch predictor actually make that much difference or is the real answer they needed more bits in a 32 bit instruction word and like thumb the first and easiest thing to get rid of is the condition bits.

it is not too difficult to do some performance tests to see how good or back the branch predictor really is, I tried it with unconditional branches on an arm11, granted that is an old architecture now but still in wide use. It was difficult at best to get the branch prediction to show any improvement, and in no way, shape, or form could it compete with the conditional execution. I have not repeated these experience on anything in the cortex-a family.

查看更多
smile是对你的礼貌
6楼-- · 2019-03-18 01:45

General claim is modern systems have better branch predictors and compilers are much more advanced so their cost on instruction encoding space is not justified.

This is from ARMv8 Instruction Set Overview

The A64 instruction set does not include the concept of predicated or conditional execution. Benchmarking shows that modern branch predictors work well enough that predicated execution of instructions does not offer sufficient benefit to justify its significant use of opcode space, and its implementation cost in advanced implementations.

And it continues

A very small set of “conditional data processing” instructions are provided. These instructions are unconditionally executed but use the condition flags as an extra input to the instruction. This set has been shown to be beneficial in situations where conditional branches predict poorly, or are otherwise inefficient.

Another paper titled Trading Conditional Execution for More Registers on ARM Processors claims:

... conditional execution takes up precious instruction space as conditions are encoded into a 4-bit condition code selector on every 32-bit ARM instruction. Besides, only small percentages of instructions are actually conditionalized in modern embedded applications, and conditional execution might not even lead to performance improvement on modern embedded processors.

查看更多
再贱就再见
7楼-- · 2019-03-18 01:56

One of the reasons is that because of encoding.

In thumb, you cannot squeeze more 4 bits into the tight 16-bit space while there isn't even enough room for the 3 high bit of the registers and they must be reduced to a subset of only 8 registers. Note that in thumb2 you have a separate IT(E) instruction for selecting the conditions for the next 4 instructions. You can't store the condition in the same instruction though, because of the reason stated above.

For AArch64 the number of registers has been doubled compared to 32-bit ARM, but again you don't have any remaining bits for the new 3 high bits of the registers. If you want to use the old encoding then you must "borrow" either from the narrow 12-bit immediate or the 4-bit condition. 12-bit is too small compared to other RISC architectures such as MIPS and reducing it making everything worse, so removing the condition is a better choice. Because branch prediction has become more and more advanced, it won't be much a problem

查看更多
登录 后发表回答