-->

What is the rationale for limitations on pointer a

2019-01-22 12:25发布

站内文章 / C++

71 0

男人必须洒脱

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

In C/C++, addition or subtraction on pointer is defined only if the resulting pointer lies within the original pointed complete object. Moreover, comparison of two pointers can only be performed if the two pointed objects are subobjects of a unique complete object.

What are the reasons of such limitations?

I supposed that segmented memory model (see here §1.2.1) could be one of the reasons but since compilers can actually define a total order on all pointers as demonstrated by this answer, I am doubting this.

回答1:

There are architectures where program and data spaces are separated, and it's simply impossible to subtract two arbitrary pointers. A pointer to a function or to const static data will be in a completely different address space than a normal variable.

Even if you arbitrarily supplied a ranking between different address spaces, there's a possibility that the diff_t type would need to be a larger size. And the process of comparing or subtracting two pointers would be greatly complicated. That's a bad idea in a language that is designed for speed.

回答2:

You only prove that the restriction could be removed - but miss that it would come with a cost (in terms of memory and code) - which was contrary to the goals of C.

Specifically the difference needs to have a type, which is ptrdiff_t, and one would assume it is similar to size_t.

In a segmented memory model you (normally) indirectly have a limitation on the sizes of objects - assuming that the answers in: What's the real size of `size_t`, `uintptr_t`, `intptr_t` and `ptrdiff_t` type on 16-bit systems using segmented addressing mode? are correct.

Thus at least for differences removing that restriction would not only add extra instructions to ensure a total order - for an unimportant corner case (as in other answer), but also spend double the amount of memory for differences etc.

C was designed to be more minimalistic and not to force compiler to spend memory and code on such cases. (In those days memory limitations mattered more.)

Obviously there are also other benefits - like the possibility to detect errors when mixing pointers from different arrays. Similarly as mixing iterators for two different containers is undefined in C++ (with some minor exceptions) - and some debug-implementations detect such errors.

回答3:

The reason is to keep the possibility to generate reasonable code. This applies to systems with a flat memory model as well as to systems with more complex memory models. If you forbid the (not very useful) corner cases like adding or subtracting out of arrays and demanding a total order on pointers between objects you can skip a lot of overhead in the generated code.

The limitations imposed by the standard allows the compiler to make assumptions on pointer arithmetic and use this to improve quality of the code. It covers both computing things statically in the compiler instead of at runtime and choosing which instrutions and addressing modes to use. As an example, consider a program with two pointers p1 and p2. If the compiler can derive that they point to different data objects it can safely assume that any no operation based on following p1 will ever affect the object pointed to by p2. This allows the compiler to reorder loads and stores based on p1 without consider loads and stores based on p2 and the other way around.

回答4:

The rationale is that some architectures have segmented memory, and pointers to different objects may point at different memory segments. The difference between the two pointers would then not necessarily be something meaningful.

This goes back all the way to pre-standard C. The C rationale doesn't mention this explicitly, but it hints at this being the reason, if we look where it explains the rationale why using a negative array index is undefined behavior (C99 rationale 5.10 6.5.6, emphasis mine):

In the case of p-1, on the other hand, an entire object would have to be allocated prior to the array of objects that p traverses, so decrement loops that run off the bottom of an array can fail. This restriction allows segmented architectures, for instance, to place objects at the start of a range of addressable memory.

回答5:

Since the C standard intends to cover the majority of processor architectures, it should also cover this one: Imagine an architecture (I know one, but wouldn't name it) where pointers are not just plain numbers, but are like structures or "descriptors". Such a structure contains information about the object it points into (its virtual address and size) and the offset within it. Adding or subtracting a pointer produces a new structure with only the offset field adjusted; producing a structure with the offset greater than the size of the object is hardware prohibited. There are other restrictions (such as how the initial descriptor is produced or what are the other ways to modify it), but they are not relevant to the topic.

回答6:

In most cases where the Stanadrd classifies an action as invoking Undefined Behavior, it has done so because:

There might be platforms where defining the behavior would be expensive. Segmented architectures could behave weirdly if code tries to do pointer arithmetic that extends beyond object boundaries, and some compilers may evaluate p > q by testing the sign of q-p.
There are some kinds of programming where defining the behavior would be useless. Many kinds of code can get by just fine without relying upon forms of pointer addition, subtraction, or relational comparison beyond those given by the Standard.
People writing compilers for various purposes should be capable of recognizing cases where quality compilers intended for such purposes should behave predictably, and handling such cases when appropriate, whether or not the Standard compels them to do so.

Both #1 and #2 are very low bars, and #3 was thought to be a "gimme". Although it has become fashionable for compiler writers to show off their cleverness by finding ways of breaking code whose behavior was defined by quality implementations intended for low-level programming, I don't think the authors of the Standard expected compiler writers to perceive a huge difference between actions which were required to behave predictably, versus those where nearly all quality implementations were expected to behave identically, but where there it might conceivably be useful to let some arcane implementations do something else.

回答7:

I would like to answer this by inverting the question. Instead of asking why pointer addition and most of the arithmetic operations are not allowed, why do pointers allow only adding or subtracting an integer, post and pre increment and decrement and comparison (or subtraction) of pointers pointing to the same array? It is to do with the logical consequence of the arithmetic operation. Adding/subtracting an integer n to a pointer p gives me the address of nth element from the currently pointed element either in the forward or reverse direction. Similarly, subtracting p1 and p2 pointing to the same array gives me the count of elements between the two pointers. The fact (or design) that the pointer arithmetic operations are defined consistent with the type of variable it is pointing to is a real stroke of genius. Any operation other than the permitted ones defies programming or philosophically logical reasoning and therefore is not allowed.

回答8:

(Please do not upvote) This is a summary of the answer, and the reason why I choosed the Mark Ransom answer's, and to make this post: the answer is its post + its following comment.

The question was What are the reasons of such limitations (that pointer arithmetic and comparison must fall on a unique complete object)?

Ransom's comment+answer summary: since there is no concept of address space, the only possiblity to constrain pointer arithmetic to fall inside an address space was to constrain it on an object.

Krazy Glew comment's provides also a sociological oriented answer.