Why does C have a distinction between -> and .?

2020-01-25 07:04发布

问题:

OK, this is of no serious consequence, but it's been bugging me for a while: Is there a reason for the distinction between the -> and . operators?

Of course, the current rule is that . acts on a struct, and -> acts on a pointer-to-struct (or union). But here's how it works in practice. Let s be a struct incuding an element x, and let ps be a pointer to a struct of the same form.

If you write

s->x

the compiler will spit out a warning in the way of

You meant s.x. Please retype that and recompile.

If you write

ps.x

the compiler will spit out a warning in the way of

You meant ps->x. Please retype that and recompile.

Because the compiler knows the type of both s and ps at compile time, it has all the information it needs to interpret what the correct operator would be. I suspect that this isn't like other warnings (like a missing semicolon), in that there is no ambiguity about the correct fix.

So here's a hypothetical proposal to the C1x standards committee (that would never be considered, because the ISO is on a conservative streak):

Given the expression lhs.rhs, if lhs is a struct or union type, then the expression shall refer to the element of lhs named rhs. If lhs is of type pointer-to-struct or -union, then this shall be interpreted as (*lhs).rhs.

This would certainly save us all time, and make it easier for people to learn C [and I've taught enough C to say with authority that learners find the -> thing to be either confusing or annoying.]

There's even precedent, where C does a handful of similar things. E.g., for implementation reasons, function declarations are always cast to pointer-to-function, so f(x,y) and (*f)(x,y) will both work regardless of whether f was declared as a function or a pointer to function.

So, my question: what's wrong with this proposal? Can you think of examples where there would be fatal ambiguity between ps.x and s.x, or why keeping the mandatory distinction is otherwise useful?

回答1:

Well, if you really wanted to introduce that kind of functionality into the specification of C language, then in order to make it "blend" with the rest of the language the logical thing to do would be to extend the concept of "decay to pointer" to struct types. You yourself made an example with a function and a function pointer. The reason it works that way is because function type in C decays to pointer type in all contexts, except for sizeof and unary & operators. (The same thing happens to arrays, BTW.)

So, in order to implement something similar to what you suggest, we could introduce the concept of "struct-to-pointer decay", which would work in exactly the same way as all other "decays" in C (namely, array-to-pointer decay and function-to-pointer decay) work: when a struct object of type T is used in an expression, its type immediately decays to type T* - pointer to the beginning of the struct object - except when it's an operand of sizeof or unary &. Once such a decay rule is introduced for structs, you could use -> operator to access struct elements regardless of whether you have a pointer to struct or the struct itself on the left-hand side. Operator . would become completely unnecessary in this case (unless I'm missing something), you'd always use -> and only ->.

The above, once again, what this feature would look like, in my opinion, if it was implemented in the spirit of C language.

But I'd say (agreeing with what Charles said) that the loss of visual distinction between the code that works with pointers to structs and the code that works with structs themselves is not exactly desirable.

P.S. An obvious negative consequence of such a decay rule for structs would be that besides the current army of newbies selflessly believing that "arrays are just constant pointers", we'd have an army of newbies selflessly believing that "struct objects are just constant pointers". And Chris Torek's array FAQ would have to be about 1.5-2x larger to cover structs as well :)



回答2:

I don't think there's anything crazy about what you've said. Using . for pointers to structs would work.

However, I like the fact that pointers to structs and structs are treated differently.

It gives some context about operations and clues as to what might be expensive.

Consider this snippet, imagine that it's in the middle of a reasonably large function.

s.c = 99;
f(s);

assert(s.c == 99);

Currently I can tell that s is a struct. I know that it's going to be copied in its entirety for the call to f. I also know that that assert can't fire.

If using . with pointers to struct were allowed, I wouldn't know any of that and the assert might fire, f might set s.c (err s->c) to something else.

The other downside is that it would reduce compatibility with C++. C++ allows -> to be overloaded by classes so that classes can be 'like' pointers. It's important that . and -> behave differently. "New" C code that used . with pointers to structs would no probably not be acceptable as C++ code any more.



回答3:

Well there clearly isn't any ambiguity or the proposal couldn't be made. The only issue is that if you see:

p->x = 3;

you know p is a pointer but if you allow:

p.x = 3;

in that circumstance then you don't actually know, which could potentially create problems, particularly if you later cast that pointer and use the wrong number of levels of indirection.



回答4:

A distinguishing feature of the C programming language (as opposed to its relative C++) is that the cost model is very explicit. The dot is distinguished from the arrow because the arrow requires an additional memory reference, and C is very careful to make the number of memory references evident from the source code.



回答5:

Well, there could definitely be cases where you have something complex like:

(*item)->elem

(which I have had happen in some programs), and if you wrote something like

item.elem

meaning the above, it could be confusing whether elem is an element of struct item, or an element of a struct that item points to, or an element of a struct that is pointed to be an element in a list that is pointed to by an iterator item, and so on and so forth.

So yeah, it does make things somewhat clearer when using pointers to pointers to structs, &c.



回答6:

Yes, that's OK, but it is not what C really needs at all

Not only is it OK, but it is the modern style. Java and Go both just use .. Since everything that doesn't fit in a register is at some level a reference, the distinction between thing and pointer to thing is definitely a bit arbitrary, at least until you get to function calls.

The first evolutionary step was to make the dereference operator postfix, something dmr once implied he at some point prefered. Pascal does this, so it has p^.field. The only reason there even is a -> operator is because it's goofy to have to type (*p).field or p[0].field.

So yes, it would work. It would even be better as it works at a higher level of abstraction. One really should be able to make as many changes as possible without requiring downstream code to change, that's in a sense the entire point of higher level languages.

I have argued that using () for function calls and [] for array subscripting is wrong. Why not allow different implementations to export different abstractions?

But there isn't much reason to make the change. C programmers are unlikely to revolt over the lack of a syntactic sugar extension that saves one character in an expression and it would be hard to use anyway because it would not be immediately if ever universally adopted. Remember that when standards committees go rogue they end up preaching to empty rooms. They require the willing cooperation and agreement of the world's compiler developers.

What C really needs isn't ever-so-slightly faster ways to write unsafe code. I don't mind working in C, but project managers don't like having their reliability determined by their worst guy, and it's possible that what C really needs is a safe dialect, something like Cyclone, or perhaps something just like Go.



回答7:

If anything, the current syntax lets readers of the code know whether or not the code is working with a pointer or the actual object. Someone who does not know the code beforehand understands it better.



标签: c struct