What's the rationale for preventing assignment

2020-02-13 07:36发布

问题:

I've tried to google this and have read:

  • Why can't arrays of same type and size be assigned?
  • Assigning arrays
  • Assign to array in struct in c

But they all state the obvious: you can't assign to arrays because the standard says so. That's great and all, but I want to know why the standard doesn't include support for assigning to arrays. The standard committee discusses things in detail, and I'd be surprised if they never discussed making arrays assignable. Assuming they've discussed it, they must have some rationale for not letting arrays be assigned to.

I mean, we can put an array in a struct and assign to the struct just fine:

struct wrapper
{
    int array[2];
};

struct wrapper a = {{1, 2}};
struct wrapper b = {{3, 4}};
a = b; // legal

But using an array directly is prohibited, even though it accomplishes effectively the same thing:

int a[2] = {1, 2};
int b[2] = {3, 4};
a = b; // Not legal

What is the standard committee's rationale for prohibiting assigning to arrays?

回答1:

Understand that the intent wasn't to make array expressions unassignable; that wasn't the goal1. Rather, this behavior falls out of a design decision Ritchie made that simplified array handling in the compiler, but in exchange made arrays expressions "second-class" objects; they lose their "array-ness" in most contexts.

Read this paper (especially the section titled "Embryonic C") for some background; I also have a more detailed answer here.


1. With the possible exception of Perl or PHP2, most blatant language WTFs are generally accidents of design or the result of compromises; most languages aren't deliberately designed to be stupid.

2. I'm only trolling a little bit; Perl and PHP are straight-up messes.



回答2:

In C, assignment copies the contents of a fixed-size object to another fixed-size object. This is well defined and fairly straightforward to implement for scalar types (integers, floating-point, pointers, complex types since C99). Assignment of structs is nearly as simple; larger ones might require a call to memcpy() or equivalent, but it's still straightforward since the size and alignment are known at compile time.

Arrays are a different matter. Most array objects have sizes that aren't determined until run time. A good example is argv. The runtime environment constructs an array of char for each command-line argument, and an array of char* containing pointers to the arguments. These are made available to main via argv, a char**, and via the dynamically allocated char[] arrays that the elements of argv point to.

C arrays are objects in their own right, but they're not generally accessed as objects. Instead, their elements are accessed via pointers, and code traverses from one element to the next using pointer arithmetic.

Languages can be designed to treat arrays as first-class objects, with assignment -- but it's complicated. As a language designer, you have to decide whether an array of 10 integers and an array of 20 integers are the same type. If they are, you have to decide what happens when you try to assign one to the other. Does it copy the smaller size? Does it cause a runtime exception? Do you have to add a slice operation so you can operate on subsets of arrays?

If int[10] and int[20] are distinct types with no implicit conversion, then array operations are inflexible (see Pascal, for example).

All these things can be defined (see Ada), but only by defining higher-level constructs than what's typical in C. Instead, the designers of C (mostly Dennis Ritchie) chose to provide arrays with low-level operations. It's admittedly inconvenient at times, but it's a framework that can be used to implement all the higher-level array operations of any other language.



回答3:

The answer is simple: It never was allowed before the committee got involved (even struct-assignment was considered too heavy), and considering there's array-decay, allowing it would have all kinds of interesting consequences.

Let's see what would change:

int a[3], b[3], *c = b, *d = b;
a = b; // Currently error, would assign elements
a = c; // Currently error, might assign array of 3?
c = a; // Currently pointer assignment with array decay
c = d; // Currently pointer assignemnt

So, allowing array-assignment would make (up to) two currently disallowed assignments valid.

That's not the trouble though, it's that near-identical expressions would have wildly different results.

That gets especially piquant if you consider that array-notation in function arguments is currently just a different notation for pointers.
If array assignment was introduced, that would become even more confusing.
Not that enough people aren't completely confounded by things as they are today...

int g(int* x);  // Function receiving pointer to int and returning int
int f(int x[3]);// Currently the same. What afterwards? Change to value-copy?


回答4:

The reason is basically historic. There was a C even before ISO C89 which was called "K&R" C, after Kernighan and Ritchie. The language was designed to be small enough so a compiler would fit in severely limited (by today's standards) memory of 64kb.

This language did not allow assigning arrays. If you wanted to copy same-sized arrays, memcpy was there for your needs. Writing memcpy(a, b, sizeof a) instead of a = b is certainly not a big complication. It has the additional advantage of being generalizable to different-sized arrays and array slices.

Interestingly, the struct assignment workaround you mention also did not work in K&R C. You had to either assign members one by one or, again, use memcpy. The first edition of K&R's The C Programming language mentions struct assignment as a feature for future implementation in the language. Which eventually happened with C89.



回答5:

C is written in such a way that the address of the first element would be computed when the array expression is evaluated.

Quoting an excerpt from this answer:

This is why you can't do something like

int a[N], b[N];
a = b;

because both a and b evaluate to pointer values in that context; it's equivalent to writing 3 = 4. There's nothing in memory that actually stores the address of the first element in the array; the compiler simply computes it during the translation phase.



回答6:

Maybe it would be helpful to turn the question around, and ask why you'd ever want to assign arrays (or structs), instead of using pointers? That's much cleaner & easier to understand (at least if you've assimilated the Zen of C), and it has the benefit of not concealing the fact that a lot of work is hidden under the "simple" assignment of multi-megabyte arrays.