Is argv[n] writable?

2019-01-06 18:19发布

问题:

C11 5.1.2.2.1/2 says:

The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

My interpretation of this is that it specifies:

int main(int argc, char **argv)
{
    if ( argv[0][0] )
        argv[0][0] = 'x';   // OK

    char *q;
    argv = &q;              // OK
}

however it does not say anything about:

int main(int argc, char **argv)
{
    char buf[20];
    argv[0] = buf;
}

Is argv[0] = buf; permitted?

I can see (at least) two possible arguments:

  • The above quote deliberately mentioned argv and argv[x][y] but not argv[x], so the intent was that it is not modifiable
  • argv is a pointer to non-const objects, so by in the absence of specific wording to the contrary, we should assume they are modifiable objects.

回答1:

IMO, code like argv[1] = "123"; is UB.


"The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination." C11dr §5.1.2.2.1 2

Recall that const came into C many years after C's creation.

Much like char *s = "abc"; is valid when it should be const char *s = "abc";. The need for const was not required else too much existing code would have be broken with the introduction of const.

Likewise, even if argv today should be considered char * const argv[] or some other signature with const, the lack of const in the char *argv[] does not complete specify the const-ness needs of the argv, argv[], or argv[][]. The const-ness needs would need to be driven by the spec.

From my reading, since the spec is silent on the issue, it is UB.

Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior" §4 2


[edit]:

main() is a very special function is C. What is allowable in other functions may or may not be allowed in main(). The C spec details attributes about its parameters that given the signature int argc, char *argv[] that shouldn't need. main(), unlike other functions in C, can have an alternate signature int main(void) and potentially others. main() is not reentrant. As the C spec goes out of its way to detail what can be modified: argc, argv, argv[][], it is reasonable to question if argv[] is modifiable due to its omission from the spec asserting that code can.

Given the specialty of main() and the omission of specifying that argv[] as modifiable, a conservative programmer would treat this greyness as UB, pending future C spec clarification.


If argv[i] is modifiable on a given platform, certainly the range of i should not exceed argc-1.

As "argv[argc] shall be a null pointer", assignining argv[argc] to something other than NULL appears to be a violation.

Although the strings are modifiable, code should not exceed the original string's length.

char *newstr = "abc";
if (strlen(newstr) <= strlen(argv[1])) 
  strcpy(argv[1], newstr);


回答2:

argc is just an int and is modifiable without any restriction.

argv is a modifiable char **. It means that argv[i] = x is valid. But it does not say anything about argv[i] being itself modifiable. So argv[i][j] = c leads to undefined behaviour.

The getopt function of C standard library does modify argc and argv but never modifies the actual char arrays.



回答3:

It is clearly mentioned that argv and argv[x][x] is modifiable. If argv is modifiable then it can point to another first element of an array of char and hence argv[x] can point to the first element of some another string. Ultimately argv[x] is modifiable too and that could be the reason that there is no need to mention it explicitly in standard.



回答4:

The answer is that argv is an array and yes, its contents are modifiable.

The key is earlier in the same section:

If the value of argc is greater than zero, the array members argv[0] through argv[argc-1] inclusive shall contain pointers to strings, which are given implementation-defined values by the host environment prior to program startup.

From this it is clear that argv is to be thought of as an array of a specific length (argc). Then *argv is a pointer to that array, having decayed to a pointer.

Read in this context, the statement to the effect that 'argv shall be modifiable...and retain its contents' clearly intends that the contents of that array be modifiable.

I concede that there remains some ambiguity in the wording, particularly as to what might happen if argc is modified.


Just to be clear, what I'm saying is that I read this language as meaning:

[the contents of the] argv [array] and the strings pointed to by the argv array shall be modifiable...

So both the pointers in the array and the strings they point to are in read-write memory, no harm is done by changing them, and both preserve their values for the life of the program. I would expect that this behaviour is to be found in all the major C/C++ runtime library implementations, without exception. This is not UB.

The ambiguity is the mention of argc. It is hard to imagine any purpose or any implementation in which the value of argc (which appears to be simply a local function parameter) could not be changed, so why mention it? The standard clearly states that a function can change the value of its parameters, so why treat argc specially in this respect? It is this unexpected mention of argc that has triggered this concern about argv, which would otherwise pass without remark. Delete argc from the sentence and the ambiguity disappears.