C11 5.1.2.2.1/2 says:
The parameters argc
and argv
and the strings pointed to by the argv
array shall
be modifiable by the program, and retain their last-stored values between program
startup and program termination.
My interpretation of this is that it specifies:
int main(int argc, char **argv)
{
if ( argv[0][0] )
argv[0][0] = 'x'; // OK
char *q;
argv = &q; // OK
}
however it does not say anything about:
int main(int argc, char **argv)
{
char buf[20];
argv[0] = buf;
}
Is argv[0] = buf;
permitted?
I can see (at least) two possible arguments:
- The above quote deliberately mentioned
argv
and argv[x][y]
but not argv[x]
, so the intent was that it is not modifiable
argv
is a pointer to non-const
objects, so by in the absence of specific wording to the contrary, we should assume they are modifiable objects.
IMO, code like argv[1] = "123";
is UB.
"The parameters argc and argv and the strings pointed to by the argv array shall
be modifiable by the program, and retain their last-stored values between program
startup and program termination." C11dr §5.1.2.2.1 2
Recall that const
came into C many years after C's creation.
Much like char *s = "abc";
is valid when it should be const char *s = "abc";
. The need for const
was not required else too much existing code would have be broken with the introduction of const
.
Likewise, even if argv
today should be considered char * const argv[]
or some other signature with const
, the lack of const
in the char *argv[]
does not complete specify the const
-ness needs of the argv
, argv[]
, or argv[][]
. The const
-ness needs would need to be driven by the spec.
From my reading, since the spec is silent on the issue, it is UB.
Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior" §4 2
[edit]:
main()
is a very special function is C. What is allowable in other functions may or may not be allowed in main()
. The C spec details attributes about its parameters that given the signature int argc, char *argv[]
that shouldn't need. main()
, unlike other functions in C, can have an alternate signature int main(void)
and potentially others. main()
is not reentrant. As the C spec goes out of its way to detail what can be modified: argc
, argv
, argv[][]
, it is reasonable to question if argv[]
is modifiable due to its omission from the spec asserting that code can.
Given the specialty of main()
and the omission of specifying that argv[]
as modifiable, a conservative programmer would treat this greyness as UB, pending future C spec clarification.
If argv[i]
is modifiable on a given platform, certainly the range of i
should not exceed argc-1
.
As "argv[argc]
shall be a null pointer", assignining argv[argc]
to something other than NULL
appears to be a violation.
Although the strings are modifiable, code should not exceed the original string's length.
char *newstr = "abc";
if (strlen(newstr) <= strlen(argv[1]))
strcpy(argv[1], newstr);
argc
is just an int
and is modifiable without any restriction.
argv
is a modifiable char **
. It means that argv[i] = x
is valid. But it does not say anything about argv[i]
being itself modifiable. So argv[i][j] = c
leads to undefined behaviour.
The getopt
function of C standard library does modify argc and argv but never modifies the actual char arrays.
It is clearly mentioned that argv
and argv[x][x]
is modifiable. If argv
is modifiable then it can point to another first element of an array of char
and hence argv[x]
can point to the first element of some another string. Ultimately argv[x]
is modifiable too and that could be the reason that there is no need to mention it explicitly in standard.
The answer is that argv is an array and yes, its contents are modifiable.
The key is earlier in the same section:
If the value of argc is greater than zero, the array members argv[0] through
argv[argc-1] inclusive shall contain pointers to strings, which are given
implementation-defined values by the host environment prior to program startup.
From this it is clear that argv is to be thought of as an array of a specific length (argc). Then *argv is a pointer to that array, having decayed to a pointer.
Read in this context, the statement to the effect that 'argv shall be modifiable...and retain its contents' clearly intends that the contents of that array be modifiable.
I concede that there remains some ambiguity in the wording, particularly as to what might happen if argc is modified.
Just to be clear, what I'm saying is that I read this language as meaning:
[the contents of the] argv [array] and the strings pointed to by the argv array shall be modifiable...
So both the pointers in the array and the strings they point to are in read-write memory, no harm is done by changing them, and both preserve their values for the life of the program. I would expect that this behaviour is to be found in all the major C/C++ runtime library implementations, without exception. This is not UB.
The ambiguity is the mention of argc. It is hard to imagine any purpose or any implementation in which the value of argc (which appears to be simply a local function parameter) could not be changed, so why mention it? The standard clearly states that a function can change the value of its parameters, so why treat argc specially in this respect? It is this unexpected mention of argc that has triggered this concern about argv, which would otherwise pass without remark. Delete argc from the sentence and the ambiguity disappears.