String Literal address across translation units [d

2019-01-03 03:38发布

This question already has an answer here:

I'd like to ask if is it portable to rely on string literal address across translation units? I.e:

A given file foo.c has a reference to a string literal "I'm a literal!", is it correct and portable to rely that in other given file, bar.c in instance, that the same string literal "I'm a literal!" will have the same memory address? Considering that each file will be translated to a individual .o file.

For better illustration, follows an example code:

# File foo.c
/* ... */
const char * x = "I'm a literal!"

# File bar.c
/* ... */
const char * y = "I'm a literal!"

# File test.c
/* ... */
extern const char * x;
extern const char * y;
assert (x == y); //Is this assertion going to fail?

And a gcc example command lines:

gcc -c -o foo.o -Wall foo.c
gcc -c -o bar.o -Wall bar.c
gcc -c -o test.o -Wall test.c
gcc -o test foo.o bar.o test.o

What about in the same translation unit? Would this be reliable if the strings literals are in the same translation unit?

2条回答
神经病院院长
2楼-- · 2019-01-03 03:52

You can not rely on identical string literals having the same memory location, it is an implementation decision. The C99 draft standard tells us that it is unspecified whether the same string literal are distinct, from section 6.4.5 String literals:

It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

For C++ this covered in the draft standard section 2.14.5 String literals which says:

Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation defined. The effect of attempting to modify a string literal is undefined.

The compiler is allowed to pool string literals but you would have to understand how it works from compiler to compiler and so this would not be portable and could potentially change. Visual Studio includes an option for string literal pooling

In some cases, identical string literals may be pooled to save space in the executable file. In string-literal pooling, the compiler causes all references to a particular string literal to point to the same location in memory, instead of having each reference point to a separate instance of the string literal. To enable string pooling, use the /GF compiler option.

Note that it does qualify with In some cases.

gcc does support pooling and across compilation units and you can turn it on via -fmerge-constants:

Attempt to merge identical constants (string constants and floating-point constants) across compilation units.

This option is the default for optimized compilation if the assembler and linker support it. Use -fno-merge-constants to inhibit this behavior.

note, the use of attempt and if ... support it.

As for a rationale at least for C for not requiring string literals to be pooled we can see from this archived comp.std.c discussion on string literals that the rationale was due to the wide variety of implementation at the time:

GCC might have served as an example but not as motivation. Partly the desire to have string literals in ROMmable data was to support, er, ROMming. I vaguely recall having used a couple of C implementations (before the X3J11 decision was made) where string literals were either automatically pooled or stored in a constant data program section. Given the existing variety of practice and the availability of an easy work-around when the original UNIX properties were wanted, it seemed best to not try to guarantee uniqueness and writability of string literals.

查看更多
贼婆χ
3楼-- · 2019-01-03 03:52

No, you can't expect the same address. If it happens, happens. But there's nothing enforcing it.

§ 2.14.5/p12

Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation defined. The effect of attempting to modify a string literal is undefined.

The compiler can do as it pleases. They can be stored in different addresses if they are in different translation units or even if they are in the same translation unit, regardless of the fact that they're read-only memory.

On MSVC, for instance, the addresses are totally different in both cases, but again: nothing prevents the compiler from merging the pointers' values (not even where, as far as the read-only section constraint is obliged).

查看更多
登录 后发表回答