Type of strings

2019-06-14 20:41发布

问题:

I got quite confused about what is what. Would you please tell me what each variables type is?

char foo[] = "bar";
char *bar = nullptr;
char const *qux = nullptr;

Aditionally, what is the type of "bar"?

回答1:

The type of foo is char[4], i.e. a character array containing 4 chars (including the trailing null character '\0'.)

String literals can be used to initialize character arrays. If an array is initialized like char str[] = "foo";, str will contain a copy of the string "foo".

The type of bar is char *, qux is char const *, just as you declared.

"bar" is string literal with type const char[4], i.e. an array containing 4 const chars (also including the trailing null character '\0'.)

The null character ('\0', L'\0', char16_t(), etc) is always appended to the string literal: thus, a string literal "Hello" is a const char[6] holding the characters 'H', 'e', 'l', 'l', 'o', and '\0'.

Here's a helper class which could give the exact type at compile-time (the idea is borrowed from Effective.Modern.C++ written by Scott Meyers).

template <typename>
struct TD;

then use it like

TD<decltype(foo)> td1;
TD<decltype("bar")> td2;
TD<decltype(bar)> td3;
TD<decltype(qux)> td4;

e.g. from clang you'll get error message containing type information like:

prog.cc:12:23: error: implicit instantiation of undefined template 'TD<char [4]>'
    TD<decltype(foo)> td1;
                      ^
prog.cc:13:25: error: implicit instantiation of undefined template 'TD<char const (&)[4]>'
    TD<decltype("bar")> td2;
                        ^
prog.cc:14:23: error: implicit instantiation of undefined template 'TD<char *>'
    TD<decltype(bar)> td3;
                      ^
prog.cc:15:23: error: implicit instantiation of undefined template 'TD<const char *>'
    TD<decltype(qux)> td4;
                      ^    

BTW: Because string literals are treated as lvalues, and decltype yields type of T& for lvalues, so the above message from clang gives the type of "bar" as an lvalue-reference to array, i.e. char const (&)[4].



回答2:

The variable foo is a character array. Sort of.

Somewhere in the memory of your computer the compiler has organised things so that it contains the bytes [ 0x62, 0x61, 0x72, 0x00 ] "bar\0". The compiler added the trailing \0 (0x00) for you, to mark the end of the string. Let's say the compiler put these bytes at memory address 0x00001000 - the 4096th byte.

So even though we think of foo as a character array, the variable foo is actually the address of the first element of those four bytes, so foo = 0x00001000.

The variable bar is a pointer, which is just a number. The number it holds is the address in memory of whatever it is "pointing at". Initially you set bar to be nullptr, so (probably) bar = 0x00000000.

It's quite OK to say:

bar = foo;

Which would mean that bar now points at foo. Since we said the bytes for foo were stored at some location in memory (an "address"), that number is just copied into bar. So now bar = 0x00001000 too.

The variable qux is a pointer to a constant variable. This is a special compiler note so it can generate an error if you try to modify what it's pointing at.

It's OK to code:

qux = foo;
qux = bar;

Since all these things are pointers to characters.