This question already has an answer here:
-
How to declare strings in C [duplicate]
4 answers
A few weeks ago I started learning the programming language C. I have knowledge in web technologies like HMTL/CSS, Javscript, PHP, and basic server administration, but C is confusing me. To my understanding, the C language does not have a data type for strings, just characters, however I may be wrong.
I have heard there are two ways of declaring a string. What is the difference between these two lines of declaring a string:
a.) char stringName[];
b.) char *stringName;
I get that char stringName[];
is an array of characters. However, the second line confuses me. To my understanding the second line makes a pointer variable. Aren't pointer variables supposed to be the memory address of another variable?
In the C language, a "string" is, as you say, an array of char
. Most string functions built into the C spec expect the string to be "NUL terminated", meaning the last char
of the string is a 0
. Not the code representing the numeral zero, but the actual value of 0
.
For example, if you're platform uses ASCII, then the following "string" is "ABC":
char myString[4] = {65, 66, 67, 0};
When you use the char varName[] = "foo"
syntax, you're allocating the string on the stack (or if its in a global space, you're allocating it globally, but not dynamically.)
Memory management in C is more manual than in many other langauges you may have experience with. In particular, there is the concept of a "pointer".
char *myString = "ABC"; /* Points to a string somewhere in memory, the compiler puts somewhere. */
Now, a char *
is "an address that points to a char or char array". Notice the "or" in that statement, it is important for you, the programmer, to know what the case is.
It's important to also ensure that any string operations you perform don't exceed the amount of memory you've allocated to a pointer.
char myString[5];
strcpy(myString, "12345"); /* copy "12345" into myString.
* On no! I've forgot space for my nul terminator and
* have overwritten some memory I don't own. */
"12345" is actually 6 characters long (don't forget the 0
at the end), but I've only reserved 5 characters. This is what's called a "buffer overflow", and is the cause of many serious bugs.
The other difference between "[]" and "*", is that one is creating an array (as you guessed). The other one is not reserving any space (other than the space to hold the pointer itself.) That means that until you point it somewhere that you know is valid, the value of the pointer should not be used, for either reading or writing.
Another point (made by someone in the comment)
You cannot pass an array as a parameter to a function in C. When you try, it gets converted to a pointer automatically. This is why we pass around pointers to strings rather than the strings themselves
In C, a string is a sequence of character values followed by a 0-valued byte1 . All the library functions that deal with strings use the 0 terminator to identify the end of the string. Strings are stored as arrays of char
, but not all arrays of char
contain strings.
For example, the string "hello"
is represented as the character sequence {'h', 'e', 'l', 'l', 'o', 0}
2 To store the string, you need a 6-element array of char
- 5 characters plus the 0 terminator:
char greeting[6] = "hello";
or
char greeting[] = "hello";
In the second case, the size of the array is computed from the size of the string used to initialize it (counting the 0 terminator). In both cases, you're creating a 6-element array of char
and copying the contents of the string literal to it. Unless the array is declared at file scope (oustide of any function) or with the static
keyword, it only exists for the duration of the block in which is was declared.
The string literal "hello"
is also stored in a 6-element array of char
, but it's stored in such a way that it is allocated when the program is loaded into memory and held until the program terminates3, and is visible throughout the program. When you write
char *greeting = "hello";
you are assigning the address of the first element of the array that contains the string literal to the pointer variable greeting
.
As always, a picture is worth a thousand words. Here's a simple little program:
#include <string.h>
#include <stdio.h>
#include <ctype.h>
int main( void )
{
char greeting[] = "hello"; // greeting contains a *copy* of the string "hello";
// size is taken from the length of the string plus the
// 0 terminator
char *greetingPtr = "hello"; // greetingPtr contains the *address* of the
// string literal "hello"
printf( "size of greeting array: %zu\n", sizeof greeting );
printf( "length of greeting string: %zu\n", strlen( greeting ) );
printf( "size of greetingPtr variable: %zu\n", sizeof greetingPtr );
printf( "address of string literal \"hello\": %p\n", (void * ) "hello" );
printf( "address of greeting array: %p\n", (void * ) greeting );
printf( "address of greetingPtr: %p\n", (void * ) &greetingPtr );
printf( "content of greetingPtr: %p\n", (void * ) greetingPtr );
printf( "greeting: %s\n", greeting );
printf( "greetingPtr: %s\n", greetingPtr );
return 0;
}
And here's the output:
size of greeting array: 6
length of greeting string: 5
size of greetingPtr variable: 8
address of string literal "hello": 0x4007f8
address of greeting array: 0x7fff59079cf0
address of greetingPtr: 0x7fff59079ce8
content of greetingPtr: 0x4007f8
greeting: hello
greetingPtr: hello
Note the difference between sizeof
and strlen
- strlen
counts all the characters up to (but not including) the 0 terminator.
So here's what things look like in memory:
Item Address 0x00 0x01 0x02 0x03
---- ------- ---- ---- ---- ----
"hello" 0x4007f8 'h' 'e' 'l' 'l'
0x4007fc 'o' 0x00 ??? ???
...
greetingPtr 0x7fff59079ce8 0x00 0x00 0x00 0x00
0x7fff59879cec 0x00 0x40 0x7f 0xf8
greeting 0x7fff59079cf0 'h' 'e' 'l' 'l'
0x7fff59079cf4 'o' 0x00 ??? ???
The string literal "hello"
is stored at a vary low address (on my system, this corresponds to the .rodata
section of the executable, which is for static, constant data). The variables greeting
and greetingPtr
are stored at much higher addresses, corresponding to the stack on my system. As you can see, greetingPtr
stores the address of the string literal "hello"
, while greeting
stores a copy of the string contents.
Here's where things can get kind of confusing. Let's look at the following print statements:
printf( "greeting: %s\n", greeting );
printf( "greetingPtr: %s\n", greetingPtr );
greeting
is a 6-element array of char
, and greetingPtr
is a pointer to char
, yet we're passing them both to printf
in exactly the same way, and the string is being printed out correctly; how can that work?
Unless it is the operand of the sizeof
or unary &
operators, or is a string literal used to initialize another array in a declaration, an expression of type "N-element array of T
" will be converted ("decay") to an expression of type "pointer to T
", and the value of the expression will be the address of the first element of the array.
In the printf
call, the expression greeting
has type "6-element array of char
"; since it isn't the operand of the sizeof
or unary &
operators, it is converted ("decays") to an expression of type "pointer to char
" (char *
), and the address of the first element is actually passed to printf
. IOW, it behaves exactly like the greetingPtr
expression in the next printf
call4.
The %s
conversion specifer tells printf
that its corresponding argument has type char *
, and that it it should print out the character values starting from that address until it sees the 0 terminator.
Hope that helps a bit.
1. Often referred to as the NUL
terminator; this should not be confused with the NULL
pointer constant, which is also 0-valued but used in a different context.
2. You'll also see the terminating 0-valued byte written as '\0'
. The leading backslash "escapes" the value, so instead of being treated as the character '0'
(ASCII 48), it's treated as the value 0
(ASCII 0)).
3. In practice, space is set aside for it in the generated binary file, often in a section marked read-only; attempting to modify the contents of a string literal invokes undefined behavior.
4. This is also why the declaration of greeting
copies the string contents to the array, while the declaration of greetingPtr
copies the address of the first element of the string. The string literal "hello"
is also an array expression. In the first declaration, since it's being used to initialize another array in a declaration, the contents of the array are copied. In the second declaration, the target is a pointer, not an array, so the expression is converted from an array type to a pointer type, and the resulting pointer value is copied to the variable.
In C (and in C++), arrays and pointers are represented similarly; an array is represented by the address of the first element in the array (which is sufficient to gain access to the other elements, since elements are contiguous in memory within an array). This also means that an array does not, by itself, indicate where it ends, and thus you need some way of identifying the end of the array, either by passing around the length as a separate variable or by using some convention (such as that there is a sentinel value that is placed in the last position of the array to indicate the end of the array). For strings, the latter is the common convention, with '\0' (the NUL character) indicating the end of the string.