Just wondering why this is the case. I'm eager to know more about low level languages, and I'm only into the basics of C and this is already confusing me.
Do languages like PHP automatically null terminate strings as they are being interpreted and / or parsed?
From Joel's excellent article on the topic:
Think about what memory is: a contiguous block of byte-sized units that can be filled with any bit patterns.
A character is simply one of those bit patterns. Its meaning as a string is determined by how you treat it. If you looked at the same part of memory, but using an integer view (or some other type), you'd get a different value.
If you have a variable which is a pointer to the start of a bunch of characters in memory, you must know when that string ends and the next piece of data (or garbage) begins.
Example
Let's look at this string in memory...
...we can see that the string logically ends after the
!
character. If there were no\0
(or any other method to determine its end), how would we know when seeking through memory that we had finished with that string? Other languages carry the string length around with the string type to solve this.I asked this question when my underlying knowledge of computers was limited, and this is the answer that would have helped many years ago. I hope it helps someone else too. :)
In C strings are represented by an array of characters allocated in a contiguous block of memory and thus there must either be an indicator stating the end of the block (ie. the null character), or a way of storing the length (like Pascal strings which are prefixed by a length).
In languages like PHP,Perl,C# etc.. strings may or may not have complex data structures so you cannot assume they have a null character. As a contrived example, you could have a language that represents a string like so:
but you only see it as a regular string with no length field, as this can be calculated by the runtime environment of the language and is only used internally by it to allocate and access memory correctly.
They are null-terminated because whole plenty of Standard Library functions expects them to be.
It is a convention - one could have implemented it with another algorithm (e.g. length at the beginning of the buffer).
In a "low level" language such as assembler, it is easy to test for "NULL" efficiently: that might have ease the decision to go with NULL terminated strings as opposed of keeping track of a length counter.
Because in C strings are just a sequence of characters accessed viua a pointer to the first character.
There is no space in a pointer to store the length so you need some indication of where the end of the string is.
In C it was decided that this would be indicated by a null character.
In pascal, for example, the length of a string is recorded in the byte immediately preceding the pointer, hence why pascal strings have a maximum length of 255 characters.