I was implementing a version of memcpy()
to be able to use it with volatile
.
Is it safe to use char *
or do I need unsigned char *
?
volatile void *memcpy_v(volatile void *dest, const volatile void *src, size_t n)
{
const volatile char *src_c = (const volatile char *)src;
volatile char *dest_c = (volatile char *)dest;
for (size_t i = 0; i < n; i++) {
dest_c[i] = src_c[i];
}
return dest;
}
I think unsigned
should be necessary to avoid overflow problems if the data in any cell of the buffer is > INT8_MAX
, which I think might be UB.
You do not need
unsigned
.Like so:
Attemping to make a confirming implementation where
char
has a trap value will eventually lead to a contradiction:fread()
andfwrite()
fgets()
takes achar *
as its first argument and can be used on binary files.strlen()
finds the distance to the next null from a givenchar *
. Sincefgets()
is guaranteed to have written one, it will not read past the end of the array and therefore will not trapPerhaps
"String handling" functions such as
memcpy()
have the specification:Using
unsigned char
is the specified "as if" type. Little to be gained attempting others - which may or may not work.Using
char
withmemcpy()
may work, but extending that paradigm to other like functions leads to problems.A single big reason to avoid
char
forstr...()
andmem...()
like functions is that sometimes it makes a functional difference unexpectedly.memcmp(), strcmp()
certainly differ with (signed)char
vs.unsigned char
.Pedantic: On relic non-2's complement with signed
char
, only'\0'
should end a string. Yetnegative_zero == 0
too and achar
withnegative_zero
should not indicate the end of a string.In theory, your code might run on a machine which forbids one bit pattern in a signed
char
. It might use ones' complement or sign-magnitude representations of negative integers, in which one bit pattern would be interpreted as a 0 with a negative sign. Even on two's-complement architectures, the standard allows the implementation to restrict the range of negative integers so thatINT_MIN == -INT_MAX
, although I don't know of any actual machine which does that.So, according to §6.2.6.2p2, there may be one signed character value which an implementation might treat as a trap representation:
(There cannot be any other trap values for character types, because §6.2.6.2 requires that
signed char
not have any padding bits, which is the only other way that a trap representation can be formed. For the same reason, no bit pattern is a trap representation forunsigned char
.)So, if this hypothetical machine has a C implementation in which
char
is signed, then it is possible that copying an arbitrary byte through achar
will involve copying a trap representation.For signed integer types other than
char
(if it happens to be signed) andsigned char
, reading a value which is a trap representation is undefined behaviour. But §6.2.6.1/5 allows reading and writing these values for character types only:(The third sentence is a bit clunky, but to simplify: storing a value into memory is a "side effect that modifies all of the object", so it's permitted as well.)
In short, thanks to that exception, you can use
char
in an implementation ofmemcpy
without worrying about undefined behaviour.However, the same is not true of
strcpy
.strcpy
must check for the trailing NUL byte which terminates a string, which means it needs to compare the value it reads from memory with 0. And the comparison operators (indeed, all arithmetic operators) first perform integer promotion on their operands, which will convert thechar
to anint
. Integer promotion of a trap representation is undefined behaviour, as far as I know, so on the hypothetical C implementation running on the hypothetical machine, you would need to useunsigned char
in order to implementstrcpy
.The
unsigned
is not needed, but there is no reason to use plainchar
for this function. Plainchar
should only be used for actual character strings. For other uses, the typesunsigned char
oruint8_t
andint8_t
are more precise as the signedness is explicitly specified.If you want to simplify the function code, you can remove the casts: