From:
man strchr
char *strchr(const char *s, int c);
The strchr() function returns a pointer to the first occurrence of the character c in the string s.
Here "character" means "byte"; these functions do not work with wide or multibyte characters.
Still, if I try to search a multi-byte character like é
(0xC3A9
in UTF-8):
const char str[] = "This string contains é which is a multi-byte character";
char * pos = strchr(str, (int)'é');
printf("%s\n", pos);
printf("0x%X 0x%X\n", pos[-1], pos[0]);
I get the following output:
� which is a multi-byte character
0xFFFFFFC3 0xFFFFFFA9
Despite the warning:
warning: multi-character character constant [-Wmultichar]
So here are my questions:
- What does it means
strchr
doesn't work with multi-byte characters ? (it seems to work, providedint
type is big enough to contains your multi-byte that can be at most 4 bytes) - How to get rid of the warning, i.e. how to safely recover the mult-byte value and store it in an int ?
- Why the prefixes
0xFFFFFF
?
That's the problem. It seems to work. Firstly, it's entirely up to the compiler what it puts in the string if you put multibyte characters in it, if indeed it compiles it at all. Clearly you are lucky (for some appropriate interpretation of lucky) in that it has filled your string with
and that you are looking for
c3a9
, as it can find that fairly easily. The man page on strchr says:So you pass c3a9 to this, which is converted to a
char
with value 'a9'. It finds thea9
character, and you get returned a pointer to it.The
ffffff
prefix is because you are outputting a signed character as a 32 bit hex number, so it sign extends it for you. This is as expected.The problem is that 'undefined behaviour' is just that. It might work almost correctly. And it might not, depending on circumstances.
And again it is almost. You are not getting a pointer to the multibyte character, you are getting a pointer to the middle of it, (and I'm surprised you're interpreting that as working). If the multibyte character had evaluated to 0xff20 you'd get pointed to somewhere much earlier in the string.
strchr()
only seems to work for your multi-byte character.The actual string in memory is
When you call
strchr()
, you are really only searching for the0xA9
, which are the lower 8 bits. That's whypos[-1]
has the first byte of your multi-byte character: it was ignored during the search.A
char
is signed on your system, which is why your characters are sign extended (the0xFFFFFF
) when you print them out.As for the warning, it seems that the compiler is trying to tell you that you are doing something odd, which you are. Don't ignore it.