可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I know the UTF-16 has two types of endiannesses: big endian and little endian.

Does the C++ standard define the endianness of std::wstring? or it is implementation-defined?

If it is standard-defined, which page of the C++ standard provide the rules on this issue?

If it is implementation-defined, how to determine it? e.g. under VC++. Does the compiler guarantee the endianness of std::wstring is strictly dependent on the processor?

I have to know this; because I want to send the UTF-16 string to others. I must add the correct BOM in the beginning of the UTF-16 string to indicate its endianness.

In short: Given a std::wstring, how should I reliably determine its endianness?

回答1:

Endianess is MACHINE dependent, not language dependent. Endianess is defined by the processor and how it arranges data in and out of memory. When dealing with wchar_t (which is wider than a single byte), the processor itself upon a read or write aligns the multiple bytes as it needs to in order to read or write it back to RAM again. Code simply looks at it as the 16 bit (or larger) word as represented in a processor internal register.

For determining (if that is really what you want to do) endianess (on your own), you could try writing a KNOWN 32 bit (unsigned int) value out to ram, then read it back using a char pointer. Look for the ordering that is returned.

It would look something like this:

unsigned int aVal = 0x11223344;
char * myValReadBack = (char *)(&aVal);

if(*myValReadBack == 0x11) printf("Big endian\r\n");
else                       printf("Little endian\r\n");

Im sure there are other ways, but something like the above should work, check my little versus big though :-)

Further, until Windows RT, VC++ really only compiled to intel type processors. They really only have had 1 endianess type.

回答2:

It is implementation-defined. wstring is just a string of wchar_t, and that can be any byte ordering, or for that matter, any old size.

回答3:

wchar_t is not required to be UTF-16 internally and UTF-16 endianness does not affect how wchar's are stored, it's a matter of saving and reading it.

You have to use an explicit procedure of converting wstring to a UTF-16 bytestream before sending it anywhere. Internal endianness of wchar is architecture-dependent and it's better to use some opaque interfaces for converting than try to convert it manually.

回答4:

For the purposes of sending the correct BOM, you don't need to know the endianness. Just use the code \uFEFF. That will be bigendian or little-endian depending on the endianness of your implementation. You don't even need to know whether your implementation is UTF-16 or UTF-32. As long as it is some unicode encoding, you'll end up with the appropriate BOM.

Unfortunately, neither wchars nor wide streams are guaranteed to be unicode.