Difference and conversions between wchar_t for Lin

2019-08-19 06:02发布

问题:

I understand from this and this thread that in Windows, wchar_t is 16-bit & for Linux, wchar_t is 32 bit.

I have a client-server architecture (using just pipes - not sockets)- where my server is Windows based and client is Linux.

Server has a API to retrieve hostname from client. When the client is Windows based, it could just do GetComputerNameW and return Wide-String. However, when the client is Linux based, things get messy.

As a first naive approach, I used mbstowcs() hoping to return wchar_t* to Windows server-side. However, this LPWSTR (I have typedef wchar_t* LPWSTR on my linux clinet side) is not recognizable on Windows since it expects its wchar_t to be 16-bit.

So, converting the output of gethostname() on linux - which is in char* to unsigned short (16-bit) my only option?

Thanks in Advance!

回答1:

You will have to decide on the actual protocol on how to transport the data across the wire. Several options here although probably UTF-8 is usually the most sensible one - also that means that under linux you can basically just use the data as-is (no reason to use wchar_t to begin with, although you obviously can convert it into whatever you want).

Under Windows you will have to convert the UTF-8 into UTF-16 (yes not exactly, but oh well) which windows wants and if you want to send data you have to convert it to UTF-8. Luckily windows provides this respectively this function for exactly these purposes.

Obviously you can decide on any encoding you want to not necessarily UTF-8, the process is the same: When receiving data convert it to the native format of the OS, when sending convert it to your on-wire encoding. iconv works on linux if you don't use utf-8.



回答2:

You are best off choosing a standard character encoding for the data you send over the pipe, and then require all machines to send their data using that encoding.

Windows uses UTF-16LE, so you could choose to use UTF-16LE over the pipe and then Windows machines can send their UTF-16LE encoded strings as-is, but Linux machines would have to convert to/from UTF-16LE as needed.

Or you could choose UTF-8 instead, which would reduce network bandwidth, but both Windows and Linux machines would have to convert to/from UTF-8 as neded. For network communications, UTF-8 would be the better choice.

On Windows, you can use MultiByteToWideChar() and WideCharToMultiByte() with the CP_UTF8 codepage.

In Linux, use the iconv() API so you can specify the UTF-8 charset for encoding/decoding.