String marshalling with marshal_as and encodings

2019-05-12 11:07发布

问题:

Converting between String^ and std::string is very easy using marshal_as. However, I have nowhere found a description of how encodings in such a string are handled. String^ uses UTF-16 but what about std::string? Text in that can be interpreted in various ways and it would be very usefull if the marshalling would convert to an encoding that is native to your application.

In my case all std::string instances contain UTF-8 encoded text. So how would I tell marshal_as to give me an UTF-8 encoded variant of the original String^ (and vice versa)?

回答1:

I agree that the documentation is lacking. Without proper documentation we are programming by coincidence. marshal_as can be very useful but when I have a question that isn't answered in the documentation, I just skip it and do it in multiple steps. Someone may have an accurate answer about how marshal_as works in each case but unless you add it to your code as a comment, the next programmer isn't going to think of the issue or understand it, even after checking the documentation.

The BCL is very capable of converting characters. I suggest using an Encoding member to GetBytes and then copy them to a C or C++ string data structure/class. Despite requiring more steps, it is then clear which character sets and encodings you are using, how mismatches are handled, how the string ownership can be transfered and how it should be destroyed. (Mismatches are, of course, not applicable when converting between UTF-16 and UTF-8.)