Is it safe to encode and decode in c#?

2019-04-11 02:46发布

问题:

In c# I can encode binary data by Encoding.UTF8.GetString() and later convert it back by binary = Encoding.UTF8.GetBytes().

I expect that the result should be my original binary data in any case - no exception.

But is it true in any case?

Or does it depend on the specific behaviour of the UTF8 character set?

Or should I better use Encoding.ASCII.GetString() and Encoding.ASCII.GetBytes()?

If anybody knows what Encoding exactly does (how it treats special characters or special bytes) then, please, give me advice.

回答1:

In c# I can encode binary data by Encoding.UTF8.GetString() and later convert it back by binary = Encoding.UTF8.GetBytes().

No, because that isn't what a text encoding does.

A text encoding transforms arbitrary text to/from structured bytes (meaning: structured in the way defined by that encoding)

You have arbitrary bytes, not structured bytes. You should use base-64 (Convert.ToBase64String / Convert.FromBase64String), which converts arbitrary bytes to/from a structured string - in this case, structured according to the rules of base-64.

byte[] orig = ...
string storeThis = Convert.ToBase64String(orig);
// ...
byte[] backAgain = Convert.FromBase64String(storeThis);


回答2:

You should only use the Encoding.UTF8, when you expect the bytes are in UTF8 encoding. Using the GetString() on unknown bytes can lead to unexpected results.

So if you use Encoding.UTF8.GetBytes("Hello world!"); You can bring it back to a string with Encoding.UTF8.GetString(byteArray);