How do I get a consistent byte representation of s

2018-12-31 01:44发布

How do I convert a string to a byte[] in .NET (C#) without manually specifying a specific encoding?

I'm going to encrypt the string. I can encrypt it without converting, but I'd still like to know why encoding comes to play here.

Also, why should encoding be taken into consideration? Can't I simply get what bytes the string has been stored in? Why is there a dependency on character encodings?

30条回答
春风洒进眼中
2楼-- · 2018-12-31 01:53

The accepted answer is very, very complicated. Use the included .NET classes for this:

const string data = "A string with international characters: Norwegian: ÆØÅæøå, Chinese: 喂 谢谢";
var bytes = System.Text.Encoding.UTF8.GetBytes(data);
var decoded = System.Text.Encoding.UTF8.GetString(bytes);

Don't reinvent the wheel if you don't have to...

查看更多
何处买醉
3楼-- · 2018-12-31 01:53
bytes[] buffer = UnicodeEncoding.UTF8.GetBytes(string something); //for converting to UTF then get its bytes

bytes[] buffer = ASCIIEncoding.ASCII.GetBytes(string something); //for converting to ascii then get its bytes
查看更多
不再属于我。
4楼-- · 2018-12-31 01:54

With the advent of Span<T> released with C# 7.2, the canonical technique to capture the underlying memory representation of a string into a managed byte array is:

byte[] bytes = "rubbish_\u9999_string".AsSpan().AsBytes().ToArray();

Converting it back should be a non-starter because that means you are in fact interpreting the data somehow, but for the sake of completeness:

string s;
unsafe
{
    fixed (char* f = &bytes.AsSpan().NonPortableCast<byte, char>().DangerousGetPinnableReference())
    {
        s = new string(f);
    }
}

The names NonPortableCast and DangerousGetPinnableReference should further the argument that you probably shouldn't be doing this.

Note that working with Span<T> requires installing the System.Memory NuGet package.

Regardless, the actual original question and follow-up comments imply that the underlying memory is not being "interpreted" (which I assume means is not modified or read beyond the need to write it as-is), indicating that some implementation of the Stream class should be used instead of reasoning about the data as strings at all.

查看更多
看淡一切
5楼-- · 2018-12-31 01:55

You can use the following code for conversion between string and byte array.

string s = "Hello World";

// String to Byte[]

byte[] byte1 = System.Text.Encoding.Default.GetBytes(s);

// OR

byte[] byte2 = System.Text.ASCIIEncoding.Default.GetBytes(s);

// Byte[] to string

string str = System.Text.Encoding.UTF8.GetString(byte1);
查看更多
大哥的爱人
6楼-- · 2018-12-31 01:55

simple code with LINQ

string s = "abc"
byte[] b = s.Select(e => (byte)e).ToArray();

EDIT : as commented below, it is not a good way.

but you can still use it to understand LINQ with a more appropriate coding :

string s = "abc"
byte[] b = s.Cast<byte>().ToArray();
查看更多
若你有天会懂
7楼-- · 2018-12-31 01:56

Also please explain why encoding should be taken into consideration. Can't I simply get what bytes the string has been stored in? Why this dependency on encoding?!!!

Because there is no such thing as "the bytes of the string".

A string (or more generically, a text) is composed of characters: letters, digits, and other symbols. That's all. Computers, however, do not know anything about characters; they can only handle bytes. Therefore, if you want to store or transmit text by using a computer, you need to transform the characters to bytes. How do you do that? Here's where encodings come to the scene.

An encoding is nothing but a convention to translate logical characters to physical bytes. The simplest and best known encoding is ASCII, and it is all you need if you write in English. For other languages you will need more complete encodings, being any of the Unicode flavours the safest choice nowadays.

So, in short, trying to "get the bytes of a string without using encodings" is as impossible as "writing a text without using any language".

By the way, I strongly recommend you (and anyone, for that matter) to read this small piece of wisdom: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

查看更多
登录 后发表回答