How do I convert a string
to a byte[]
in .NET (C#) without manually specifying a specific encoding?
I'm going to encrypt the string. I can encrypt it without converting, but I'd still like to know why encoding comes to play here.
Also, why should encoding be taken into consideration? Can't I simply get what bytes the string has been stored in? Why is there a dependency on character encodings?
The accepted answer is very, very complicated. Use the included .NET classes for this:
Don't reinvent the wheel if you don't have to...
With the advent of
Span<T>
released with C# 7.2, the canonical technique to capture the underlying memory representation of a string into a managed byte array is:Converting it back should be a non-starter because that means you are in fact interpreting the data somehow, but for the sake of completeness:
The names
NonPortableCast
andDangerousGetPinnableReference
should further the argument that you probably shouldn't be doing this.Note that working with
Span<T>
requires installing the System.Memory NuGet package.Regardless, the actual original question and follow-up comments imply that the underlying memory is not being "interpreted" (which I assume means is not modified or read beyond the need to write it as-is), indicating that some implementation of the
Stream
class should be used instead of reasoning about the data as strings at all.You can use the following code for conversion between string and byte array.
simple code with LINQ
EDIT : as commented below, it is not a good way.
but you can still use it to understand LINQ with a more appropriate coding :
Because there is no such thing as "the bytes of the string".
A string (or more generically, a text) is composed of characters: letters, digits, and other symbols. That's all. Computers, however, do not know anything about characters; they can only handle bytes. Therefore, if you want to store or transmit text by using a computer, you need to transform the characters to bytes. How do you do that? Here's where encodings come to the scene.
An encoding is nothing but a convention to translate logical characters to physical bytes. The simplest and best known encoding is ASCII, and it is all you need if you write in English. For other languages you will need more complete encodings, being any of the Unicode flavours the safest choice nowadays.
So, in short, trying to "get the bytes of a string without using encodings" is as impossible as "writing a text without using any language".
By the way, I strongly recommend you (and anyone, for that matter) to read this small piece of wisdom: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)