In this convert function
public static byte[] GetBytes(string str)
{
byte[] bytes = new byte[str.Length * sizeof(char)];
System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
return bytes;
}
byte[] test = GetBytes("abc");
The resulting array contains zero character
test = [97, 0, 98, 0, 99, 0]
And when we convert byte[] back to string, the result is
string test = "a b c "
How do we make it so it doesn't create those zeroes
In reality .net (at least for 4.0) automatically changes size of char when serialized with BinaryWriter
UTF-8 chars have variable length (might not be 1 byte), ASCII chars have 1 byte
'ē' = 2 bytes
'e' = 1 byte
It must be kept in mind when using
In case of word "ēvalds" = 7 bytes size will be different than "evalds" = 6 bytes
Try to specify
Encoding
explicitly. You can use next code to convert string to bytes with specified encodingif you print contents of bytes, you will get
{ 97, 98, 99 }
which doesn't contain zeros, as in your example In your example default encoding using 16 bits per symbol. It can be observer by printing the results ofThen while converting it back, you should select the appropriate encoding:
Prints
"abc"
as you might expectedJust to clear the confusion about your answer, char type in C# takes 2 bytes. So, string.toCharArray() returns an array in which each item takes 2 bytes of storage. While copying to byte array where each item takes 1 byte storage, there occurs a data loss. Hence the zeroes showing up in result.
As suggested,
Encoding.ASCII.GetBytes
is a safer option to use.First let's look at what your code does wrong.
char
is 16-bit (2 byte) in .NET framework. Which means when you writesizeof(char)
, it returns2
.str.Length
is1
, so actually your code will bebyte[] bytes = new byte[2]
is the samebyte[2]
. So when you useBuffer.BlockCopy()
method, you actually copy2
bytes from a source array to a destination array. Which means yourGetBytes()
method returnsbytes[0] = 32
andbytes[1] = 0
if your string is" "
.Try to use
Encoding.ASCII.GetBytes()
instead.Output:
(97,0) is Unicode representation of 'a'. Unicode represents each character in two bytes. So you can not remove zeros. But you can change Encoding to ASCII. Try following for Converting string to byte[].