Are the raw bytes written by .NET System.IO.Binary

2019-07-14 12:32发布

Background

I am manually writing a large data block into a binary file with System.IO.BinaryWriter. I have chosen this due to the improved performance compared to a wide variety of other means of serialization & deserialization (I am currently deserializing with System.IO.BinaryReader).

Question

I may need to use the serialized formats in other programming languages like Java and/or Rust. Would they be able to understand the raw binary written by System.IO.BinaryWriter and read it in a similar manner to .NETs 'System.IO.BinaryReader'?

(I am assuming that the new plaforms (Java/Rust) will have implicit knowledge of the specific order in which the raw binary was written.)

Side Info

I am aware that protocol buffers is meant to be a performant and language agnostic framework for serializing/deserializing in this scenario but: (1) I am using F# and it struggles with the discriminated unions (2) It wasn't really that much effort to write my own custom serializer as my types aren't too complex

2条回答
2楼-- · 2019-07-14 13:00

Yes, you can.

bool     --> 0 | 1
sbyte    --> x
byte[]   --> xxxxxx
char[]   --> encoding.getbytes(char[])
byte     --> x
char     --> 
decimal  --> decimal.GetBytes(), 16 bytes, should see the System.Decimal class code
double   --> 8 bytes, should see the System.Double class code
short    --> 2 bytes, <lsb><msb>
int      --> 4 byets, <lsb>xx<msb>
long     --> 8 bytes, <lsb>xxxxxx<msb>
float    --> 4 bytes, should see the System.Single class code
string   --> 7 bit encoded length (variable size) + encoding.GetBytes(), see 7 bit encoding method below
ushort   --> same as short
uint     --> same as int
ulong    --> same as long

For numeric types, data is written in Little Endian Format

protected void Write7BitEncodedInt(int value)
{
    uint num = (uint) value;
    while (num >= 0x80)
    {
        this.Write((byte) (num | 0x80));
        num = num >> 7;
    }
    this.Write((byte) num);
}
查看更多
\"骚年 ilove
3楼-- · 2019-07-14 13:01

It depends on the types you write with the BinaryWriter.

  • byte, sbyte and byte[]: no problem.
  • (U)IntXX: matter of endianness. The .NET BinaryWriter dumps these types in little endian format.
  • float and double: If both systems use the same IEEE 754 standard, and both systems use the same endianness, then it is no problem.
  • decimal: This is a .NET-specific type, similar to Currency but uses different format. Use carefully.
  • char and char[]: Uses the current Encoding of the BinaryWriter. Use the same encoding on both sides and everything is alright.
  • string: The length of the string is encoded in the so-called 7 bit-encoded int format (1 byte for up to 127 chars, etc), and uses the current encoding. To make things compatible maybe you should dump character arrays with manually dumped length information.
查看更多
登录 后发表回答