Reading a “string in little-endian UTF-16 encoding

2019-01-28 08:58发布

问题:

I am following this specification of this file format: https://github.com/rouault/dump_gdbtable/wiki/FGDB-Spec

utf16: string in little-endian UTF-16 encoding

How do I read this? I tried BinaryReader.ReadString() however it returns something along the lines of:

"\0e\0y\0w\0o\0r\0d\0\0 \0\0\0\0\rP\0a\0r\0a\0m\0e\0t\0e\0r\0N\0a\0m\0e\0\0 \0\0\0\0\fC\0o\0n\0f\0i\0g\0S\0t\0r\0"

That definitely isn't right.


From the specification:

ubyte: number of UTF-16 characters (not bytes) of the name of the field
utf16: name of the field
ubyte: number of UTF-16 characters (not bytes) of the alias of the field. Might be 0
utf16: alias of the field (ommitted if previous field is 0)
ubyte: field type ( 0 = int16, 1 = int32, 2 = float32, 3 = float64, 4 = string, 5 = datetime, 6 = objectid, 7 = geometry, 8 = binary, 9=raster, 10/11 = UUID, 12 = XML )

Could I somehow use the number of UTF-16 characters to read the name of the field?

回答1:

BinaryReaders ReadString() method doesn't provide an overload where you can specify the string length (instead it assumes an encoded prefixed length, which doesn't match the format of the spec you linked).

Therefore, you cannot use ReadString() directly, but you can

  1. use ReadByte() to get the string (character) length,
  2. multiply it by 2,
  3. use ReadBytes(count),
  4. use Encoding.Unicode.GetString(bytes).


回答2:

It should be:

BinaryReader br = new BinaryReader(File.Open("C:\\florida.gdb\\a00000002.gdbtable",
                                   FileMode.Open,
                                   FileAccess.Read,
                                   FileShare.Read | FileShare.Delete),
                      Encoding.Unicode);

Where Encoding is System.Text.Encoding.


For various historical reasons, Microsoft/Windows refer to UTF-16 (and, specifically, the little-endian variant) as "Unicode" rather than UTF-16.