Parsing a raw Protocol Buffer byte stream in C#

2020-07-18 11:01发布

问题:

Given a protocol buffer encoded Stream or byte[] but NOT knowing the object type itself, how can we print the skeleton of the message? The use case is for debugging IO that's protobuf based, for root cause analysis.

If there are existing tools that can parse the raw Protocol Buffer byte stream from a binary file - that would be great! An alternative could be using the ProtoBuf.NET class ProtoReader() to keep chugging along till we hit the error but the usage of ProtoReader() isn't clear. I started as below but couldn't find good documentation on how to use the ProtoReader class to actually do it. The source code of the project wasn't very straightforward to follow either ... so would appreciate some tips/help

using (var fs = File.OpenRead(filePath))
{
    using (var pr = new ProtoReader(fs, TypeModel.Create(), null))
    {
        // Use ProtoReader to march through the bytes
        // Printing field number, type, size and payload values/bytes
    }
}

回答1:

Firstly, note that the google "protoc" command-line tool has options to try to disassemble a raw message without schema information. With protobuf-net, you can do something like below - but I need to emphasize that without the schema, the format is ambiguous: there are more data types/formats than there are "wire types" (the actual encoding formats). Here I am just showing possible interpretations, but there are other ways of parsing the same data.

static void WriteTree(ProtoReader reader)
{
    while (reader.ReadFieldHeader() > 0)
    {
        Console.WriteLine(reader.FieldNumber);
        Console.WriteLine(reader.WireType);
        switch (reader.WireType)
        {
            case WireType.Variant:
                // warning: this appear to be wrong if the 
                // value was written signed ("zigzag") - to
                // read zigzag, add: pr.Hint(WireType.SignedVariant);
                Console.WriteLine(reader.ReadInt64());
                break;
            case WireType.String:
                // note: "string" here just means "some bytes"; could
                // be UTF-8, could be a BLOB, could be a "packed array",
                // or could be sub-object(s); showing UTF-8 for simplicity
                Console.WriteLine(reader.ReadString());
                break;
            case WireType.Fixed32:
                // could be an integer, but probably floating point
                Console.WriteLine(reader.ReadSingle());
                break;
            case WireType.Fixed64:
                // could be an integer, but probably floating point
                Console.WriteLine(reader.ReadDouble());
                break;
            case WireType.StartGroup:
                // one of 2 sub-object formats
                var tok = ProtoReader.StartSubItem(reader);
                WriteTree(reader);
                ProtoReader.EndSubItem(tok, reader);
                break;
            default:
                reader.SkipField();
                break;
        }
    }
}