How does protocol buffer handle versioning?

2019-03-12 06:35发布

问题:

How does protocol buffers handle type versioning?

For example, when I need to change a type definition over time? Like adding and removing fields.

回答1:

Google designed protobuf to be pretty forgiving with versioning:

  • unexpected data is either stored as "extensions" (making it round-trip safe), or silently dropped, depending on the implementation
  • new fields are generally added as "optional", meaning that old data can be loaded successfully

however:

  • do not renumber fields - that would break existing data
  • you should not normally change the way any given field is stored (i.e. from a fixed-with 32-bit int to a "varint")

Generally speaking, though - it will just work, and you don't need to worry much about versioning.



回答2:

I know this is an old question, but I ran into this recently. The way I got around it is using facades, and run-time decisions to serialize. This way I can deprecate/upgrade a field into a new type, with old and new messages handling it gracefully.

I am using Marc Gravell's protobuf.net (v2.3.5), and C#, but the theory of facades would work for any language and Google's original protobuf implementation.

My old class had a Timestamp of DateTime which I wanted to change to include the "Kind" (a .NET anachronism). Adding this effectively meant it serialized to 9 bytes instead of 8, which would be a breaking serialization change!

    [ProtoMember(3, Name = "Timestamp")]
    public DateTime Timestamp { get; set; }

A fundamental of protobuf is NEVER to change the proto ids! I wanted to read old serialized binaries, which meant "3" was here to stay.

So,

I renamed the old property and made it private (yes, it can still deserialize through reflection magic), but my API no longer shows it useable!

    [ProtoMember(3, Name = "Timestamp-v1")]
    private DateTime __Timestamp_v1 = DateTime.MinValue;

I created a new Timestamp property, with a new proto id, and included the DateTime.Kind

    [ProtoMember(30002, Name = "Timestamp", DataFormat = ProtoBuf.DataFormat.WellKnown)]
    public DateTime Timestamp { get; set; }

I added a "AfterDeserialization" method to update our new time, in the case of old messages

    [ProtoAfterDeserialization]
    private void AfterDeserialization()
    {
        //V2 Timestamp includes a "kind" - we will stop using __Timestamp - so keep it up to date
        if (__Timestamp_v1 != DateTime.MinValue)
        {
            //Assume the timestamp was in UTC - as it was...
            Timestamp = new DateTime(__Timestamp_v1.Ticks, DateTimeKind.Utc)     //This is for old messages - we'll update our V2 timestamp...
        }
    }

Now, I have the old and new messages serializing/deserializing correctly, and my Timestamp now includes DateTime.Kind! Nothing broken.

However, this does mean that BOTH fields will be in all new messages going forward. So the final touch is to use a run-time serialization decision to exclude the old Timestamp (note this won't work if it was using protobuf's required attribute!!!)

    bool ShouldSerialize__Timestamp_v1() 
    {
        return __Timestamp_v1 != DateTime.MinValue;
    }

And thats it. I have a nice unit test which does it from end-to-end if anyone wants it...

I know my method relies on .NET magic, but I reckon the concept could be translated to other languages....