What are the trade-offs, advantages and disadvantages of each of these implementations ? Are they any different at all ? What I want achieve is to store a vector of box'es, into a protobuf.
Impl 1 :
package foo;
message Boxes
{
message Box
{ required int32 w = 1;
required int32 h = 2;
}
repeated Box boxes = 1;
}
Impl 2:
package foo;
message Box
{ required int32 w = 1;
required int32 h = 2;
}
message Boxes
{ repeated Box boxes = 1;
}
Impl 3 : Stream multiple of these messages into the same file.
package foo;
message Box
{ required int32 w = 1;
required int32 h = 2;
}
1 & 2 only change where / how the types are declared. The work itself will be identical.
3 is more interesting: you can't just stream Box
after Box
after Box
, because the root object in protobuf is not terminated (to allow concat === merge). If you only write Box
es, when you deserialize you will have exactly one Box
with the last w
and h
that were written. You need to add a length-prefix; you could do that arbitrarily, but: if you happen to choose to "varint"-encode the length, you're close to what the repeated
gives you - except the repeated
also includes a field-header (field 1, type 2 - so binary 1010 = decimal 10) before each "varint" length.
If I were you, I'd just use the repeated
for simplicity. Which of 1 / 2 you choose would depend on personal choice.
Marc Gravell answer is certainly correct, but one point he missed is
- option's 1 & 2 (Repeated option) will serialise / deserialise all the box's at once
- option 3 (multiple messages in the file) will serialise / deserialise box by box.
If using java, you can use delimited files (which will add a Var-Int length at the start of the message).
Most of the time it will not matter wether you use a Repeated or Multiple messages, but if there are millions / billions of box's, memory will be an issue for option's 1 and 2 (Repeated) and option 3 (multiple messages in the file) would be the best to choose.
So in summary:
- If there millions / billions of Boxes use - Option 3 (multiple messages in the file).
- Otherwise use one of the Repeated options (1/2) because it simpler and supported across all Protocol buffers versions.
Personally I would like to see a "standard" Multiple Message format