Protobuf streaming (lazy serialization) API

2019-03-30 08:55发布

问题:

We have an Android app that uses Protocol Buffers to store application data. The data format (roughly) is a single protobuf ("container") that contains a list of protobufs ("items") as a repeated field:

message Container {
    repeated Item item = 1;
}

When we want to save a change to an item, we must recreate the protobuf container, add all the items to it, then serialize it and write it to a file.

The problem with this a approach is it potentially triples the memory used when saving because the data has to first be copied from the model class to the protobuf builder and then to a byte array when the protobuf is serialized, all before writing it out to a file stream.

What we would like is a way to create our protobuf container and lazily serialize it to a stream, then simply add each protobuf item (created from our model data) to the container which serializes and writes it to the stream, rather than keeping all the items in memory until we've created the entire container in memory.

Is there a way to build a protobuf and serialize it lazily to a stream?

If there's not a way to do this officially, are there any libraries that can help? Does anyone have any suggestions or ideas how to solve this in other ways? Alternative data formats or technologies (e.g. JSON or XML containing protobufs) that would make this possible?

回答1:

For serialization:

protobuf is an appendable format, with individual items being merged, and repeated items being appended

Therefore, to write a sequence as a lazy stream, all you need to do is repeatedly write the same structure with only one item in the list: serializing a sequence of 200 x "Container with 1 Item" is 100% identical to serializing 1 x "Container with 200 Items".

So: just do that!


For deserialization:

That is technically very easy to read as a stream - it all, however, comes down to which library you are using. For example, I expose this in protobuf-net (a .NET / C# implementation) as Serializer.DeserializeItems<T>, which reads (fully lazy/streaming) a sequence of messages of type T, based on the assumption that they are in the form you describe in the question (so Serializer.DeserializeItems<Item> would be the streaming way that replaces Serializer.Deserialize<Container> - the outermost object kinda doesn't really exist in protobuf)

If this isn't available, but you have access to a raw reader API, what you need to do is:

  • read one varint for the header - this will be the value 10 (0x0A), i.e. "(1 << 3) | 2" for the field-number (1) and wire-type (2) respectively - so this could also be phrased: "read a single byte from the stream , and check the value is 10"
  • read one varint for the length of the following item
  • now:
    • if the reader API allows you to restrict the maximum number of bytes to process, use this length to specify the length that follows
    • or wrap the stream API with a length-limiting stream, limited to that length
    • or just manually read that many bytes, and construct an in-memory stream from the payload
  • rinse, repeat


回答2:

There is no such thing. A protobuf is a packed structure. In order to do this effectively it would need all the data. You will have to add the "streaming protocol" yourself. Maybe send a protobuf msg every N items.



回答3:

In the normal java version of Protocol buffers there is Delimited files where you write Protocol-Buffers one at a time. I am not sure if it is in the Android version

 aLocation.writeDelimitedTo(out);

As Marc has indicated it easily implemented; just write a length followed the serialised bytes. In normal (non android) java version of prortocol-buffers you can also do (you have to serialise to a byte array or something similar)

private CodedOutputStream codedStream = null;


public void write(byte[] bytes) throws IOException {
    if (bytes != ConstClass.EMPTY_BYTE_ARRAY) {
        codedStream.writeRawVarint32(bytes.length);
        codedStream.writeRawBytes(bytes);
        codedStream.flush();
    }
}

and

    private CodedInputStream coded;

public byte[] read() throws IOException {
    if (coded == null) {
        throw new IOException("Reader has not been opened !!!");
    }
    if (coded.isAtEnd()) {
        return null;
    }
    return coded.readBytes().toByteArray();

Something may be possible in other Protocol-Buffers versions