I'm working with data from spinn3r, which consists of multiple different protobuf messages serialized into a byte stream:
http://code.google.com/p/spinn3r-client/wiki/Protostream
"A protostream is a stream of protocol buffer messages, encoded on the wire as length prefixed varints according to the Google protocol buffer specification. The stream has three parts: a header, the payload, and a tail marker."
This seems like a pretty standard use case for protobufs. In fact, protobuf core distribution provides CodedInputStream for both C++ and Java. But, it appears that protobuf does not provide such a tool for python -- the 'internal' tools are not setup for this kind of external use:
https://groups.google.com/forum/?fromgroups#!topic/protobuf/xgmUqXVsK-o
So... before I go and cobble together a python varint parser and tools for parsing a stream of different message types: does anyone know of any tools for this?
Why is it missing from protobuf? (Or am I just failing to find it?)
This seems like a big gap for protobuf, especially when compared to thrift's equivalent tools for both 'transport' and 'protocol'. Am I viewing that correctly?
This is simple enough that I can see why maybe nobody has bothered to make a reusable tool:
I've implemented a small python package to serialize multiple protobuf messages into a stream and deserialize them from a stream. You can install it by
pip
:Here's a sample code writing two lists of protobuf messages in to a file:
and then reading the same messages (e.g. Alignment messages defined in
vg_pb2.py
) from the stream:It looks like the code in the other answer is potentially lifted from here. Check the licence before using this file but I managed to get it to read
varint32
s using code such as this:This is very simple code designed to load messages of a single type delimited by
varint32
s which describe the next message's size.Update: It may also be possible to include this file directly from the protobuf library by using: