Recently I read this guide on undocumented featuers in TensorFlow, as I needed to pass variable length sequences as input. However, I found the protocol for tf.train.SequenceExample
relatively confusing (especially due to lack of documentation), and managed to build an input pipe using tf.train.Example
just fine instead.
Are there any advantages to using tf.train.SequenceExample
? Using the standard example protocol when there is a dedicated one for variable length sequences seems like a cheat, but does it bear any consequence?
The link you provided lists some benefits. You can see how parse_single_sequence_example is used here https://github.com/tensorflow/magenta/blob/master/magenta/common/sequence_example_lib.py
If you managed to get the data into your model with
Example
, it should be fine.SequenceExample
just gives a little more structure to your data and some utilities for working with it.Here are the definitions of the
Example
andSequenceExample
protocol buffers, and all the protos they may contain:An
Example
contains aFeatures
, which contains a mapping from feature name toFeature
, which contains either abytes
list, or afloat
list or anint64
list.A
SequenceExample
also contains aFeatures
, but it also contains aFeatureLists
, which contains a mapping from list name toFeatureList
, which contains a list ofFeature
. So it can do everything anExample
can do, and more. But do you really need that extra functionality? What does it do?Since each
Feature
contains a list of values, aFeatureList
is a list of lists. And that's the key: if you need lists of lists of values, then you needSequenceExample
.For example, if you handle text, you can represent it as one big string:
Or you could represent it as a list of words and tokens:
Or you could represent each sentence separately. That's where you would need a list of lists:
Then create the
SequenceExample
:And you can serialize it and perhaps save it to a TFRecord file.
Later, when you read the data, you can parse it using
tf.io.parse_single_sequence_example()
.