Notes:
this question extends upon a previous question of mine. In that question I ask about the best way to store some dummy data as
Example
andSequenceExample
seeking to know which is better for data similar to dummy data provided. I provide both explicit formulations of theExample
andSequenceExample
construction as well as, in the answers, a programatic way to do so.Because this is still a lot of code, I am providing a Colab (interactive jupyter notebook hosted by google) file where you can try the code out yourself to assist. All the necessary code is there and it is generously commented.
I am trying to learn how to convert my data into TF Records as the claimed benefits are worthwhile for my data. However, the documentation leaves a lot to be desired and the tutorials / blogs (that I have seen) which try to go deeper, really only touch the surface or rehash the sparse docs that exist.
For the demo data considered in my previous question - as well as here - I have written a decent class that takes:
- a sequence with n channels (in this example it is integer based, of fixed-length and with n channels)
- soft-labeled class probabilities (in this example there are n classes and float based)
- some meta data (in this example a string and two floats)
and can encode the data in 1 of 6 forms:
- Example, with sequence channels / classes separate in a numeric type (
int64
in this case) with meta data tacked on - Example, with sequence channels / classes separate as a byte string (via
numpy.ndarray.tostring()
) with meta data tacked on Example, with sequence / classes dumped as byte string with meta data tacked on
SequenceExample, with sequence channels / classes separate in a numeric type and meta data as context
- SequenceExample, with sequence channels separate as a byte string and meta data as context
- SequenceExample, with sequence and classes dumped as byte string and meta data as context
This works fine.
In the Colab I show how to write dummy data all in the same file as well as in separate files.
My question is how can I recover this data?
I given 4 attempts at trying to do so in the linked file.
Why is TFReader under a different sub-package from TFWriter?
Solved by updating the features to include shape information and remembering that
SequenceExample
are unnamedFeatureLists
.