tensorflow: Reading time series data from TFRecord

2019-08-17 04:15发布

I'm using a SequenceExample protobuf to read/write time-series data into a TFRecord file.

I serialized a pair the np arrays as follows:

writer = tf.python_io.TFRecordWriter(file_name)

context = tf.train.Features( ... Feature( ... ) ... )

feature_data = tf.train.FeatureList(feature=[
                  tf.train.Feature(float_list=tf.train.FloatList(value=
                                   np.random.normal(size=([4065000,]))])
labels = tf.train.FeatureList(feature=[
                  tf.train.Feature(int64_list=tf.train.Int64List(value=
                           np.random.random_integers(0,10,size=([1084,]))])

##feature_data and labels are of similar, but varying lengths

feature_list = {"feature_data": feature_data,
                "labels": labels}

feature_lists = tf.train.FeatureLists(feature_list=feature_list)
example = tf.train.SequenceExample(context=context,
                                   feature_lists=feature_lists)

        ## serialize and close

When trying to read the .tfrecords file, I've gotten quite a few errors, primarily because the SequenceExample protobuf writes the time series data as a series of values (e.g. value: -12.2549, value: -18.1372, .... value:13.1234). My code to read the .tfrecords file is as follows:

dataset = tf.data.TFRecordDataset("data/tf_record.tfrecords")
dataset = dataset.map(decode)
dataset = dataset.make_one_shot_iterator().get_next()

### reshape tensors and feed to estimator###

My decode() function is defined as follows:

def decode(serialized_proto):
    context_features = {...}
    sequence_features = {"feature_data": tf.FixedLenSequenceFeature((None,), 
                                                                tf.float32),
                         "labels": tf.FixedLenSequenceFeature(((None,), 
                                                                 tf.int64)}

    context, sequence = tf.parse_single_sequence_example(serialized_proto,
                                        context_features=context_features,
                                        sequence_features=sequence_features)

    return context, sequence

One of the errors is as follows:

Shape [?] is not fully defined for 'ParseSingleSequenceExample/ParseSingleSequenceExample' (op: 'ParseSingleSequenceExample') with input shapes: [], [0], [], [], [], [], [], [], [].

My primary question is how to think about the structure of Datasets. I'm not sure I really understand the structure of the data returned. I'm having a hard time iterating through this Dataset and returning the variably-sized Tensors. Thanks in advance!

1条回答
叛逆
2楼-- · 2019-08-17 04:57

you can only use tf.FixedLenSequenceFeature when the shape of the feature is known. Otherwise, use tf.VarLenFeature instead.

查看更多
登录 后发表回答