Dataset API for TensorFlow : Variable sized Input

2019-06-23 00:28发布

问题:

I have my entire dataset in memory as list of tuples where each tuple corresponds to a batch of fixed size 'N' . i.e

(x[i],label[i],length[i])

  • x[i]: numpy array of shape [N,W,F]; here there are N examples, with W timestep each; all timesteps have fixed number of features F
  • label[i] : class: shape [N,] one for each example in batch
  • length[i] : length (number of timesteps ) in data : shape [N,] : this is number of timesteps (W) for each example in batch

Main problem : Across the batches W varies .

I was looking at the following examples and documentation for Dataset API but could not understand how to create a DataSet object for my case. API's like Dataset.from_tensor_slices and Dataset.from_tensor don't seem to be working (throwing broadcasting errors) as they require tensors to be of same shape i,e W across batches to be the same. Is there any way I can do without having to pad my batches (using DataSet.padded_batch) ?