really finding it hard to understand the input dimensions to the convolutional 1d layer in keras:
Input shape
3D tensor with shape: (samples, steps, input_dim).
Output shape
3D tensor with shape: (samples, new_steps, nb_filter). steps value might have changed due to padding.
I want my network to take in a time series of prices (101, in order) and output 4 probabilities. My current non-convolutional network which does this fairly well (with a training set of 28000) looks like this:
standardModel = Sequential()
standardModel.add(Dense(input_dim=101, output_dim=100, W_regularizer=l2(0.5), activation='sigmoid'))
standardModel.add(Dense(4, W_regularizer=l2(0.7), activation='softmax'))
To improve this, I want to make a feature map from the input layer which has a local receptive field of length 10. (and therefore has 10 shared weights and 1 shared bias). I then want to use max pooling and feed this in to a hidden layer of 40 or so neurons and then output this with 4 neurons with softmax in the outer layer.
picture (it's quite awful sorry!)
So ideally, the convolutional layer would take a 2d tensor of dimensions:
(minibatch_size, 101)
and output a 3d tensor of dimensions
(minibatch_size, 91, no_of_featuremaps)
However, the keras layer seems to require a dimension in the input called step. I've tried understanding this and still don't quite get it. In my case, should step be 1 as each step in the vector is an increase in the time by 1? Also, what is new_step?
In addition, how do you turn the output of the pooling layers (a 3d tensor) into input suitable for the standard hidden layer (i.e a Dense keras layer) in the form of a 2d tensor?
Update: After the very helpful suggestions given, I tried making a convolutional network like so:
conv = Sequential()
conv.add(Convolution1D(64, 10, input_shape=(1,101)))
conv.add(Activation('relu'))
conv.add(MaxPooling1D(2))
conv.add(Flatten())
conv.add(Dense(10))
conv.add(Activation('tanh'))
conv.add(Dense(4))
conv.add(Activation('softmax'))
The line conv.Add(Flatten()) throws a range exceeds valid bounds error. Interestingly, this error is not thrown for just this code:
conv = Sequential()
conv.add(Convolution1D(64, 10, input_shape=(1,101)))
conv.add(Activation('relu'))
conv.add(MaxPooling1D(2))
conv.add(Flatten())
doing
print conv.input_shape
print conv.output_shape
results in
(None, 1, 101
(None, -256)
being returned
Update 2:
Changed
conv.add(Convolution1D(64, 10, input_shape=(1,101)))
to
conv.add(Convolution1D(10, 10, input_shape=(101,1))
and it started working. However, is there any important different between inputting (None, 101, 1) to a 1d conv layer or (None, 1, 101) that I should be aware of? Why does (None, 1, 101) not work?
The reason why it look like this is that Keras designer intended to make 1-dimensional convolutional framework to be interpreted as a framework to deal with sequences. To fully understand the difference - try to imagine that you have a sequence of a multiple feature vectors. Then your output will be at least two dimensional - where first dimension is connected with time and other dimensions are connected with features. 1-dimensional convolutional framework was designed to in some way bold this time dimension and try to find the reoccuring patterns in data - rather than performing a classical multidimensional convolutional transformation.
In your case you must simply reshape your data to have shape (dataset_size, 101, 1) - because you have only one feature. It could be easly done using
numpy.reshape
function. To understand what does a new step mean - you must understand that you are doing the convolution over time - so you change the temporal structure of your data - which lead to new time-connected structure. In order to get your data to a format which is suitable for dense / static layers usekeras.layers.flatten
layer - the same as in classic convolutional case.UPDATE: As I mentioned before - the first dimension of input is connected with time. So the difference between
(1, 101)
and(101, 1)
lies in that in first case you have one time step with 101 features and in second - 101 timesteps with 1 feature. The problem which you mentioned after your first change has its origin in making pooling with size 2 on such input. Having only one timestep - you cannot pool any value on a time window of size 2 - simply because there is not enough timesteps to do that.