I have a 5000 by 9
2d numpy array of features trainX
which are the features of a time sequence. I also have a 1d numpy array of floating point feature labels trainY
. This is exactly the format you would need for scikit-learn
for example.
I would like to use these with keras+LSTM. This is my code at present:
NUM_EPOCHS = 20
model = Sequential()
model.add(LSTM(8, input_shape=(1, window_size)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=NUM_EPOCHS, batch_size=1, verbose=2)
However this doesn't work as keras needs trainX
in a different format it seems. I have read the manual but I can't understand what this is exactly.
How can I convert my data into a format that keras will accept?
The format is
(samples, timeSteps, features)
How many sequences do you have? It sounds like one sequence of 5000 steps, is that right?
Then the format is
(1,5000,9)
.The labels should also be
(1,5000,1)
, if you have one label per time step. (Then usereturn_sequences=True
). Otherwise labels are(1,1)
.Optionally, you may want to split your single sequence in many segments, in a classical sliding window case, for instance, where you'd have many samples with less time steps, such as
(4998,3,1)
, supposing you want a 3-step window. Then the labels should follow:(4998,1)
.