I'm trying to forecast a time series: given 50 previous values, I want to predict the 5 next values.
To do so, I'm using the skflow
package (based on TensorFlow), and this problem is relatively close to the Boston example provided in the Github repo.
My code is as follows :
%matplotlib inline
import pandas as pd
import skflow
from sklearn import cross_validation, metrics
from sklearn import preprocessing
filepath = 'CSV/FILE.csv'
ts = pd.Series.from_csv(filepath)
nprev = 50
deltasuiv = 5
def load_data(data, n_prev = nprev, delta_suiv=deltasuiv):
docX, docY = [], []
for i in range(len(data)-n_prev-delta_suiv):
docX.append(np.array(data[i:i+n_prev]))
docY.append(np.array(data[i+n_prev:i+n_prev+delta_suiv]))
alsX = np.array(docX)
alsY = np.array(docY)
return alsX, alsY
X, y = load_data(ts.values)
# Scale data to 0 mean and unit std dev.
scaler = preprocessing.StandardScaler()
X = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y,
test_size=0.2, random_state=42)
regressor = skflow.TensorFlowDNNRegressor(hidden_units=[30, 50],
steps=5000, learning_rate=0.1, batch_size=1)
regressor.fit(X_train, y_train)
score = metrics.mean_squared_error(regressor.predict(X_test), y_test)
print('MSE: {0:f}'.format(score))
This leads to :
ValueError: y_true and y_pred have different number of output (1!=5)
at the end of the training.
And when I try to predict, I have the same kind of problem
ypred = regressor.predict(X_test)
print ypred.shape, y_test.shape
(200, 1) (200, 5)
We can therefore see that the model is somehow predicting only 1 value instead of the 5 wanted/hoped.
How could I use the same model to predict values for several values ?