scikit-learn - Convert pipeline prediction to orig

2019-04-29 10:47发布

问题:

I've create a pipeline as follows (using the Keras Scikit-Learn API)

estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=baseline_model, nb_epoch=50, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)

and fit it with

pipeline.fit(trainX,trainY)

If I predict with pipline.predict(testX), I (believe) I get standardised predictions.

How do I predict on testX so that predictedY it at the same scale as the actual (untouched) testY (i.e. NOT standardised prediction, but instead the actual values)? I see there is an inverse_transform method for Pipeline, however appears to be for only reverting a transformed X.

回答1:

Exactly. The StandardScaler() in a pipeline is only mapping the inputs (trainX) of pipeline.fit(trainX,trainY).

So, if you fit your model to approximate trainY and you need it to be standardized as well, you should map your trainY as

scalerY = StandardScaler().fit(trainY)  # fit y scaler
pipeline.fit(trainX, scalerY.transform(trainY))  # fit your pipeline to scaled Y
testY = scalerY.inverse_transform(pipeline.predict(testX))  # predict and rescale

The inverse_transform() function maps its values considering the standard deviation and mean calculated in StandardScaler().fit().

You can always fit your model without scaling Y, as you mentioned, but this can be dangerous depending on your data since it can lead your model to overfit. You have to test it ;)