how to implement walk forward testing in sklearn?

2019-03-09 19:11发布

In sklearn, GridSearchCV can take a pipeline as a parameter to find the best estimator through cross validation. However, the usual cross validation is like this:

to cross validate a time series data, the training and testing data are often splitted like this:

That is to say, the testing data should be always ahead of training data.

My thought is:

Write my own version class of k-fold and passing it to GridSearchCV so I can enjoy the convenience of pipeline. The problem is that it seems difficult to let GridSearchCV to use an specified indices of training and testing data.
Write a new class GridSearchWalkForwardTest which is similar to GridSearchCV, I am studying the source code grid_search.py and find it is a little complicated.

Any suggestion is welcome.

标签： python scikit-learn time-series cross-validation

2条回答

虎瘦雄心在

2楼-- · 2019-03-09 20:00

I think you could use a Time Series Split either instead of your own implementation or as a basis for implementing a CV method which is exactly as you describe it.

After digging around a bit, it seems like someone added a max_train_size to the TimeSeriesSplit in this PR which seems like it does what you want.

0人赞添加讨论(0) 举报

成全新的幸福

3楼-- · 2019-03-09 20:01

My opinion is that you should try to implement your own GridSearchWalkForwardTest. I used GridSearch once to do the training and implemented the same GridSearch myself and I didn't get the same results, eventhough I should.

What I did at the end is using my own function. You have more control over the training and test set and you have more control over the parameters you train.

0人赞添加讨论(0) 举报

how to implement walk forward testing in sklearn?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间