How to transform items using sklearn Pipeline?

I have a simple scikit-learn Pipeline of two steps: a TfIdfVectorizer followed by a LinearSVC.

I have fit the pipeline using my data. All good.

Now I want to transform (not predict!) an item, using my fitted pipeline.

I tried pipeline.transform([item]), but it gives a different result compared to pipeline.named_steps['tfidf'].transform([item]). Even the shape and type of the result is different: the first is a 1x3000 CSR matrix, the second a 1x15000 CSC matrix. Which one is correct? Why do they differ?

How do I transform items, i.e. get an item's vector representation before the final estimator, when using scikit-learn's Pipeline?

标签： python machine-learning scikit-learn

3条回答

Emotional °昔

2楼-- · 2020-07-10 07:59

Actually, I think it is possible to transform the data, ignoring the last step if it is a predictor, utilizing one of the "hidden" methods, more or less as follows:

data = my_pipeline._fit(X_train, y_train)

where data is a tuple, and the first item of the tuple is the transformed dataset.

0人赞添加讨论(0) 举报

甜甜的少女心

3楼-- · 2020-07-10 08:06

The reason why the results are different (and why calling transform even workds) is that LinearSVC also has a transform (now deprecated) that does feature selection

If you want to transform using just the first step, pipeline.named_steps['tfidf'].transform([item]) is the right thing to do. If you would like to transform using all but the last step, olologin's answer provides the code.

By default, all steps of the pipeline are executed, so also the transform on the last step, which is the feature selection performed by the LinearSVC.

0人赞添加讨论(0) 举报

淡お忘

4楼-- · 2020-07-10 08:14

You can't call a transform method on a pipeline which contains Non-transformer on last step. If you wan't to call transfrom on such pipeline last estimator must be a transformer.

Even method doc says so:

Applies transforms to the data, and the transform method of the final estimator. Valid only if the final estimator implements transform.

Also, there is no method to use every estimator except last one. Thou you can make your own Pipeline, and inherit everything from scikit-learn's Pipeline, but add one method, something like:

def just_transforms(self, X):
    """Applies all transforms to the data, without applying last 
       estimator.

    Parameters
    ----------
    X : iterable
        Data to predict on. Must fulfill input requirements of first step of
        the pipeline.
    """
    Xt = X
    for name, transform in self.steps[:-1]:
        Xt = transform.transform(Xt)
    return Xt

0人赞添加讨论(0) 举报

How to transform items using sklearn Pipeline?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间