Sklearn custom transformers: difference between us

In order to do proper CV it is advisable to use pipelines so that same transformations can be applied to each fold in the CV. I can define custom transformations by using either sklearn.preprocessing.FunctionTrasformer or by subclassing sklearn.base.TransformerMixin. Which one is the recommended approach? Why?

标签： python machine-learning scikit-learn cross-validation

1条回答

一纸荒年 Trace。

2楼-- · 2019-04-09 21:43

Well it is totally upto you, both will achieve the same results more or less, only the way you write the code differs.

For instance, while using sklearn.preprocessing.FunctionTransformer you can simply define the function you want to use and call it directly like this (code from official documentation)

def all_but_first_column(X):
    return X[:, 1:]

def drop_first_component(X, y):
    """
    Create a pipeline with PCA and the column selector and use it to
    transform the dataset.
    """
    pipeline = make_pipeline(PCA(), FunctionTransformer(all_but_first_column),)

    X_train, X_test, y_train, y_test = train_test_split(X, y)
    pipeline.fit(X_train, y_train)
    return pipeline.transform(X_test), y_test

On the other hand, while using subclassing sklearn.base.TransformerMixin you will have to define the whole class along with the fit and transform functions of the class. So you will have to create a class like this(Example code take from this blog post)

class FunctionFeaturizer(TransformerMixin):
    def __init__(self, *featurizers):
        self.featurizers = featurizers

    def fit(self, X, y=None):
        return self

   def transform(self, X):
        #Do transformations
        return transformed data

So as you can see, TransformerMixin gives you more flexibility as compared to FunctionTransformer with regard to transform function. You can apply multiple trasnformations, or partial transformation depending on the value, etc. An example can be like, for the first 50 values you want to log while for the next 50 values you wish to take inverse log and so on. You can easily define your transform method to deal with data selectively.

If you just want to directly use a function as it is, use sklearn.preprocessing.FunctionTrasformer, else if you want to do more modification or say complex transformations, I would suggest subclassing sklearn.base.TransformerMixin

Here, take a look at the following links to get a more better idea

0人赞添加讨论(0) 举报

Sklearn custom transformers: difference between us

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间