In order to do proper CV it is advisable to use pipelines so that same transformations can be applied to each fold in the CV. I can define custom transformations by using either sklearn.preprocessing.FunctionTrasformer
or by subclassing sklearn.base.TransformerMixin
. Which one is the recommended approach? Why?
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
Well it is totally upto you, both will achieve the same results more or less, only the way you write the code differs.
For instance, while using
sklearn.preprocessing.FunctionTransformer
you can simply define the function you want to use and call it directly like this (code from official documentation)On the other hand, while using
subclassing sklearn.base.TransformerMixin
you will have to define the whole class along with thefit
andtransform
functions of the class. So you will have to create a class like this(Example code take from this blog post)So as you can see,
TransformerMixin
gives you more flexibility as compared to FunctionTransformer with regard to transform function. You can apply multiple trasnformations, or partial transformation depending on the value, etc. An example can be like, for the first 50 values you want to log while for the next 50 values you wish to take inverse log and so on. You can easily define your transform method to deal with data selectively.If you just want to directly use a function as it is, use
sklearn.preprocessing.FunctionTrasformer
, else if you want to do more modification or say complex transformations, I would suggestsubclassing sklearn.base.TransformerMixin
Here, take a look at the following links to get a more better idea