I saw both transformer and estimator were mentioned in the sklearn documentation.
Is there any difference between these two words?
I saw both transformer and estimator were mentioned in the sklearn documentation.
Is there any difference between these two words?
The basic difference is that a:
Transformer
transforms the input data (X
) in some ways. Estimator
predicts a new value (or values) (y
) by using the input data (X
). Both the Transformer
and Estimator
should have a fit()
method which can be used to train them (they learn some characteristics of the data). The signature is:
fit(X, y)
fit()
does not return any value, just stores the learnt data inside the object.
Here X
represents the samples (feature vectors) and y
is the target vector (which may have single or multiple values per corresponding sample in X
). Note that y
can be optional in some transformers where its not needed, but its mandatory for most estimators (supervised estimators). Look at StandardScaler
for example. It needs the initial data X
for finding the mean and std of the data (it learns the characteristics of X
, y
is not needed).
Each Transformer
should have a transform(X, y)
function which like fit()
takes the input X
and returns a new transformed version of X
(which generally should have same number samples but may or may not have same features).
On the other hand, Estimator
should have a predict(X)
method which should output the predicted value of y
from the given X
.
There will be some classes in scikit-learn which implement both transform()
and predict()
, like KMeans
, in that case carefully reading the documentation should solve your doubts.