In Tfidf.fit_transform we are only using the parameters X and have not used y for fitting the data set. Is this right? We are generating the tfidf matrix for only parameters of the training set.We are not using ytrain in fitting the model. Then how do we make predictions for the test data set
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
https://datascience.stackexchange.com/a/12346/122 has a good explanation of why it's call
fit()
,transform()
andfit_transform()
.In gist,
fit()
: Fit the vectorizer/model to the training data and save the vectorizer/model to a variable (returnssklearn.feature_extraction.text.TfidfVectorizer
)transform()
: Use the variable output fromfit()
to transformer validation/test data (returnsscipy.sparse.csr.csr_matrix
)fit_transform()
: Sometimes you to directly transform the training data, so you usefit()
+transform()
together, thusfit_transform()
. (returnsscipy.sparse.csr.csr_matrix
)E.g.
[out]: