this page: http://scikit-learn.org/stable/modules/feature_extraction.html mentions:
As tf–idf is a very often used for text features, there is also another class called TfidfVectorizer that combines all the option of CountVectorizer and TfidfTransformer in a single model.
then I followed the code and use fit_transform() on my corpus. How to get the weight of each feature computed by fit_transform()?
I tried:
In [39]: vectorizer.idf_
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-39-5475eefe04c0> in <module>()
----> 1 vectorizer.idf_
AttributeError: 'TfidfVectorizer' object has no attribute 'idf_'
but this attribute is missing.
Thanks
See also this on how to get the TF-IDF values of all the documents:
I think the results are normalized by document:
>>>0.4483208731992+0.4483208731992+0.4483208731992+0.6300993445182 0.9999999999997548
Since version 0.15, the tf-idf score of each feature can be retrieved via the attribute
idf_
of theTfidfVectorizer
object:Output:
As discussed in the comments, prior to version 0.15, a workaround is to access the attribute
idf_
via the supposedly hidden_tfidf
(an instance ofTfidfTransformer
) of the vectorizer:which should give the same output as above.