I am trying to fit tfidf vectorizer on a certain text corpus and then use the same vectorizer to find the sum of tfidf values of the new text.However, the sum values are not as expected. Below is the example:
text = ["I am new to python and R , how can anyone help me","why is no one able to crack the python code without help"]
tf= TfidfVectorizer(stop_words='english',ngram_range =(1,1))
tf.fit_transform(text)
zip(tf.get_feature_names(),tf.idf_)
[(u'able', 1.4054651081081644),
(u'code', 1.4054651081081644),
(u'crack', 1.4054651081081644),
(u'help', 1.0),
(u'new', 1.4054651081081644),
(u'python', 1.0)]
Now when i try the same tf
with new text:
new_text = "i am not able to code"
np.sum(tf.transform([new_text]))
1.4142135623730951
I am expecting the output to be around 2.80.any suggestion on what might be going wrong here would be really helpful.