I found the following code on the internet for calculating TFIDF:
I added "1+" in the function def idf(word, documentList) so i won't get divided by 0 error:
return math.log(len(documentList) / (1 + float(numDocsContaining(word,documentList))))
But i am confused for two things:
- I get negative values in some cases, is this correct?
- I am confused with line 62, 63 and 64.
documentNumber = 0
for word in documentList[documentNumber].split(None):
words[word] = tfidf(word,documentList[documentNumber],documentList)
Should TFIDF be calculated on the first document only?