
Co occurance matrix for tfidf vectorizer for top 2

i computed tfidf vectorizer for text data and got vectors as (100000,2000) max_feature = 2000.

while i am computing the co occurance matrix by below code.

length = 2000
m = np.zeros([length,length]) # n is the count of all words
def cal_occ(sentence,m):
    for i,word in enumerate(sentence):
    for j in range(max(i-window,0),min(i+window,length)):
for sentence in tf_vec:
    cal_occ(sentence, m)

I am getting the following error.

IndexError                                Traceback (most recent call last)
<ipython-input-96-ad505b6df734> in <module>()
 11             m[word,sentence[j]]+=1
 12 for sentence in tf_vec:
 ---> 13     cal_occ(sentence, m)

 <ipython-input-96-ad505b6df734> in cal_occ(sentence, m)
  9             print(j)
 10             print(sentence[j])
 ---> 11             m[word,sentence[j]]+=1
 12 for sentence in tf_vec:
 13     cal_occ(sentence, m)

IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices


You are having the problem most probably here:

for j in range(max(i-window,0),min(i+window,length)):

min function returns length when i+window exceeds the bound, can you try this instead of the line above:

for j in range(max(i-window,0),min(i+window,length-1)):

Hope this helps,
