I am getting the following error for this code:
model = lda.LDA(n_topics=15, n_iter=50, random_state=1)
model.fit(X)
topic_word = model.topic_word_
print("type(topic_word): {}".format(type(topic_word)))
print("shape: {}".format(topic_word.shape))
print ("\n")
n = 15
doc_topic=model.doc_topic_
for i in range(15):
print("{} (top topic: {})".format(titles[i], doc_topic[0][i].argmax()))
topic_csharp=np.zeros(shape=[1,n])
np.copyto(topic_csharp,doc_topic[0][i])
for i, topic_dist in enumerate(topic_word):
topic_words = np.array(vocab)[np.argsort(topic_dist)][:-(n+1):-1]
print('*Topic {}\n- {}'.format(i, ' '.join(topic_words)))
error is:
Traceback (most recent call last):
File "C:\Users\csharp.py", line 56, in <module>
topic_words = np.array(vocab)[np.argsort(topic_dist)][:-(n+1):-1]
MemoryError
The doc I am fitting into model has about 1,50,000 lines of text. vocab_size: 558270 n_words: 13075390 (after preprocessing)
How can I solve this error?
If your arrays are too large for RAM, use numpy.memmap. See http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.memmap.html