Memory error in python using numpy array

2019-03-04 15:47发布

问题:

I am getting the following error for this code:

model = lda.LDA(n_topics=15, n_iter=50, random_state=1)
model.fit(X)
topic_word = model.topic_word_
print("type(topic_word): {}".format(type(topic_word)))
print("shape: {}".format(topic_word.shape))
print ("\n")
n = 15
doc_topic=model.doc_topic_
for i in range(15):
    print("{} (top topic: {})".format(titles[i], doc_topic[0][i].argmax()))

topic_csharp=np.zeros(shape=[1,n])
np.copyto(topic_csharp,doc_topic[0][i])
for i, topic_dist in enumerate(topic_word):
    topic_words = np.array(vocab)[np.argsort(topic_dist)][:-(n+1):-1]
    print('*Topic {}\n- {}'.format(i, ' '.join(topic_words)))

error is:

Traceback (most recent call last):
File "C:\Users\csharp.py", line 56, in <module>
topic_words = np.array(vocab)[np.argsort(topic_dist)][:-(n+1):-1]
MemoryError

The doc I am fitting into model has about 1,50,000 lines of text. vocab_size: 558270 n_words: 13075390 (after preprocessing)

How can I solve this error?

回答1:

If your arrays are too large for RAM, use numpy.memmap. See http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.memmap.html