In Python 2.7, when I load all data from a text file of 2.5GB into memory for quicker processing like this:
>>> f = open('dump.xml','r')
>>> dump = f.read()
I got the following error:
Python(62813) malloc: *** mmap(size=140521659486208) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
MemoryError
Why did Python try to allocate 140521659486208
bytes memory for 2563749237
bytes data? How do I fix the code to make it loads all the bytes?
I'm having around 3GB RAM free. The file is a Wiktionary xml dump.
Based on some quick googling, I came across this forum post that seems to address the issue that you appear to be having. Assuming that you are running Mac or Linux based on the error code, you may try implementing garbage collection with
gc.enable()
orgc.collect()
as suggested in the forum post.If you use mmap, you'll be able to load the entire file into memory immediately.