Python load 2GB of text file to memory

2019-04-08 06:30发布

In Python 2.7, when I load all data from a text file of 2.5GB into memory for quicker processing like this:

>>> f = open('dump.xml','r')
>>> dump = f.read()

I got the following error:

Python(62813) malloc: *** mmap(size=140521659486208) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

Why did Python try to allocate 140521659486208 bytes memory for 2563749237 bytes data? How do I fix the code to make it loads all the bytes?

I'm having around 3GB RAM free. The file is a Wiktionary xml dump.

标签： python memory text-files

2条回答

ら.Afraid

2楼-- · 2019-04-08 06:58

Based on some quick googling, I came across this forum post that seems to address the issue that you appear to be having. Assuming that you are running Mac or Linux based on the error code, you may try implementing garbage collection with gc.enable() or gc.collect() as suggested in the forum post.

0人赞添加讨论(0) 举报

甜甜的少女心

3楼-- · 2019-04-08 07:13

If you use mmap, you'll be able to load the entire file into memory immediately.

import mmap

with open('dump.xml', 'rb') as f:
  # Size 0 will read the ENTIRE file into memory!
  m = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ) #File is open read-only

  # Proceed with your code here -- note the file is already in memory
  # so "readine" here will be as fast as could be
  data = m.readline()
  while data:
    # Do stuff
    data = m.readline()

0人赞添加讨论(0) 举报

Python load 2GB of text file to memory

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间