How do I split a huge text file in python

2019-01-17 20:34发布

I have a huge text file (~1GB) and sadly the text editor I use won't read such a large file. However, if I can just split it into two or three parts I'll be fine, so, as an exercise I wanted to write a program in python to do it.

What I think I want the program to do is to find the size of a file, divide that number into parts, and for each part, read up to that point in chunks, writing to a filename.nnn output file, then read up-to the next line-break and write that, then close the output file, etc. Obviously the last output file just copies to the end of the input file.

Can you help me with the key filesystem related parts: filesize, reading and writing in chunks and reading to a line-break?

I'll be writing this code test-first, so there's no need to give me a complete answer, unless its a one-liner ;-)

14条回答
我想做一个坏孩纸
2楼-- · 2019-01-17 21:27

You can use wc and split (see the respective manpages) to get the desired effect. In bash:

split -dl$((`wc -l 'filename'|sed 's/ .*$//'` / 3 + 1)) filename filename-chunk.

produces 3 parts of the same linecount (with a rounding error in the last, of course), named filename-chunk.00 to filename-chunk.02.

查看更多
Anthone
3楼-- · 2019-01-17 21:28

Now, there is a pypi module available that you can use to split files of any size into chunks. Check this out

https://pypi.org/project/filesplit/

查看更多
登录 后发表回答