I have a huge text file (~1GB) and sadly the text editor I use won't read such a large file. However, if I can just split it into two or three parts I'll be fine, so, as an exercise I wanted to write a program in python to do it.
What I think I want the program to do is to find the size of a file, divide that number into parts, and for each part, read up to that point in chunks, writing to a filename.nnn output file, then read up-to the next line-break and write that, then close the output file, etc. Obviously the last output file just copies to the end of the input file.
Can you help me with the key filesystem related parts: filesize, reading and writing in chunks and reading to a line-break?
I'll be writing this code test-first, so there's no need to give me a complete answer, unless its a one-liner ;-)
As an alternative method, using the logging library:
Your files will appear as follows:
This is a quick and easy way to make a huge log file match your
RotatingFileHandler
implementation.Or, a python version of wc and split:
Then some code to read the first lines/3 into one file, the next lines/3 into another , etc.
This worked for me
I had a requirement to split csv files for import into Dynamics CRM since the file size limit for import is 8MB and the files we receive are much larger. This program allows user to input FileNames and LinesPerFile, and then splits the specified files into the requested number of lines. I can't believe how fast it works!
Here is a python script you can use for splitting large files using
subprocess
:You can call it externally:
You can also import
subprocess
and run it directly in your program.The issue with this approach is high memory usage:
subprocess
creates a fork with a memory footprint same size as your process and if your process memory is already heavy, it doubles it for the time that it runs. The same thing withos.system
.Here is another pure python way of doing this, although I haven't tested it on huge files, it's going to be slower but be leaner on memory:
Check out
os.stat()
for file size andfile.readlines([sizehint])
. Those two functions should be all you need for the reading part, and hopefully you know how to do the writing :)