Python append multiple files in given order to one

2019-01-14 08:32发布

I have up to 8 seperate Python processes creating temp files in a shared folder. Then I'd like the controlling process to append all the temp files in a certain order into one big file. What's the quickest way of doing this at an os agnostic shell level?

8条回答
ら.Afraid
2楼-- · 2019-01-14 09:07

Rafe's answer was lacking proper open/close statements, e.g.

# tempfiles is a list of file handles to your temp files. Order them however you like
with open("bigfile.txt", "w") as fo:
     for tempfile in tempfiles:
          with open(tempfile,'r') as fi: fo.write(fi.read())

However, be forewarned that if you want to sort the contents of the bigfile, this method does not catch instances where the last line in one or more of your temp files has a different EOL format, which will cause some strange sort results. In this case, you will want to strip the tempfile lines as you read them, and then write consistent EOL lines to the bigfile (i.e. involving an extra line of code).

查看更多
【Aperson】
3楼-- · 2019-01-14 09:07

Try this. It's very fast (much faster than line-by-line, and shouldn't cause a VM thrash for large files), and should run on about anything, including CPython 2.x, CPython 3.x, Pypy, Pypy3 and Jython. Also it should be highly OS-agnostic. Also, it makes no assumptions about file encodings.

#!/usr/local/cpython-3.4/bin/python3

'''Cat 3 files to one: example code'''

import os

def main():
    '''Main function'''
    input_filenames = ['a', 'b', 'c']

    block_size = 1024 * 1024

    if hasattr(os, 'O_BINARY'):
        o_binary = getattr(os, 'O_BINARY')
    else:
        o_binary = 0
    output_file = os.open('output-file', os.O_WRONLY | o_binary)
    for input_filename in input_filenames:
        input_file = os.open(input_filename, os.O_RDONLY | o_binary)
        while True:
            input_block = os.read(input_file, block_size)
            if not input_block:
                break
            os.write(output_file, input_block)
        os.close(input_file)
    os.close(output_file)

main()

There is one (nontrivial) optimization I've left out: It's better to not assume anything about a good blocksize, instead using a bunch of random ones, and slowly backing off the randomization to focus on the good ones (sometimes called "simulated annealing"). But that's a lot more complexity for little actual performance benefit.

You could also make the os.write keep track of its return value and restart partial writes, but that's only really necessary if you're expecting to receive (nonterminal) *ix signals.

查看更多
登录 后发表回答