How to concatenate huge number of files

2020-07-09 07:05发布

问题:

I would like to concatenate my files. I use

cat *txt > newFile

But I have almost 500000 files and it complains that the

argument list is too long.

Is there an efficient and fast way of merging half a million files?

Thanks

回答1:

If your directory structure is shallow (there are no subdirectories) then you can simply do:

find . -type f -exec cat {} \; > newFile

If you have subdirectories, you can limit the find to the top level, or you might consider putting some of the files in the sub-directories so you don't have this problem!

This is not particularly efficient, and some versions of find allow you to do:

find . -type f -exec cat {} \+ > newFile

for greater efficiency. (Note the backslash before the + is not necessary, but I find it nice for symmetry with the previous example.)



回答2:

How about doing it in a loop:

for a in *.txt ; do cat $a >> newFile ; done

This has the disadvantage of spawning a new cat instance for each file, which might be costly, but if the files are reasonably large the I/O overhead should dominate over the CPU time required to spawn a new process.

I would recommend creating a file containing the files in the proper order, I'm not 100% sure about the guarantees of using globbing like this (and like in the question).