I would like to concatenate my files. I use
cat *txt > newFile
But I have almost 500000 files and it complains that the
argument list is too long.
Is there an efficient and fast way of merging half a million files?
Thanks
I would like to concatenate my files. I use
cat *txt > newFile
But I have almost 500000 files and it complains that the
argument list is too long.
Is there an efficient and fast way of merging half a million files?
Thanks
If your directory structure is shallow (there are no subdirectories) then you can simply do:
find . -type f -exec cat {} \; > newFile
If you have subdirectories, you can limit the find to the top level, or you might consider putting some of the files in the sub-directories so you don't have this problem!
This is not particularly efficient, and some versions of find allow you to do:
find . -type f -exec cat {} \+ > newFile
for greater efficiency. (Note the backslash before the +
is not necessary, but I find it nice for symmetry with the previous example.)
How about doing it in a loop:
for a in *.txt ; do cat $a >> newFile ; done
This has the disadvantage of spawning a new cat
instance for each file, which might be costly, but if the files are reasonably large the I/O overhead should dominate over the CPU time required to spawn a new process.
I would recommend creating a file containing the files in the proper order, I'm not 100% sure about the guarantees of using globbing like this (and like in the question).