I would like to concatenate my files. I use
cat *txt > newFile
But I have almost 500000 files and it complains that the
argument list is too long.
Is there an efficient and fast way of merging half a million files?
Thanks
I would like to concatenate my files. I use
cat *txt > newFile
But I have almost 500000 files and it complains that the
argument list is too long.
Is there an efficient and fast way of merging half a million files?
Thanks
If your directory structure is shallow (there are no subdirectories) then you can simply do:
If you have subdirectories, you can limit the find to the top level, or you might consider putting some of the files in the sub-directories so you don't have this problem!
This is not particularly efficient, and some versions of find allow you to do:
for greater efficiency. (Note the backslash before the
+
is not necessary, but I find it nice for symmetry with the previous example.)How about doing it in a loop:
This has the disadvantage of spawning a new
cat
instance for each file, which might be costly, but if the files are reasonably large the I/O overhead should dominate over the CPU time required to spawn a new process.I would recommend creating a file containing the files in the proper order, I'm not 100% sure about the guarantees of using globbing like this (and like in the question).