I am using GNU xargs (version 4.2.2) in parallel mode and I seem to be reliably losing output when redirecting to a file. When redirecting to a pipe, it appears to work correctly.
The following shell commands demonstrates a minimum, complete, and verifiable example of the issue. I generate 2550 numbers using xargs
to split it into lines of 100 args each totalling 26 lines where the 26th line contains only 50 args.
# generate numbers 1 to 2550 where each number is on its own line
$ seq 1 2550 > /tmp/nums
$ wc -l /tmp/nums
2550 /tmp/nums
# piping to wc is accurate: 26 lines, 2550 args
$ xargs -P20 -n 100 </tmp/nums | wc
26 2550 11643
# redirecting to a file is clearly inaccurate: 22 lines, 2150 args
$ xargs -P20 -n 100 </tmp/nums >/tmp/out; wc /tmp/out
22 2150 10043 /tmp/out
I believe the problem is not related to the underlying shell since the shell will perform the redirection before the commands execute and wait for xargs to complete. In this case, I hypothesize xargs is completing before flushing the buffer. However if my hypothesis is correct, I do not know why this problem doesn't manifest when writing to a pipe.
Edit:
It appears when using >>
(create/append to file) in the shell, the problem doesn't seem to manifest:
# appending to file
$ >/tmp/out
$ xargs -P20 -n 100 </tmp/nums >>/tmp/out; wc /tmp/out
26 2550 11643
# creating and appending to file
$ rm /tmp/out
$ xargs -P20 -n 100 </tmp/nums >>/tmp/out; wc /tmp/out
26 2550 11643