The page 38 of the book Linux 101 Hacks suggests:
cat url-list.txt | xargs wget –c
I usually do:
for i in `cat url-list.txt`
do
wget -c $i
done
Is there some thing, other than length, where the xargs-technique is superior to the old good for-loop-technique in bash?
Added
The C source code seems to have only one fork. In contrast, how many forks have the bash-combo? Please, elaborate on the issue.
From the Rationale section of a UNIX manpage for xargs
. (Interestingly this section doesn't appear in the OS X BSD version of xargs
, nor in the GNU version.)
The classic application of the xargs
utility is in conjunction with the
find utility to reduce the number of
processes launched by a simplistic use
of the find -exec combination. The
xargs utility is also used to enforce
an upper limit on memory required to
launch a process. With this basis in
mind, this volume of POSIX.1-2008
selected only the minimal features
required.
In your follow-up, you ask how many forks the other version will have. Jim already answered this: one per iteration. How many iterations are there? It's impossible to give an exact number, but easy to answer the general question. How many lines are there in your url-list.txt file?
There are other some other considerations. xargs
requires extra care for filenames with spaces or other no-no characters, and -exec
has an option (+
), that groups processing into batches. So, not everyone prefers xargs
, and perhaps it's not best for all situations.
See these links:
- http://www.sunmanagers.org/pipermail/summaries/2005-March/006255.html
- http://fahdshariff.blogspot.com/2009/05/find-exec-vs-xargs.html
Also consider:
xargs -I'{}' wget -c '{}' < url-list.txt
but wget provides an even better means for the same:
wget -c -i url-list.txt
With respect to the xargs versus loop consideration, i prefer xargs when the meaning and implementation are relatively "simple" and "clear", otherwise, i use loops.
xargs will also allow you to have a huge list, which is not possible with the "for" version because the shell uses command lines limited in length.
xargs
is designed to process multiple inputs for each process it forks. A shell script with a for
loop over its inputs must fork a new process for each input. Avoiding that per-process overhead can give an xargs
solution a significant performance enhancement.
instead of GNU/Parallel i prefer using xargs' built in parallel processing. Add -P to indicate how many forks to perform in parallel. As in...
seq 1 10 | xargs -n 1 -P 3 echo
would use 3 forks on 3 different cores for computation. This is supported by modern GNU Xargs. You will have to verify for yourself if using BSD or Solaris.
Depending on your internet connection you may want to use GNU Parallel http://www.gnu.org/software/parallel/ to run it in parallel.
cat url-list.txt | parallel wget -c
One advantage I can think of is that, if you have lots of files, it could be slightly faster since you don't have as much overhead from starting new processes.
I'm not really a bash expert though, so there could be other reasons it's better (or worse).