cat/Xargs/command VS for/bash/command

2019-05-06 14:59发布

问题:

The page 38 of the book Linux 101 Hacks suggests:

cat url-list.txt | xargs wget –c

I usually do:

for i in `cat url-list.txt`
   do
      wget -c $i
   done 

Is there some thing, other than length, where the xargs-technique is superior to the old good for-loop-technique in bash?

Added

The C source code seems to have only one fork. In contrast, how many forks have the bash-combo? Please, elaborate on the issue.

回答1:

From the Rationale section of a UNIX manpage for xargs. (Interestingly this section doesn't appear in the OS X BSD version of xargs, nor in the GNU version.)

The classic application of the xargs utility is in conjunction with the find utility to reduce the number of processes launched by a simplistic use of the find -exec combination. The xargs utility is also used to enforce an upper limit on memory required to launch a process. With this basis in mind, this volume of POSIX.1-2008 selected only the minimal features required.

In your follow-up, you ask how many forks the other version will have. Jim already answered this: one per iteration. How many iterations are there? It's impossible to give an exact number, but easy to answer the general question. How many lines are there in your url-list.txt file?

There are other some other considerations. xargs requires extra care for filenames with spaces or other no-no characters, and -exec has an option (+), that groups processing into batches. So, not everyone prefers xargs, and perhaps it's not best for all situations.

See these links:

  • http://www.sunmanagers.org/pipermail/summaries/2005-March/006255.html
  • http://fahdshariff.blogspot.com/2009/05/find-exec-vs-xargs.html


回答2:

Also consider:

xargs -I'{}' wget -c '{}' < url-list.txt

but wget provides an even better means for the same:

wget -c -i url-list.txt

With respect to the xargs versus loop consideration, i prefer xargs when the meaning and implementation are relatively "simple" and "clear", otherwise, i use loops.



回答3:

xargs will also allow you to have a huge list, which is not possible with the "for" version because the shell uses command lines limited in length.



回答4:

xargs is designed to process multiple inputs for each process it forks. A shell script with a for loop over its inputs must fork a new process for each input. Avoiding that per-process overhead can give an xargs solution a significant performance enhancement.



回答5:

instead of GNU/Parallel i prefer using xargs' built in parallel processing. Add -P to indicate how many forks to perform in parallel. As in...

 seq 1 10 | xargs -n 1 -P 3 echo

would use 3 forks on 3 different cores for computation. This is supported by modern GNU Xargs. You will have to verify for yourself if using BSD or Solaris.



回答6:

Depending on your internet connection you may want to use GNU Parallel http://www.gnu.org/software/parallel/ to run it in parallel.

cat url-list.txt | parallel wget -c


回答7:

One advantage I can think of is that, if you have lots of files, it could be slightly faster since you don't have as much overhead from starting new processes.

I'm not really a bash expert though, so there could be other reasons it's better (or worse).