This is probably in many FAQs - instead of using:
cat file | command
(which is called useless use of cat), correct way supposed to be:
command < file
In the 2nd, "correct" way - OS does not have to spawn an extra process.
Despite knowing that, I continued to use useless cat for 2 reasons.
more aesthetic - I like when data moves uniformly only from left to right. And it easier to replace
cat
with something else (gzcat
,echo
, ...), add a 2nd file or insert new filter (pv
,mbuffer
,grep
...).I "felt" that it might be faster in some cases. Faster because there are 2 processes, 1st (
cat
) does the reading and the second does whatever. And they can run in parallel, which means sometimes faster execution.
Is my logic correct (for 2nd reason)?
I disagree with most instances of the excessively smug UUOC Award because, when teaching someone else,
cat
is a convenient place-holder for any command or crusty complicated pipeline of commands that produce output suitable for the problem or task being discussed.This is especially true on sites like Stack Overflow, ServerFault, Unix & Linux or any of the SE sites.
If someone specifically asks about optimisation, or if you feel like adding extra information about it then, great, talk about how using cat is inefficient. But don't berate people because they chose to aim for simplicity and ease-of-understanding in their examples rather than look-at-me-how-cool-am-i! complexity.
In short, because cat isn't always cat.
Also because most people who enjoy going around awarding UUOCs do it because they're more concerned with showing off about how 'clever' they are than they are about helping or teaching people. In reality, they demonstrate that they're probably just another newbie who has found a tiny stick to beat their peers with.
Update
Here's another UUOC that I posted in an answer at https://unix.stackexchange.com/a/301194/7696:
UUOC pedants would say that that's a UUOC because it's easily possible to make
$filter
default to the empty string and have theif
statement dofilter='| grep -v "^$"'
but IMO, by not embedding the pipe character in$filter
, this "useless"cat
serves the extremely useful purpose of self-documenting the fact that$filter
on theprintf
line isn't just another argument tosqlplus
, it's an optional user-selectable output filter.If there's any need to have multiple optional output filters, the option processing could just append
| whatever
to$filter
as often as needed - one extracat
in the pipeline isn't going to hurt anything or cause any noticeable loss of performance.I was not aware of the award until today when some rookie tried to pin the UUOC on me for one of my answers. It was a
cat file.txt | grep foo | cut ... | cut ...
. I gave him a piece of my mind, and only after doing so visited the link he gave me referring to the origins of the award and the practice of doing so. Further searching led me to this question. Somewhat unfortunately despite conscious consideration none of the answers included my rationale.I had not meant to be defensive in responding to him. After all, in my younger years I would have written the command as
grep foo file.txt | cut ... | cut ...
because whenever you do the frequent singlegrep
s you learn the placement of the file argument and it is ready knowledge that the first is the pattern and the later ones are file names.It was a conscious choice to use
cat
when I answered the question, partly because of a reason of "good taste" (in the words of Linus Torvalds) but chiefly for a compelling reason of function.The latter reason is more important so I will put it out first. When I offer a pipeline as a solution I expect it to be reusable. It is quite likely that a pipeline would be added at the end of or spliced into another pipeline. In that case having a file argument to grep screws up reusability, and quite possibly do so silently without an error message if the file argument exists. I. e.
grep foo xyz | grep bar xyz | wc
will give you how many lines inxyz
containbar
while you are expecting the number of lines that contain bothfoo
andbar
. Having to change arguments to a command in a pipeline before using it is prone to errors. Add to it the possibility of silent failures and it becomes a particularly insidious practice.The former reason is not unimportant either since a lot of "good taste" merely is an intuitive subconscious rationale for things like the silent failures above that you cannot think of right at the moment when some person in need of education says "but isn't that cat useless".
However, I will try to also make conscious the former "good taste" reason I mentioned. That reason has to do with the orthogonal design spirit of Unix.
grep
does notcut
andls
does notgrep
. Therefore at the very leastgrep foo file1 file2 file3
goes against the design spirit. The orthogonal way of doing it iscat file1 file2 file3 | grep foo
. Now,grep foo file1
is merely a special case ofgrep foo file1 file2 file3
, and if you do not treat it the same you are at least using up brain clock cycles trying to avoid the useless cat award.That leads us to the argument that
grep foo file1 file2 file3
is concatenating, andcat
concatenates so it is proper tocat file1 file2 file3
but becausecat
is not concatenating incat file1 | grep foo
therefore we are violating the spirit of both thecat
and the almighty Unix. Well, if that were the case then Unix would need a different command to read the output of one file and spit it to stdout (not paginate it or anything just a pure spit to stdout). So you would have the situation where you saycat file1 file2
or you saydog file1
and conscientiously remember to avoidcat file1
to avoid getting the award, while also avoidingdog file1 file2
since hopefully the design ofdog
would throw an error if multiple files are specified.Hopefully at this point you sympathize with the Unix designers for not including a separate command to spit a file to stdout, while also naming
cat
for concatenate rather than giving it some other name.<edit>
removed incorrect comments on<
, in fact<
is an efficient no-copy facility to spit a file to stdout which you can position at the beginning of a pipeline so the unix designers did include something specifically for this</edit>
The next question is why is it important to have commands that merely spit a file or the concatenation of several files to stdout, without any further processing? One reason is to avoid having every single Unix command that operates on standard input to know how to parse at least one command line file argument and use it as input if it exists. The second reason is to avoid users having to remember: (a) where the filename arguments go; and (b) avoid the silent pipeline bug as mentioned above.
That brings us to why
grep
does have the extra logic. The rationale is to allow user-fluency for commands that are used frequently and on a stand-alone basis (rather than as a pipeline). It is a slight compromise of orthogonality for a significant gain in usability. Not all commands should be designed this way and commands that are not frequently used should completely avoid the extra logic of file arguments (remember extra logic leads to unnecessary fragility (the possibility of a bug)). The exception is to allow file arguments like in the case ofgrep
. (by the way note thatls
has a completely different reason to not just accept but pretty much require file arguments)Finally, what could have been done better is if such exceptional commands as
grep
(but not necessarilyls
) generate an error if the standard input is also available when file arguments are specified. This is reasonable because the commands include logic that violates the orthogonal spirit of Unix for user convenience. For further user convenience, i. e. for preventing the suffering caused by a silent failure, such commands should not hesitate to violate their own violation by having extra logic to alert the user if there is a possibility of a silent bug.With the UUoC version,
cat
has to read the file into memory, then write it out to the pipe, and the command has to read the data from the pipe, so the kernel has to copy the whole file three times whereas in the redirected case, the kernel only has to copy the file once. It is quicker to do something once than to do it three times.Using:
is a wholly different and not necessarily useless use of
cat
. It is still useless if the command is a standard filter that accepts zero or more filename arguments and processes them in turn. Consider thetr
command: it is a pure filter that ignores or rejects filename arguments. To feed multiple files to it, you have to usecat
as shown. (Of course, there's a separate discussion that the design oftr
is not very good; there's no real reason it could not have been designed as a standard filter.) This might also be valid if you want the command to treat all the input as a single file rather than as multiple separate files, even if the command would accept multiple separate files: for example,wc
is such a command.It is the
cat single-file
case that is unconditionally useless.Nope!
First of all, it doesn't matter where in a command the redirection happens. So if you like your redirection to the left of your command, that's fine:
is the same as
Second, there are n + 1 processes and a subshell happening when you use a pipe. It is most decidedly slower. In some cases n would've been zero (for example, when you're redirecting to a shell builtin), so by using
cat
you're adding a new process entirely unnecessarily.As a generalization, whenever you find yourself using a pipe it's worth taking 30 seconds to see if you can eliminate it. (But probably not worth taking much longer than 30 seconds.) Here are some examples where pipes and processes are frequently used unnecessarily:
Feel free to edit to add more examples.
An additional problem is that the pipe can silently mask a subshell. For this example, I'll replace
cat
withecho
, but the same problem exists.You might expect
x
to containfoo
, but it doesn't. Thex
you set was in a subshell spawned to execute thewhile
loop.x
in the shell that started the pipeline has an unrelated value, or is not set at all.In bash4, you can configure some shell options so that the last command of a pipeline executes in the same shell as the one that starts the pipeline, but then you might try this
and
x
is once again local to thewhile
's subshell.As someone who regularly points out this and a number of other shell programming antipatterns, I feel obliged to, belatedly, weigh in.
Shell script is very much a copy/paste language. For most people who write shell scripts, they are not in it to learn the language; it's just an obstacle they have to overcome in order to continue to do things in the language(s) they are actually somewhat familiar with.
In that context, I see it as disruptive and potentially even destructive to propagate various shell scripting anti-patterns. The code that someone finds on Stack Overflow should ideally be possible to copy/paste into their environment with minimal changes, and incomplete understanding.
Among the many shell scripting resources on the net, Stack Overflow is unusual in that users can help shape the quality of the site by editing the questions and answers on the site. However, code edits can be problematic because it's easy to make changes which were not intended by the code author. Hence, we tend to leave comments to suggest changes to the code.
The UUCA and related antipattern comments are not just for the authors of the code we comment on; they are as much a caveat emptor to help readers of the site become aware of problems in the code they find here.
We can't hope to achieve a situation where no answers on Stack Overflow recommend useless
cat
s (or unquoted variables, orchmod 777
, or a large variety of other antipattern plagues), but we can at least help educate the user who is about to copy/paste this code into the innermost tight loop of his script which executes millions of times.As far as technical reasons go, the traditional wisdom is that we should try to minimize the number of external processes; this continues to hold as a good general guidance when writing shell scripts.