This is probably in many FAQs - instead of using:
cat file | command
(which is called useless use of cat), correct way supposed to be:
command < file
In the 2nd, "correct" way - OS does not have to spawn an extra process.
Despite knowing that, I continued to use useless cat for 2 reasons.
more aesthetic - I like when data moves uniformly only from left to right. And it easier to replace
cat
with something else (gzcat
,echo
, ...), add a 2nd file or insert new filter (pv
,mbuffer
,grep
...).I "felt" that it might be faster in some cases. Faster because there are 2 processes, 1st (
cat
) does the reading and the second does whatever. And they can run in parallel, which means sometimes faster execution.
Is my logic correct (for 2nd reason)?
I think that (the traditional way) using pipe is a bit more faster; on my box I used
strace
command to see what's going on:Without pipe:
And with pipe:
You can do some testing with
strace
andtime
command with more and longer commands for good benchmarking.In defense of cat:
Yes,
or
is more efficient, but many invocations don't have performance issues, so you don't care.
ergonomic reasons:
We are used to read from left to right, so a command like
is trivial to understand.
has to jump over process1, and then read left to right. This can be healed by:
looks somehow, as if there were an arrow pointing to the left, where nothing is. More confusing and looking like fancy quoting is:
and generating scripts is often an iterative process,
where you see your progress stepwise, while
not even works. Simple ways are less error prone and ergonomic command catenation is simple with cat.
Another topic is, that most people were exposed to > and < as comparison operators, long before using a computer and when using a computer as programmers, are far more often exposed to these as such.
And comparing two operands with < and > is contra commutative, which means
I remember the first time using < for input redirection, I feared
could mean the same as
and somehow overwrite my a.sh script. Maybe this is an issue for many beginners.
rare differences
The latter can be used in calculations directly.
Of course the < can be used here too, instead of a file parameter:
but who cares - 15k?
If I would run occasionally into issues, surely I would change my habit of invocing cat.
When using very large or many, many files, avoiding cat is fine. To most questions the use of cat is orthogonal, off topic, not an issue.
Starting these useless useless use of cat discussion on every second shell topic is only annoying and boring. Get a life and wait for your minute of fame, when dealing with performance questions.