I have a text file:
1 Q0 1657 1 19.6117 Exp
1 Q0 1410 2 18.8302 Exp
2 Q0 3078 1 18.6695 Exp
2 Q0 2434 2 14.0508 Exp
2 Q0 3129 3 13.5495 Exp
I want to take the 2nd and 4th word of every line like this:
1657 19.6117
1410 18.8302
3078 18.6695
2434 14.0508
3129 13.5495
I'm using this code:
nol=$(cat "/path/of/my/text" | wc -l)
x=1
while [ $x -le "$nol" ]
do
line=($(sed -n "$x"p /path/of/my/text)
echo ""${line[1]}" "${line[3]}"" >> out.txt
x=$(( $x + 1 ))
done
It works, but it is very complicated and takes a long time to process long text files.
Is there a simpler way to do this?
You can use the
cut
command:prints
the
-d' '
- mean, usespace
as a delimiter-f3,5
- take and print 3rd and 5th columnThe
cut
is much faster for large files as a pure shell solution. If your file is delimited with multiple whitespaces, you can remove them first, like:where the (gnu) sed will replace any
tab
orspace
characters with a singlespace
.For a variant - here is a perl solution too:
If your file contains n lines, then your script has to read the file n times; so if you double the length of the file, you quadruple the amount of work your script does — and almost all of that work is simply thrown away, since all you want to do is loop over the lines in order.
Instead, the best way to loop over the lines of a file is to use a
while
loop, with the condition-command being theread
builtin:In your case, since you want to split the line into an array, and the
read
builtin actually has special support for populating an array variable, which is what you want, you can write:or better yet:
However, for what you're doing you can just use the
cut
utility:(or
awk
, as Tom van der Woerdt suggests, orperl
, or evensed
).If you are using structured data, this has the added benefit of not invoking an extra shell process to run
tr
and/orcut
or something. ...(Of course, you will want to guard against bad inputs with conditionals and sane alternatives.)
For the sake of completeness:
Instead of
_
an arbitrary variable (such asjunk
) can be used as well. The point is just to extract the columns.Demo:
One more simple variant -
iirc :
or, as mentioned in the comments :