Take nth column in a text file

I have a text file:

1 Q0 1657 1 19.6117 Exp
1 Q0 1410 2 18.8302 Exp
2 Q0 3078 1 18.6695 Exp
2 Q0 2434 2 14.0508 Exp
2 Q0 3129 3 13.5495 Exp

I want to take the 2nd and 4th word of every line like this:

1657 19.6117
1410 18.8302
3078 18.6695
2434 14.0508
3129 13.5495

I'm using this code:

 nol=$(cat "/path/of/my/text" | wc -l)
 x=1
 while  [ $x -le "$nol" ]
 do
     line=($(sed -n "$x"p /path/of/my/text)
     echo ""${line[1]}" "${line[3]}""  >> out.txt
     x=$(( $x + 1 ))
 done

It works, but it is very complicated and takes a long time to process long text files.

Is there a simpler way to do this?

标签： linux bash

6条回答

神经病院院长

2楼-- · 2019-01-08 12:30

You can use the cut command:

cut -d' ' -f3,5 < datafile.txt

prints

1657 19.6117
1410 18.8302
3078 18.6695
2434 14.0508
3129 13.5495

the

-d' ' - mean, use space as a delimiter
-f3,5 - take and print 3rd and 5th column

The cut is much faster for large files as a pure shell solution. If your file is delimited with multiple whitespaces, you can remove them first, like:

sed 's/[\t ][\t ]*/ /g' < datafile.txt | cut -d' ' -f3,5

where the (gnu) sed will replace any tab or space characters with a single space.

For a variant - here is a perl solution too:

perl -lanE 'say "$F[2] $F[4]"' < datafile.txt

0人赞添加讨论(0) 举报

Explosion°爆炸

3楼-- · 2019-01-08 12:38

If your file contains n lines, then your script has to read the file n times; so if you double the length of the file, you quadruple the amount of work your script does — and almost all of that work is simply thrown away, since all you want to do is loop over the lines in order.

Instead, the best way to loop over the lines of a file is to use a while loop, with the condition-command being the read builtin:

while IFS= read -r line ; do
    # $line is a single line of the file, as a single string
    : ... commands that use $line ...
done < input_file.txt

In your case, since you want to split the line into an array, and the read builtin actually has special support for populating an array variable, which is what you want, you can write:

while read -r -a line ; do
    echo ""${line[1]}" "${line[3]}"" >> out.txt
done < /path/of/my/text

or better yet:

while read -r -a line ; do
    echo "${line[1]} ${line[3]}"
done < /path/of/my/text > out.txt

However, for what you're doing you can just use the cut utility:

cut -d' ' -f2,4 < /path/of/my/text > out.txt

(or awk, as Tom van der Woerdt suggests, or perl, or even sed).

0人赞添加讨论(0) 举报

别忘想泡老子

4楼-- · 2019-01-08 12:39

If you are using structured data, this has the added benefit of not invoking an extra shell process to run tr and/or cut or something. ...

(Of course, you will want to guard against bad inputs with conditionals and sane alternatives.)

...
while read line ; 
do 
    lineCols=( $line ) ;
    echo "${lineCols[0]}"
    echo "${lineCols[1]}"
done < $myFQFileToRead ; 
...

0人赞添加讨论(0) 举报

迷人小祖宗

5楼-- · 2019-01-08 12:47

For the sake of completeness:

while read _ _ one _ two _; do
    echo "$one $two"
done < file.txt

Instead of _ an arbitrary variable (such as junk) can be used as well. The point is just to extract the columns.

Demo:

$ while read _ _ one _ two _; do echo "$one $two"; done < /tmp/file.txt
1657 19.6117
1410 18.8302
3078 18.6695
2434 14.0508
3129 13.5495

0人赞添加讨论(0) 举报

再贱就再见

6楼-- · 2019-01-08 12:53

One more simple variant -

$ while read line ;
  do
      set $line          # assigns words in line to positional parameters
      echo "$3 $5"
  done < file

0人赞添加讨论(0) 举报

一夜七次

7楼-- · 2019-01-08 12:54

iirc :

cat filename.txt | awk '{ print $2 $4 }'

or, as mentioned in the comments :

awk '{ print $2 $4 }' filename.txt

0人赞添加讨论(0) 举报

Take nth column in a text file

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间