Print lines that contain a value in a specific col

2019-09-10 17:11发布

I want to extract only those values in Column 2 that are shared by at least 2 unique values in Column 2.

Using the same input (in this case 3- tab-separated columns):

waterline-n    below-sheath-v    14.8097 
dock-n    below-sheath-v     14.5095 
waterline-n    below-steel-n    11.0330 
picnic-n    below-steel-n    12.2277 
wavefront-n    at-part-of-variance-n    18.4888 
wavefront-n    between-part-of-variance-n    17.0656
audience-b    between-part-of-variance-n    17.6346 
game-n    between-part-of-variance-n    14.9652 
whereabouts-n    become-rediscovery-n    11.3556 
whereabouts-n    get-tee-n    10.9091

For the following desired output:

waterline-n    below-sheath-v    14.8097 
dock-n    below-sheath-v     14.5095 
waterline-n    below-steel-n    11.0330
picnic-n    below-steel-n    12.2277 
wavefront-n    between-part-of-variance-n    17.0656 
audience-b    between-part-of-variance-n    17.6346 
game-n    between-part-of-variance-n    14.9652

Is it possible to do this using grep?

标签: terminal grep
2条回答
迷人小祖宗
2楼-- · 2019-09-10 17:30

Reading the file twice with awk and using array.
I think this would be hard to do with grep only.

awk 'FNR==NR {a[$2]++;next} a[$2]>1' file file
waterline-n    below-sheath-v    14.8097
dock-n    below-sheath-v     14.5095
waterline-n    below-steel-n    11.0330
picnic-n    below-steel-n    12.2277
wavefront-n    between-part-of-variance-n    17.0656
audience-b    between-part-of-variance-n    17.6346
game-n    between-part-of-variance-n    14.9652

In first pass FNR==NR it adds all the value of column 2 in an array, and increment it for every hits that passes.
In pass two it looks in the array and see if hits is more than one and if ok, print the line.

查看更多
你好瞎i
3楼-- · 2019-09-10 17:38

You can get the desired output with grep and uniq. Note that there should be no correspondence between the second column and other columns. Also note that the identical fields need to be on consecutive lines unless you sort the output of cut:

grep -f <(cut -f2 infile | uniq -d) infile

Output:

waterline-n below-sheath-v  14.8097
dock-n  below-sheath-v  14.5095
waterline-n below-steel-n   11.0330
picnic-n    below-steel-n   12.2277
wavefront-n between-part-of-variance-n  17.0656
audience-b  between-part-of-variance-n  17.6346
game-n  between-part-of-variance-n  14.9652
查看更多
登录 后发表回答