I want to extract only those values in Column 2 that are shared by at least 2 unique values in Column 2.
Using the same input (in this case 3- tab-separated columns):
waterline-n below-sheath-v 14.8097
dock-n below-sheath-v 14.5095
waterline-n below-steel-n 11.0330
picnic-n below-steel-n 12.2277
wavefront-n at-part-of-variance-n 18.4888
wavefront-n between-part-of-variance-n 17.0656
audience-b between-part-of-variance-n 17.6346
game-n between-part-of-variance-n 14.9652
whereabouts-n become-rediscovery-n 11.3556
whereabouts-n get-tee-n 10.9091
For the following desired output:
waterline-n below-sheath-v 14.8097
dock-n below-sheath-v 14.5095
waterline-n below-steel-n 11.0330
picnic-n below-steel-n 12.2277
wavefront-n between-part-of-variance-n 17.0656
audience-b between-part-of-variance-n 17.6346
game-n between-part-of-variance-n 14.9652
Is it possible to do this using grep?
Reading the file twice with
awk
and using array.I think this would be hard to do with
grep
only.In first pass
FNR==NR
it adds all the value of column 2 in an array, and increment it for every hits that passes.In pass two it looks in the array and see if hits is more than one and if ok, print the line.
You can get the desired output with
grep
anduniq
. Note that there should be no correspondence between the second column and other columns. Also note that the identical fields need to be on consecutive lines unless you sort the output ofcut
:Output: