I'd like to read filein.txt (tab delimited) and output a fileout.txt with only rows that match the value of a given column, and eliminate the column being queried. i.e.,
filein.txt
#name\thouse\taddress
roger\tvictorian\t223 dolan st.
maggie\tfrench\t12 alameda ave.
kingston\tvictorian\t224 house st.
robert\tamerican\t22 dolan st.
Let us say I'd like to select only the rows where the houses are of victorian
style, then my fileout.txt should look like:
fileout.txt
#name\taddress
roger\t223 dolan st.
kingston\t224 house st.
awk -F"\t" '$2 == "victorian" { print $1"\t"$3 }' file.in
You can do it with the following awk
script:
#!/bin/bash
style="victorian"
awk -v s_style=$style 'BEGIN{FS=OFS="\t"}
$2==s_style {$2=""; sub("\t\t","\t"); print}'
Explanation:
style="victorian"
: assign the house style that you want to select outside of the awk
script so it's easier to maintain
awk
: invoke awk
-v s_style=$style
: the -v
option passes an external variable into awk. Need to specify this for each variable you pass in. In this case it assigns the external variable $style
to the awk variable s_style
.
BEGIN{FS=OFS="\t"}
: tells awk that the field separators in the output should be tabs, not spaces by default.
{$2==s_style {$2=""; sub("\t\t","\t"); print}}'
: If the 2nd field is the house type specified in s_style
(in this case, victorian
), then remove it and print the line.
Alternatively, you could do:
#!/bin/bash
style="victorian"
awk -v s_style=$style 'BEGIN{FS=OFS="\t"}
$2==s_style {print $1, $3}'
but this assumes that your input files will not have additional fields separated by tabs in the future.
Using the OFS (Output Field Separator) variable, you can avoid hard coding between rows:
awk -F"\t" -v OFS="\t" '$2 == "victorian" { print $1,$3 }' file.in