I'd like to read filein.txt (tab delimited) and output a fileout.txt with only rows that match the value of a given column, and eliminate the column being queried. i.e.,
filein.txt
#name\thouse\taddress
roger\tvictorian\t223 dolan st.
maggie\tfrench\t12 alameda ave.
kingston\tvictorian\t224 house st.
robert\tamerican\t22 dolan st.
Let us say I'd like to select only the rows where the houses are of victorian
style, then my fileout.txt should look like:
fileout.txt
#name\taddress
roger\t223 dolan st.
kingston\t224 house st.
You can do it with the following
awk
script:Explanation:
style="victorian"
: assign the house style that you want to select outside of theawk
script so it's easier to maintainawk
: invoke awk-v s_style=$style
: the-v
option passes an external variable into awk. Need to specify this for each variable you pass in. In this case it assigns the external variable$style
to the awk variables_style
.BEGIN{FS=OFS="\t"}
: tells awk that the field separators in the output should be tabs, not spaces by default.{$2==s_style {$2=""; sub("\t\t","\t"); print}}'
: If the 2nd field is the house type specified ins_style
(in this case,victorian
), then remove it and print the line.Alternatively, you could do:
but this assumes that your input files will not have additional fields separated by tabs in the future.
Using the OFS (Output Field Separator) variable, you can avoid hard coding between rows: