I would like to create awk code, which will modifie text like this:
- Tab delimited all columns
- Delete all columns which is starting by "##text"
- And keep headers, which starts "#header"
I have this code, but it is not good:
#!/bin/bash
for i
in *.vcf;
do
awk 'BEGIN {print "CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILT\tINFO\tFORMAT"}' |
awk '{$1 "\t" $2 "\t" $3 "\t" $4 "\t" $5 "\t" $6 "\t" $7 "\t" $8 "\t" $9}' $i |
awk '!/#/' > ${i%.vcf}.tsv;
done
INPUT:
> ##fileformat=VCFv4.1
> ##FORMAT=<ID=GQX,Number=1,Type=Integer,Description="Minimum of {Genotype quality assuming variant position,Genotype quality assuming
> non-variant position}">
> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 1 chr1 10385471 rs17401966 A G 100.00 PASS DP=67;TI=NM_015074;GI=KIF1B;FC=Silent GT:GQ:AD:VF:NL:SB:GQX 0/1:100:29,38:0.5672:20:-100.0000:100
> chr1 17380497 rs2746462 G T 100.00 PASS DP=107;TI=NM_003000;GI=SDHB;FC=Synonymous_A6A;EXON GT:GQ:AD:VF:NL:SB:GQX 1/1:100:0,107:1.0000:20:-100.0000:100
> chr1 222045446 rs6691170 G T 100.00 PASS DP=99 GT:GQ:AD:VF:NL:SB:GQX 0/1:100:49,50:0.5051:20:-100.0000:100
OUTPUT: What I want
> CHROM POS ID REF ALT QUAL FILTER INFO etc...
> hr1 10385471 rs17401966 A
> G 100.00 PASS DP=67;TI=NM_015074;GI=KIF1B;
You want to put your whole program in a single awk call:
This program will skip any record that begins with ##, will remove the leading hash for lines that have it, and then print each line using tab as the field separator.
awk programs are a series of
condition {action}
pairs. For each record in the input, if the condition is true, the action block is performed, otherwise it is ignored. If the condition is omitted, the action block is performed unconditionally.One tricky bit in this example is
$1=$1
-- when fields are modified, awk will re-build the record, joining the fields using the output field separator (OFS
variable).