AWK reference columns from file based on header of

2019-09-14 23:17发布

I have the following code in cmd.awk:

 BEGIN {FS=","}
 {
    if(FNR==1) print $0",Header";
    else if (FNR>1)
            {
                    if($79==0 && $80==0 && $81==0) print $0",0";
                    else if ($80==0 && $81!=0) print $0","($79-$81)/$81;
                    else if ($81==0 && $80!=0) print $0","($79-$80)/$80;
                    else if ($81==0 && $80==0 && $79!=0) print $0",10";
                    else if ($81!=0 && $80!=0) print $0","(($79-$80)/$80)+(($80-$81)/$81);
            }
}

When i execute the following commamnd:

awk -f cmd.awk input.txt

it performs the required operation(as specified in the AWK Script) and provides the required result.

But in this script all the columns of the input txt file are being accessed based on the column_index i.e., $79, $80, $81 etc.

My requirement is that i need to use this script as a function which takes $79, $80, $81 and Header(as given in the script) as parameters, performs operations and stores result in the newly appended column with column name Header and store the new contents into an output file. But i am only allowed to specify the parameters in the form of column headers and not in column index i.e., my function call has to be something like this:

cmd(column_header1, column_header2, column_header3,new_header)

and the function definition of cmd() has to perform the operation mentioned in the awk script above.

Is there any way to do this? Please bear in mind that I'm very new to awk. Thanks in advance.

My input file contains 150 columns and over 50M rows. A sample of the file is given below:

RN,DATE,ID,PRE_M1,PRE_M2,GALV,GALG,PRE_M5.........................TOTAL
0624873840,2016/04/28,201610,1618,0,0,0,Active,.................12234
0747269250,2016/02/02,201610,227,93,0,0,Daat,....................99988

The input file contains columns of type numeric,character. The columns being accessed in the above AWK script are all of type numeric.

A sample of the required output file is as below:

RN,DATE,ID,PRE_M1,PRE_M2,GALV,GALG,PRE_M5.........................TOTAL,Header
0624873840,2016/04/28,201610,1618,0,0,0,Active,.................12234,10
0747269250,2016/02/02,201610,227,93,0,0,Daat,....................99988,0

Please note that a new column is being appended to the file with name "Header" and this column contains the result of the AWK script for each individual row of the input file.

标签: shell awk
2条回答
Ridiculous、
2楼-- · 2019-09-14 23:50

I think you can simplify it a lot, there is no input file so flying blind...

Assuming the columns in interest are consecutive and fields are all numerical, just provide the start address

$ awk -F, -v s=79 'BEGIN {OFS=FS}
                   NR==1 {$(NF+1)="Header"}
                   NR >1 {v1=$s; v2=$(s+1); v3=$(s+2)
                          if(!v2 && !v3) $(NF+1) = v1?10:0
                          else $(NF+1) = v3?(v1-v3)/v3:0 + v2?(v1-v2)/v2:0}1' file

The parametric column names can be written as

$ cols="c1,c2,c3"; header="Header"
$ awk -F, -v cols="$cols" -v hdr="$header" '
           BEGIN {OFS=FS}
           NR==1 {n=split(cols,cn); 
                  for(i=1;i<=NF;i++) 
                    for(j=1;j<=n;j++) 
                      if($i==cn[j]) c[++k]=i; 
                  $(NF+1)=hdr}
           NR >1 {v1=$c[1]; v2=$c[2]; v3=$c[3]
                  if(!v2 && !v3) $(NF+1) = v1?10:0
                  else $(NF+1) = v3?(v1-v3)/v3:0 + v2?(v1-v2)/v2:0}1' file

id,c1,c2,c3,Header
1,0,0,0,0
2,0,0,1,-1
3,0,1,0,-1
4,0,1,1,-1
5,1,0,0,10
6,1,0,1,0
7,1,1,0,0
8,1,1,1,0

for the given input file

id,c1,c2,c3
1,0,0,0
2,0,0,1
3,0,1,0
4,0,1,1
5,1,0,0
6,1,0,1
7,1,1,0
8,1,1,1

Explanation

n=split(cols,cn) splits the string "cols" into array "cn" using the same FS delimiter. The number of elements will be returned and assigned to "n".

1 is shorthand for {print}

查看更多
贪生不怕死
3楼-- · 2019-09-14 23:50
# --- for your sample test ----------
column_1=1;column_2=2;column_3=3;new_header="Header"

--- the generic code ------

awk -v Col1=${column_1} -v Col2=${column_2} -v Col3=${column_3} -v NewH="${new_header}" -F ',' '
  BEGIN { OFS = FS }
  FNR == 1 {
     $(NF + 1) = NewH
     print $0
     next
     }
  {
   if      ( ! $Col1 && ! $Col2 && ! $Col3) NewV = 0
   else if ( ! $Col2 && $Col3 )             NewV = ($Col1-$Col3)/$Col3
   else if ( ! $Col3 && $Col2 )             NewV = ($Col1-$Col2)/$Col2
   else if ( ! $Col3 && ! $Col2 && $Col1 )  NewV = 10
   else if ( $Col3 && $Col2)                NewV = (($Col1-$Col2)/$Col2)+(($Col2-$Col3)/$Col3)
   print $0
   }' YourFile
  • pass value to variable via the -v argument of awk (integer is asis, string are quoted (see NewH)
  • var == 0 is the same a as ! var in the if (you can keep your version for more readibility)
  • no need of ; at end of the line (this is a instruction separator for onliner)

you can catch directly the argument in awk but a bit heavier compare to calling awk with -v)

查看更多
登录 后发表回答