multidimensional arrays in awk

2020-03-03 08:32发布

问题:

I tried creating a pseudo-multidimensional array in awk.

# Calculate cumulative context score

BEGIN { FS=OFS="\t" }

{
        a[$2+FS+$7,$3]+=$6
}

END { for (i,j) in a
        { print i,j,a[i,j] }

}

Output:

awk: ccstscan.awk:9: END { for (i,j) in a
awk: ccstscan.awk:9:             ^ syntax error

this is what is mentioned in GNU awk manual:

To test whether a particular index sequence exists in a multidimensional array, use the same operator (in) that is used for single dimensional arrays. Write the whole sequence of indices in parentheses, separated by commas, as the left operand:

 (subscript1, subscript2, ...) in array

i tried modifying the script to create a true-multi dimensional array:

BEGIN { FS=OFS="\t" }

{
    a[$2+FS+$7][$3]+=$6
}

END { for i in a
    {
     for j in a[i]
        { print i,j,a[i][j]
        }

    }
}

i ran it with gawk. it also gave an error:

gawk: ccstscan.awk:6:   a[$2+FS+$7][$3]+=$6
gawk: ccstscan.awk:6:              ^ syntax error
gawk: ccstscan.awk:9: END { for i in a
gawk: ccstscan.awk:9:           ^ syntax error
gawk: ccstscan.awk:11:   for j in a[i]
gawk: ccstscan.awk:11:       ^ syntax error
gawk: ccstscan.awk:11:   for j in a[i]
gawk: ccstscan.awk:11:             ^ syntax error
gawk: ccstscan.awk:12:          { print i,j,a[i][j]
gawk: ccstscan.awk:12:                          ^ syntax error

what is the correct format to make and scan multi-dimensional-associative arrays

回答1:

If you are using the simulated multi-dimensional arrays, your loop would need to be like this:

  END { 
    for (ij in a) {
      split(ij,indices,SUBSEP);
      i=indices[1];
      j=indices[2];
      print i,j,a[ij]
    }
  }

The (i,j) in a syntax only works for testing whether a particular index is in the array. It doesn't work for for-loops, despite the for-loop allowing a similar syntax.

For the true multi-dimensional arrays (arrays of arrays), you can write it like this:

BEGIN { FS=OFS="\t" }

{ a[$2+FS+$7][$3]+=$6 }

END { 
  for (i in a) {
    for (j in a[i]) { 
      print i,j,a[i][j]
    }
  }
}

However, arrays of arrays was only added in gawk 4.0, so your version of gawk may not support it.

Another note: on this line:

a[$2+FS+$7,$3]+=$6

It seems like you are trying to concatenate $2, FS, and $7, but "+" is for numerical addition, not concatenation. You would need to write it like this:

a[$2 FS $7,$3] += $6