I have a dataset with many missing values as -999. Part of the data is
input.txt
30
-999
10
40
23
44
-999
-999
31
-999
54
-999
-999
-999
-999
-999
-999
-999 and so on
I would like calculate the average in each 6 rows interval without considering the missing values.
Desire output is
ofile.txt
29.4
42.5
-999
While I am trying with this
awk '!/\-999/{sum += $1; count++} NR%6==0{print count ? (sum/count) : count;sum=count=0}' input.txt
it is giving
29.4
42.5
0
I'm not entirely sure why, if you're discounting -999
values, you'd think that -999
was a better choice than zero for the average of the third group. In the first two groups, the -999
values contribute to neither the sum nor the count, so an argument could be made that zero is a better choice.
However, it may be that you want -999
to represent a "lack of value" (which would certainly be the case where there were no values in a group). If that's the case, you can just ouput -999
rather than count
in your original code:
awk '!/\-999/{sm+=$1;ct++} NR%6==0{print ct?(sm/ct):-999;sm=ct=0}' input.txt
Even if you decide that zero is a better answer, I'd still make that explicit rather than outputting the count variable itself:
awk '!/\-999/{sm+=$1;ct++} NR%6==0{print ct?(sm/ct):0;sm=ct=0}' input.txt