I have a dataset with many missing values as -999. Part of the data is
input.txt
30
-999
10
40
23
44
-999
-999
31
-999
54
-999
-999
-999
-999
-999
-999
-999 and so on
I would like calculate the average in each 6 rows interval without considering the missing values.
Desire output is
ofile.txt
29.4
42.5
-999
While I am trying with this
awk '!/\-999/{sum += $1; count++} NR%6==0{print count ? (sum/count) : count;sum=count=0}' input.txt
it is giving
29.4
42.5
0
I'm not entirely sure why, if you're discounting
-999
values, you'd think that-999
was a better choice than zero for the average of the third group. In the first two groups, the-999
values contribute to neither the sum nor the count, so an argument could be made that zero is a better choice.However, it may be that you want
-999
to represent a "lack of value" (which would certainly be the case where there were no values in a group). If that's the case, you can just ouput-999
rather thancount
in your original code:Even if you decide that zero is a better answer, I'd still make that explicit rather than outputting the count variable itself: