Calculating average without considering missing va

2019-08-11 15:06发布

问题:

I have a dataset with many missing values as -999. Part of the data is

input.txt
30
-999
10
40
23
44
-999
-999
31
-999
54
-999 
-999
-999
-999
-999
-999
-999 and so on

I would like calculate the average in each 6 rows interval without considering the missing values.

Desire output is

ofile.txt
29.4
42.5
-999

While I am trying with this

awk '!/\-999/{sum += $1; count++} NR%6==0{print count ? (sum/count) : count;sum=count=0}' input.txt

it is giving

29.4
42.5
0

回答1:

I'm not entirely sure why, if you're discounting -999 values, you'd think that -999 was a better choice than zero for the average of the third group. In the first two groups, the -999 values contribute to neither the sum nor the count, so an argument could be made that zero is a better choice.

However, it may be that you want -999 to represent a "lack of value" (which would certainly be the case where there were no values in a group). If that's the case, you can just ouput -999 rather than count in your original code:

awk '!/\-999/{sm+=$1;ct++} NR%6==0{print ct?(sm/ct):-999;sm=ct=0}' input.txt

Even if you decide that zero is a better answer, I'd still make that explicit rather than outputting the count variable itself:

awk '!/\-999/{sm+=$1;ct++} NR%6==0{print ct?(sm/ct):0;sm=ct=0}' input.txt


标签: linux shell awk