I have a data set that looks like this:
id a b
1 AA 2
1 AB 5
1 AA 1
2 AB 2
2 AB 4
3 AB 4
3 AB 3
3 AA 1
I need to calculate the cumulative mean for each record within each group and excluding the case where a == 'AA'
, So sample output should be:
id a b mean
1 AA 2 -
1 AB 5 5
1 AA 1 5
2 AB 2 2
2 AB 4 (4+2)/2
3 AB 4 4
3 AB 3 (4+3)/2
3 AA 1 (4+3)/2
3 AA 4 (4+3)/2
I tried to achieve it using dplyr and cummean by getting an error.
df <- df %>%
group_by(id) %>%
mutate(mean = cummean(b[a != 'AA']))
Error: incompatible size (123), expecting 147 (the group size) or 1
Can you suggest a better way to achieve the same in R ?
The trick here is to reconstruct the
cummean
by dividing the adjustedcumsum
by the adjusted count. As a one-liner:We can make this a little nicer (the "multiply by
a!='AA'
- magic!" is the ugliness in my mind) by taking out thea != 'AA'
as a columnThere may be an easier approach. Here, we group by 'id'. Create a new column 'Mean' by first converting the elements in 'b' that corresponds to 'AA' in 'a' to
NA
(b*NA^(a=='AA')
).NA^(a=='AA')
gives an output ofNA
for 'AA' in 'a' and 1 for all other values. So, when we multiply by 'b', it replaces the 1 with the values in 'b' while NA remains as such. We usena.aggregate
to replace the 'NA' with themean
of non-NA elements in each group, then wrap withcummean
to get the cumulative mean. If the first value in each group for 'a' is 'AA', we can getNA
for that by multiplying withNA^(row_number()==1 & a=='AA')
.data