I want to calculate means over several columns for each row in my dataframe containing missing values, and place results in a new column called 'means.' Here's my dataframe:
df <- data.frame(A=c(3,4,5),B=c(0,6,8),C=c(9,NA,1))
1 3 0 9
2 4 6 NA
3 5 8 1
The code below successfully accomplishes the task if columns have no missing values, such as columns A and B.
df %>%
rowwise() %>%
mutate(means=mean(A:B, na.rm=T))
A B C means
<dbl> <dbl> <dbl> <dbl>
1 3 0 9 1.5
2 4 6 NA 5.0
3 5 8 1 6.5
However, if a column has missing values, such as C, then I get an error:
> df %>% rowwise() %>% mutate(means=mean(A:C, na.rm=T))
Error: NA/NaN argument
Ideally, I'd like to implement it with dplyr.
df %>%
mutate(means=rowMeans(., na.rm=TRUE))
The .
is a "pronoun" that references the data frame df
that was piped into mutate
A B C means
1 3 0 9 4.000000
2 4 6 NA 5.000000
3 5 8 1 4.666667
You can also select only specific columns to include, using all the usual methods (column names, indices, grep
, etc.).
df %>%
mutate(means=rowMeans(.[ , c("A","C")], na.rm=TRUE))
A B C means
1 3 0 9 6
2 4 6 NA 4
3 5 8 1 3
It is simple to accomplish in base R as well:
cbind(df, "means"=rowMeans(df, na.rm=TRUE))
A B C means
1 3 0 9 4.000000
2 4 6 NA 5.000000
3 5 8 1 4.666667
The rowMeans
performs the calculation.and allows for the na.rm argument to skip missing values, while cbind
allows you to bind the mean and whatever name you want to the the data.frame, df.
Regarding the error in OP's code, we can use the concatenate function c
to get those elements as a single vector
and then do the mean
as mean
can take only a single argument.
df %>%
rowwise() %>%
mutate(means = mean(c(A, B, C), na.rm = TRUE))
# A B C means
# <dbl> <dbl> <dbl> <dbl>
#1 3 0 9 4.000000
#2 4 6 NA 5.000000
#3 5 8 1 4.666667
Also, we can use rowMeans
with transform
transform(df, means = rowMeans(df, na.rm = TRUE))
# A B C means
#1 3 0 9 4.000000
#2 4 6 NA 5.000000
#3 5 8 1 4.666667
Or using data.table
setDT(df)[, means := rowMeans(.SD, na.rm = TRUE)]