calculate a mean by criteria in R

2019-07-20 22:04发布

问题:

I would like to calculate a sample mean in R by introducing a specific criteria. For example I have this table and I want the means of only those for whom stage = 1 or 2:

treatment session period stage wage_accepted type 
1            1      1     1            25  low 
1            1      1     3            19  low 
1            1      1     3            15  low 
1            1      1     2            32 high 
1            1      1     2            13  low 
1            1      1     2            14  low 
1            1      2     1            17  low 
1            1      2     4            16  low
1            1      2     5            21  low

The desired out in this case should be:

   stage  mean
      1  21.0 
      2  19.6667

Thanks in advance.

回答1:

With the dplyr library

library(dplyr)

df %>% filter(stage==1 | stage ==2) %>% group_by(stage) %>%
  summarise(mean=mean(wage_accepted))

If you are new to dplyr a bit of explanation:

Take the data frame df then filter where stage is equal to 1 or 2. Then for each group in stage calculate the mean of the wage_accepted



回答2:

Check this out. It's a toy example, but data.table is so compact. dplyr is great as well obviously.


    library(data.table)

    dat <- data.table(iris)
    dat[Species == "setosa" | Species == "virginica", mean(Sepal.Width), by = Species]

In terms of your need for speed... data.table is a rocket ship look it up. I'll leave it to you to apply this to your question. Best, M2K



回答3:

Assuming you have a csv file for the data, you can read data into a data frame using:

data<-read.csv("PATH_TO_YOUR_CSV_FILE/Name_of_the_CSV_File.csv")

Then you can use either this code relying on sapply():

sapply(split(data$Wage_Accepted,data$Stage),mean)

   1        2        3        4        5 
21.00000 19.66667 17.00000 16.00000 21.00000 

Or this code relying on tapply():

tapply(data$Wage_Accepted,data$Stage,mean)

   1        2        3        4        5 
21.00000 19.66667 17.00000 16.00000 21.00000 


回答4:

You can do this and then later filter for Stages as per your requirement

# Calculating mean with respect to stages
df = do.call(rbind, lapply(split(data, f = data$stage),function(x) out = data.frame(stage = unique(x$stage), mean = mean(x$wage_accepted))))

# mean for stage 1 and 2
required = subset(df, stage %in% c(1,2))


标签: r mean