I would like to calculate a sample mean in R by introducing a specific criteria. For example I have this table and I want the means of only those for whom stage = 1 or 2:
treatment session period stage wage_accepted type
1 1 1 1 25 low
1 1 1 3 19 low
1 1 1 3 15 low
1 1 1 2 32 high
1 1 1 2 13 low
1 1 1 2 14 low
1 1 2 1 17 low
1 1 2 4 16 low
1 1 2 5 21 low
The desired out in this case should be:
stage mean
1 21.0
2 19.6667
Thanks in advance.
With the dplyr
library
library(dplyr)
df %>% filter(stage==1 | stage ==2) %>% group_by(stage) %>%
summarise(mean=mean(wage_accepted))
If you are new to dplyr
a bit of explanation:
Take the data frame df
then filter
where stage
is equal to 1 or 2. Then for each group
in stage
calculate the mean
of the wage_accepted
Check this out. It's a toy example, but data.table is so compact. dplyr is great as well obviously.
library(data.table)
dat <- data.table(iris)
dat[Species == "setosa" | Species == "virginica", mean(Sepal.Width), by = Species]
In terms of your need for speed... data.table is a rocket ship look it up. I'll leave it to you to apply this to your question. Best, M2K
Assuming you have a csv file for the data, you can read data into a data frame using:
data<-read.csv("PATH_TO_YOUR_CSV_FILE/Name_of_the_CSV_File.csv")
Then you can use either this code relying on sapply()
:
sapply(split(data$Wage_Accepted,data$Stage),mean)
1 2 3 4 5
21.00000 19.66667 17.00000 16.00000 21.00000
Or this code relying on tapply()
:
tapply(data$Wage_Accepted,data$Stage,mean)
1 2 3 4 5
21.00000 19.66667 17.00000 16.00000 21.00000
You can do this and then later filter for Stages as per your requirement
# Calculating mean with respect to stages
df = do.call(rbind, lapply(split(data, f = data$stage),function(x) out = data.frame(stage = unique(x$stage), mean = mean(x$wage_accepted))))
# mean for stage 1 and 2
required = subset(df, stage %in% c(1,2))