I am trying to use dplyr computation as below and then call this in a function where I can change the column name and dataset name. The code is as below:-
sample_table <- function(byvar = TRUE, dataset = TRUE) {
tcount <-
df2 %>% group_by(.dots = byvar) %>% tally() %>% arrange(byvar) %>% rename(tcount = n) %>%
left_join(
select(
dataset %>% group_by(.dots = byvar) %>% tally() %>% arrange(byvar) %>% rename(scount = n), byvar, scount
), by = c("byvar")
) %>%
mutate_each(funs(replace(., is.na(.), 0)),-byvar %>% mutate(
tperc = round(tcount / rcount, digits = 2), sperc = round(scount / samplesize, digits = 2),
absdiff = abs(sperc - tperc)
) %>%
select(byvar, tcount, tperc, scount, sperc, absdiff)
return(tcount)
}
category_Sample1 <- sample_table(byvar = "category", dataset = Sample1)
My function name is sample_table. The Error message is as below:-
Error: All select() inputs must resolve to integer column positions.
The following do not:
* byvar
I know this is a repeat question and I have gone through the below links:- Function writing passing column reference to group_by Error when combining dplyr inside a function
I am not sure where I am going wrong. Any help would be really appreciated please. rcount is the number of rows in df2 and samplesize is the number of rows in "dataset" dataframe I have to compute the same thing for another variable with three different "dataset" names.
You use column references as strings (
byvar
) (Standard Evaluation) and normal reference (tcount
,tperc
etc.) (Non Standard Evaluation) together. Make sure you use one of both and the appropriate function:select()
orselect_()
. You can fix your issue by using