I have an imported data frame that has column names with various punctuations including parentheses, e.g. BILLNG.STATUS.(COMPLETED./.INCOMPLTE)
.
I was trying to use group_by
from dplyr
to do some summarizing, something like
df <- df %>% group_by(ORDER.NO, BILLNG.STATUS.(COMPLETED./.INCOMPLTE))
which brings the error Error in mutate_impl(.data, dots) :
could not find function "BILLNG.STATUS."
Short of changing the column names, is there a way to handle such column names directly in group_by
?
I think you can make this work if you enclose the "illegal" column names in backticks. For example, let's say I start with this data frame (called df
):
BILLING.STATUS.(COMPLETED./.INCOMPLETE) ORDER.VALUE.(USD)
1 A 0.01544196
2 A 0.95522706
3 B 1.13479303
4 B 1.22848285
Then I can summarise it like this:
dat %>% group_by(`BILLING.STATUS.(COMPLETED./.INCOMPLETE)`) %>%
summarise(count=n(),
mean = mean(`ORDER.VALUE.(USD)`))
Giving:
BILLING.STATUS.(COMPLETED./.INCOMPLETE) count mean
1 A 2 0.4853345
2 B 2 1.1816379
Backticks also come in handy for referring to or creating variable names with whitespace. You can find a number of questions related to dplyr
and backticks on SO, and there's also some discussion of backticks in the help for Quotes
.
I'm just using this not-an-answer as a counter-example or illustration of limitations for the the backtick method. (It was the first strategem I tried. Perhaps it is the fact that two language operations ("(" and "/") are being handled adjacently that makes this fail.)
names(iris)[5] <- "Specie(/)s"
library(dplyr)
by_species <- iris %>% group_by(`Specie(/)s`)
by_species %>% summarise_each(funs(mean(., na.rm = TRUE)))
#Error: cannot modify grouping variable
Tried a variety or other language-oriented efforts with quote
, as.name
and substitute
that also failed. (I wish there were a mechanism to request that this sink to the bottom of the answers.)