column name with brackets or other punctuations fo

2020-02-12 07:43发布

问题:

I have an imported data frame that has column names with various punctuations including parentheses, e.g. BILLNG.STATUS.(COMPLETED./.INCOMPLTE) .

I was trying to use group_by from dplyr to do some summarizing, something like

df <- df %>% group_by(ORDER.NO, BILLNG.STATUS.(COMPLETED./.INCOMPLTE))

which brings the error Error in mutate_impl(.data, dots) : could not find function "BILLNG.STATUS."

Short of changing the column names, is there a way to handle such column names directly in group_by ?

回答1:

I think you can make this work if you enclose the "illegal" column names in backticks. For example, let's say I start with this data frame (called df):

  BILLING.STATUS.(COMPLETED./.INCOMPLETE) ORDER.VALUE.(USD)
1                                       A        0.01544196
2                                       A        0.95522706
3                                       B        1.13479303
4                                       B        1.22848285

Then I can summarise it like this:

dat %>% group_by(`BILLING.STATUS.(COMPLETED./.INCOMPLETE)`) %>% 
  summarise(count=n(),
            mean = mean(`ORDER.VALUE.(USD)`))

Giving:

  BILLING.STATUS.(COMPLETED./.INCOMPLETE) count      mean
1                                       A     2 0.4853345
2                                       B     2 1.1816379

Backticks also come in handy for referring to or creating variable names with whitespace. You can find a number of questions related to dplyr and backticks on SO, and there's also some discussion of backticks in the help for Quotes.



回答2:

I'm just using this not-an-answer as a counter-example or illustration of limitations for the the backtick method. (It was the first strategem I tried. Perhaps it is the fact that two language operations ("(" and "/") are being handled adjacently that makes this fail.)

names(iris)[5] <- "Specie(/)s"
library(dplyr)
by_species <- iris %>% group_by(`Specie(/)s`)
by_species %>% summarise_each(funs(mean(., na.rm = TRUE)))
#Error: cannot modify grouping variable

Tried a variety or other language-oriented efforts with quote, as.name and substitute that also failed. (I wish there were a mechanism to request that this sink to the bottom of the answers.)



标签: r dplyr