Dash in column name yields “object not found” Erro

2020-04-21 03:17发布

问题:

I have a function to generate scatter plots from data, where an argument is provided to select which column to use for coloring the points. Here is a simplified version:

library(ggplot2)

plot_gene <- function (df, gene) {
   ggplot(df, aes(x, y)) + 
     geom_point(aes_string(col = gene)) +
     scale_color_gradient()
}

where df is a data.frame with columns x, y, and then a bunch of gene names. This works fine for most gene names; however, some have dashes and these fail:

print(plot_gene(df, "Gapdh")) # great!
print(plot_gene(df, "H2-Aa")) # Error: object "H2" not found

It appears the gene variable is getting parsed ("H2-Aa" becomes H2 - Aa). How can I get around this? Is there a way to indicate that a string should not go through eval in aes_string?

Reproducible Input

If you need some input to play with, this fails like my data:

df <- data.frame(c(1,2), c(2,1), c(1,2), c(2,1))
colnames(df) <- c("x", "y", "Gapdh", "H2-Aa")

For my real data, I am using read.table(..., header=TRUE) and get column names with dashes because the raw data files have them.

回答1:

Normally R tries very hard to make sure you have column names in your data.frame that can be valid variable names. Using non-standard column names (those that are not valid variable names) will lead to problems when using functions that use non-standard evaluation type syntax. When focused to use such variable names you often have to wrap them in back ticks. In the normal case

ggplot(df, aes(x, y)) + 
  geom_point(aes(col = H2-Aa)) +
  scale_color_gradient()
# Error in FUN(X[[i]], ...) : object 'H2' not found

would return an error but

ggplot(df, aes(x, y)) + 
  geom_point(aes(col = `H2-Aa`)) +
  scale_color_gradient()

would work.

You can paste in backticks if you really want

geom_point(aes_string(col = paste0("`", gene, "`")))

or you could treat it as a symbol from the get-go and use aes_q instread

geom_point(aes_q(col = as.name(gene)))

The latest release of ggplot support escaping via !! rather than using aes_string or aes_q so you could do

geom_point(aes(col = !!rlang::sym(gene)))