I have a function to generate scatter plots from data, where an argument is provided to select which column to use for coloring the points. Here is a simplified version:
library(ggplot2)
plot_gene <- function (df, gene) {
ggplot(df, aes(x, y)) +
geom_point(aes_string(col = gene)) +
scale_color_gradient()
}
where df
is a data.frame with columns x
, y
, and then a bunch of gene names. This works fine for most gene names; however, some have dashes and these fail:
print(plot_gene(df, "Gapdh")) # great!
print(plot_gene(df, "H2-Aa")) # Error: object "H2" not found
It appears the gene
variable is getting parsed ("H2-Aa"
becomes H2 - Aa
). How can I get around this? Is there a way to indicate that a string should not go through eval
in aes_string
?
Reproducible Input
If you need some input to play with, this fails like my data:
df <- data.frame(c(1,2), c(2,1), c(1,2), c(2,1))
colnames(df) <- c("x", "y", "Gapdh", "H2-Aa")
For my real data, I am using read.table(..., header=TRUE)
and get column names with dashes because the raw data files have them.
Normally R tries very hard to make sure you have column names in your data.frame that can be valid variable names. Using non-standard column names (those that are not valid variable names) will lead to problems when using functions that use non-standard evaluation type syntax. When focused to use such variable names you often have to wrap them in back ticks. In the normal case
would return an error but
would work.
You can paste in backticks if you really want
or you could treat it as a symbol from the get-go and use
aes_q
instreadThe latest release of
ggplot
support escaping via!!
rather than usingaes_string
oraes_q
so you could do