R: Legend color according to factor levels

2019-07-27 06:39发布

问题:

Inspired by this question where apparently the top answer is using an unsafe/erroneous way to add colors to a legend for a scatter plot.

Top answer suggests doing this:

data<-iris
plot(data$Sepal.Length, data$Sepal.Width, col=data$Species)
legend(7,4.3,unique(data$Species),col=1:length(data$Species),pch=1)

Comments suggest using levels() instead of unique() for controlling the text and colors in the call to legend(), but are unclear on why it would help. I would need a better explanation to trust that code.

How can I write code that guarantees proper coloring?

回答1:

A solution I've found is:

data <- iris
# Create a translation table that couple species to color
colorcode = data.frame(
  cbind(colorsMy = c("red", "green", "blue"), species = levels(data$Species)),
  stringsAsFactors = F)
# Make vector with colors for the different points in the scatter
iriscolors = sapply(data$Species,  # Species to colors translation acc to colorcode
                    function(x) colorcode$colorsMy[colorcode$species == x])
# Plot the scatter using the color vector constructed according the colorcode
plot(data$Sepal.Length, data$Sepal.Width, col = iriscolors, pch = 19)
# Since iriscolors according to colorcode, I can use colorcode for the legend
legend("bottomright", legend = colorcode$species, fill = colorcode$colorsMy)

This code is a bit bulky, but easy to follow and explicitly constructs correct color labeling in the legend. The "trick" is to create the colorcode variable that serves as a translation table between levels of the factor (iris species in this case) and colors for the legend.