Force ggplot2 to apply a separate legend for each

2020-03-31 05:49发布

问题:

In the chart below, the triangles mapped using geom_point are both in the same legend. Essentially, I would like each geom_ to have their own separate legend instead. How would I go about doing so?

Here's my code for reproducibility:

mydf <- data.frame(year = c(rep(2000, 3), rep(2002, 3), rep(2004, 3), rep(2006, 3), rep(2008, 3), rep(2010, 3), rep(2012, 3), rep(2014, 3), rep(2016, 3)),
                 answer = rep(c("A great deal", "Hardly any", "Only some"), 9),
                 result = c(0.3015940, 0.1399303, 0.5584757, 0.2269548, 0.1792754, 0.5937698, 0.2955301, 0.1309859, 0.5734840, 0.3008197, 0.1344499,
                            0.5647303, 0.1919454, 0.2026290, 0.6054256, 0.1059793, 0.4190533, 0.4749674, 0.1190636, 0.3631279, 0.5178085, 0.1518314,
                            0.3181203, 0.5300483, 0.1424715, 0.3094615, 0.5480669))
mydf$year <- factor(mydf$year)
mydf$answer <- factor(mydf$answer)
triangle_up <- data.frame(year = c(2004, 2008, 2010),
                        direction = c("A great deal", "Hardly any", "Hardly any"),
                        result = c(0.2955301, 0.2026290, 0.4190533))
triangle_up$year <- factor(triangle_up$year)
triangle_up$direction <- factor(triangle_up$direction)
triangle_down <- data.frame(year = c(2002, 2008, 2010, 2010, 2012),
                            direction = c(rep("A great deal", 3), "Only some", "Hardly any"),
                            result = c(0.2269548, 0.1919454, 0.1059793, 0.4749674, 0.3631279))
triangle_down$year <- factor(triangle_down$year)
triangle_down$direction <- factor(triangle_down$direction)

ggplot(mydf, aes(x = year, y = result)) + geom_line(aes(colour = answer, group = answer)) +
geom_point(data = triangle_up, aes(x = year, y = result, group = direction, fill = direction), shape = 24, size = 3) +
geom_point(data = triangle_down, aes(x = year, y = result, group = direction, fill = direction), shape = 25, size = 3)

回答1:

The issue here is the use of multiple data frames to store values. It would be better to add a column to the mydf data frame to store the direction variable. Confusingly, you have used answer as the variable name in mydf, but direction to store the same values in the other data frames.

So here is the new mydf with values "up", "down" or NA in the direction column:

mydf <- structure(structure(list(year = c(2000, 2000, 2000, 2002, 2002, 2002, 2004, 
2004, 2004, 2006, 2006, 2006, 2008, 2008, 2008, 2010, 2010, 2010, 
2012, 2012, 2012, 2014, 2014, 2014, 2016, 2016, 2016), answer = structure(c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("A great deal", 
"Hardly any", "Only some"), class = "factor"), result = c(0.301594, 
0.1399303, 0.5584757, 0.2269548, 0.1792754, 0.5937698, 0.2955301, 
0.1309859, 0.573484, 0.3008197, 0.1344499, 0.5647303, 0.1919454, 
0.202629, 0.6054256, 0.1059793, 0.4190533, 0.4749674, 0.1190636, 
0.3631279, 0.5178085, 0.1518314, 0.3181203, 0.5300483, 0.1424715, 
0.3094615, 0.5480669), direction = c(NA, NA, NA, "down", NA, 
NA, "up", NA, NA, NA, NA, NA, "down", "up", NA, "down", "up", 
"down", NA, "down", NA, NA, NA, NA, NA, NA, NA)), .Names = c("year", 
"answer", "result", "direction"), row.names = c(NA, -27L), class = 
"data.frame"))

Now you can plot with separate legends for direction and answer. Shapes are specified manually using scale_shape_manual, using breaks to omit the NA values. For line colour, we use scale_color_manual and override the legend mapping so as only lines, not shapes, are shown.

ggplot(mydf, aes(year, result)) + 
  geom_line(aes(group = answer, color = answer)) + 
  geom_point(aes(shape = direction, fill = answer), size = 3) + 
  scale_shape_manual(values = c(25, 24), breaks = c("down", "up")) + 
  scale_color_manual(values = c("red", "green", "blue"),
                     guide = guide_legend(override.aes = list(shape = rep(NA, 3)))) +
  theme_light()



回答2:

This answer has demonstrated how to create the chart if the plot data are supplied in the proper format. However, this is only the second part of the story.

The answer below tries to suggest a way how the OP can create the plot data from his original data.frames mydf, triangle_up, and triangle_down. The goal is to have all plot data in one data.frame.

Original data

as provided by OP but without factorization (this will be done later when needed):

mydf <- data.frame(year = c(rep(2000, 3), rep(2002, 3), rep(2004, 3), rep(2006, 3), rep(2008, 3), rep(2010, 3), rep(2012, 3), rep(2014, 3), rep(2016, 3)),
                 answer = rep(c("A great deal", "Hardly any", "Only some"), 9),
                 result = c(0.3015940, 0.1399303, 0.5584757, 0.2269548, 0.1792754, 0.5937698, 0.2955301, 0.1309859, 0.5734840, 0.3008197, 0.1344499,
                            0.5647303, 0.1919454, 0.2026290, 0.6054256, 0.1059793, 0.4190533, 0.4749674, 0.1190636, 0.3631279, 0.5178085, 0.1518314,
                            0.3181203, 0.5300483, 0.1424715, 0.3094615, 0.5480669))
triangle_up <- data.frame(year = c(2004, 2008, 2010),
                        direction = c("A great deal", "Hardly any", "Hardly any"),
                        result = c(0.2955301, 0.2026290, 0.4190533))
triangle_down <- data.frame(year = c(2002, 2008, 2010, 2010, 2012),
                            direction = c(rep("A great deal", 3), "Only some", "Hardly any"),
                            result = c(0.2269548, 0.1919454, 0.1059793, 0.4749674, 0.3631279))

triangle_up
#  year    direction    result
#1 2004 A great deal 0.2955301
#2 2008   Hardly any 0.2026290
#3 2010   Hardly any 0.4190533

triangle_down
#  year    direction    result
#1 2002 A great deal 0.2269548
#2 2008 A great deal 0.1919454
#3 2010 A great deal 0.1059793
#4 2010    Only some 0.4749674
#5 2012   Hardly any 0.3631279

Apparently, triangle_up and triangle_down are rows selected by some means from the master data.frame mydf containing data points which should be emphasized in the chart by being plotted using special symbols. Note that the OP has decided to rename the answer column to direction.

Preparing and joining the data

data.table is used for these tasks, here.

library(data.table)   # CRAN version 1.10.4 used

# combine separate triangle data.frames into one data.table,
# thereby adding an id column to indicate origin of each row
triangles <- rbindlist(list(triangle_up, triangle_down), idcol = "arrow")
# turn id col into factor, giving meaningful labels
triangles[, arrow := factor(arrow, labels = c("up", "down"))]

triangles
#   arrow year    direction    result
#1:    up 2004 A great deal 0.2955301
#2:    up 2008   Hardly any 0.2026290
#3:    up 2010   Hardly any 0.4190533
#4:  down 2002 A great deal 0.2269548
#5:  down 2008 A great deal 0.1919454
#6:  down 2010 A great deal 0.1059793
#7:  down 2010    Only some 0.4749674
#8:  down 2012   Hardly any 0.3631279

# rename direction column to make join easier
setnames(triangles, "direction", "answer")

# coerce mydf to data.table and right join with triangles, i.e.,
# result contains all rows of mydf
mydf2 <- triangles[setDT(mydf), on = .(year, answer, result)]
# turn year into factor to have ggplot using a discrete scale
mydf2[, year := factor(year)]

Finally, all data required for plotting are combined in one data.table

mydf2
#    arrow year       answer    result
# 1:    NA 2000 A great deal 0.3015940
# 2:    NA 2000   Hardly any 0.1399303
# 3:    NA 2000    Only some 0.5584757
# 4:  down 2002 A great deal 0.2269548
# 5:    NA 2002   Hardly any 0.1792754
# 6:    NA 2002    Only some 0.5937698
# 7:    up 2004 A great deal 0.2955301
# 8:    NA 2004   Hardly any 0.1309859
# ...
#19:    NA 2012 A great deal 0.1190636
#20:  down 2012   Hardly any 0.3631279
#21:    NA 2012    Only some 0.5178085
#22:    NA 2014 A great deal 0.1518314
#23:    NA 2014   Hardly any 0.3181203
#24:    NA 2014    Only some 0.5300483
#25:    NA 2016 A great deal 0.1424715
#26:    NA 2016   Hardly any 0.3094615
#27:    NA 2016    Only some 0.5480669
#    arrow year       answer    result

Plotting

To plot the data, the code from this answer can be used with two modifications:

ggplot(mydf2, aes(year, result)) + 
  geom_line(aes(group = answer, color = answer)) + 
  geom_point(aes(shape = arrow, fill = answer), size = 3) + 
  scale_shape_manual(values = c(25, 24), breaks = c("down", "up")) + 
  scale_color_manual(values = c("red", "green", "blue"),
                     guide = guide_legend(override.aes = list(shape = rep(NA, 3)))) +
  theme_light()


标签: r ggplot2