I am creating fluctuation diagrams in ggplot2 using geom_tile
, and would like to add a legend for size. I am at a loss as to how to do so. Here is a MWE:
library(dplyr)
library(ggplot2)
# create data frame of total number of passengers in each Sex-Age group
df <- data.frame(Titanic) %>% group_by(Sex, Age) %>%
summarise (freq = sum(Freq))
# calculate the lengths of the sides of the tiles so the largest has
# area = 1 and the others are smaller proportional to frequency
df$tileside <- sqrt(df$freq / max(df$freq))
df
## Source: local data frame [4 x 4]
## Groups: Sex [?]
##
## Sex Age freq tileside
## (fctr) (fctr) (dbl) (dbl)
## 1 Male Child 64 0.1959396
## 2 Male Adult 1667 1.0000000
## 3 Female Child 45 0.1643003
## 4 Female Adult 425 0.5049248
# using geom_tile, no size legend
ggplot(df, aes(x = Sex, y = Age,
height = tileside, width = tileside)) +
geom_tile() + coord_fixed (ratio = 1)
I should mention that one alternative would be to use geom_point instead of geom_tile (see this post: https://stats.stackexchange.com/questions/56322/graph-for-relationship-between-two-ordinal-variables/56357#56357)
Here is a MWE of this approach:
ggplot(df, aes(x = Sex, y = Age, size = freq)) +
geom_point(shape = 15) + coord_fixed (ratio = 1)
The problem is that the squares are too small, and if I rescale them, with scale_size(), I lose the most important feature of the fluctation diagram–that the area of the squares is proportional to frequency. (I’m not sure if this condition is met even without rescaling – it’s hard to tell how the area is calculated).
Many thanks for your help.