When do you normally use factor
to color/size encode variables in ggplot2 in R? Example:
ggplot(mtcars) + geom_point(aes(x=mpg, y=drat, colour=gear))
versus:
ggplot(mtcars) + geom_point(aes(x=mpg, y=drat, colour=factor(gear)))
Is the general rule to use factor
when the variable being used to determine the shape/size/colour is discrete, and not continuous? Or is there another use of factor
in this context? It seems like the first command can be made like the second with the right legend, even without factor
. thanks.
edit: I get this when I use the colour=gear
:
The issue isn't the legend, it's the choice of colors. When it is not a factor, the points are different shades of the same hue:
This communicates a continuum of points, and it's thus not ideal for a set of separate possibilities. (Indeed, once you get to five or six possibilities the colors can be hard to distinguish from each other).
When
gears
is treated like a factor, the colors are chosen to be distinguishable:Note that if you're not getting a gradient plot when not using
factor
, you should try upgrading to a more recent version ofggplot2
.