when to use factor() when plotting with ggplot in

2020-07-10 03:23发布

When do you normally use factor to color/size encode variables in ggplot2 in R? Example:

ggplot(mtcars) + geom_point(aes(x=mpg, y=drat, colour=gear))

versus:

ggplot(mtcars) + geom_point(aes(x=mpg, y=drat, colour=factor(gear)))

Is the general rule to use factor when the variable being used to determine the shape/size/colour is discrete, and not continuous? Or is there another use of factor in this context? It seems like the first command can be made like the second with the right legend, even without factor. thanks.

edit: I get this when I use the colour=gear: enter image description here

标签: r ggplot2
1条回答
欢心
2楼-- · 2020-07-10 03:54

The issue isn't the legend, it's the choice of colors. When it is not a factor, the points are different shades of the same hue:

ggplot(mtcars) + geom_point(aes(x=mpg, y=drat, colour=gear))

enter image description here

This communicates a continuum of points, and it's thus not ideal for a set of separate possibilities. (Indeed, once you get to five or six possibilities the colors can be hard to distinguish from each other).

When gears is treated like a factor, the colors are chosen to be distinguishable:

ggplot(mtcars) + geom_point(aes(x=mpg, y=drat, colour=factor(gear)))

enter image description here

Note that if you're not getting a gradient plot when not using factor, you should try upgrading to a more recent version of ggplot2.

查看更多
登录 后发表回答