I am making a scatter plot of two variables and would like to colour the points by a factor variable. Here is some reproducible code:
data <- iris
plot(data$Sepal.Length, data$Sepal.Width, col=data$Species)
This is all well and good but how do I know what factor has been coloured what colour??
The command
palette
tells you the colours and their order whencol = somefactor
. It can also be used to set the colours as well.In order to see that in your graph you could use a legend.
You'll notice that I only specified the new colours with 3 numbers. This will work like using a factor. I could have used the factor originally used to colour the points as well. This would make everything logically flow together... but I just wanted to show you can use a variety of things.
You could also be specific about the colours. Try
?rainbow
for starters and go from there. You can specify your own or have R do it for you. As long as you use the same method for each you're OK.The
lattice
library is another good option. Here I've added a legend on the right side and jittered the points because some of them overlapped.There are two ways that I know of to color plot points by factor and then also have a corresponding legend automatically generated. I'll give examples of both:
colorRampPallete
function (trickier, but many people prefer/need R's built-in plotting facilities)For both examples, I will use the ggplot2 diamonds dataset. We'll be using the numeric columns
diamond$carat
anddiamond$price
, and the factor/categorical columndiamond$color
. You can load the dataset with the following code if you have ggplot2 installed:Using ggplot2 and qplot
It's a one liner. Key item here is to give
qplot
the factor you want to color by as thecolor
argument.qplot
will make a legend for you by default.Your output should look like this:
Using R's built in plot functionality
Using R's built in plot functionality to get a plot colored by a factor and an associated legend is a 4-step process, and it's a little more technical than using ggplot2.
First, we will make a
colorRampPallete
function.colorRampPallete()
returns a new function that will generate a list of colors. In the snippet below, callingcolor_pallet_function(5)
would return a list of 5 colors on a scale from red to orange to blue:Second, we need to make a list of colors, with exactly one color per diamond color. This is the mapping we will use both to assign colors to individual plot points, and to create our legend.
Third, we create our plot. This is done just like any other plot you've likely done, except we refer to the list of colors we made as our
col
argument. As long as we always use this same list, our mapping between colors anddiamond$colors
will be consistent across our R script.Fourth and finally, we add our legend so that someone reading our graph can clearly see the mapping between the plot point colors and the actual diamond colors.
Your output should look like this:
Nifty, right?
should do it for you. But I prefer
ggplot2
and would suggest that for better graphics in R.Like Maiasaura, I prefer
ggplot2
. The transparent reference manual is one of the reasons. However, this is one quick way to get it done.And cause someone famous said, plot related posts are not complete without the plot, here's the result:
Here's a couple of references: qplot.R example, note basically this uses the same diamond dataset I use, but crops the data before to get better performance.
http://ggplot2.org/book/ the manual: http://docs.ggplot2.org/current/
The
col
argument in theplot
function assign colors automatically to a vector of integers. If you convertiris$Species
to numeric, notice you have a vector of 1,2 and 3s So you can apply this as:Suppose you want red, blue and green instead of the default colors, then you can simply adjust it:
You can probably see how to further modify the code above to get any unique combination of colors.