I saved my data in as a .csv
file with 12 columns. Columns two through 11 (labeled F1, F2, ..., F11
) are features
. Column one
contains the label
of these features either good
or bad
.
I would like to plot a boxplot
of all these 11 features against the label
, but separate by good
or bad
. My code so far is:
qplot(Label, F1, data=testData, geom = "boxplot", fill=Label,
binwidth=0.5, main="Test") + xlab("Label") + ylab("Features")
However, this only shows F1
against the label
.
My question is: How to show F2, F3, ..., F11
against the label
in one graph with some dodge position
? I have normalized the features so they are in the same scale within [0 1] range.
The test data can be found here. I have drawn something by hand to explain the problem (see below).
Since you don't mention a plot package , I propose here using
Lattice
version( I think there is more ggplot2 answers than lattice ones, at least since I am here in SO).I know this is a bit of an older question, but it is one I had as well, and while the accepted answers work, there is a way to do something similar without using additional packages like ggplot or lattice. It isn't quite as nice in that the boxplots overlap rather than showing side by side but:
This puts in two sets of boxplots, with the second having an outline (no fill) in red, and also puts the outliers in red. The nice thing is, it works for two different dataframes rather than trying to reshape them. Quick and dirty way.
You should get your data in a specific format by melting your data (see below for how melted data looks like) before you plot. Otherwise, what you have done seems to be okay.
Edit: I realise that you might need to facet. Here's an implementation of that as well:
Edit 2: How to add
x-labels
,y-labels
,title
, changelegend heading
, add ajitter
?Edit 3: How to align
geom_point()
points to the center of box-plot? It could be done usingposition_dodge
. This should work.In base R a formula interface with interactions (
:
) can be used to achieve this.Using base graphics, we can use
at =
to control box position , combined withboxwex =
for the width of the boxes. The 1stboxplot
statement creates a blank plot. Then add the 2 traces in the following two statements.Note that in the following, we use
df[,-1]
to exclude the 1st (id) column from the values to plot. With different data frames, it may be necessary to change this to subset for whichever columns contain the data you want to plot.ggplot version of the lattice plot:
Plot: