I am facing a difficulty while plotting a parallel coordinates plot using the ggparcoord
from the GGally package. As there are two categorical variables, what I want to show in the visualisation is like the image below. I've found that in ggparcoord
, groupColumn
is only allowed to a single variable to group (colour) by, and surely I can use showPoints to mark the values on the axes, but i also need to vary the shape of these markers according to the categorical variables. Is there other package that can help me to realise my idea?
Any response will be appreciated! Thanks!
It's not that difficult to roll your own parallel coordinates plot in ggplot2, which will give you the flexibility to customize the aesthetics. Below is an illustration using the built-in
diamonds
data frame.To get parallel coordinates, you need to add an
ID
column so you can identify each row of the data frame, which we'll use as agroup
aesthetic in ggplot. You also need toscale
the numeric values so that they'll all be on the same vertical scale when we plot them. Then you need to take all the columns that you want on the x-axis and reshape them to "long" format. We do all that on the fly below with thetidyverse/dplyr
pipe operator.Even after limiting the number of category combinations, the lines are probably too intertwined for this plot to be easily interpretable, so consider this merely a "proof of concept". Hopefully, you can create something more useful with your data. I've used
colour
(for the lines) andfill
(for the points) aesthetics below. You can useshape
orlinetype
instead, depending on your needs.I haven't used
ggparcoords
before, but the only option that seemed straightforward (at least on my first try with the function) was to paste together two columns of data. Below is an example. Even with just four category combinations, the plot is confusing, but maybe it will be interpretable if there are strong patterns in your data: