I have a graph with 12 variables divided into two groups. I can't use facets, but using colour and shape, I have been able to make the visualization easy to understand. However, there are some points that overlap (partially or wholly). I am using jitter to deal with these, but as you can see from the attached graph, this leads to all points being moved around, not just those with overlap.
Is there a way to use jitter or dodge conditionally? Even better, is there a way to put the partially overlapping points side-by-side? As you can see, my x-axis is discrete categories, and a slight shift to left/right won't matter. I tried using dotplot with binaxis='y'
, but that completely spoils the x-axis.
Edit: This graph has managed to do exactly what I am searching for.
Further edit: Adding the code behind this visualization.
disciplines <- c("Comp. Sc.\n(17.2%)", "Physics\n(19.6%)", "Maths\n(29.4%)", "Pol.Sc.\n(40.4%)", "Psychology\n(69.8%)")
# To stop ggplot from imposing alphabetical ordering on x-axis
disciplines <- factor(disciplines, levels=disciplines, ordered=T)
# involved aspects
intensive <- c( 0.660, 0.438, 0.515, 0.028, 0.443)
comparative <- c( 0.361, 0.928, 0.270, 0.285, 0.311)
wh_adverbs <- c( 0.431, 0.454, 0.069, 0.330, 0.577)
past_tense <- c(0.334, 0.229, 0.668, 0.566, 0.838)
present_tense <- c(0.680, 0.408, 0.432, 0.009, 0.996)
conjunctions <- c( 0.928, 0.207, 0.162, -0.299, -0.045)
personal <- c(0.498, 0.521, 0.332, 0.01, 0.01)
interrogative <- c(0.266, 0.202, 0.236, 0.02, 0.02)
sbj_objective <- c(0.913, 0.755, 0.863, 0.803, 0.913)
possessive <- c(0.896, 0.802, 0.960, 0.611, 0.994)
thrd_person <- c(-0.244, -0.265, -0.310, -0.008, -0.384)
nouns <- c(-0.602, -0.519, -0.388, -0.244, -0.196)
df1 <- data.frame(disciplines,
"Intensive Adverbs"=intensive,
"Comparative Adverbs"=comparative,
"Wh-adverbs (WRB)"=wh_adverbs,
"Verb: Past Tense"=past_tense,
"Verb: Present Tense"=present_tense,
"Conjunctions"=conjunctions,
"Personal Pronouns"=personal,
"Interrogative Pronouns"=interrogative,
"Subjective/Objective Pronouns"=sbj_objective,
"Possessive Pronouns"=possessive,
"3rd-person verbs"=thrd_person,
"Nouns"=nouns,
check.names=F)
df1.m <- melt(df1)
grp <- ifelse(df1.m$variable %in% c('3rd-person verbs','Nouns'), 'Informational Features', 'Involved Features')
g <- ggplot(df1.m, aes(group=grp, disciplines, value, shape=grp, colour=variable))
g <- g + geom_hline(yintercept=0, size=9, color="white")
g <- g + geom_smooth(method=loess, span=0.75, level=0.95, alpha=I(0.16), linetype="dashed")
g <- g + geom_point(size=4, alpha=I(0.7), position=position_jitter(width=0.1, height=0))
g <- g + scale_shape_manual(values=c(17,19))
I am curious what others might suggest, but to get the side-by-side effect, you could code the major x-axis categories as numbers (10, 20,..50) plus/minus a small amount like (0..10)/2 based on the categories you are using for color. So you could get the x-axis as 9.6, 9.8, 10.0, 10.2 ... and then 20.0, 20.2, 20.4. This could create an organized plot instead of assigning those fractional adjustments randomly.
Here is a quick implementation of that idea for your data-set. It offsets the main x variable
disciplines
by one sixth of the sub-categoryvariable
and uses that without jitter for the x value...Note that the values within each category occur evenly spaced across and in the same order. (This code doesn't include all the curve fitting, etc that is shown in the figure).
Variation: You can see the effect even more clearly if you "quantize" your y values, so more of them plot side by side.
Then use
valmod
in place ofvalue
in theaes()
statement to see the effect.To get the category labels back, manually set with
scale_x_discrete
. This version uses a differentScaleFactor
for broader spacing and the quantized y axis: