Conditional use of jitter in ggplot2 with geom_poi

I have a graph with 12 variables divided into two groups. I can't use facets, but using colour and shape, I have been able to make the visualization easy to understand. However, there are some points that overlap (partially or wholly). I am using jitter to deal with these, but as you can see from the attached graph, this leads to all points being moved around, not just those with overlap. enter image description here

Is there a way to use jitter or dodge conditionally? Even better, is there a way to put the partially overlapping points side-by-side? As you can see, my x-axis is discrete categories, and a slight shift to left/right won't matter. I tried using dotplot with binaxis='y', but that completely spoils the x-axis.

Edit: This graph has managed to do exactly what I am searching for.

Further edit: Adding the code behind this visualization.

disciplines <- c("Comp. Sc.\n(17.2%)", "Physics\n(19.6%)", "Maths\n(29.4%)", "Pol.Sc.\n(40.4%)", "Psychology\n(69.8%)")

# To stop ggplot from imposing alphabetical ordering on x-axis
disciplines <- factor(disciplines, levels=disciplines, ordered=T)

# involved aspects
intensive   <- c( 0.660,  0.438,  0.515,  0.028,  0.443)
comparative <- c( 0.361,  0.928,  0.270,  0.285,  0.311)
wh_adverbs  <- c( 0.431,  0.454,  0.069,  0.330,  0.577)
past_tense    <- c(0.334, 0.229, 0.668, 0.566, 0.838)
present_tense <- c(0.680, 0.408, 0.432, 0.009, 0.996)
conjunctions <- c( 0.928,  0.207,  0.162, -0.299, -0.045)
personal      <- c(0.498, 0.521, 0.332, 0.01, 0.01)
interrogative <- c(0.266, 0.202, 0.236, 0.02, 0.02)
sbj_objective <- c(0.913, 0.755, 0.863, 0.803, 0.913)
possessive    <- c(0.896, 0.802, 0.960, 0.611, 0.994)
thrd_person <- c(-0.244, -0.265, -0.310, -0.008, -0.384)
nouns       <- c(-0.602, -0.519, -0.388, -0.244, -0.196)

df1 <- data.frame(disciplines,
                 "Intensive Adverbs"=intensive,
                 "Comparative Adverbs"=comparative,
                 "Wh-adverbs (WRB)"=wh_adverbs,
                 "Verb: Past Tense"=past_tense,
                 "Verb: Present Tense"=present_tense,
                 "Conjunctions"=conjunctions,
                 "Personal Pronouns"=personal,
                 "Interrogative Pronouns"=interrogative,
                 "Subjective/Objective Pronouns"=sbj_objective,
                 "Possessive Pronouns"=possessive,
                 "3rd-person verbs"=thrd_person,
                 "Nouns"=nouns,
                 check.names=F)

df1.m <- melt(df1)
grp <- ifelse(df1.m$variable %in% c('3rd-person verbs','Nouns'), 'Informational Features', 'Involved Features')
g <- ggplot(df1.m, aes(group=grp, disciplines, value, shape=grp, colour=variable))
g <- g + geom_hline(yintercept=0, size=9, color="white")
g <- g + geom_smooth(method=loess, span=0.75, level=0.95, alpha=I(0.16), linetype="dashed")
g <- g + geom_point(size=4,  alpha=I(0.7), position=position_jitter(width=0.1, height=0))
g <- g + scale_shape_manual(values=c(17,19))

标签： r plot ggplot2 visualization

1条回答

Lonely孤独者°

2楼-- · 2019-06-23 15:29

I am curious what others might suggest, but to get the side-by-side effect, you could code the major x-axis categories as numbers (10, 20,..50) plus/minus a small amount like (0..10)/2 based on the categories you are using for color. So you could get the x-axis as 9.6, 9.8, 10.0, 10.2 ... and then 20.0, 20.2, 20.4. This could create an organized plot instead of assigning those fractional adjustments randomly.

Here is a quick implementation of that idea for your data-set. It offsets the main x variable disciplines by one sixth of the sub-category variable and uses that without jitter for the x value...

M = df1.m
ScaleFactor = 6
xadj = as.numeric(M$variable)/ScaleFactor
xadj = xadj - mean(xadj)   # shift it to center around zero
x10  = as.numeric(M$disciplines) * 10
M$x = x10 + xadj
g = ggplot(M, aes(group=grp, x, value, shape=grp, colour=variable)) 
g + geom_point(size=4,alpha=I(0.7)) + scale_x_discrete(breaks=x10,labels=disciplines)

Note that the values within each category occur evenly spaced across and in the same order. (This code doesn't include all the curve fitting, etc that is shown in the figure).

enter image description here

Variation: You can see the effect even more clearly if you "quantize" your y values, so more of them plot side by side.

M$valmod = M$value - M$value %% 0.2 + .1

Then use valmod in place of value in the aes() statement to see the effect.

To get the category labels back, manually set with scale_x_discrete. This version uses a different ScaleFactor for broader spacing and the quantized y axis:

M=df1.m
ScaleFactor = 3
# Note this could just be xadj instead of adding to data frame
M$xadj = as.numeric(M$variable)/ScaleFactor
M$xadj = M$xadj - mean(M$xadj)   # shift it to center around zero
M$x10  = as.numeric(M$disciplines) * 10
M$x = M$x10 + M$xadj

Qfact = 0.2  # resolution to quantize y values
M$valmod = M$value - M$value %% Qfact + Qfact/2  # clump y to given resolution

g = ggplot(M, aes(group=grp, x, valmod, shape=grp, colour=variable)) +
    scale_x_discrete(limits = M$x10, breaks=unique(M$x10),labels=levels(M$disciplines))
g + geom_point(size=3,alpha=I(0.7))

quantized

0人赞添加讨论(0) 举报

Conditional use of jitter in ggplot2 with geom_poi

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间