I am trying to place labels of observations counts at the ends of boxplot whiskers, but it doesn't seem to work when there are outliers.
I have attempted to compare the max/min values with what I believe is the calculated whisker length [quartile 1 (or quartile 3) + (or -) 1.5 * interquartile range]. But the labels get placed at neither the max/min value or the whisker end.
Example using mtcars
and y-axis reversed to demonstrate:
library(ggplot2,dplyr)
mtcars %>%
select(qsec, cyl,am) %>%
ggplot(aes(factor(cyl),qsec,fill=factor(am))) +
stat_boxplot(geom = "errorbar") + ## Draw horizontal lines across ends of whiskers
geom_boxplot(outlier.shape=1, outlier.size=3,
position = position_dodge(width = 0.75)) +
scale_y_reverse() +
geom_text(data = mtcars %>%
select(qsec,cyl,am) %>%
group_by(cyl, am) %>%
summarize(min_qsec = min(qsec),Count = n(),med = median(qsec),
q1 = quantile(qsec,0.25),
q3 = quantile(qsec,0.75), iqr = IQR(qsec),
qsec = mean(qsec),
lab_pos = max(min_qsec, q1-1.5*iqr)),
aes(y=lab_pos,label = Count), position = position_dodge(width = 0.75))
Which produces:
The labels for am(1)
at cyl(4)
and am(0)
at cyl(8)
are misaligned.
Is my calculation for lab_pos
incorrect or is there a better approach to position labels at the whisker ends, regardless of outliers? I would like to accomplish it using ggplot2
and dplyr
, if possible
If I understand correctly, this is what you want:
The whiskers extend to the furthest point within the fence, not the fence itself.