I have the code that creates a boxplot, using ggplot in R, I want to label my outliers with the year and Battle.
Here is my code to create my boxplot
require(ggplot2)
ggplot(seabattle, aes(x=PortugesOutcome,y=RatioPort2Dutch ),xlim="OutCome",
y="Ratio of Portuguese to Dutch/British ships") +
geom_boxplot(outlier.size=2,outlier.colour="green") +
stat_summary(fun.y="mean", geom = "point", shape=23, size =3, fill="pink") +
ggtitle("Portugese Sea Battles")
Can anyone help? I knew this is correct, I just want to label the outliers.
Similar answer to above, but gets outliers directly from
ggplot2
, thus avoiding any potential conflict in method:Does this work for you?
ggplot defines an outlier by default as something that's > 1.5*IQR from the borders of the box.
To label the outliers with rownames (based on JasonAizkalns answer)
The following is a reproducible solution that uses
dplyr
and the built-inmtcars
dataset.Walking through the code: First, create a function,
is_outlier
that will return a booleanTRUE/FALSE
if the value passed to it is an outlier. We then perform the "analysis/checking" and plot the data -- first wegroup_by
our variable (cyl
in this example, in your example, this would bePortugesOutcome
) and we add a variableoutlier
in the call tomutate
(if thedrat
variable is an outlier [note this corresponds toRatioPort2Dutch
in your example], we will pass thedrat
value, otherwise we will returnNA
so that value is not plotted). Finally, we plot the results and plot the text values viageom_text
and an aesthetic label equal to our new variable; in addition, we offset the text (slide it a bit to the right) withhjust
so that we can see the values next to, rather than on top of, the outlier points.You can do this simply within
ggplot
itself, using an appropriatestat_summary
call.With a small twist on @JasonAizkalns solution you can label outliers with their location in your data frame.
I load the data frame into the R Studio Environment, so I can then take a closer look at the data in outlier rows.