可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have the code that creates a boxplot, using ggplot in R, I want to label my outliers with the year and Battle.
Here is my code to create my boxplot
require(ggplot2)
ggplot(seabattle, aes(x=PortugesOutcome,y=RatioPort2Dutch ),xlim="OutCome",
y="Ratio of Portuguese to Dutch/British ships") +
geom_boxplot(outlier.size=2,outlier.colour="green") +
stat_summary(fun.y="mean", geom = "point", shape=23, size =3, fill="pink") +
ggtitle("Portugese Sea Battles")
Can anyone help? I knew this is correct, I just want to label the outliers.
回答1:
The following is a reproducible solution that uses dplyr
and the built-in mtcars
dataset.
Walking through the code: First, create a function, is_outlier
that will return a boolean TRUE/FALSE
if the value passed to it is an outlier. We then perform the "analysis/checking" and plot the data -- first we group_by
our variable (cyl
in this example, in your example, this would be PortugesOutcome
) and we add a variable outlier
in the call to mutate
(if the drat
variable is an outlier [note this corresponds to RatioPort2Dutch
in your example], we will pass the drat
value, otherwise we will return NA
so that value is not plotted). Finally, we plot the results and plot the text values via geom_text
and an aesthetic label equal to our new variable; in addition, we offset the text (slide it a bit to the right) with hjust
so that we can see the values next to, rather than on top of, the outlier points.
library(dplyr)
library(ggplot2)
is_outlier <- function(x) {
return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}
mtcars %>%
group_by(cyl) %>%
mutate(outlier = ifelse(is_outlier(drat), drat, as.numeric(NA))) %>%
ggplot(., aes(x = factor(cyl), y = drat)) +
geom_boxplot() +
geom_text(aes(label = outlier), na.rm = TRUE, hjust = -0.3)
回答2:
To label the outliers with rownames (based on JasonAizkalns answer)
library(dplyr)
library(ggplot2)
library(tibble)
is_outlier <- function(x) {
return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}
dat <- mtcars %>% tibble::rownames_to_column(var="outlier") %>% group_by(cyl) %>% mutate(is_outlier=ifelse(is_outlier(drat), drat, as.numeric(NA)))
dat$outlier[which(is.na(dat$is_outlier))] <- as.numeric(NA)
ggplot(dat, aes(y=drat, x=factor(cyl))) + geom_boxplot() + geom_text(aes(label=outlier),na.rm=TRUE,nudge_y=0.05)
回答3:
Does this work for you?
library(ggplot2)
library(data.table)
#generate some data
set.seed(123)
n=500
dat <- data.table(group=c("A","B"),value=rnorm(n))
ggplot defines an outlier by default as something that's > 1.5*IQR from the borders of the box.
#function that takes in vector of data and a coefficient,
#returns boolean vector if a certain point is an outlier or not
check_outlier <- function(v, coef=1.5){
quantiles <- quantile(v,probs=c(0.25,0.75))
IQR <- quantiles[2]-quantiles[1]
res <- v < (quantiles[1]-coef*IQR)|v > (quantiles[2]+coef*IQR)
return(res)
}
#apply this to our data
dat[,outlier:=check_outlier(value),by=group]
dat[,label:=ifelse(outlier,"label","")]
#plot
ggplot(dat,aes(x=group,y=value))+geom_boxplot()+geom_text(aes(label=label),hjust=-0.3)
回答4:
Similar answer to above, but gets outliers directly from ggplot2
, thus avoiding any potential conflict in method:
# calculate boxplot object
g <- ggplot(mtcars, aes(factor(cyl), drat)) + geom_boxplot()
# get list of outliers
out <- ggplot_build(g)[["data"]][[1]][["outliers"]]
# label list elements with factor levels
names(out) <- levels(factor(mtcars$cyl))
# convert to tidy data
tidyout <- purrr::map_df(out, tibble::as_tibble, .id = "cyl")
# plot boxplots with labels
g + geom_text(data = tidyout, aes(cyl, value, label = value),
hjust = -.3)
回答5:
You can do this simply within ggplot
itself, using an appropriate stat_summary
call.
ggplot(mtcars, aes(x = factor(cyl), y = drat, fill = factor(cyl))) +
geom_boxplot() +
stat_summary(
aes(label = round(stat(y), 1)),
geom = "text",
fun.y = function(y) { o <- boxplot.stats(y)$out; if(length(o) == 0) NA else o },
hjust = -1
)
回答6:
With a small twist on @JasonAizkalns solution you can label outliers with their location in your data frame.
mtcars[,'row'] <- row(mtcars)[,1]
...
mutate(outlier = ifelse(is_outlier(drat), row, as.numeric(NA)))
...
I load the data frame into the R Studio Environment, so I can then take a closer look at the data in outlier rows.