Plotting top n and other in a bar chart

2020-04-22 10:48发布

问题:

I am working with a WW2 ally bombings dataset from Kaggle, which has a categorical variable with five levels. I am trying to plot the two highest bombers (USA and Great Britain) and group the others (South Africa, New Zealand, Australia) onto a bar chart.

How can I aggregate the smallest groups into a group "Other"?

Current Approach

I have a bit of a workaround going to get the top two plotted using xlim. I am wondering if there a simple way in R to group the residuals and plot them as 'other'?

ggplot(data = operations) +
geom_bar(mapping = aes(x = Country, fill = Country)) + xlim('USA', 'GREAT 
         BRITAIN')+
ggtitle("Allied Bombings") +
    xlab("Country") + ylab("Bombs Dropped") +
    theme(plot.title = element_text(hjust = 0.5))+
    theme(panel.background = element_rect(fill = 'transparent', colour = NA))

回答1:

You can use the fct_lump function from the package forcats, which aggregates together least/most common factor levels into "other".

Here is a reproducible example. Only the two largest groups are retained in this example:

df <- data.frame(group =rep(LETTERS[1:9], times = c(40, 10, 5, 27, 1, 1, 1, 1, 1)))

library(forcats)
df$groupLump <- fct_lump(df$group, 2)

library(ggplot2)
ggplot(df) +
  geom_bar(aes(x = groupLump, fill = groupLump))



标签: r ggplot2