How to reorder factor levels in a tidy way?

2020-02-28 06:59发布

问题:

Hi I usually use some code like the following to reorder bars in ggplot or other types of plots.

Normal plot (unordered)

library(tidyverse)
iris.tr <-iris %>% group_by(Species) %>% mutate(mSW = mean(Sepal.Width)) %>%
  select(mSW,Species) %>% 
  distinct()
ggplot(iris.tr,aes(x = Species,y = mSW, color = Species)) +
  geom_point(stat = "identity")

Ordering the factor + ordered plot

iris.tr$Species <- factor(iris.tr$Species,
                          levels = iris.tr[order(iris.tr$mSW),]$Species,
                          ordered = TRUE)
ggplot(iris.tr,aes(x = Species,y = mSW, color = Species)) + 
  geom_point(stat = "identity")

The factor line is extremely unpleasant to me and I wonder why arrange() or some other function can't simplify this. I am missing something?

Note:

This do not work but I would like to know if something like this exists in the tidyverse.

iris.tr <-iris %>% group_by(Species) %>% mutate(mSW = mean(Sepal.Width)) %>%
  select(mSW,Species) %>% 
  distinct() %>% 
  arrange(mSW)
ggplot(iris.tr,aes(x = Species,y = mSW, color = Species)) + 
  geom_point(stat = "identity")

回答1:

Using ‹forcats›:

iris.tr %>%
    mutate(Species = fct_reorder(Species, mSW)) %>%
    ggplot() +
    aes(Species, mSW, color = Species) +
    geom_point()


回答2:

Reordering the factor using base:

iris.ba = iris
iris.ba$Species = with(iris.ba, reorder(Species, Sepal.Width, mean))

Translating to dplyr:

iris.tr = iris %>% mutate(Species = reorder(Species, Sepal.Width, mean))

After that, you can continue on to summarize and plot as in your question.


A couple comments: reordering a factor is modifying a data column. The dplyr command to modify a data column is mutate. All arrange does is re-order rows, this has no effect on the levels of the factor and hence no effect on the order of a legend or axis in ggplot.

All factors have an order for their levels. The difference between an ordered = TRUE factor and a regular factor is how the contrasts are set up in a model. ordered = TRUE should only be used if your factor levels have a meaningful rank order, like "Low", "Medium", "High", and even then it only matters if you are building a model and don't want the default contrasts comparing everything to a reference level.



回答3:

If you happen to have a character vector to order, for example:

iris2 <- iris %>% 
    mutate(Species = as.character(Species)) %>% 
    group_by(Species) %>% 
    mutate(mSW = mean(Sepal.Width)) %>% 
    ungroup()

You can also order the factor level using the behavior of the forcats::as_factor function :

"Compared to base R, this function creates levels in the order in which they appear"

library(forcats)
iris2 %>% 
    arrange(mSW) %>%  
    mutate(Species = as_factor(Species)) %>%
    ggplot() +
    aes(Species, mSW, color = Species) +
    geom_point()