Geom_area order in ggplot

2019-08-04 15:18发布

问题:

I have sample cohort data that is not color-coded or displayed in proper chronological order when plotted with ggplot(). The following code is used to generate the plot:

library(ggplot2)

blues <- colorRampPalette(c('lightblue', 'darkblue'))

p <- ggplot(cohort.chart, aes(x=month, y=users, group=cohort))
p + geom_area(aes(fill = cohort)) +
  scale_fill_manual(values = blues(12)) +
  ggtitle('Users by cohort') +
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

As you can see in the legend, the months are not in chronological order, and that is reflected in the plot. I have tried changing both the cohort and the month to a yearmon object with as.yearmon() but this produces the error that continuous variables cannot be used. How does one solve this issue in ggplot() ?

Here is the data

> dput(cohort.chart)
structure(list(cohort = structure(c(11L, 10L, 3L, 5L, 4L, 8L, 
1L, 9L, 7L, 6L, 2L, 12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 
6L, 2L, 12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L, 2L, 12L, 
11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L, 2L, 12L, 11L, 10L, 
3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L, 2L, 12L, 11L, 10L, 3L, 5L, 4L, 
8L, 1L, 9L, 7L, 6L, 2L, 12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 
7L, 6L, 2L, 12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L, 2L, 
12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L, 2L, 12L, 11L, 
10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L, 2L, 12L, 11L, 10L, 3L, 5L, 
4L, 8L, 1L, 9L, 7L, 6L, 2L, 12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 
9L, 7L, 6L, 2L, 12L), .Label = c("Apr 2017", "Aug 2017", "Dec 2016", 
"Feb 2017", "Jan 2017", "Jul 2017", "Jun 2017", "Mar 2017", "May 2017", 
"Nov 2016", "Oct 2016", "Sep 2017"), class = "factor"), month = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 
11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 
12L, 12L, 12L, 12L, 12L, 12L), .Label = c("Oct 2016", "Nov 2016", 
"Dec 2016", "Jan 2017", "Feb 2017", "Mar 2017", "Apr 2017", "May 2017", 
"Jun 2017", "Jul 2017", "Aug 2017", "Sep 2017"), class = "factor"), 
    users = c(795, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 672, 92, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 526, 48, 26, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 506, 37, 14, 26, 0, 0, 0, 0, 0, 0, 0, 0, 514, 
    46, 18, 19, 37, 0, 0, 0, 0, 0, 0, 0, 532, 47, 16, 18, 22, 
    57, 0, 0, 0, 0, 0, 0, 589, 55, 15, 20, 24, 39, 88, 0, 0, 
    0, 0, 0, 548, 53, 18, 21, 25, 39, 62, 172, 0, 0, 0, 0, 559, 
    53, 20, 20, 23, 36, 61, 139, 133, 0, 0, 0, 538, 58, 17, 18, 
    22, 39, 54, 130, 99, 109, 0, 0, 519, 45, 16, 19, 26, 39, 
    50, 125, 96, 86, 115, 0, 530, 54, 18, 20, 23, 37, 51, 129, 
    88, 75, 85, 126)), row.names = c(NA, -144L), .Names = c("cohort", 
"month", "users"), class = "data.frame")

The plot should resemble the following:

回答1:

Based on the data frame your provided, your month column is in the right factor level, but your cohort column is not, so change the factor level of your cohort column to be the same as your month column.

library(ggplot2)

blues <- colorRampPalette(c('lightblue', 'darkblue'))

# Change the factor level
cohort.chart$cohort <- factor(cohort.chart$cohort, levels = levels(cohort.chart$month))

p <- ggplot(cohort.chart, aes(x=month, y=users, group=cohort))
p + geom_area(aes(fill = cohort)) +
  scale_fill_manual(values = blues(12)) +
  ggtitle('Users by cohort') +
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))



回答2:

When you created the dataframe, R (automatically) converted the strings into factors, creating a new factor for each new cohort date in alphabetical order.

cohort.chart$cohort 
[1]... 12 Levels: Apr 2017 Aug 2017 Dec 2016 Feb 2017 Jan 2017 Jul 2017 Jun 2017 Mar 2017 May 2017 ... Sep 2017

You can use as.yearmon to fix this, you just need to convert back to a factor afterwards.

cohort.chart$cohort <- as.yearmon(cohort.chart$cohort) 
cohort.chart$cohort <- as.factor(cohort.chart$cohort)  


标签: r ggplot2 zoo