overlay/superimpose grouped bar plots in ggplot2

2019-07-31 15:25发布

I'd like to make a bar plot featuring an overlay of data from two time points, 'before' and 'after'.

At each time point, participants were asked two questions ('pain' and 'fear'), which they would answer by stating a score of 1, 2, or 3.

My existing code plots the counts for the data from the 'before' time point nicely, but I can't seem to add the counts for the 'after' data.

This is a sketch of what I'd like the plot to look like with the 'after' data added, with the black bars representing the 'after' data:

enter image description here

I'd like to make the plot in ggplot2() and I've tried to adapt code from How to superimpose bar plots in R? but I can't get it to work for grouped data.

Many thanks!

#DATA PREP
library(dplyr)
library(ggplot2)
library(tidyr)


df <- data.frame(before_fear=c(1,1,1,2,3),before_pain=c(2,2,1,3,1),after_fear=c(1,3,3,2,3),after_pain=c(1,1,2,3,1))


df <- df %>% gather("question", "answer_option") # Get the counts for each answer of each question 
df2 <- df  %>%
  group_by(question,answer_option) %>%
  summarise (n = n()) 
df2 <- as.data.frame(df2)


df3 <- df2 %>% mutate(time = factor(ifelse(grepl("before", question), "before", "after"),
                                        c("before", "after"))) # change classes and split data into two data frames
df3$n <- as.numeric(df3$n)
df3$answer_option <- as.factor(df3$answer_option)
df3after <- df3[ which(df3$time=='after'), ]
df3before <- df3[ which(df3$time=='before'), ]


# CODE FOR 'BEFORE' DATA ONLY PLOT - WORKS  
    ggplot(df3before, aes(fill=answer_option, y=n, x=question)) + geom_bar(position="dodge", stat="identity")



# CODE FOR 'BEFORE' AND 'AFTER' DATA PLOT - DOESN'T WORK
ggplot(mapping = aes(x, y,fill)) +
  geom_bar(data = data.frame(x = df3before$question, y = df3before$n, fill= df3before$index_value), width = 0.8, stat = 'identity') +
  geom_bar(data = data.frame(x = df3after$question, y = df3after$n, fill=df3after$index_value), width = 0.4, stat = 'identity', fill = 'black') +
  theme_classic() + scale_y_continuous(expand = c(0, 0))

标签: r ggplot2
2条回答
何必那么认真
2楼-- · 2019-07-31 16:06

I think the clue is to set the width of the "after" bars, but to dodge them as if their width are 0.9 (i.e. the same (default) width as the "before" bars). In addition, because we don't map fill of the "after" bars, we need to use the group aesthetic instead to achieve the dodging.

I prefer to have only one data set and just subset it in each call to geom_col.

ggplot(mapping = aes(x = question, y = n, fill = factor(ans))) +
  geom_col(data = d[d$t == "before", ], position = "dodge") +
  geom_col(data = d[d$t == "after", ], aes(group = ans),
           fill = "black", width = 0.5, position = position_dodge(width = 0.9))

enter image description here

Data:

set.seed(2)
d <- data.frame(t = rep(c("before", "after"), each = 6),
                question = rep(c("pain", "fear"), each = 3),
                ans = 1:3, n = sample(12))

Alternative data preparation using data.table, starting with your original 'df':

library(data.table)
d <- melt(setDT(df), measure.vars = names(df), value.name = "ans")
d[ , c("t", "question") := tstrsplit(variable, "_")]

Either pre-calculate the counts and proceed as above with geom_col

# d2 <- d[ , .N, by = .(question, ans)]

Or let geom_bar do the counting:

ggplot(mapping = aes(x = question, fill = factor(ans))) +
  geom_bar(data = d[d$t == "before", ], position = "dodge") +
  geom_bar(data = d[d$t == "after", ], aes(group = ans),
           fill = "black", width = 0.5, position = position_dodge(width = 0.9))

enter image description here

Data:

df <- data.frame(before_fear = c(1,1,1,2,3), before_pain = c(2,2,1,3,1),
                     after_fear = c(1,3,3,2,3),after_pain = c(1,1,2,3,1))
查看更多
够拽才男人
3楼-- · 2019-07-31 16:14

My solution is very similar to @Henrik's, but I wanted to point out a few things.

First, you're building your data frames inside your geom_cols, which is probably messier than you need it to be. If you've already created df3after, etc., you might as well use it inside your ggplot.

Second, I had a hard time following your tidying. I think there are a couple tidyr functions that might make this task easier on you, so I went a different route, such as using separate to create the columns of time and measure, rather than essentially searching for them manually, making it more scalable. This also lets you put "pain" and "fear" on your x-axis, rather than still having "before_pain" and "before_fear", which are no longer accurate representations once you have "after" values on the plot as well. But feel free to disregard this and stick with your own method.

library(tidyverse)

df <- data.frame(before_fear = c(1,1,1,2,3),
                 before_pain = c(2,2,1,3,1),
                 after_fear = c(1,3,3,2,3),
                 after_pain = c(1,1,2,3,1))
df_long <- df %>%
  gather(key = question, value = answer_option) %>%
  mutate(answer_option = as.factor(answer_option)) %>%
  count(question, answer_option) %>%
  separate(question, into = c("time", "measure"), sep = "_", remove = F)

df_long
#> # A tibble: 12 x 5
#>    question    time   measure answer_option     n
#>    <chr>       <chr>  <chr>   <fct>         <int>
#>  1 after_fear  after  fear    1                 1
#>  2 after_fear  after  fear    2                 1
#>  3 after_fear  after  fear    3                 3
#>  4 after_pain  after  pain    1                 3
#>  5 after_pain  after  pain    2                 1
#>  6 after_pain  after  pain    3                 1
#>  7 before_fear before fear    1                 3
#>  8 before_fear before fear    2                 1
#>  9 before_fear before fear    3                 1
#> 10 before_pain before pain    1                 2
#> 11 before_pain before pain    2                 2
#> 12 before_pain before pain    3                 1

I split this into before & after datasets, as you did, then plotted them with 2 geom_cols. I still put df_long into ggplot, treating it almost as a dummy to get uniform x and y aesthetics. Like @Henrik said, you can use different width in the geom_col and in its position_dodge to dodge the bars at a width of 90% but give the bars themselves a width of only 40%.

df_before <- df_long %>% filter(time == "before")
df_after <- df_long %>% filter(time == "after")

ggplot(df_long, aes(x = measure, y = n)) +
  geom_col(aes(fill = answer_option), 
    data = df_before, width = 0.9, 
    position = position_dodge(width = 0.9)) +
  geom_col(aes(group = answer_option), 
    data = df_after, fill = "black", width = 0.4, 
    position = position_dodge(width = 0.9))

What you could instead of making the two separate data frames is to filter inside each geom_col. This is generally my preference unless the filtering is more complex. This code will get the same plot as above.

ggplot(df_long, aes(x = measure, y = n)) +
  geom_col(aes(fill = answer_option), 
    data = . %>% filter(time == "before"), width = 0.9, 
    position = position_dodge(width = 0.9)) +
  geom_col(aes(group = answer_option), 
    data = . %>% filter(time == "after"), fill = "black", width = 0.4, 
    position = position_dodge(width = 0.9))
查看更多
登录 后发表回答