ggplot2 alternatives to fill in barplots, occurenc

2019-08-20 11:09发布

I'm pretty new to R and I have a problem with plotting a barplot out of my data which looks like this:

condition answer
2    H
1    H
8    H
5    W
4    M
7    H
9    H
10   H
6    H
3    W

The data consists of 100 rows with the conditions 1 to 10, each randomly generated 10 times (10 times condition 1, 10 times condition 8,...). Each of the conditions also has a answer which could be H for Hit, M for Miss or W for wrong.

I want to plot the number of Hits for each condition in a barplot (for example 8 Hits out of 10 for condition 1,...) for that I tried to do the following in ggplot2

ggplot(data=test, aes(x=test$condition, fill=answer=="H"))+
  geom_bar()+labs(x="Conditions", y="Hitrate")+
  coord_cartesian(xlim = c(1:10), ylim = c(0:10))+ 
  scale_x_continuous(breaks=seq(1,10,1))

And it looked like this:

This actually exactly what I need except for the red color which covers everything. You can see that conditions 3 to 5 have no blue bar, because there are no hits for these conditions.

Is there any way to get rid of this red color and to maybe count the amount of hits for the different conditions? -> I tried the count function of dplyr but it only showed me the amount of H when there where some for this particular condition. 3-5 where just "ignored" by count, there wasn't even a 0 in the output.-> but I'd still need those numbers for the plot

I'm sorry for this particular long post but I'm really at the end of knowledge considering this. I'd be open for suggestions or alternatives! Thanks in advance!

3条回答
时光不老,我们不散
2楼-- · 2019-08-20 11:31

One option is to just filter out anything but where answer == "H" from your dataset, and then plot.

An alternative is to use a grouped bar plot, made by setting position = "dodge":

test <- data.frame(condition = rep(1:10, each = 10),
                   answer = sample(c('H', 'M', 'W'), 100, replace = T))

ggplot(data=test) +
    geom_bar(aes(x = condition, fill = answer), position = "dodge") +
    labs(x="Conditions", y="Hitrate") +
    coord_cartesian(xlim = c(1:10), ylim = c(0:10)) + 
    scale_x_continuous(breaks=seq(1,10,1))

grouped barplot

Also note that if the condition is actually a categorical variable, it may be better to make it a factor:

test$condition <- as.factor(test$condition)

This means that you don't need the scale_x_continuous call, and that the grid lines will be cleaner.

查看更多
我欲成王,谁敢阻挡
3楼-- · 2019-08-20 11:32

Another option is to pick your fill colors explicitly and make FALSE transparent by using scale_fill_manual. Since FALSE comes alphabetically first, the first value to specify is FALSE, the second TRUE.

ggplot(data=test, aes(x=condition, fill=answer=="H"))+
  geom_bar()+labs(x="Conditions", y="Hitrate")+
  coord_cartesian(xlim = c(1:10), ylim = c(0:10))+ 
  scale_x_continuous(breaks=seq(1,10,1)) +
  scale_fill_manual(values = c(alpha("red", 0), "cadetblue")) +
  guides(fill = F)

enter image description here

查看更多
够拽才男人
4楼-- · 2019-08-20 11:36

This is a situation where a little preprocessing goes a long way. I made sample data that would recreate the issue, i.e. has cases where there won't be any "H"s.

Instead of relying on ggplot to aggregate data in the way you want it, use proper tools. Since you mention dplyr::count, I use dplyr functions.

The preprocessing task is to count observations with answer "H", including cases where the count is 0. To make sure all combinations are retained, convert condition to a factor and set .drop = F in count, which is in turn passed to group_by.

library(dplyr)
library(ggplot2)

set.seed(529)
test <- data.frame(condition = rep(1:10, times = 10),
                   answer = c(sample(c("H", "M", "W"), 50, replace = T),
                              sample(c("M", "W"), 50, replace = T)))

hit_counts <- test %>%
  mutate(condition = as.factor(condition)) %>%
  filter(answer == "H") %>%
  count(condition, .drop = F)

hit_counts
#> # A tibble: 10 x 2
#>    condition     n
#>    <fct>     <int>
#>  1 1             0
#>  2 2             1
#>  3 3             4
#>  4 4             2
#>  5 5             3
#>  6 6             0
#>  7 7             3
#>  8 8             2
#>  9 9             1
#> 10 10            1

Then just plot that. geom_col is the version of geom_bar for where you have your y-values already, instead of having ggplot tally them up for you.

ggplot(hit_counts, aes(x = condition, y = n)) +
  geom_col()

查看更多
登录 后发表回答