可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I'm pretty new to R and I have a problem with plotting a barplot out of my data which looks like this:
condition answer
2 H
1 H
8 H
5 W
4 M
7 H
9 H
10 H
6 H
3 W
The data consists of 100 rows with the conditions 1 to 10, each randomly generated 10 times (10 times condition 1, 10 times condition 8,...). Each of the conditions also has a answer which could be H for Hit, M for Miss or W for wrong.
I want to plot the number of Hits for each condition in a barplot (for example 8 Hits out of 10 for condition 1,...) for that I tried to do the following in ggplot2
ggplot(data=test, aes(x=test$condition, fill=answer=="H"))+
geom_bar()+labs(x="Conditions", y="Hitrate")+
coord_cartesian(xlim = c(1:10), ylim = c(0:10))+
scale_x_continuous(breaks=seq(1,10,1))
And it looked like this:
This actually exactly what I need except for the red color which covers everything. You can see that conditions 3 to 5 have no blue bar, because there are no hits for these conditions.
Is there any way to get rid of this red color and to maybe count the amount of hits for the different conditions? -> I tried the count function of dplyr but it only showed me the amount of H when there where some for this particular condition. 3-5 where just "ignored" by count, there wasn't even a 0 in the output.-> but I'd still need those numbers for the plot
I'm sorry for this particular long post but I'm really at the end of knowledge considering this. I'd be open for suggestions or alternatives! Thanks in advance!
回答1:
This is a situation where a little preprocessing goes a long way. I made sample data that would recreate the issue, i.e. has cases where there won't be any "H"s.
Instead of relying on ggplot
to aggregate data in the way you want it, use proper tools. Since you mention dplyr::count
, I use dplyr
functions.
The preprocessing task is to count observations with answer "H", including cases where the count is 0. To make sure all combinations are retained, convert condition to a factor and set .drop = F
in count
, which is in turn passed to group_by
.
library(dplyr)
library(ggplot2)
set.seed(529)
test <- data.frame(condition = rep(1:10, times = 10),
answer = c(sample(c("H", "M", "W"), 50, replace = T),
sample(c("M", "W"), 50, replace = T)))
hit_counts <- test %>%
mutate(condition = as.factor(condition)) %>%
filter(answer == "H") %>%
count(condition, .drop = F)
hit_counts
#> # A tibble: 10 x 2
#> condition n
#> <fct> <int>
#> 1 1 0
#> 2 2 1
#> 3 3 4
#> 4 4 2
#> 5 5 3
#> 6 6 0
#> 7 7 3
#> 8 8 2
#> 9 9 1
#> 10 10 1
Then just plot that. geom_col
is the version of geom_bar
for where you have your y-values already, instead of having ggplot
tally them up for you.
ggplot(hit_counts, aes(x = condition, y = n)) +
geom_col()
回答2:
One option is to just filter out anything but where answer == "H"
from your dataset, and then plot.
An alternative is to use a grouped bar plot, made by setting position = "dodge"
:
test <- data.frame(condition = rep(1:10, each = 10),
answer = sample(c('H', 'M', 'W'), 100, replace = T))
ggplot(data=test) +
geom_bar(aes(x = condition, fill = answer), position = "dodge") +
labs(x="Conditions", y="Hitrate") +
coord_cartesian(xlim = c(1:10), ylim = c(0:10)) +
scale_x_continuous(breaks=seq(1,10,1))
Also note that if the condition is actually a categorical variable, it may be better to make it a factor:
test$condition <- as.factor(test$condition)
This means that you don't need the scale_x_continuous
call, and that the grid lines will be cleaner.
回答3:
Another option is to pick your fill colors explicitly and make FALSE transparent by using scale_fill_manual
. Since FALSE comes alphabetically first, the first value to specify is FALSE, the second TRUE.
ggplot(data=test, aes(x=condition, fill=answer=="H"))+
geom_bar()+labs(x="Conditions", y="Hitrate")+
coord_cartesian(xlim = c(1:10), ylim = c(0:10))+
scale_x_continuous(breaks=seq(1,10,1)) +
scale_fill_manual(values = c(alpha("red", 0), "cadetblue")) +
guides(fill = F)