I have three cohorts of students identified by an ExperimentCohort
factor. For each student, I have a LetterGrade
, also a factor. I'd like to plot a histogram-like bar graph of LetterGrade
for each ExperimentCohort
. Using
ggplot(df, alpha = 0.2,
aes(x = LetterGrade, group = ExperimentCohort, fill = ExperimentCohort))
+ geom_bar(position = "dodge")
gets me very close, but the three ExperimentCohorts
don't have the same number of students. To compare these on a more even field, I'd like the y-axis to be the in-cohort proportion of each letter-grade. So far, short of calculating this proportion and putting it in a separate dataframe before plotting, I have not been able to find a way to do this.
Every solution to a similar question on SO and elsewhere involves aes(y = ..count../sum(..count..))
, but sum(..count..) is executed across the whole dataframe rather than within each cohort. Anyone got a suggestion? Here's code to create an example dataframe:
df <- data.frame(ID = 1:60,
LetterGrade = sample(c("A", "B", "C", "D", "E", "F"), 60, replace = T),
ExperimentCohort = sample(c("One", "Two", "Three"), 60, replace = T))
Thanks.
I recently attempted this and received an error calling ddply:
Column prop must be length 1 (a summary value), not 6
. Spent some time with ddply but couldn't quite get the solution to work so I offer up an alternative (note this still makes use ofplyr
):Then you can plot it just as @didzis-elferts mentioned:
Wrong solution
You can use
stat_bin()
andy=..density..
to get percentages in each group.UPDATE - correct solution
As pointed out by @rpierce
y=..density..
will calculate density values for each group not the percentages (they are not the same).To get the correct solution with percentages one way is to calculate them before plotting. For this used function
ddply()
from libraryplyr
. In eachExperimentCohort
calculated proportions using functionsprop.table()
andtable()
and saved them asprop
. Withnames()
andtable()
got backLetterGrade
.Now use this new data frame for plotting. As proportions are already calculated - provided them as
y
values and addedstat="identity"
inside thegeom_bar
.You can also do this by creating a
weight
column that sums to 1 for each group: