ggplot2: Using `fill = …` in aes(..) and geom_bar(

2019-07-28 18:27发布


Here is a barplot with ggplot:

ggplot(subset(dat, Gene=='3_RH2B'), aes(x=Morpho, y=Weights, fill=Model2)) + geom_bar(stat='identity') + ggtitle('RH2B')

My problem is that the colors repeat instead of forming one big block. I would like that each bar is formed by three blocks of color corresponding to the three levels of the variable dat$Model2. How can I achieve this? Why does ggplot create this graph and not directly the one I'd like?

Here is the data.frame dat:

      Gene    Morpho Model     Weights Model2
1   1_RH1 Morph_PC1  OUMV 0.081666667   OUMx
2   1_RH1 Morph_PC1   OUM 0.093333333   OUMx
3   1_RH1 Morph_PC1   BM1 0.286666667    BMx
4   1_RH1 Morph_PC1 OUMVA 0.191666667   OUMx
5   1_RH1 Morph_PC1   OU1 0.076666667    OU1
6   1_RH1 Morph_PC1   BMS 0.255000000    BMx
7   1_RH1 Morph_PC1  OUMA 0.013333333   OUMx
8   1_RH1 Morph_PC2   OU1 0.106666667    OU1
9   1_RH1 Morph_PC2   BM1 0.030000000    BMx
10  1_RH1 Morph_PC2   OUM 0.226666667   OUMx
11  1_RH1 Morph_PC2 OUMVA 0.346666667   OUMx
12  1_RH1 Morph_PC2  OUMA 0.238333333   OUMx
13  1_RH1 Morph_PC2  OUMV 0.045000000   OUMx
14  1_RH1 Morph_PC2   BMS 0.003333333    BMx
15  2_LWS Morph_PC1   BM1 0.545000000    BMx
16  2_LWS Morph_PC1   BMS 0.253333333    BMx
17  2_LWS Morph_PC1   OUM 0.061666667   OUMx
18  2_LWS Morph_PC1  OUMV 0.018333333   OUMx
19  2_LWS Morph_PC1  OUMA 0.015000000   OUMx
20  2_LWS Morph_PC1 OUMVA 0.110000000   OUMx
21  2_LWS Morph_PC1   OU1 0.000000000    OU1
22  2_LWS Morph_PC2   OU1 0.136666667    OU1
23  2_LWS Morph_PC2   OUM 0.078333333   OUMx
24  2_LWS Morph_PC2 OUMVA 0.373333333   OUMx
25  2_LWS Morph_PC2   BM1 0.028333333    BMx
26  2_LWS Morph_PC2  OUMV 0.018333333   OUMx
27  2_LWS Morph_PC2  OUMA 0.353333333   OUMx
28  2_LWS Morph_PC2   BMS 0.013333333    BMx
29 3_RH2B Morph_PC1   BM1 0.301666667    BMx
30 3_RH2B Morph_PC1   BMS 0.478333333    BMx
31 3_RH2B Morph_PC1   OU1 0.091666667    OU1
32 3_RH2B Morph_PC1   OUM 0.066666667   OUMx
33 3_RH2B Morph_PC1  OUMA 0.028333333   OUMx
34 3_RH2B Morph_PC1  OUMV 0.023333333   OUMx
35 3_RH2B Morph_PC1 OUMVA 0.008333333   OUMx
36 3_RH2B Morph_PC2   OUM 0.246666667   OUMx
37 3_RH2B Morph_PC2  OUMA 0.171666667   OUMx
38 3_RH2B Morph_PC2  OUMV 0.096666667   OUMx
39 3_RH2B Morph_PC2   BMS 0.106666667    BMx
40 3_RH2B Morph_PC2   OU1 0.213333333    OU1
41 3_RH2B Morph_PC2   BM1 0.140000000    BMx
42 3_RH2B Morph_PC2 OUMVA 0.025000000   OUMx


It appears that your data.frame is a summary table. In which case, stat = 'identity' could be appropriate within the geom_bar command. Except not. You need ggplot to perform additional aggregations on the summary table. For the first stacked bar (MORPH_PC1), the components to be stacked are ordered, and, despite stat='identity', ggplot will add the appropriate weights. But if you change the order of the components of the first stacked bar, then it too will contain repeated colours. For instance, use your ggplot command with the following data frame to see the effect. It's your data frame except for a slight change in the order for the Model2 variable.

dat = read.table(text = "      Gene    Morpho Model     Weights Model2
29 3_RH2B Morph_PC1   BM1 0.301666667    BMx
32 3_RH2B Morph_PC1   OUM 0.066666667   OUMx
30 3_RH2B Morph_PC1   BMS 0.478333333    BMx
31 3_RH2B Morph_PC1   OU1 0.091666667    OU1
33 3_RH2B Morph_PC1  OUMA 0.028333333   OUMx
34 3_RH2B Morph_PC1  OUMV 0.023333333   OUMx
35 3_RH2B Morph_PC1 OUMVA 0.008333333   OUMx
36 3_RH2B Morph_PC2   OUM 0.246666667   OUMx
37 3_RH2B Morph_PC2  OUMA 0.171666667   OUMx
38 3_RH2B Morph_PC2  OUMV 0.096666667   OUMx
39 3_RH2B Morph_PC2   BMS 0.106666667    BMx
40 3_RH2B Morph_PC2   OU1 0.213333333    OU1
41 3_RH2B Morph_PC2   BM1 0.140000000    BMx
42 3_RH2B Morph_PC2 OUMVA 0.025000000   OUMx", header = TRUE, sep = "")

Additional solutions to the one offered by @Alpha:

Perform the additional aggregation outside the ggplot2 command, then plot:

datRevised = aggregate(Weights ~ Morpho + Model2, data = dat, FUN = "sum")
ggplot(datRevised, aes(x=Morpho, y=Weights, fill=Model2)) + geom_bar(stat='identity') + ggtitle('RH2B')

Or, use the weight aesthetic on the original data frame (see here for some details - about half way down the page).

ggplot(dat, aes(x=Morpho, weight=Weights, fill=Model2)) + geom_bar() + ggtitle('RH2B')


What you need is to order your Model2 column by names and then it works fine:

sub <- subset(dat, Gene == '3_RH2B')
df <- sub[with(sub, order(Model2)), ]

ggplot(df, aes(x = Morpho, y = Weights, fill = Model2)) + 
  geom_bar(stat = 'identity') + ggtitle('RH2B')

If you try plotting not ordered data without stat = 'identity' (and because of it remove weighting by Weights) you can see that the block problem does not occur:

ggplot(sub, aes(x = Morpho, fill = Model2)) + geom_bar() + ggtitle('RH2B')