ggplot barplot : How to display small positive num

2020-02-15 09:08发布

问题:

Main issue: I want to display the data from 0 to 1.0 as an upward bar (starting from 0) but do not want the intervals to be equally spaced but log spaced.

I am trying to display the column labeled "mean" in the dataset below as a bar plot in ggplot but as the numbers are very small, I would like to show the y-axis on a log scale rather than log transform the data itself. In other words, I want to have upright bars with y-axis labels as 0, 1e-8, 1e-6 1e-4 1e-2 and 1e-0 (i.e. from 0 to 1.0 but the intervals are log scaled).

The solution below does not work as the bars are inverted.

> print(df)
        type         mean           sd           se snp
V7    outer 1.596946e-07 2.967432e-06 1.009740e-08   A
V8    outer 7.472417e-07 6.598652e-06 2.245349e-08   B
V9    outer 1.352327e-07 2.515771e-06 8.560512e-09   C
V10   outer 2.307726e-07 3.235821e-06 1.101065e-08   D
V11   outer 4.598375e-06 1.653457e-05 5.626284e-08   E
V12   outer 5.963164e-07 5.372226e-06 1.828028e-08   F
V71  middle 2.035414e-07 3.246161e-06 1.104584e-08   A
V81  middle 9.000131e-07 7.261463e-06 2.470886e-08   B
V91  middle 1.647716e-07 2.875840e-06 9.785733e-09   C
V101 middle 3.290817e-07 3.886779e-06 1.322569e-08   D
V111 middle 6.371170e-06 1.986268e-05 6.758752e-08   E
V121 middle 8.312429e-07 6.329386e-06 2.153725e-08   F

The code below properly generates the grouped barplot with error bars

ggplot(data=df, aes(x=snp,y=mean,fill=type))+
  geom_bar(stat="identity",position=position_dodge(),width=0.5) + 
  geom_errorbar(aes(ymin=mean-se, ymax=mean+se),width=.3, position=position_dodge(.45)) 

However, I want to make the y-axis log scaled and so I add in scale_y_log10() as follows:

 ggplot(data=df, aes(x=snp,y=mean,fill=type))+
  geom_bar(stat="identity",position=position_dodge(),width=0.5) + scale_y_log10() +
  geom_errorbar(aes(ymin=mean-se, ymax=mean+se),width=.3, position=position_dodge(.45)) 

But strangely the bars are falling from above but I simply want them to be going up (as normally) and don't know what I am doing wrong.

Thank you

回答1:

Here's a bit of hacking to show what happens if you try to get bars that start at zero on a log scale. I've used geom_segment for illustration, so that I can create "bars" (wide line segments, actually) extending over arbitrary ranges. To make this work, I've also had to do all the dodging manually, which is why the x mapping looks weird.

In the example below, the scale goes from y=1e-20 to y=1. The y-axis intervals are log scaled, meaning that the physical distance from, say 1e-20 to 1e-19 is the same as the physical distance from, say, 1e-8 to 1e-7, even though the magnitudes of those intervals differ by a factor of one trillion.

Bars that go down to zero can't be displayed, because zero on the log scale is an infinite distance below the bottom of the graph. We could get closer to zero by, for example, changing 1e-20 to 1e-100 in the code below. But that will just make the already-small physical distances between the data values even smaller and thus even harder to distinguish.

The bars are also misleading in another way, because, as @hrbrmstr pointed out, our brains treat distance along the bar linearly, but the magnitude represented by each increment of distance along the bar changes by a factor of 10 about every few millimeters in the example below. The bars simply aren't encoding meaningful information about the data.

ggplot(data=df, aes(x=as.numeric(snp) + 0.3*(as.numeric(type) - 1.5), 
                    y=mean, colour=type)) +
  geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.3) +
  geom_segment(aes(xend=as.numeric(snp) + 0.3*(as.numeric(type) - 1.5),
                   y=1e-20, yend=mean), size=5) +
  scale_y_log10(limits=c(1e-20, 1), breaks=10^(-100:0), expand=c(0,0)) +
  scale_x_continuous(breaks=1:6, labels=LETTERS[1:6])

If you want to stick with a log scale, maybe plotting points would be a better approach:

pd = position=position_dodge(.5)
ggplot(data=df, aes(x=snp,y=mean,fill=type))+
  geom_errorbar(aes(ymin=mean-se, ymax=mean+se, colour=type), width=.3, position=pd) +
  geom_point(aes(colour=type), position=pd) +
  scale_y_log10(limits=c(1e-7, 1e-5), breaks=10^(-10:0)) +
  annotation_logticks(sides="l")