How to improve the aspect of ggplot histograms wit

2020-05-28 10:57发布

问题:

I am trying to improve the clarity and aspect of a histogram of discrete values which I need to represent with a log scale.

Please consider the following MWE

set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) + geom_histogram()

which produces

and then

ggplot(data, aes(x=dist)) + geom_line() + scale_x_log10(breaks=c(1,2,3,4,5,10,100))

which probably is even worse

since now it gives the impression that the something is missing between "1" and "2", and also is not totally clear which bar has value "1" (bar is on the right of the tick) and which bar has value "2" (bar is on the left of the tick).

I understand that technically ggplot provides the "right" visual answer for a log scale. Yet as observer I have some problem in understanding it.

Is it possible to improve something?

EDIT:

This what happen when I applied Jaap solution to my real data

Where do the dips between x=0 and x=1 and between x=1 and x=2 come from? My value are discrete, but then why the plot is also mapping x=1.5 and x=2.5?

回答1:

The first thing that comes to mind, is playing with the binwidth. But that doesn't give a great solution either:

ggplot(data, aes(x=dist)) +
  geom_histogram(binwidth=10) +
  scale_x_continuous(expand=c(0,0)) +
  scale_y_continuous(expand=c(0.015,0)) +
  theme_bw()

gives:


In this case it is probably better to use a density plot. However, when you use scale_x_log10 you will get a warning message (Removed 524 rows containing non-finite values (stat_density)). This can be resolved by using a log plus one transformation.

The following code:

library(ggplot2)
library(scales)

ggplot(data, aes(x=dist)) +
  stat_density(aes(y=..count..), color="black", fill="blue", alpha=0.3) +
  scale_x_continuous(breaks=c(0,1,2,3,4,5,10,30,100,300,1000), trans="log1p", expand=c(0,0)) +
  scale_y_continuous(breaks=c(0,125,250,375,500,625,750), expand=c(0,0)) +
  theme_bw()

will give this result:



回答2:

I am wondering, what if, y-axis is scaled instead of x-axis. It will results into few warnings wherever values are 0, but may serve your purpose.

set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) + geom_histogram() + scale_y_log10()

Also you may want to display frequencies as data labels, since people might ignore the y-scale and it takes some time to realize that y scale is logarithmic.

ggplot(data, aes(x=dist)) + geom_histogram(fill = 'skyblue', color = 'grey30') + scale_y_log10() +
  stat_bin(geom="text", size=3.5, aes(label=..count.., y=0.8*(..count..)))



回答3:

A solution could be to convert your data to a factor:

library(ggplot2)
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
ggplot(data, aes(x=factor(dist))) + 
    geom_histogram() + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1))

Resulting in:



标签: r ggplot2