Forcing a 1e3 instead of 1000 format in ggplot R

2019-05-20 09:07发布

问题:

I'm having some trouble with the y formatting ranges. When I use scale_y_log10() in my plot, it decides that having the scale 0.1, 10, 1000 is the way to do it. I really need it to display it as 1e-1, 1e1, 1e3. math_format help page is not helpful without the format I need to know.

Anything I can answer I will.

回答1:

You can use the breaks and labels parameters of scale_y_log10 as in

library(ggplot2)

ggplot(data=subset(movies, votes > 1000)) +
  aes(x = rating, y = votes / 10000) +
  scale_y_log10(breaks = c(0.1, 1, 10), labels = expression(10^-1, 10^0, 10^1)) +
  geom_point()

This might not be an elegant solution, but it works if you only have a limited number of plots.



回答2:

The problem is that R uses an not well-understood penalty mechanism for deciding whether to print in normal or scientific notation. This is decided by options( scipen ).

The value represents the penalty R applies to the number of characters it would take to print in scientific notation vs. fixed point, so options( scipen = 3 ) would mean that R adds 3 to the number of characters it takes to print say 1e2 and compares it to the number of characters it needs to print the fixed point equivalent and prints the number with the lower number of characters (so in this case 1e2 = 3 characters, + 3 penalty = 6, whereas 100 equals 3 characters so 100 gets printed. To fix you example just set options( scipen = -10 ) to always favour printing scientific notation over fixed point. So (using @PeterB's example) you can use scipen which should allow you to not worry about manual break setting...

option( scipen = -10 )
ggplot(data=subset(movies, votes > 1000)) +
  aes(x = rating, y = votes / 10000) +
  geom_point()



回答3:

The easiest way to achieve what you ask, with automatic limits and breaks, and without side-effects is this:

library(ggplot2)
library(MASS)
library(scales)
ggplot(data=subset(movies, votes > 1000)) +
  aes(x = rating, y = votes / 10000) +
  scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x, n=3), 
                labels = trans_format("log10")) +
  geom_point()

I rather prefer to use superscripts for the powers of ten, and hide the minor grid, and add ticks spaced according to logs. This is also rather easy to achieve:

ggplot(data=subset(movies, votes > 1000)) +
  aes(x = rating, y = votes / 10000) +
  scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x, n=3), 
               labels = trans_format("log10", math_format(10^.x))) +
  theme(panel.grid.minor = element_blank()) +
  annotation_logticks(sides="l") + 
  geom_point()

The code above is adapted from the examples in the annotation_logticks help, annotation_logticks. There is a lot of flexibilty for adjusting the exact format.



标签: r ggplot2