How to keep consistent axes scaling in a grid of g

2019-07-07 17:26发布

EDIT: with description cleared up and code example, plots added.

I have a data set with locations of several animals.

I created a grid of location scatter plots for every single animal. Because the x y of plot are distance, I want to keep x y in same scale for each plot itself (so there is no distortion in distance) and across plots (so I can compare different plots with same scale).

Facet is a natural choice for this and it works with coord_fixed(). However it became more complex when there are outliers in the data (which could be errors). I modified @Mark Peterson great answer to add some outlier points.

set.seed(8675309)
df <-
  data.frame(
    x = runif(40, 1, 20)
    , y = runif(40, 100, 140)
    , ind = sample(LETTERS[1:4], 40, TRUE)
  )
# add some outliers to stretch the plot
outliers <- data.frame(x = c(-100, 30, 60,-50),
                       y = c(20, 200, -100, 500),
                       ind = LETTERS[1:4])
df <- rbind(df, outliers)

ggplot(df , aes(x = x, y = y)) +
  geom_point() +
  facet_wrap(~ind) +
  coord_fixed()

This is what we got. facet with outlier

1.facet plot with coord_fixed(): consistent scales, aligned axes

This plot satisfied the scale ratio requirement and the scale consistent requirement, it also have all axes aligned, i.e. all xlim ylim are same. This is useful because it can show the relative position of each other.

I also want to check the patterns of each plot and compare them. Keeping the facet plot for relative position, I want to add another plot that have consistent scales but axes not aligned. If you draw each plot individually it will choose the xlim ylim to just cover the data without the alignment requirement. So I just need to draw each plot, arrange them with gridExtra or cowplot.

Then to deal with the outliers, our plan is to add a zoom button to zoom in all plots (the plots will be in a Shiny app).

We decided to center every plot to its centroid. Although this way there will be more space wasted, with all plot centered correctly, zooming them all will show the majority of all plot and they are still comparable in scales.

I had a function to adjust each plot to its median center, a little bit similar to @Mark Peterson code.

I knew median center is not well defined in 2D points, but it's good enough for my needs. Because I need to adjust each plot individually, I cannot use facet anymore.

expand_1D_center <- function(vec){
  center <- median(vec)
  new_diff <- max(center - min(vec), 
                  max(vec) - center)
  return(c(new_min = center - new_diff, 
           new_max = center + new_diff))
}
# given x y vectors, get new x y lim to make centroid center
expand_2D_center <- function(x_vec, y_vec){
  return(list(xlim = expand_1D_center(x_vec),
              ylim = expand_1D_center(y_vec)))
}
# plot each with center adjusted
id_vector <- sort(unique(df$ind))
g_list <- vector("list", length = length(id_vector))
for (i in seq_along(id_vector)) {
  data_i <- df[df$ind == id_vector[i], ]
  new_lim <- expand_2D_center(data_i$x, data_i$y)
  g_list[[i]] <- ggplot(data = data_i, aes(x, y)) +
    geom_point() +
    coord_fixed(xlim = new_lim$xlim, ylim = new_lim$ylim) 
}
grid.arrange(grobs = g_list, ncol = 2, respect=TRUE)

center adjusted

2. center adjusted plots, with xy scale right for each plot, but not consistent across plots.

I hope this is more clear now. My first post didn't state the problem clearly when I was focused on current problem and forgot the whole history, which are needed to explain our requirement.

@Mark Peterson answer seems solved this problem, I'll read the code further to verify.

Thanks!

EDIT: to give some context, I added the plots from the real data here:

the overview plots with all gulls in one plot, note there are some outliers stretched the plot

the overview plots

This is the facet plot, which is useful to have everything aligned.

facet

This is the individual plots with each scales right, not aligned across plots.

plots not adjusted

This one have each plot centered around the centroid. I plan to zoom in them all at the same time. The only problem is the scales are not consistent across plots.

enter image description here

EDIT: I tried @Mark Peterson code on my data, it cropped some points but the plots are consistent., probably because my data is with much bigger values so the original padding is not big enough.

Mark is using the max xrange across all plots for each plot, so every plot have same range. My code tried to fit every plot to their pattern, but to place them inside a grid with consistent scales will need to shrink the plot with biggest canvas, or padding the smallest plot. Setting the range of every plot to same actually have similar effect but is much simpler to implement.

标签: r ggplot2
1条回答
smile是对你的礼貌
2楼-- · 2019-07-07 17:53

Alright, I think I have gotten my best guess at what you are asking, though I agree with @MrFlick that explictly sharing data would be a huge help to that.

If you had simple data with all of your animals on the same basic grid, I am guessing you wouldn't be asking (at least not the way you are). That is, given these data:

set.seed(8675309)
df <-
  data.frame(
    x = runif(40, 1, 20)
    , y = runif(40, 100, 140)
    , ind = sample(LETTERS[1:4], 40, TRUE)
  )

This straightforward facet_grid works:

ggplot(df , aes(x = x, y = y)) +
  geom_point() +
  facet_wrap(~ind) +
  coord_fixed()

to give this:

enter image description here

But, you said that facet_wrap solutions wouldn't work. So, I am guessing that you have data where each animal is in a different grid, like this (note, using dplyr here and much more below):

modDF <-
  df %>%
  mutate(x = x + as.numeric(ind)*10
         , y = y + as.numeric(ind)*20)

And that means that the above code (using modDF instead of df)

ggplot(modDF, aes(x = x, y = y)) +
  geom_point() +
  facet_wrap(~ind) +
  coord_fixed()

gives this:

enter image description here

which has a ton of wasted space and doesn't look great. So, I think you are asking how to handle data like these. For that, I think what you need to do is calculate the largest range (in each axis) and then generate that range centered on the data for each individual. For that, I am relying heavily on dplyr to group_by individual and calculate the minimum and maximum x/y locations. Then, I calculate a number of additional columns to calculate the midpoint of the data for each individual, the size of the range, and then where the range should extend to be set to the largest width/height needed and be centered on that individual's data. Note that I am also padding these a little bit so that I can set expand = FALSE when I implement the ranges.

getRanges <-
  modDF %>%
  group_by(ind) %>%
  summarise(
    minx = min(x)
    , maxx = max(x)
    , miny = min(y)
    , maxy = max(y)
  ) %>%
  mutate(
    # Find mid points for range setting
    midx = (maxx + minx)/2
    , midy = (maxy + miny)/2
    # Find size of all ranges
    , xrange = maxx - minx
    , yrange = maxy - miny
    # Set X lims the size of the biggest range, centered at the middle
    , xstart = midx - max(xrange)/2 - 0.5
    , xend = midx + max(xrange)/2 + 0.5
    # Set Y lims the size of the biggest range, centered at the middle
    , ystart = midy - max(yrange)/2 - 0.5
    , yend = midy + max(yrange)/2 + 0.5
    )

gives

     ind     minx     maxx     miny     maxy     midx     midy   xrange   yrange   xstart     xend   ystart     yend
  <fctr>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1      A 14.91873 29.53871 120.0743 157.6944 22.22872 138.8844 14.61997 37.62010 14.17717 30.28027 119.5743 158.1944
2      B 22.50432 37.27647 153.5654 179.0589 29.89039 166.3122 14.77215 25.49352 21.83884 37.94195 147.0021 185.6222
3      C 32.15187 47.08845 165.9829 195.0261 39.62016 180.5045 14.93658 29.04320 31.56861 47.67171 161.1945 199.8146
4      D 44.49392 59.59702 192.7243 214.5523 52.04547 203.6383 15.10310 21.82806 43.99392 60.09702 184.3283 222.9484

Then, I loop through each individual, generating the plot needed and setting the range to what was calculated for that individual. (You could use ggtitle instead of facet_wrap but I like the strip effect from facet_wrap.)

sepPlots <- lapply(levels(modDF$ind), function(thisInd){
  thisRange <-
    filter(getRanges, ind == thisInd)

  modDF %>%
    filter(ind == thisInd) %>%
    ggplot(aes(x = x, y = y)) +
    geom_point() +
    coord_fixed(
      xlim = c(thisRange$xstart, thisRange$xend)
      , ylim = c(thisRange$ystart, thisRange$yend)
      , expand = FALSE
    ) +
    # ggtitle(thisInd)
    facet_wrap(~ind)
})

Then, I use plot_grid from cowplot to arrange the plots together. Note that loading cowplot sets a theme. So, I am resetting the theme because I am not a huge fan of the one from cowplot

library(cowplot)
theme_set(theme_gray())

plot_grid(plotlist = sepPlots)

gives:

enter image description here

From there, you can play around with scales and axis labels as you see fit.

查看更多
登录 后发表回答