EDIT: with description cleared up and code example, plots added.
I have a data set with locations of several animals.
I created a grid of location scatter plots for every single animal. Because the x y of plot are distance, I want to keep x y in same scale for each plot itself (so there is no distortion in distance) and across plots (so I can compare different plots with same scale).
Facet is a natural choice for this and it works with coord_fixed()
. However it became more complex when there are outliers in the data (which could be errors). I modified @Mark Peterson great answer to add some outlier points.
set.seed(8675309)
df <-
data.frame(
x = runif(40, 1, 20)
, y = runif(40, 100, 140)
, ind = sample(LETTERS[1:4], 40, TRUE)
)
# add some outliers to stretch the plot
outliers <- data.frame(x = c(-100, 30, 60,-50),
y = c(20, 200, -100, 500),
ind = LETTERS[1:4])
df <- rbind(df, outliers)
ggplot(df , aes(x = x, y = y)) +
geom_point() +
facet_wrap(~ind) +
coord_fixed()
1.facet plot with coord_fixed(): consistent scales, aligned axes
This plot satisfied the scale ratio requirement and the scale consistent requirement, it also have all axes aligned, i.e. all xlim ylim are same. This is useful because it can show the relative position of each other.
I also want to check the patterns of each plot and compare them. Keeping the facet plot for relative position, I want to add another plot that have consistent scales but axes not aligned. If you draw each plot individually it will choose the xlim ylim to just cover the data without the alignment requirement. So I just need to draw each plot, arrange them with gridExtra
or cowplot
.
Then to deal with the outliers, our plan is to add a zoom button to zoom in all plots (the plots will be in a Shiny app).
We decided to center every plot to its centroid. Although this way there will be more space wasted, with all plot centered correctly, zooming them all will show the majority of all plot and they are still comparable in scales.
I had a function to adjust each plot to its median center, a little bit similar to @Mark Peterson code.
I knew median center is not well defined in 2D points, but it's good enough for my needs. Because I need to adjust each plot individually, I cannot use facet anymore.
expand_1D_center <- function(vec){
center <- median(vec)
new_diff <- max(center - min(vec),
max(vec) - center)
return(c(new_min = center - new_diff,
new_max = center + new_diff))
}
# given x y vectors, get new x y lim to make centroid center
expand_2D_center <- function(x_vec, y_vec){
return(list(xlim = expand_1D_center(x_vec),
ylim = expand_1D_center(y_vec)))
}
# plot each with center adjusted
id_vector <- sort(unique(df$ind))
g_list <- vector("list", length = length(id_vector))
for (i in seq_along(id_vector)) {
data_i <- df[df$ind == id_vector[i], ]
new_lim <- expand_2D_center(data_i$x, data_i$y)
g_list[[i]] <- ggplot(data = data_i, aes(x, y)) +
geom_point() +
coord_fixed(xlim = new_lim$xlim, ylim = new_lim$ylim)
}
grid.arrange(grobs = g_list, ncol = 2, respect=TRUE)
2. center adjusted plots, with xy scale right for each plot, but not consistent across plots.
I hope this is more clear now. My first post didn't state the problem clearly when I was focused on current problem and forgot the whole history, which are needed to explain our requirement.
@Mark Peterson answer seems solved this problem, I'll read the code further to verify.
Thanks!
EDIT: to give some context, I added the plots from the real data here:
the overview plots with all gulls in one plot, note there are some outliers stretched the plot
This is the facet plot, which is useful to have everything aligned.
This is the individual plots with each scales right, not aligned across plots.
This one have each plot centered around the centroid. I plan to zoom in them all at the same time. The only problem is the scales are not consistent across plots.
EDIT: I tried @Mark Peterson code on my data, it cropped some points but the plots are consistent., probably because my data is with much bigger values so the original padding is not big enough.
Mark is using the max xrange across all plots for each plot, so every plot have same range. My code tried to fit every plot to their pattern, but to place them inside a grid with consistent scales will need to shrink the plot with biggest canvas, or padding the smallest plot. Setting the range of every plot to same actually have similar effect but is much simpler to implement.
Alright, I think I have gotten my best guess at what you are asking, though I agree with @MrFlick that explictly sharing data would be a huge help to that.
If you had simple data with all of your animals on the same basic grid, I am guessing you wouldn't be asking (at least not the way you are). That is, given these data:
This straightforward
facet_grid
works:to give this:
But, you said that
facet_wrap
solutions wouldn't work. So, I am guessing that you have data where each animal is in a different grid, like this (note, usingdplyr
here and much more below):And that means that the above code (using
modDF
instead ofdf
)gives this:
which has a ton of wasted space and doesn't look great. So, I think you are asking how to handle data like these. For that, I think what you need to do is calculate the largest range (in each axis) and then generate that range centered on the data for each individual. For that, I am relying heavily on
dplyr
togroup_by
individual and calculate the minimum and maximum x/y locations. Then, I calculate a number of additional columns to calculate the midpoint of the data for each individual, the size of the range, and then where the range should extend to be set to the largest width/height needed and be centered on that individual's data. Note that I am also padding these a little bit so that I can setexpand = FALSE
when I implement the ranges.gives
Then, I loop through each individual, generating the plot needed and setting the range to what was calculated for that individual. (You could use
ggtitle
instead offacet_wrap
but I like thestrip
effect fromfacet_wrap
.)Then, I use
plot_grid
fromcowplot
to arrange the plots together. Note that loadingcowplot
sets a theme. So, I am resetting the theme because I am not a huge fan of the one fromcowplot
gives:
From there, you can play around with scales and axis labels as you see fit.