EDIT: with description cleared up and code example, plots added.
I have a data set with locations of several animals.
I created a grid of location scatter plots for every single animal. Because the x y of plot are distance, I want to keep x y in same scale for each plot itself (so there is no distortion in distance) and across plots (so I can compare different plots with same scale).
Facet is a natural choice for this and it works with coord_fixed()
. However it became more complex when there are outliers in the data (which could be errors). I modified @Mark Peterson great answer to add some outlier points.
set.seed(8675309)
df <-
data.frame(
x = runif(40, 1, 20)
, y = runif(40, 100, 140)
, ind = sample(LETTERS[1:4], 40, TRUE)
)
# add some outliers to stretch the plot
outliers <- data.frame(x = c(-100, 30, 60,-50),
y = c(20, 200, -100, 500),
ind = LETTERS[1:4])
df <- rbind(df, outliers)
ggplot(df , aes(x = x, y = y)) +
geom_point() +
facet_wrap(~ind) +
coord_fixed()
This is what we got.
1.facet plot with coord_fixed(): consistent scales, aligned axes
This plot satisfied the scale ratio requirement and the scale consistent requirement, it also have all axes aligned, i.e. all xlim ylim are same. This is useful because it can show the relative position of each other.
I also want to check the patterns of each plot and compare them. Keeping the facet plot for relative position, I want to add another plot that have consistent scales but axes not aligned. If you draw each plot individually it will choose the xlim ylim to just cover the data without the alignment requirement. So I just need to draw each plot, arrange them with gridExtra
or cowplot
.
Then to deal with the outliers, our plan is to add a zoom button to zoom in all plots (the plots will be in a Shiny app).
We decided to center every plot to its centroid. Although this way there will be more space wasted, with all plot centered correctly, zooming them all will show the majority of all plot and they are still comparable in scales.
I had a function to adjust each plot to its median center, a little bit similar to @Mark Peterson code.
I knew median center is not well defined in 2D points, but it's good enough for my needs. Because I need to adjust each plot individually, I cannot use facet anymore.
expand_1D_center <- function(vec){
center <- median(vec)
new_diff <- max(center - min(vec),
max(vec) - center)
return(c(new_min = center - new_diff,
new_max = center + new_diff))
}
# given x y vectors, get new x y lim to make centroid center
expand_2D_center <- function(x_vec, y_vec){
return(list(xlim = expand_1D_center(x_vec),
ylim = expand_1D_center(y_vec)))
}
# plot each with center adjusted
id_vector <- sort(unique(df$ind))
g_list <- vector("list", length = length(id_vector))
for (i in seq_along(id_vector)) {
data_i <- df[df$ind == id_vector[i], ]
new_lim <- expand_2D_center(data_i$x, data_i$y)
g_list[[i]] <- ggplot(data = data_i, aes(x, y)) +
geom_point() +
coord_fixed(xlim = new_lim$xlim, ylim = new_lim$ylim)
}
grid.arrange(grobs = g_list, ncol = 2, respect=TRUE)
2. center adjusted plots, with xy scale right for each plot, but not consistent across plots.
I hope this is more clear now. My first post didn't state the problem clearly when I was focused on current problem and forgot the whole history, which are needed to explain our requirement.
@Mark Peterson answer seems solved this problem, I'll read the code further to verify.
Thanks!
EDIT: to give some context, I added the plots from the real data here:
the overview plots with all gulls in one plot, note there are some outliers stretched the plot
This is the facet plot, which is useful to have everything aligned.
This is the individual plots with each scales right, not aligned across plots.
This one have each plot centered around the centroid. I plan to zoom in them all at the same time. The only problem is the scales are not consistent across plots.
EDIT: I tried @Mark Peterson code on my data, it cropped some points but the plots are consistent., probably because my data is with much bigger values so the original padding is not big enough.
Mark is using the max xrange across all plots for each plot, so every plot have same range. My code tried to fit every plot to their pattern, but to place them inside a grid with consistent scales will need to shrink the plot with biggest canvas, or padding the smallest plot. Setting the range of every plot to same actually have similar effect but is much simpler to implement.