I'm sure this has been answered before, but I can't find the thread for the life of me!
I am trying to use r to produce a list of all the distances between pairs of xy coordinates in a dataframe. The data is stored something like this:
ID = c('1','2','3','4','5','6','7')
x = c(1,2,4,5,1,3,1)
y = c(3,5,6,3,1,5,1)
df= data.frame(ID,x,y)
At the moment I can calculate the distance between two points using:
length = sqrt((x1 - x2)^2+(y1 - y2)^2).
However, I am uncertain as to where to go next. Should I use something from plyr or a for loop?
Thanks for any help!
Have you tried ?dist, the formula you listed is euclidean distance
dist(df[,-1])
You can use a self-join to get all combinations then apply your distance formula. All of this is easily do-able using the tidyverse
(combination of packages from Hadley Wickham):
# Load the tidyverse
library(tidyverse)
# Set up a fake key to join on (just a constant)
df <- df %>% mutate(k = 1)
# Perform the join, remove the key, then create the distance
df %>%
full_join(df, by = "k") %>%
mutate(dist = sqrt((x.x - x.y)^2 + (y.x - y.y)^2)) %>%
select(-k)
N.B. using this method, you'll also calculate the distance between each point and itself (as well as with all other points). It's easy to filter those points out though:
df %>%
full_join(df, by = "k") %>%
filter(ID.x != ID.y) %>%
mutate(dist = sqrt((x.x - x.y)^2 + (y.x - y.y)^2)) %>%
select(-k)
For more information about using the tidyverse
set of packages I'd recommend R for Data Science or the tidyverse
website.