Method for calculating distance between all points

2019-07-04 16:02发布

问题:

I'm sure this has been answered before, but I can't find the thread for the life of me!

I am trying to use r to produce a list of all the distances between pairs of xy coordinates in a dataframe. The data is stored something like this:

ID = c('1','2','3','4','5','6','7')
x = c(1,2,4,5,1,3,1)
y = c(3,5,6,3,1,5,1)
df= data.frame(ID,x,y)

At the moment I can calculate the distance between two points using:

length = sqrt((x1 - x2)^2+(y1 - y2)^2).

However, I am uncertain as to where to go next. Should I use something from plyr or a for loop?

Thanks for any help!

回答1:

Have you tried ?dist, the formula you listed is euclidean distance

dist(df[,-1]) 


回答2:

You can use a self-join to get all combinations then apply your distance formula. All of this is easily do-able using the tidyverse (combination of packages from Hadley Wickham):

# Load the tidyverse
library(tidyverse)

# Set up a fake key to join on (just a constant)
df <- df %>% mutate(k = 1) 

# Perform the join, remove the key, then create the distance
df %>% 
 full_join(df, by = "k") %>% 
 mutate(dist = sqrt((x.x - x.y)^2 + (y.x - y.y)^2)) %>%
 select(-k)

N.B. using this method, you'll also calculate the distance between each point and itself (as well as with all other points). It's easy to filter those points out though:

df %>% 
 full_join(df, by = "k") %>% 
 filter(ID.x != ID.y) %>%
 mutate(dist = sqrt((x.x - x.y)^2 + (y.x - y.y)^2)) %>%
 select(-k)

For more information about using the tidyverse set of packages I'd recommend R for Data Science or the tidyverse website.