Calculating Euclidean Distance for Large DataSets

2019-07-07 03:10发布

问题:

I have to calculate Euclidean distance between train and test data. the total length of train data is 1389 and for test data is 364. It is basically the data from the handwritten ZIP codes on envelopes from U.S. postal mail, downloaded from the website of "Elements of Statistical learning".

I am a beginner and just read the data in R package. I'm unable to start calculating distance between train and test data. Can anyone help me out to give me an idea that how to generate a loop for this data?

I would be thankful.

回答1:

For Euclidian distances, I like using rdist from the fields packages. One advantage over dist from the stats package, is that it can take two matrices as input:

train.data <- matrix(runif(1389*2), ncol = 2)
test.data  <- matrix(runif(364*2),  ncol = 2)

library(fields)
distances <- rdist(train.data, test.data)
dim(distances)
# [1] 1389  364


标签: r distance