I've written a short 'for' loop to find the minimum euclidean distance between each row in a dataframe and all the other rows (and to record which row is closest). In theory this avoids the errors associated with trying to calculate distance measures for very large matrices. However, while not that much is being saved in memory, it is very very slow for large matrices (my use case of ~150K rows is still running).
I'm wondering whether anyone can advise or point me in the right direction in terms of vectorising my function, using apply or similar. Apologies for what may seem a simple question, but I'm still struggling to think in a vectorised way.
Thanks in advance (and for your patience).
require(proxy)
df<-data.frame(matrix(runif(10*10),nrow=10,ncol=10), row.names=paste("site",seq(1:10)))
min.dist<-function(df) {
#df for results
all.min.dist<-data.frame()
#set up for loop
for(k in 1:nrow(df)) {
#calcuate dissimilarity between each row and all other rows
df.dist<-dist(df[k,],df[-k,])
# find minimum distance
min.dist<-min(df.dist)
# get rowname for minimum distance (id of nearest point)
closest.row<-row.names(df)[-k][which.min(df.dist)]
#combine outputs
all.min.dist<-rbind(all.min.dist,data.frame(orig_row=row.names(df)[k],
dist=min.dist, closest_row=closest.row))
}
#return results
return(all.min.dist)
}
#example
min.dist(df)