Wrong Euclidean distance H2O calculations R

2020-04-08 15:04发布

问题:

I am using H2O with R to calculate the euclidean distance between 2 data.frames:

set.seed(121)

#create the data
df1<-data.frame(matrix(rnorm(1000),ncol=10))
df2<-data.frame(matrix(rnorm(300),ncol=10))
#init h2o
h2o.init()

#transform to h2o
df1.h<-as.h2o(df1)
df2.h<-as.h2o(df2)

if I use normal calculations, i.e. the first row:

distance1<-sqrt(sum((df1[1,]-df2[1,])^2))

And If I use the H2O library:

distance.h2o<-h2o.distance(df1.h[1,],df2.h[1,],"l2")

print(distance1)
print(distance.h2o)

The distance1 and distance.h2o are not the same. Does anybody knows why? Thanks!!

回答1:

It seems as if h2o.distance calculates the sum of squares, without taking the square root: so take the square root to get the standard result.

distance.h2o <- h2o.distance(df1.h[1,],df2.h[1,],"l2") 
sqrt(distance.h2o)