I have a Spark dataframe in R as follows
head(df)
Lat1 Lng1 Lat2 Lng2
23.123 24.234 25.345 26.456
... ... ... ...
The DataFrame
contains two points Latitude and Longitude
I would like to calculate the Geo distance between the nodes in each row and add it to a new column.
In R I am using distCosine
function from geosphere
library.
df$dist = distCosine(cbind(df$lng1,df$lat1),cbind(df$lng2,df$lat2))
I am wondering how I should calculate it in SparkR.
SparkR produces the following error,
Error in as.integer(length(x) > 0L) :
cannot coerce type 'S4' to vector of type 'integer'
You cannot use standard R function directly on Spark
DataFrames
. If you use a recent Spark release you can you can usedapply
but it is a bit verbose and slowish:In practice I would rather use the formula directly. It will be much faster, all required functions are already available and it is not very complicated: