Let's say I have and array, where column 1 is in feet, column 2 is in feet, and column 3 is in seconds. For example:
x = [50 40 30]
I then have another array, y
, with the same units and same number of columns, but many rows. I then turn it into a KDTree with Scipy:
tree = scipy.KDTree(y)
and then query that tree:
distance,index = tree.query(x,k=1)
By default, I believe the distance is calculated based on the Euclidean norm.
So for example, distance
might be:
print distance
[34]
What units are these? Are they still in the original feet, feet, & seconds?
It doesn't return any interpretable unit when the measurements are of things in which units can't be converted to each other (time and distance, for example). It's returning sqrt(feet**2 + feet**2 + sec**2)
, which is not a unit of measure. It's the Euclidean norm, but over an abstract space in this case.
This isn't really a Python question, by the way. scipy is just manipulating the numbers you give it and doesn't know the units. It's more a question of how to interpret math, for instance, if you want to think of a 5' x 5' box as 'closer' to a 7' x 7' box than a 6' x 6' box because you happened to measure them within seconds of each other and measured the third box hours later. Only you know your data and what features really count for building a similarity score. In the case I just gave, it doesn't make sense. If you're ranking similarity of sprinters based on both body size and best 100m time, then it probably makes sense.