I have a function which takes two arrays containing the tokens/words of two texts and gives out the cosine similarity value which shows the relationship between both texts.
The function takes an array $tokensA (0=>house, 1=>bike, 2=>man) and an array $tokensB (0=>bike, 1=>house, 2=>car) and calculates the similarity which is given back as a floating point value.
function cosineSimilarity($tokensA, $tokensB) {
$a = $b = $c = 0;
$uniqueTokensA = $uniqueTokensB = array();
$uniqueMergedTokens = array_unique(array_merge($tokensA, $tokensB));
foreach ($tokensA as $token) $uniqueTokensA[$token] = 0;
foreach ($tokensB as $token) $uniqueTokensB[$token] = 0;
foreach ($uniqueMergedTokens as $token) {
$x = isset($uniqueTokensA[$token]) ? 1 : 0;
$y = isset($uniqueTokensB[$token]) ? 1 : 0;
$a += $x * $y;
$b += $x;
$c += $y;
}
return $b * $c != 0 ? $a / sqrt($b * $c) : 0;
}
If I want to compare 75 texts with each other, I need to make 5,625 single comparisons to have all texts compared with each other.
Is it possible to use MySQL's spatial columns to reduce the number of comparisons?
I don't want to talk about my function or about ways to compare texts. Just about reducing the number of comparisons.
MySQL's spatial columns
- You create spatial columns with: CREATE TABLE abc (clmnName TYPE)
- possible types are listed here
- here is how I select the data later [e.g. MultiPointFromText() or AsText()]
- You insert values like this: INSERT INTO clmnName VALUES (GeomFromText('POINT(1 1)'))
But how do you use this for my problem?
PS: I'm looking for ways to reduce the number of comparisons with algorithms in this question. Vinko Vrsalovic told me that I should open another question for the spatial features.