I would like to use the Jaccard similarity in the stringdist function to determine the similarity of bags of words. From what I can tell, using Jaccard only matches by letters within a character string.
c <- c('cat', 'dog', 'person')
d <- c('cat', 'dog', 'ufo')
stringdist(c, d, method='jaccard', q=2)
[1] 0 0 1
So we see here that it calculates the similarity of 'cat' and 'cat', 'dog' and 'dog' and 'person' and 'ufo'.
I also tried converting the words into 1 long text string. The following approaches what I need, but it's still calculating 1 - (number of shared 2-grams / number of total unique 2-grams):
f <- 'cat dog person'
g <- 'cat dog ufo'
stringdist(f, g, method='jaccard', q=2)
[1] 0.5625
How would I get it to calculate similarity by the words?