Is there an algorithm that tells the semantic simi

2019-01-05 07:45发布

input: phrase 1, phrase 2

output: semantic similarity value (between 0 and 1), or the probability these two phrases are talking about the same thing

11条回答
贼婆χ
2楼-- · 2019-01-05 08:40

This requires your algorithm actually knows what your talking about. It can be done in some rudimentary form by just comparing words and looking for synonyms etc, but any sort of accurate result would require some form of intelligence.

查看更多
forever°为你锁心
3楼-- · 2019-01-05 08:41

For anyone just coming at this, i would suggest taking a look at SEMILAR - http://www.semanticsimilarity.org/ . They implement a lot of the modern research methods for calculating word and sentence similarity. It is written in Java.

SEMILAR API comes with various similarity methods based on Wordnet, Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), BLEU, Meteor, Pointwise Mutual Information (PMI), Dependency based methods, optimized methods based on Quadratic Assignment, etc. And the similarity methods work in different granularities - word to word, sentence to sentence, or bigger texts.

查看更多
贼婆χ
4楼-- · 2019-01-05 08:42

One simple solution is to use the dot product of character n-gram vectors. This is robust over ordering changes (which many edit distance metrics are not) and captures many issues around stemming. It also prevents the AI-complete problem of full semantic understanding.

To compute the n-gram vector, just pick a value of n (say, 3), and hash every 3-word sequence in the phrase into a vector. Normalize the vector to unit length, then take the dot product of different vectors to detect similarity.

This approach has been described in J. Mitchell and M. Lapata, “Composition in Distributional Models of Semantics,” Cognitive Science, vol. 34, no. 8, pp. 1388–1429, Nov. 2010., DOI 10.1111/j.1551-6709.2010.01106.x

查看更多
男人必须洒脱
5楼-- · 2019-01-05 08:47

I would look into latent semantic indexing for this. I believe you can create something similar to a vector space search index but with semantically related terms being closer together i.e. having a smaller angle between them. If I learn more I will post here.

查看更多
唯我独甜
6楼-- · 2019-01-05 08:47

Sorry to dig up a 6 year old question, but as I just came across this post today, I'll throw in an answer in case anyone else is looking for something similar.

cortical.io has developed a process for calculating the semantic similarity of two expressions and they have a demo of it up on their website. They offer a free API providing access to the functionality, so you can use it in your own application without having to implement the algorithm yourself.

查看更多
登录 后发表回答