Which tools would you recommend to look into for semantic analysis of text?
Here is my problem: I have a corpus of words (keywords, tags).
I need to process sentences, input by users and find if they are semantically close to words in the corpus that I have.
Any kind of suggestions (books or actual toolkits / APIs) are very welcome.
Regards,
Some useful links to begin with:
- http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-me.html
- http://kmandcomputing.blogspot.com/2008/06/opinion-mining-with-rapidminer-quick.html
- http://rapid-i.com/content/blogcategory/38/69/
- http://www.cs.cornell.edu/People/pabo/movie-review-data/otherexperiments.html
- http://wordnet.princeton.edu/
Tools/Libraries:
If you consider your corpus as an ontology, Apache Stanbol - http://incubator.apache.org/stanbol/ - might be useful. It uses dbpedia as the default ontology while analyzing text. Although it is incubating, enhancer component is good enough foe adoption. So, you can give it a try.
You can try some WordNet similarity measurements. Ted Pedersen has a compilation of those metrics in WordNet::Similarity which you can experiment and look into. There are counterpart implementations in other languages (e.g. Java).