问题:

Which tools would you recommend to look into for semantic analysis of text?

Here is my problem: I have a corpus of words (keywords, tags).

I need to process sentences, input by users and find if they are semantically close to words in the corpus that I have.

Any kind of suggestions (books or actual toolkits / APIs) are very welcome.

Regards,

回答1:

Some useful links to begin with:

http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-me.html
http://kmandcomputing.blogspot.com/2008/06/opinion-mining-with-rapidminer-quick.html
http://rapid-i.com/content/blogcategory/38/69/
http://www.cs.cornell.edu/People/pabo/movie-review-data/otherexperiments.html
http://wordnet.princeton.edu/

Tools/Libraries:

Open NLP
lingpipe

回答2:

If you consider your corpus as an ontology, Apache Stanbol - http://incubator.apache.org/stanbol/ - might be useful. It uses dbpedia as the default ontology while analyzing text. Although it is incubating, enhancer component is good enough foe adoption. So, you can give it a try.

回答3:

You can try some WordNet similarity measurements. Ted Pedersen has a compilation of those metrics in WordNet::Similarity which you can experiment and look into. There are counterpart implementations in other languages (e.g. Java).