What represents the state-of-the-art in Word Sense Disambiguation (WSD) software? What metrics determine the state-of-the-art, and what toolkits / open source packages are available?
问题:
回答1:
My list are not exhaustive but surely Googling for more will be better for your purposes.
For softwares here's a short list, remember to CITE the relevant sources!!!
GWSD: Unsupervised Graph-based Word Sense Disambiguation http://lit.csci.unt.edu/~rada/downloads/GWSD/GWSD.1.0.tar.gz
SenseLearner: All-Words Word Sense Disambiguation Tool http://lit.csci.unt.edu/~rada/downloads/senselearner/SenseLearner2.0.tar.gz
KYOTO UKB graph-based WSD http://ixa2.si.ehu.es/ukb/
pyWSD: Python Implementation of Simple WSD algorithms https://github.com/alvations/pywsd
WSD tasks are sort of also dependent of the data source so here's a few, remember to CITE them too!!!
Open Mind Word Expert Sense Tagged Data http://teach-computers.org/word-expert.html
TWA Sense Tagged Data http://lit.csci.unt.edu/~rada/downloads/TWA/TWA.tar.gz
SemCor http://lit.csci.unt.edu/~rada/downloads/semcor/semcor1.6.tar.gz
Lastly, WSD tasks are dependent on some preprocessing and if you're looking into state-of-the-art crosslingual WSD, then you should try to look out for word level aligners like
- MOSES
- MGIZA++
- GIZA++
- BerkeleyAligner
Also, look at previous Senseval/SemEval pages to look for what has already been done, and what are the trends that future tasks are moving towards. http://en.wikipedia.org/wiki/SemEval