I would like to apply lemmatization to reduce the inflectional forms of words. I know that for English language WordNet provides such a functionality, but I am also interested in applying lemmatization for Dutch, French, Spanish and Italian words. Is there any trustworthy and confirmed way to go about this? Thank you!
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
Try
pattern
library from CLIPS, they have support for German, English, Spanish, French and Italian. Just what you needed: http://www.clips.ua.ac.be/patternUnfortunately it only works with Python 2, no support for Python3 provided yet.
The textacy library http://textacy.readthedocs.io/en/latest/api_reference.html provides the essential tools for building a bag of words or bag of terms with lemmatization included as part of the options on it. I've tried it with Spanish and works quite OK.
The library automatically checks the language you're writing in and lemmatize according to it. However, you can also specify it here.
You'll get an output as the following {'perro': 1, 'y': 1, 'gato': 1, 'jugar': 1, 'casar': 1, 'Los': 1, 'patio': 1}
The library recognizes well some of the words, however, the lemmas were not perfectly recognized. Hope this helps.