I'm processing some English texts in a Java application, and I need to stem them. For example, from the text "amenities/amenity" I need to get "amenit".
The function looks like:
String stemTerm(String term){
...
}
I've found the Lucene Analyzer, but it looks way too complicated for what I need. http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/analysis/PorterStemFilter.html
Is there a way to use it to stem words without building an Analyzer? I don't understand all the Analyzer business...
EDIT: I actually need a stemming + lemmatization. Can Lucene do this?
Why aren't you using the "EnglishAnalyzer"? It's simple to use it and I think it'd solve your problem:
Hope it helps you!
The previous example applies stemming to a search query, so if you are interesting to stem a full text you can try the following:
The TermAttribute class has been deprecated and will not longer be supported in Lucene 4, but the documentation is not clear on what to use at its place.
Also in the first example the PorterStemmer is not available as a class (hidden) so you cannot use it directly.
Hope this helps.
SnowballAnalyzer is deprecated, you can use Lucene Porter Stemmer instead:
Hope this help!
Here is how you can use Snowball Stemmer in JAVA:
See here for more details. If stemming is all you want to do, then you should use this instead of Lucene.
Edit: You should lowercase
term
before passing it tostem()
.Ling pipe provides a number of tokenizers . They can be used for stemming and stop word removal . Its a simple and a effective means of stemming.