I'm familiar with word stemming and completion from the tm package in R.
I'm trying to come up with a quick and dirty method for finding all variants of a given word (within some corpus.) For example, I'd like to get "leukocytes" and "leuckocytic" if my input is "leukocyte".
If I had to do it right now, I would probably just go with something like:
library(tm)
library(RWeka)
dictionary <- unique(unlist(lapply(crude, words)))
grep(pattern = LovinsStemmer("company"),
ignore.case = T, x = dictionary, value = T)
I used Lovins because Snowball's Porter doesn't seem to be aggressive enough.
I'm open to suggestions for other stemmers, scripting languages (Python?), or entirely different approaches.