Given a word, which may or may not be a singular-form noun, how would you generate its plural form?
Based on this NLTK tutorial and this informal list on pluralization rules, I wrote this simple function:
def plural(word):
"""
Converts a word to its plural form.
"""
if word in c.PLURALE_TANTUMS:
# defective nouns, fish, deer, etc
return word
elif word in c.IRREGULAR_NOUNS:
# foot->feet, person->people, etc
return c.IRREGULAR_NOUNS[word]
elif word.endswith('fe'):
# wolf -> wolves
return word[:-2] + 'ves'
elif word.endswith('f'):
# knife -> knives
return word[:-1] + 'ves'
elif word.endswith('o'):
# potato -> potatoes
return word + 'es'
elif word.endswith('us'):
# cactus -> cacti
return word[:-2] + 'i'
elif word.endswith('on'):
# criterion -> criteria
return word[:-2] + 'a'
elif word.endswith('y'):
# community -> communities
return word[:-1] + 'ies'
elif word[-1] in 'sx' or word[-2:] in ['sh', 'ch']:
return word + 'es'
elif word.endswith('an'):
return word[:-2] + 'en'
else:
return word + 's'
But I think this is incomplete. Is there a better way to do this?
Another option which supports python 3 is Inflect.
First, it's worth noting that, as the FAQ explains, WordNet cannot generate plural forms.
If you want to use it anyway, you can. With Morphy, WordNet might be able to generate plurals for many nouns… but it still won't help with most irregular nouns, like "children".
Anyway, the easy way to use WordNet from Python is via NLTK. One of the NLTK HOWTO docs explains the WordNet Interface. (Of course it's even easier to just use NLTK without specifying a corpus, but that's not what you asked for.)
There is a lower-level API to WordNet called
pywordnet
, but I believe it's no longer maintained (it became the foundation for the NLTK integration), and only works with older versions of Python (maybe 2.7, but not 3.x) and of WordNet (only 2.x).Alternatively, you can always access the C API by using
ctypes
orcffi
or building custom bindings, or access the Java API by using Jython instead of CPython.Or, of course, you can call the command-line interface via
subprocess
.Anyway, at least on some installations, if you give the simple Morphy interface a singular noun, it will return its plural, while if you give it a plural noun, it will return its singular. So:
This isn't actually documented, or even implied, to be true, and in fact it's clearly not true for the OP, so I'm not sure I'd want to rely on it (even if it happens to work on your computer).
The other way around is documented to work, so you could write some rules that apply all possible English plural rules, call
morphy
on each one, and the first one that returns the starting string is the right plural.However, the way it's documented to work is effectively by blindly applying the same kind of rules. So, for example, it will properly tell you that
doges
is not the plural ofdog
—but not because it knowsdogs
is the right answer; only because it knowsdoge
is a different word, and it likes the "+s" rule more than the "+es" rule. So, this isn't going to be helpful.Also, as explained above, it has no rules for any irregular plurals—WordNet has no idea that
children
andchild
are related in any way.Also,
wn.morphy('reckless')
will return'reckless'
rather thanNone
. If you want that, you'll have to test whether it's a noun first. You can do this just sticking with the same interface, although it's a bit hacky:To do this properly, you will actually need to add a plurals database instead of trying to trick WordNet into doing something it can't do.
Also, a word can have multiple meanings, and they can have different plurals, and sometimes there are even multiple plurals for the same meaning. So you probably want to start with something like
(lemma for s in synsets(word, wn.NOUN) for lemma in s.lemmas if lemma.name == word)
and then get all appropriate plurals, instead of just returning "the" plural.The pattern-en package (for python 2.5+, but not python 3 yet) offers pluralization