I'm preparing some table names for an ORM, and I want to turn plural table names into single entity names. My only problem is finding an algorithm that does it reliably. Here's what I'm doing right now:
- If a word ends with -ies, I replace the ending with -y
- If a word ends with -es, I remove this ending. This doesn't always work however - for example, it replaces Types with Typ
- Otherwise, I just remove the trailing -s
Does anyone know of a better algorithm?
Those are all general rules (and good ones) but English is not a language for the faint of heart :-).
My own preference would be to have a transformation engine along with a set of transformations (surprisingly enough) for doing the actual work.
You would run through the transformations (from specific to general) and, when a match was found, apply the transformation to the word.
Regular expressions would be an ideal approach to this due to their expressiveness. An example rule set:
Note that an earlier version of the rules may not have had entry number 4. However, when we found the problem with "types" being transformed to "typ" at 98, we then created a higher-priority transformation at 4 to cater for this.
You'll basically need to keep this transformation table updated as you find all those wondrous exceptions that English has spawned.
The other possibility is to not waste your time with a general rule. Since the names of the tables will be relatively limited, just create another table (or some sort of data structure) called
singulars
which maps all the relevant plural table names (employees
) to singular object names (employee
).Then every time a table is added, add an entry to the singulars "table" so you can singularize it.
Maybe you need this,It works well ,if you know how to use PHP script.It can turn plural words to single words,and turn single words to plural words too.
There is some example.
And forward github link click here.
I think you have to use a list to translate plural into singular for some special words (in your example Types->Type).
I think you could have a look at the sourcecode of CakePHP (you might start your search here). They are using such an algorithm for their tablenames and fieldnames to automagically join tables.
[Edit:] Here you have some scientific work to read about "Plural inflection in English"
As an improvement, you could use rules that generate multiple possibilities and then look up the results in a dictionary to weed out impossible options.
For example replace -ies with -y and -ie. Pies becomes Py and Pie. Only one of those is in the dictionary, so choose that one.
Perhaps you can even find a dictionary with frequency information and select the most common word you generate.
If you combine this with an ordered list of rules that covers a few exceptions, you might get pretty good accuracy.
I'm sure you can google to find plenty of libs that do this.
But if you feel like coding, you could try the reverse process: start with singular words of dictionary (download free ones, used by aspell or whatever), use pluralization rule; collect mappings and switch the direction. For "type" you would pluralize to "types", and reverse mapping would work as expected. While there are exceptions here too it is slightly easier to reliably pluralize things. I did this a while back (in mid 90s... :-) ), for an online game (a MUD), where descriptions for multiple identical items were concatenatd, and automatic pluralization was needed.
Also: given that it's finite number of tables you could just use simplest algorithm, get raw output, eyeball it and fix error cases manually. :-)
There's a nice implementation of an inflector in uNnAddIns project that even implements an experimental spanish inflector. The idea is caught from Rails Inflector module.
It can be used as well for other things like converting from CamelCase to normal text and other goodies and for example generating browser friendly URLs from titles.