I'm looking to use regex in C# to search for terms and I want to include the plurals of those terms in the search. For example if the user wants to search for 'pipe' then I want to return results for 'pipes' as well.
So I can do this...
string s ="\\b" + term + "s*\\b";
if (Regex.IsMatch(bigtext, s) { /* do stuff */ }
How would I modify the above to allow me to match, say, 'stresses' when the user enters 'stress' and still work for 'pipe'/'pipes'?
If you are using SQL server as your backend couldn't you utilize Soundex? I am unsure what you are trying to search for. I assume you are trying to create dynamic SQL as search input. If not I think there is SoundEx for LINQ.
EDIT: I stand corrected, it appears there is some linq to sql entity stuff that can be done for SoundEx.
However, MSDN does have a soundex example, which for the simple tests I ran this morning seems to do fine as far as what I tested. http://msdn.microsoft.com/en-us/library/bb669073.aspx
The change I made was instead of .ToUpper(invariant) i used .ToUpperInvariant() and instead of passing (string word) i used an extension method (this string word)
Here is an example of what I ran
data : dogs, dog, doggie
Now with SQL server, using the Contains/FreeText/ContainsTable etc and using SoundEx against a catalog (I am not familiar with the newer versions of SQL server - going back to SQLServer 2000 implementation I used), you could also rank your results.
Also if you have the ability to use sql server you may want to look into this option: LINQ to SQL SOUNDEX - possible?
The concern with the Pluralization solution, you must be able to utilize .Net 4.
There is also the Levenshtein distance algorithm that may be useful.
The problem you can face is that there are a lot of irregular nouns such as
man
,fish
andindex
. So you should consider using thePluralizationService
that has aPluralize
method. Here is an example that shows how to use it.After you get the plural of the term, you can easily construct a regex that searches for both the plural or the singular term.
Here's a regex created to remove the plurals:
(Demo & source)
I know it's not exactly what you need, but it may help you find something out.