Consider the requirement to find a matched pair of set of characters, and remove any characters between them, as well as those characters/delimiters.
Here are the sets of delimiters:
[] square brackets
() parentheses
"" double quotes
'' single quotes
Here are some examples of strings that should match:
Given: Results In:
-------------------------------------------
Hello "some" World Hello World
Give [Me Some] Purple Give Purple
Have Fifteen (Lunch Today) Have Fifteen
Have 'a good'day Have day
And some examples of strings that should not match:
Does Not Match:
------------------
Hello "world
Brown]co[w
Cheese'factory
If the given string doesn't contain a matching set of delimiters, it isn't modified. The input string may have many matching pairs of delimiters. If a set of 2 delimiters are overlapping (i.e. he[llo "worl]d"
), that'd be an edge case that we can ignore here.
The algorithm would look something like this:
string myInput = "Give [Me Some] Purple (And More) Elephants";
string pattern; //some pattern
string output = Regex.Replace(myInput, pattern, string.Empty);
Question: How would you achieve this with C#? I am leaning towards a regex.
Bonus: Are there easy ways of matching those start and end delimiters in constants or in a list of some kind? The solution I am looking for would be easy to change the delimiters in case the business analysts come up with new sets of delimiters.
I have to add the old adage, "You have a problem and you want to use regular expressions. Now you have two problems."
I've come up with a quick regex that will hopefully help you in the direction you are looking:
The parenthesis, brackets, double quotes are escaped while the single quote is able to be left alone.
To put the above expression into English, I'm allowing for any number of characters before and any number after, matching the expression in between matching delimiters.
The open delimiter phrase is
(\(|\[|\"|')
This has a matching closing phrase. To make this a bit more extensible in the future, you could remove the actual delimiters and contain them in a config file, database or wherever you may choose.Use the following Regex
What this regex does is it replaces any occurences of {word} with the modifiedWord you want to replace it with.
Some sample c# code:
In a sentence such as
It will replace only {Silverlight} and not starting from first { bracket to the last } bracket.
A simple way would be to do this:
Changing the return statement to the following will avoid duplicate empty spaces:
The final result for this would be:
Disclamer: A single regex would probably faster than this.
Building on Bryan Menard's regular expression, I made an extension method which will also work for nested replacements like "[Test 1 [[Test2] Test3]] Hello World":
Usage of this method would in the suggested case look like this:
Returning the string "Hello World".
Simple regex would be:
As for doing it a custom way where you want to build up the regex you would just need to build up the parts:
Then have each individual regex part concatenated with an OR (the | in regex) as in my original example. Once you have your regex string built just run it once. The key is to get the regex into a single check because performing a many regex matches on one item and then iterating through a lot of items will probably see a significant decrease in performance.
In my first example that would take the place of the following line:
I am sure someone will post a cool linq expression to build the regex based on an array of delimiter objects to match or something.