I have the following string:
<SEM>electric</SEM> cu <SEM>hello</SEM> rent <SEM>is<I>love</I>, <PARTITION />mind
I want to find the last "SEM" start tag before the "PARTITION" tag. not the SEM end tag but the start tag. The result should be:
<SEM>is <Im>love</Im>, <PARTITION />
I have tried this regular expression:
<SEM>[^<]*<PARTITION[ ]/>
but it only works if the final "SEM" and "PARTITION" tags do not have any other tag between them. Any ideas?
Have you tried this:
Your regular expression was matching anything but "<" after the "EM" tag. Therefore it would stop matching when it hit the closing "EM" tag.
Use String.IndexOf to find PARTITION and String.LastIndexOf to find SEM?
And here's your goofy Regex!!!
What that says is "While ahead somewhere is a PARTITION tag... but while ahead is NOT another SEM tag... match a SEM tag."
Enjoy!
Here's that regex broken down:
If you are going to use a regex to find the last occurrence of something then you might also want to use the right-to-left parsing regex option:
Bit quick-and-dirty, but try this:
and take a look at what's in the C#/.net equivalent of $2
The secret lies in the lazy-matching construct (.*?) --- I assume/hope C# supports this.
Clearly, Jon Skeet's solution will perform better, but you may want to use a regex (to simplify breaking up the bits that interest you, for example).
(Disclaimer: I'm a Perl/Python/Ruby person myself...)
The solution is this, i have tested in http://regexlib.com/RETester.aspx
As you want the last one, the only way to identify is to find only the characters that don't contain
</SEM>
.I have included "\s*" in case there are some spaces in
<SEM> or <PARTITION/>
.Basically, what we do is exclude the word
</SEM>
with: