可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have the following string:
<SEM>electric</SEM> cu <SEM>hello</SEM> rent <SEM>is<I>love</I>, <PARTITION />mind
I want to find the last "SEM" start tag before the "PARTITION" tag. not the SEM end tag but the start tag. The result should be:
<SEM>is <Im>love</Im>, <PARTITION />
I have tried this regular expression:
<SEM>[^<]*<PARTITION[ ]/>
but it only works if the final "SEM" and "PARTITION" tags do not have any other tag between them. Any ideas?
回答1:
And here's your goofy Regex!!!
(?=[\s\S]*?\<PARTITION)(?![\s\S]+?\<SEM\>)\<SEM\>
What that says is "While ahead somewhere is a PARTITION tag... but while ahead is NOT another SEM tag... match a SEM tag."
Enjoy!
Here's that regex broken down:
(?=[\s\S]*?\<PARTITION) means "While ahead somewhere is a PARTITION tag"
(?![\s\S]+?\<SEM\>) means "While ahead somewhere is not a SEM tag"
\<SEM\> means "Match a SEM tag"
回答2:
Use String.IndexOf to find PARTITION and String.LastIndexOf to find SEM?
int partitionIndex = text.IndexOf("<PARTITION");
int emIndex = text.LastIndexOf("<SEM>", partitionIndex);
回答3:
If you are going to use a regex to find the last occurrence of something then you might also want to use the right-to-left parsing regex option:
new Regex("...", RegexOptions.RightToLeft);
回答4:
The solution is this, i have tested in http://regexlib.com/RETester.aspx
<\s*SEM\s*>(?!.*</SEM>.*).*<\s*PARTITION\s*/>
As you want the last one, the only way to identify is to find only the characters that don't contain </SEM>
.
I have included "\s*" in case there are some spaces in <SEM> or <PARTITION/>
.
Basically, what we do is exclude the word </SEM>
with:
(?!.*</SEM>.*)
回答5:
Bit quick-and-dirty, but try this:
(<SEM>.*?</SEM>.*?)*(<SEM>.*?<PARTITION)
and take a look at what's in the C#/.net equivalent of $2
The secret lies in the lazy-matching construct (.*?) --- I assume/hope C# supports this.
Clearly, Jon Skeet's solution will perform better, but you may want to use a regex (to simplify breaking up the bits that interest you, for example).
(Disclaimer: I'm a Perl/Python/Ruby person myself...)
回答6:
Have you tried this:
<EM>.*<PARTITION\s*/>
Your regular expression was matching anything but "<" after the "EM" tag. Therefore it would stop matching when it hit the closing "EM" tag.