How will I be able to look for kewords that are not inside a string.
For example if I have the text:
Hello this text is an example.
bla bla bla "this text is inside a string"
"random string" more text bla bla bla "foo"
I will like to be able to match all the words text
that are not inside " "
. In other I will like to match:
note I do not want to match the text that is highlighted on red because it is inside a string
Possible solution:
I been working on it and this is what I have so far:
(?s)((?<q>")|text)(?(q).*?"|)
note that regex uses the if statement as: (?(predicate) true alternative|false alternative)
so the regex will read:
find " or text. If you find " then continue selecting until you find " again (.*?") if you find text then do nothing...
when I run that regex I match the whole string though. I am asking this question for purposes of learning. I know I can remove all strings then look for what I need.
I have used these answers a lot of times till now and want to share alternative approach of fixing this, as sometimes I was not able to implement and use the given answers.
Instead of matching keywords out of something, break the tasks to two sub tasks:
For example, to replace the text in quotes I use:
or more clear:
'.*?(?<!\\)'
.I know that this may looks like double work and have performance impact on some platforms/languages, so everyone need to test this, too.
Here is one answer:
This means:
You can easily extend this to handle strings containing escapes as well.
In C# code:
Added from comment discussion - extended version (match on a per-line basis and handle escapes). Use
RegexOptions.Multiline
for this:In a C# string this looks like:
Since you now want to use
**
instead of"
here is a version for that:Explanation:
Since this version doesn't contain
"
characters it's cleaner to use a literal string:I would simply greedily match the text's in quotes within a non-capturing group to filter them out and then use a capturing group for the non-quoted answer, like this:
which you might want to refine a little for word-boundaries etc. But this should get you where you wanna go, and be a clear readable sample.
This can get pretty tricky, but here is one potential method that works by making sure that there is an even number of quotation marks between the matching text and the end of the string:
Replace
text
with the regex that you want to match.Rubular: http://www.rubular.com/r/cut5SeWxyK
Explanation: