I'm doing some regular expression gymnastics. I set myself the task of trying to search for C# code where there is a usage of the as-operator not followed by a null-check within a reasonable amount of space. Now I don't want to parse the C# code. E.g. I want to capture code snippets such as
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
however, not capture
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1 == null)
nor for that matter
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(somethingunrelated == null) {...}
if(x1.a == y1.a)
Thus any random null-check will count as a "good check" and hence not found.
The question is: How do I match something while ensuring something else is not found in its sourroundings.
I've tried the naive approach, looking for 'as' then doing a negative lookahead within a 150 characters.
\bas\b.{1,150}(?!\b==\s*null\b)
The above regular expression matches all of the above examples infortunately. My gut tells me, the problem is that the looking ahead and then doing negative lookahead can find many situations where the lookahead does not find the '== null'.
If I try negating the whole expression, then that doesn't help either, at that would match most C# code around.
Let me try to redefine your problem:
if (... == null)
within 150 characters, don't matchif (... == null)
within 150 characters, matchYour expression
\bas\b.{1,150}(?!\b==\s*null\b)
won't work because of the negative look-ahead. The regex can always skip ahead or behind one letter in order to avoid this negative look-ahead and you end up matching even when there is anif (... == null)
there.Regex's are really not good at not matching something. In this case, you're better of trying to match an "as" assignment with an "if == null" check within 150 characters:
and then negating the check:
if (!regex.match(text)) ...
I'm activating the SingleLine option with
?s:
. You can put it in the options of your Regex if you want. I'll add that I'm putting\s
aroundas
because I think that only spaces are "legal" around theas
. You can probably put the\b
likeBe aware that
\s
will probably catch spaces that aren't "valid spaces". It's defined as[\f\n\r\t\v\x85\p{Z}]
where\p{Z}
is Unicode Characters in the 'Separator, Space' Category plus Unicode Characters in the 'Separator, Line' Category plus Unicode Characters in the 'Separator, Paragraph' Category.The question isn't clear. What do you want EXACTLY ? I regret, but I still don't understand, after having read the question and comments numerous times.
.
Must the code be in C# ? In Python ? Other ? There is no indication concerning this point
.
Do you want a matching only if a
if(... == ...)
line follows a block ofvar ... = ...
lines ?Or may an heterogenous line be BETWEEN the block and the
if(... == ...)
line without stopping the matching ?My code takes the second option as true.
.
Does a
if(... == null)
line AFTER aif(... == ...)
line stop the matchin or not ?Unable to understand if it is yes or no, I defined the two regexes to catch these two options.
.
I hope my code will be clear enough and answering to your preoccupation.
It is in Python
Result
I think it would help to put the variable name into () so you can use it as a back reference. Something like the following,
Put the
.{1,150}
inside the lookahead, and replace.
with\s\S
(in general,.
doesn't match newlines). Also, the\b
might be misleading near the==
.I love regex gymnastics! Here is a commented PHP regex:
And here it is in Javascript style:
This one did make my head hurt a little...
Here is the test data I am using: