I ran Perl::Critic on one of my scripts, and got this message:
Regular expression without "/x" flag at line 21, column 26. See page 236 of PBP.
I looked up the policy information here, and I understand that writing regular expressions in extended mode will help anyone who is looking at the code.
However, I am stuck as how to convert my code to use the /x flag.
CPAN Example:
# Match a single-quoted string efficiently...
m{'[^\\']*(?:\\.[^\\']*)*'}; #Huh?
# Same thing with extended format...
m{
' # an opening single quote
[^\\'] # any non-special chars (i.e. not backslash or single quote)
(?: # then all of...
\\ . # any explicitly backslashed char
[^\\']* # followed by an non-special chars
)* # ...repeated zero or more times
' # a closing single quote
}x;
This makes sense if you only look at the regex.
My Code:
if ($line =~ /^\s*package\s+(\S+);/ ) {
I am not exactly sure how to use an extended regex inside of an if statement. I can write it like this:
if (
$line =~ /
^\s* # starting with zero or more spaces
package
\s+ # at least one space
(\S+) # capture any non-space characters
; # ending in a semi-colon
/x
)
{
And this works, but I think this is almost harder to read than the original. Is there a better way (or a best practice way) to write this? I guess I could create a variable using qr//.
I'm not really looking for advice on re-writing this specific regex (although if I can improve it, I'll take advice) - I'm more looking for advice on how to expand a regex inside of an if statement.
I know Perl::Critic is just a guideline, but it would be nice to follow it.
Thanks in advance!
EDIT: So after receiving a few answers, it became clear to me that making a regex multi-line with comments is not always necessary. People who understand basic regex should be able to understand what my example was doing - the comments I added were maybe a little unnecessary and verbose. I like the idea of using the extended regex flag, but still embedding spaces in the regex to make each part of the regex a little more clear. Thanks for all the input!
Well, I really don't think you should waste vertical screen real estate on this. On the other hand, if I were to write this pattern over several lines, I would have used braces and indented the pattern:
IMHO, the following version is perfectly fine:
in terms of getting the benefit of
m//x
.The comments are completely unnecessary in this case because you are not doing anything tricky. I did add
\s*
before the semi-colon because sometimes people set the semi-colon apart from the package name and that should not throw off your match.Seeing this topic is about alternative ways to write regular expressions, there are ways to write complicated regular expressions without variables, and without comments, and it still be useful.
I reflowed Chas Owens date validating regex to the new declarative form available in Perl-5.10, which has numerous benefits.
It may not be everyones kettle of fish, but for extremely complex things such as date validating it can be handy ( ps: in the real world, please use a module for date stuff, don't DIY , this is just an example to learn from )
Note the addition of the
(?!\d)
snippets, which are added so that"45" wont match
~= m{(?&twentyeightdays) $syntax}
due to '4' matching 0?[4]Seems like this is more a question of how to consistently indent a multiline if condition...to which there are a great many answers. What really matters is consistency. If you use perltidy or some other formatter, be consistent with what it comes up with (with your configuration). I would indent the contents of the regex one level from the delimiters, though.
Your post shows one major flaw in running existing code through something like Perl::Critic -
youthe CPAN example left out a * from the original regex. If you do a lot of "cleanup", you can expect to introduce bugs, so I hope for your sake that you have a good test suite.Never write a comment that says what the code says. Comments should tell you why the code says what it says. Take a look at this monstrosity, without the comments it is very difficult to see what is going on, but the comments make it clear what is trying to be matched:
It is pretty much your call as to the value added by such extra information.
Sometimes you're right, it doesn't add anything to explain what is going on and just makes the code look messy, but for complex regular expressions, the
x
flag can be a boon.Actually, this "making a call" regarding the added value of additional information can be quite difficult.
I cannot remember how many times I've seen legacy code where beautifully formatted comments have not been maintained and so drift away from what the code is doing. In fact, when I was a lot less-experienced, I went completely up the wrong path because a comment associated with a piece of code was old and hadn't been maintained.
Edit: In some ways, the CPAN example is not really that useful. When using the x flag to add comments to describe a complex regexp, I tend to describe the components that the regexp is trying to match rather than just describe the regexp "bits" themselves. For example, I'd write things like:
which tells me more than
My feeling would be to leave the regexp comments out in this case. Your gut feeling is right!