I have some complex regular expressions which I need to comment for readability and maintenance. The Java spec is rather terse and I struggled for a long time getting this working. I finally caught my bug and will post it as an answer but I'd be grateful for any other advice on maintaining regexes
As an example I want to comment the subcomponents (of patternS) in a simple name parser:
String testTarget = "Waldorf T. Flywheel";
String patternS = "([A-Za-z]+)\\s+([A-Z]\\.)?\\s+([A-Za-z]+)";
Pattern pattern = Pattern.compile(patternS, Pattern.COMMENTS);
Assert.assertTrue(pattern.matcher(testTarget).matches());
EDIT: I would be grateful for examples of the (?x) format as well.
EDIT: @geowa4 has a good suggestion which avoids embedded comments. Sinnce java and others have provided for embedded comments what are the cases where they are useful? (I think I have a case but I'd be interested to see others).
EDIT: As noted below @mikej the regex does not support the optional initial well and would be better as:
String patternS = "([A-Za-z]+)\\s+([A-Z]\\.\\s+)?([A-Za-z]+)";
but that would end up extracting space in the initial
See the post by Martin Fowler on ComposedRegex for some more ideas on improving regexp readability. In summary, he advocates breaking down a complex regexp into smaller parts which can be given meaningful variable names. e.g.
I found the following worked:
The key thing was to include the newline character \n explicitly in the string
Why don't you just do this:
CONTINUATION:
If you want to keep the comments with the pattern and you need to read it in from a properties file, use this: