I have a regular expression with named capture groups, where the last group is optional. I can't figure out how to iterate the groups and properly deal with the optional group when it's empty; I get an EListOutOfBounds exception.
The regular expression is parsing a file generated by an external system that we receive by email which contains information about checks that have been issued to vendors. The file is pipe-delimited; a sample is in the code below.
program Project1;
{$APPTYPE CONSOLE}
uses
System.SysUtils, System.RegularExpressions, System.RegularExpressionsCore;
{
File format (pipe-delimited):
Check #|Batch|CheckDate|System|Vendor#|VendorName|CheckAmount|Cancelled (if voided - optional)
}
const
CheckFile = '201|3001|12/01/2015|1|001|JOHN SMITH|123.45|'#13 +
'202|3001|12/01/2015|1|002|FRED JONES|234.56|'#13 +
'103|2099|11/15/2015|2|001|JOHN SMITH|97.95|C'#13 ;
var
RegEx: TRegEx;
MatchResult: TMatch;
begin
try
RegEx := TRegEx.Create(
'^(?<Check>\d+)\|'#10 +
' (?<Batch>\d{3,4})\|'#10 +
' (?<ChkDate>\d{2}\/\d{2}\/\d{4})\|'#10 +
' (?<System>[1-3])\|'#10 +
' (?<PayID>[0-9X]+)\|'#10 +
' (?<Payee>[^|]+)\|'#10 +
' (?<Amount>\d+\.\d+)\|'#10 +
'(?<Cancelled>C)?$',
[roIgnorePatternSpace, roMultiLine]);
MatchResult := RegEx.Match(CheckFile);
while MatchResult.Success do
begin
WriteLn('Check: ', MatchResult.Groups['Check'].Value);
WriteLn('Dated: ', MatchResult.Groups['ChkDate'].Value);
WriteLn('Amount: ', MatchResult.Groups['Amount'].Value);
WriteLn('Payee: ', MatchResult.Groups['Payee'].Value);
// Problem is here, where Cancelled is optional and doesn't
// exist (first two lines of sample CheckFile.)
// Raises ERegularExpressionError
// with message 'Index out of bounds (8)' exception.
WriteLn('Cancelled: ', MatchResult.Groups['Cancelled'].Value);
WriteLn('');
MatchResult := MatchResult.NextMatch;
end;
ReadLn;
except
// Regular expression syntax error.
on E: ERegularExpressionError do
Writeln(E.ClassName, ': ', E.Message);
end;
end.
I've tried checking to see if the MatchResult.Groups['Cancelled'].Index
is less than MatchResult.Groups.Count
, tried checking the MatchResult.Groups['Cancelled'].Length > 0
, and checking to see if MatchResult.Groups['Cancelled'].Value <> ''
with no success.
How do I correctly deal with the optional capture group Cancelled when there is no match for that group?
You could also avoid using an optional group and make the cancelled-group obligatory, including either C or nothing. Just change the last line of the regex to
For your test application, this wouldn't change the output. If you need to work further with cancelled you can simply check if it contains C or an empty string.
If the requested named group does not exist in the result, an
ERegularExpressionError
exception is raised. This is by design (though the wording of the exception message is misleading). If you move yourReadLn()
after yourtry/except
block, you would see the exception message in your console window before your process exits. Your code is not waiting for user input when an exception is raised.Since your other groups are not optional, you can simply test if
MatchResult.Groups.Count
is large enough to hold theCancelled
group (the string that was tested is in the group at index 0, so it is included in theCount
):Or:
BTW, your loop is also missing a call to
NextMatch()
, so your code is getting stuck in an endless loop.