How to handle TRegEx named capture groups that mig

2019-07-07 17:34发布

问题:

I have a regular expression with named capture groups, where the last group is optional. I can't figure out how to iterate the groups and properly deal with the optional group when it's empty; I get an EListOutOfBounds exception.

The regular expression is parsing a file generated by an external system that we receive by email which contains information about checks that have been issued to vendors. The file is pipe-delimited; a sample is in the code below.

program Project1;

{$APPTYPE CONSOLE}

uses
  System.SysUtils, System.RegularExpressions, System.RegularExpressionsCore;
{
  File format (pipe-delimited): 
   Check #|Batch|CheckDate|System|Vendor#|VendorName|CheckAmount|Cancelled (if voided - optional)
}
const 
  CheckFile = '201|3001|12/01/2015|1|001|JOHN SMITH|123.45|'#13 +
              '202|3001|12/01/2015|1|002|FRED JONES|234.56|'#13 +
              '103|2099|11/15/2015|2|001|JOHN SMITH|97.95|C'#13 ;

var
  RegEx: TRegEx;
  MatchResult: TMatch;
begin
  try
    RegEx := TRegEx.Create(
      '^(?<Check>\d+)\|'#10 +
      '  (?<Batch>\d{3,4})\|'#10 +
      '  (?<ChkDate>\d{2}\/\d{2}\/\d{4})\|'#10 +
      '  (?<System>[1-3])\|'#10 +
      '  (?<PayID>[0-9X]+)\|'#10 +
      '  (?<Payee>[^|]+)\|'#10 +
      '  (?<Amount>\d+\.\d+)\|'#10 +
      '(?<Cancelled>C)?$',
      [roIgnorePatternSpace, roMultiLine]);
    MatchResult := RegEx.Match(CheckFile);
    while MatchResult.Success do
    begin
      WriteLn('Check: ', MatchResult.Groups['Check'].Value);
      WriteLn('Dated: ', MatchResult.Groups['ChkDate'].Value);
      WriteLn('Amount: ', MatchResult.Groups['Amount'].Value);
      WriteLn('Payee: ', MatchResult.Groups['Payee'].Value);
      // Problem is here, where Cancelled is optional and doesn't 
      // exist (first two lines of sample CheckFile.)
      // Raises ERegularExpressionError 
      // with message 'Index out of bounds (8)' exception.
      WriteLn('Cancelled: ', MatchResult.Groups['Cancelled'].Value);
      WriteLn('');
      MatchResult := MatchResult.NextMatch;
    end;
    ReadLn;
  except
    // Regular expression syntax error.
    on E: ERegularExpressionError do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.

I've tried checking to see if the MatchResult.Groups['Cancelled'].Index is less than MatchResult.Groups.Count, tried checking the MatchResult.Groups['Cancelled'].Length > 0, and checking to see if MatchResult.Groups['Cancelled'].Value <> '' with no success.

How do I correctly deal with the optional capture group Cancelled when there is no match for that group?

回答1:

If the requested named group does not exist in the result, an ERegularExpressionError exception is raised. This is by design (though the wording of the exception message is misleading). If you move your ReadLn() after your try/except block, you would see the exception message in your console window before your process exits. Your code is not waiting for user input when an exception is raised.

Since your other groups are not optional, you can simply test if MatchResult.Groups.Count is large enough to hold the Cancelled group (the string that was tested is in the group at index 0, so it is included in the Count):

if MatchResult.Groups.Count > 8 then
  WriteLn('Cancelled: ', Write(MatchResult.Groups['Cancelled'].Value)
else
  WriteLn('Cancelled: ');

Or:

Write('Cancelled: ');
if MatchResult.Groups.Count > 8 then
  Write(MatchResult.Groups['Cancelled'].Value);
WriteLn('');

BTW, your loop is also missing a call to NextMatch(), so your code is getting stuck in an endless loop.

while MatchResult.Success do
begin
  ...
  MatchResult := MatchResult.NextMatch; // <-- add this
end;


回答2:

You could also avoid using an optional group and make the cancelled-group obligatory, including either C or nothing. Just change the last line of the regex to

'(?<Cancelled>C|)$'

For your test application, this wouldn't change the output. If you need to work further with cancelled you can simply check if it contains C or an empty string.

if MatchResult.Groups['Cancelled'].Value = 'C' then
  DoSomething;