Regex to catch these patterns recursively

2019-09-13 01:20发布

问题:

    java.util.regex.Pattern ips
        = java.util.regex.Pattern.compile("(\\d{1,3}(?:\\.\\d{1,3}){2}\\.(\\d{1,3}))(?:(?:-|\\s+to\\s+)(\\d{1,3}(?![\\d\\.]))|(?:-|\\s*to\\s+)(\\d{1,3}(?:\\.\\d{1,3}){3})|\\s+(25\\d(?:\\.\\d{1,3}){3})|\\s*\\/(\\d{1,3}))?");

Currently my Regex will accept the following types of IP address input, but only one input type at a time:

  • ip: "47.1.2.3"
  • range: "47.1.2.3-4"
  • ip range: "47.1.2.3-47.1.2.4"
  • ip to range: "47.1.2.3 to 4"
  • ip to ip range: "47.1.2.3 to 47.1.2.4"
  • ip CIDR: "47.1.2.4/32"
  • ip Mask: "47.1.2.4 255.255.255.255"

I would like to modify my regex to accept combinations of these separated by a comma or space. Ideally the regex would have named capture groups as listed above to make handling easier.

I want the following to also be a valid input, but I want to be able to pull out the matches described above with named groups.

"47.1.2.3 to 4, 47.1.2.7, 47.1.3.9-47.1.3.19"

I'm attempting to use the regex to verify input into a text field. The following code is the textfield:

public class HostCollectionTextField extends JFormattedTextField implements CellEditor, MouseListener {

ArrayList listeners = new ArrayList();
HostCollection hc;
java.util.regex.Pattern ips
        = java.util.regex.Pattern.compile("(\\d{1,3}(?:\\.\\d{1,3}){2}\\.(\\d{1,3}))(?:(?:-|\\s+to\\s+)(\\d{1,3}(?![\\d\\.]))|(?:-|\\s*to\\s+)(\\d{1,3}(?:\\.\\d{1,3}){3})|\\s+(25\\d(?:\\.\\d{1,3}){3})|\\s*\\/(\\d{1,3}))?");

public HostCollectionTextField() {
    this.addMouseListener(this);
    this.hc = new HostCollection();

    this.setFormatterFactory(new AbstractFormatterFactory() {

        @Override
        public JFormattedTextField.AbstractFormatter getFormatter(JFormattedTextField tf) {
            RegexFormatter f = new RegexFormatter(ips);
            return f;
        }
    });
    this.getDocument().addDocumentListener(new DocListener(this));
    addActionListener(new ActionListener() {
        @Override
        public void actionPerformed(ActionEvent ae) {
            if (stopCellEditing()) {
                fireEditingStopped();
            }
        }
    });

}

//class methods.... }

This is the RegexFormatter Class:

public class RegexFormatter extends DefaultFormatter {

protected java.util.regex.Matcher matcher;

public RegexFormatter(java.util.regex.Pattern regex) {
    setOverwriteMode(false);
    matcher = regex.matcher(""); // create a Matcher for the regular expression
}

public Object stringToValue(String string) throws java.text.ParseException {
    if (string == null) {
        return null;
    }
    matcher.reset(string); // set 'string' as the matcher's input

    if (!matcher.matches()) // Does 'string' match the regular expression?
    {
        throw new java.text.ParseException("does not match regex", 0);
    }

    // If we get this far, then it did match.
    return super.stringToValue(string); // will honor the 'valueClass' property
}

}

回答1:

The ip parts are pretty unique, there should be no problem with
overlapping parts during a match using whitespace and/or comma as separator.

You probably need two versions of the same regex.
One to validate, one to extract.

The one to extract is just your original regex used in a global match.
This is used after a validation.

The validation one is below. It matches multiple ip parts at once using
the anchors ^$ with the original quantified regex embedded between using
the required separator [\s,]+.

Not sure if this will work for your validation code, but if entering a single ip part now, works, then this should.

Validation regex:

"^(?:\\d{1,3}(?:\\.\\d{1,3}){2}\\.\\d{1,3}(?:(?:-|\\s+to\\s+)\\d{1,3}(?![\\d\\.])|(?:-|\\s*to\\s+)\\d{1,3}(?:\\.\\d{1,3}){3}|\\s+25\\d(?:\\.\\d{1,3}){3}|\\s*\\/\\d{1,3})?(?:[\\s,]*$|[\\s,]+))+$"

Formatted:

 ^     
 (?:
      \d{1,3} 
      (?: \. \d{1,3} ){2}
      \.
      \d{1,3} 
      (?:
           (?: - | \s+ to \s+ )
           \d{1,3} 
           (?! [\d\.] )
        |  
           (?: - | \s* to \s+ )
           \d{1,3} 
           (?: \. \d{1,3} ){3}
        |  
           \s+ 
           25 \d 
           (?: \. \d{1,3} ){3}
        |  
           \s* \/
           \d{1,3} 
      )?

      (?:
           [\s,]* $ 
        |  
           [\s,]+  
      )
 )+
 $  

edit: add group names to extraction regex.

 # "(?<IP>\\d{1,3}(?:\\.\\d{1,3}){2}\\.(?<From_Seg>\\d{1,3}))(?:(?:-|\\s+to\\s+)(?<To_Seg>\\d{1,3}(?![\\d\\.]))|(?:-|\\s*to\\s+)(?<To_Range>\\d{1,3}(?:\\.\\d{1,3}){3})|\\s+(?<Mask>25\\d(?:\\.\\d{1,3}){3})|\\s*/(?<Port>\\d{1,3}))?"

 (?<IP>                        # (1), IP
      \d{1,3} 
      (?: \. \d{1,3} ){2}
      \.
      (?<From_Seg> \d{1,3} )        # (2), From segment
 )
 (?:
      (?: - | \s+ to \s+ )
      (?<To_Seg>                    # (3), Dash/To segment
           \d{1,3} 
           (?! [\d\.] )
      )
   |  
      (?: - | \s* to \s+ )
      (?<To_Range>                  # (4), Dash/To range
           \d{1,3} 
           (?: \. \d{1,3} ){3}
      )
   |  
      \s+     
      (?<Mask>                      # (5), Mask
           25 \d 
           (?: \. \d{1,3} ){3}
      )
   |  
      \s* /     
      (?<Port>                      # (6), Port
           \d{1,3} 
      )
 )?