Parsing regex with alternatives and optionals

2019-07-13 03:18发布

问题:

I'm building a chatbot subset of RiveScript and trying to build the pattern matching parser with regular expression. Which three regexes match the following three examples?

ex1: I am * years old
valid match:
- "I am 24 years old"
invalid match:
- "I am years old"

ex2: what color is [my|your|his|her] (bright red|blue|green|lemon chiffon) *
valid matches:
- "what color is lemon chiffon car"
- "what color is my some random text till the end of string"

ex3: [*] told me to say *
valid matches:
- "Bob and Alice told me to say hallelujah"
- "told me to say by nobody"

The wildcards mean any text that is not empty is acceptable.

In example 2, anything between [ ] is optional, anything between ( ) is alternative, each option or alternative is separated by a |.

In example 3, the [*] is an optional wildcard, meaning blank text can be accepted.

回答1:

  1. https://regex101.com/r/CuZuMi/4

    I am (?:\d+) years old
    
  2. https://regex101.com/r/CuZuMi/2

    what color is.*(?:my|your|his|her).*(?:bright red|blue|green|lemon chiffon)?.*
    
  3. https://regex101.com/r/CuZuMi/3

    .*told me to say.*
    

I am using mostly 2 things:

  1. (?:) non-capture groups, to group things together like the parenthesis use on math.
  2. .* match any character 0 or more times. Could be replaced by {1,3} to match between 1 and 3 times.

You can exchange * by + to match at least 1 character, instead of 0. And the ? after the non-capture group, makes that group optional.


These are golden place for you to start:

  1. http://www.rexegg.com/regex-quickstart.html
  2. https://regexone.com/
  3. http://www.regular-expressions.info/quickstart.html
  4. Reference - What does this regex mean?