Match Optional Components in Any Order

2019-09-15 00:54发布

问题:

I have a URL that can contain any combination of parameters used in filtering results. The two params are type and sort. If type exists in the URL, it has to be either 'article', 'opinion', 'review', or 'media. If sort exists in the URL, it has to be one of the following: date-desc, date-asc, views-desc, views-asc, comments-desc, comments-asc.

Right now, the expression I have only matches URI's with both type and sort. And it doesn't match URL's that contain neither params. I would like the expression to match URL's without any parameters and URL's that have only one parameter. It also has to start with either nintendo, pc, playstation, or xbox.

Here are my example strings:

xbox/type:article/sort:date-desc  (match)
nintendo/type:media/sort:comments-asc  (match)
pc/sort:views-desc  (no match)
playstation/type:opinion/ (no match)
playstation/sort:views-asc (no match)
xbox/sort:views-asc/type:article (no match)
playstation/type:media/sort:views-asc (match)
xbox (no match)

All of the above combinations need to match. Here is my current expression:

(nintendo|pc|playstation|xbox)[\/]((type\:(article|opinion|reivew|media))[\/](sort\:(date-desc|date-asc|views-desc|views-asc|comments-desc|comments-asc)))

Here is the Regex101 link: http://regex101.com/r/eN0tJ5

回答1:

You can suffix any atom with ? to make it optional, so you could end up with something like this:

(nintendo|pc|...)(/type:(article|media|...))?(/sort:(date|views|comments)-(asc|desc))?


回答2:

Capturing Two Optional Groups that Can Occur in Either Order

This is any interesting question because as your samples show, type can happen before sort or vice-versa.

It sounds like you would like:

  1. Match everything
  2. Capture the type, if present
  3. Capture the sort, if present

(If this is not right, let me know so I can tweak the regex.)

Since type and sortcan happen in any order, we'll use lookaheads to capture them:

(?m)^(?=(?:.*type:([^/\s]+))?)(?=(?:.*sort:([^/\s]+))?).*

The type will be captured by Group 1, and the sort will be captured by Group 2.

On the demo, look at the capture groups in the right pane.

Explanation

  • (?m) turns on multi-line mode, allowing ^ and $ to match on each line
  • The ^ anchor asserts that we are at the beginning of the string The(?=(?:.*type:([^/\s]+))?)lookahead allows us to capture the type, if present. It does so by asserting that(?:.type:([^/\s]+))may be found zero or one time. That optional content is any chars., thentype:, then[^/\s]+` any chars that are not a slash or a white-space character, i.e., the type, captured to Group 1 by the parentheses.
  • Likewise, the (?=(?:.*sort:([^/\s]+))?) lookahead allows us to capture the sort, if present.
  • .* matches the whole string as we want it anyway.

Reference

  • Lookahead and Lookbehind Zero-Length Assertions
  • Mastering Lookahead and Lookbehind