I have a URL that can contain any combination of parameters used in filtering results. The two params are type and sort. If type exists in the URL, it has to be either 'article', 'opinion', 'review', or 'media. If sort exists in the URL, it has to be one of the following: date-desc, date-asc, views-desc, views-asc, comments-desc, comments-asc.
Right now, the expression I have only matches URI's with both type and sort. And it doesn't match URL's that contain neither params. I would like the expression to match URL's without any parameters and URL's that have only one parameter. It also has to start with either nintendo, pc, playstation, or xbox.
Here are my example strings:
xbox/type:article/sort:date-desc (match)
nintendo/type:media/sort:comments-asc (match)
pc/sort:views-desc (no match)
playstation/type:opinion/ (no match)
playstation/sort:views-asc (no match)
xbox/sort:views-asc/type:article (no match)
playstation/type:media/sort:views-asc (match)
xbox (no match)
All of the above combinations need to match. Here is my current expression:
(nintendo|pc|playstation|xbox)[\/]((type\:(article|opinion|reivew|media))[\/](sort\:(date-desc|date-asc|views-desc|views-asc|comments-desc|comments-asc)))
Here is the Regex101 link:
http://regex101.com/r/eN0tJ5
You can suffix any atom with ?
to make it optional, so you could end up with something like this:
(nintendo|pc|...)(/type:(article|media|...))?(/sort:(date|views|comments)-(asc|desc))?
Capturing Two Optional Groups that Can Occur in Either Order
This is any interesting question because as your samples show, type
can happen before sort
or vice-versa.
It sounds like you would like:
- Match everything
- Capture the type, if present
- Capture the sort, if present
(If this is not right, let me know so I can tweak the regex.)
Since type
and sort
can happen in any order, we'll use lookaheads to capture them:
(?m)^(?=(?:.*type:([^/\s]+))?)(?=(?:.*sort:([^/\s]+))?).*
The type will be captured by Group 1, and the sort will be captured by Group 2.
On the demo, look at the capture groups in the right pane.
Explanation
(?m)
turns on multi-line mode, allowing ^
and $
to match on each line
- The
^
anchor asserts that we are at the beginning of the string
The
(?=(?:.*type:([^/\s]+))?)lookahead allows us to capture the type, if present. It does so by asserting that
(?:.type:([^/\s]+))may be found zero or one time. That optional content is any chars
., then
type:, then
[^/\s]+` any chars that are not a slash or a white-space character, i.e., the type, captured to Group 1 by the parentheses.
- Likewise, the
(?=(?:.*sort:([^/\s]+))?)
lookahead allows us to capture the sort, if present.
.*
matches the whole string as we want it anyway.
Reference
- Lookahead and Lookbehind Zero-Length Assertions
- Mastering Lookahead and Lookbehind