Get only URI segments from URL

2019-03-04 18:22发布

问题:

I am trying to get the URI segments using regular expression.

Example URI:

http://abc.com/hello/hi/bye?humm/ok=hi&ya=yaya/wow/waaah
               ^^^^^ ^^ ^^^                    ^^^ ^^^^^

I am trying:

/(?<=\/)[\w-]+(?=(\/|$|\r|\?))/g

But it's not working properly. The query string is not getting excluded (wow/waaah).

So, when I tried the following, everything got excluded:

/(?<!?.+)(?<=\/)[\w-]+(?=(\/|$|\r|\?))/g

What's wrong with this?

回答1:

You forgot to escape the second ? in the second regex. It should read:

/(?<!\?.+)(?<=\/)[\w-]+(?=(\/|$|\r|\?))/g

Note: You could improve the regex by using character classes like so:

/(?<!\?.+)(?<=\/)[\w-]+(?=[/\r\n?]|$)/g

EDIT:

For a lowest common denominator solution to cater for all the different flavours of regex, you need a two step process:

  • Remove the trailing ? and all following chars (if it exists):


    ^[^/]+//[^/]+([^?]+)

    Keep the string returned in capture group 1.

  • Extract the URI segments by looping through:


    /([\w-]+)

    The segments are returned in capture group 1.



标签: regex url