Matching loosely formed urls using regex and php?

I'm trying to identify urls in a set of text. However I would like to be able to identify loosly formed urls such as :

example.com
www.example.com

I'm not very good at regex :(

I found patter below but unfortunately it requires the scheme.

/(([[:alnum:]]+:\/\/)|www\.)([^[:space:]]*)([[:alnum:]#?\/&=])/i

Would it be possible to match a whole string (no spaces) which includes .com or .net or .org etc ?

Thanks

标签： php regex url

3条回答

一夜七次

2楼-- · 2019-07-23 13:23

To only match any string of characters, which do not contain a space and end in ".com", ".net" or ".org":

/[^\s]+\.(?:com|net|org)\b/i

Explanation:

/ = Start of a Regular Expression
[^\s] = Not (^) a whitespace (\s) character
+ = One or more of the preceding set (non-whitespace characters)
\. = A dot. Dots in RegExps are special characters otherwise
(?: ... ) = A group, but not one to be stored
com|net|org = com OR net OR org (You can add more here, separated by "|")
\b = A word boundary - the end of a word
/ = End of the Regular Expression (apart from optional flags)
i = Insensitive to case

Extension of Answer

At the request of the OP, the below is a (rough) RegExp which should match a URL for a domain ending in the specified strings, and with one or more key=value pairs in the query string.

/[^\s]+\.(?:com|net|org)[^\s]+\?[^\s]+=[^\s]+(?:\&?[^\s]+=[^\s]+)*\b/i

/ = Start of a Regular Expression
[^\s]+\.(?:com|net|org) = As before
[^\?]+ = One or more non-questionmark characters (this would be any folder or filenames). Again, the Questionmark has a \ before it to have it treated as a normal character, as, otherwise, it has a special meaning here
\? = A Questionmark
[^\s]+\=[^\s]+ = One or more non-whitespaces, then an equals sign, then one or more non-whitespaces
(?:\&?[^\s]+=[^\s]+)* = None or more sets of an ampersand &, then another one or more non-whitespaces, an equal sign, and one or more non-whitespaces
\b = End of the string
/ = End of the Regular Expression
i = Insensitive to case

NOTE: This does not look for completely valid URLs, nor does it allow for the multitude of Country Codes (like '.com.au' for Australia), or other Top Level Domains (like '.edu', etc.) But, it will match the example string provided, of twitter.com/example?var=true

0人赞添加讨论(0) 举报

冷血范

3楼-- · 2019-07-23 13:28

The risk of false positives is there, but minimal. So you can indeed use something like:

/\b(([-\w]{2,}\.)+(com|net|org|info)|www(\.\w{3,})+\.\w{2,6})\b/i

The first half is for ordinary .com/.net domains, the second matches everything with www. prefix. It's more difficult if you wanted to detect these domain names in addition to full http:// urls.

0人赞添加讨论(0) 举报

We Are One

4楼-- · 2019-07-23 13:28

~(?:https?://)?(?:[-\w]+\.)+[a-z]{2,6}[^\s]*~

Regex@Rubular

0人赞添加讨论(0) 举报

Matching loosely formed urls using regex and php?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间