C#正则表达式来从给定字符串提取URL - 不完整的HTML网址,但裸露的链接,以及C#正则表达式

2019-05-14 01:54发布

我需要一个正则表达式,将执行以下操作

Extract all strings which starts with http://
Extract all strings which starts with www.

所以我需要提取这些2。

比如有低于这个给定的字符串文本

house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue

所以从上面的字符串给定的我会得到

    www.monstermmorpg.com
http://www.monstermmorpg.com
http://www.monstermmorpg.commerged

寻找正则表达式或另一种方式。 谢谢。

C#4.0

Answer 1:

你可以写一些非常简单的正则表达式来处理这个问题,或通过更传统的字符串分割+ LINQ方法去。

正则表达式

var linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
var rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
foreach(Match m in linkParser.Matches(rawString))
    MessageBox.Show(m.Value);

解释模式:

\b       -matches a word boundary (spaces, periods..etc)
(?:      -define the beginning of a group, the ?: specifies not to capture the data within this group.
https?://  - Match http or https (the '?' after the "s" makes it optional)
|        -OR
www\.    -literal string, match www. (the \. means a literal ".")
)        -end group
\S+      -match a series of non-whitespace characters.
\b       -match the closing word boundary.

基本模式查找以开头的字符串http:// OR https:// OR www. (?:https?://|www\.) http:// OR https:// OR www. (?:https?://|www\.)然后将所有的字符匹配到下一个空格。

传统的字符串选项

var rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://"));
foreach (string s in links)
    MessageBox.Show(s);


文章来源: C# regex pattern to extract urls from given string - not full html urls but bare links as well