可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Out customers can enter websites from domain names. They also can enter mailadresses from their contacts.

Know we need to find customers which websited whoose domain can be associated to the domains of the mailadresses.

So my idea is to extract the host from the webadress and from the url and compare them

So what's the most reliable algorithm to get the hostname from a url?

for example a host can be:

foo.com
www.foo.com
http://foo.com
https://foo.com
https://www.foo.com

The result should always be foo.com

回答1:

Rather than relying on unreliable regex use System.Uri to do the parsing for you. Use a code like this:

string uriStr = "www.foo.com";
if (!uriStr.Contains(Uri.SchemeDelimiter)) {
    uriStr = string.Concat(Uri.UriSchemeHttp, Uri.SchemeDelimiter, uriStr);
}
Uri uri = new Uri(uriStr);
string domain = uri.Host; // will return www.foo.com

Now to get just the top-level domain you can use:

string tld = uri.GetLeftPart( UriPartial.Authority ); // will return foo.com

回答2:

Here's a regular expression that will match the url's you have provided. Basically http and https etc are optional, as is the www Everything is then matched up to a possible path;

var expression = /(https?:\/\/)?(www\.)?([^\/]*)(\/.*)?$/;

This would mean that;

var result = 'https://www.foo.com.vu/blah'.replace(expression, '$3')

Would evaluate to

result === 'foo.com.vu'

回答3:

There is already a url parser in c# for extracting this information

Here are some examples http://www.stev.org/post/2011/06/27/C-HowTo-Parse-a-URL.aspx

回答4:

See this url. The Host property, unlike the Authority will not include the port number.

http://msdn.microsoft.com/en-us/library/system.uri.host(v=vs.110).aspx

Parsing string for Domain / hostName

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

Parsing string for Domain / hostName

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮