Get specific subdomain from URL in foo.bar.car.com

2020-01-31 07:08发布

问题:

Given a URL as follows:

foo.bar.car.com.au

I need to extract foo.bar.

I came across the following code :

private static string GetSubDomain(Uri url)
{
    if (url.HostNameType == UriHostNameType.Dns)
    {
        string host = url.Host;
        if (host.Split('.').Length > 2)
        {
            int lastIndex = host.LastIndexOf(".");
            int index = host.LastIndexOf(".", lastIndex - 1);
            return host.Substring(0, index);
        }
    }         
    return null;     
}

This gives me like foo.bar.car. I want foo.bar. Should i just use split and take 0 and 1?

But then there is possible wwww.

Is there an easy way for this?

回答1:

Given your requirement (you want the 1st two levels, not including 'www.') I'd approach it something like this:

private static string GetSubDomain(Uri url)
{

    if (url.HostNameType == UriHostNameType.Dns)
    {

        string host = url.Host;

        var nodes = host.Split('.');
        int startNode = 0;
        if(nodes[0] == "www") startNode = 1;

        return string.Format("{0}.{1}", nodes[startNode], nodes[startNode + 1]);

    }

    return null; 
}


回答2:

You can use the following nuget package Nager.PublicSuffix. It uses the PUBLIC SUFFIX LIST from Mozilla to split the domain.

PM> Install-Package Nager.PublicSuffix

Example

 var domainParser = new DomainParser();
 var data = await domainParser.LoadDataAsync();
 var tldRules = domainParser.ParseRules(data);
 domainParser.AddRules(tldRules);

 var domainName = domainParser.Get("sub.test.co.uk");
 //domainName.Domain = "test";
 //domainName.Hostname = "sub.test.co.uk";
 //domainName.RegistrableDomain = "test.co.uk";
 //domainName.SubDomain = "sub";
 //domainName.TLD = "co.uk";


回答3:

I faced a similar problem and, based on the preceding answers, wrote this extension method. Most importantly, it takes a parameter that defines the "root" domain, i.e. whatever the consumer of the method considers to be the root. In the OP's case, the call would be

Uri uri = "foo.bar.car.com.au";
uri.DnsSafeHost.GetSubdomain("car.com.au"); // returns foo.bar
uri.DnsSafeHost.GetSubdomain(); // returns foo.bar.car

Here's the extension method:

/// <summary>Gets the subdomain portion of a url, given a known "root" domain</summary>
public static string GetSubdomain(this string url, string domain = null)
{
  var subdomain = url;
  if(subdomain != null)
  {
    if(domain == null)
    {
      // Since we were not provided with a known domain, assume that second-to-last period divides the subdomain from the domain.
      var nodes = url.Split('.');
      var lastNodeIndex = nodes.Length - 1;
      if(lastNodeIndex > 0)
        domain = nodes[lastNodeIndex-1] + "." + nodes[lastNodeIndex];
    }

    // Verify that what we think is the domain is truly the ending of the hostname... otherwise we're hooped.
    if (!subdomain.EndsWith(domain))
      throw new ArgumentException("Site was not loaded from the expected domain");

    // Quash the domain portion, which should leave us with the subdomain and a trailing dot IF there is a subdomain.
    subdomain = subdomain.Replace(domain, "");
    // Check if we have anything left.  If we don't, there was no subdomain, the request was directly to the root domain:
    if (string.IsNullOrWhiteSpace(subdomain))
      return null;

    // Quash any trailing periods
    subdomain = subdomain.TrimEnd(new[] {'.'});
  }

  return subdomain;
}


回答4:

private static string GetSubDomain(Uri url)
{
    if (url.HostNameType == UriHostNameType.Dns)
    {

        string host = url.Host;   
        String[] subDomains = host.Split('.');
        return subDomains[0] + "." + subDomains[1];
     }
    return null; 
}


回答5:

OK, first. Are you specifically looking in 'com.au', or are these general Internet domain names? Because if it's the latter, there is simply no automatic way to determine how much of the domain is a "site" or "zone" or whatever and how much is an individual "host" or other record within that zone.

If you need to be able to figure that out from an arbitrary domain name, you will want to grab the list of TLDs from the Mozilla Public Suffix project (http://publicsuffix.org) and use their algorithm to find the TLD in your domain name. Then you can assume that the portion you want ends with the last label immediately before the TLD.



回答6:

I would recommend using Regular Expression. The following code snippet should extract what you are looking for...

string input = "foo.bar.car.com.au";
var match = Regex.Match(input, @"^\w*\.\w*\.\w*");
var output = match.Value;


回答7:

In addition to the NuGet Nager.PubilcSuffix package specified in this answer, there is also the NuGet Louw.PublicSuffix package, which according to its GitHub project page is a .Net Core Library that parses Public Suffix, and is based on the Nager.PublicSuffix project, with the following changes:

  • Ported to .NET Core Library.
  • Fixed library so it passes ALL the comprehensive tests.
  • Refactored classes to split functionality into smaller focused classes.
  • Made classes immutable. Thus DomainParser can be used as singleton and is thread safe.
  • Added WebTldRuleProvider and FileTldRuleProvider.
  • Added functionality to know if Rule was a ICANN or Private domain rule.
  • Use async programming model

The page also states that many of above changes were submitted back to original Nager.PublicSuffix project.



标签: c# url