What is the fastest way to get the domain/host nam

2019-02-02 13:24发布

I need to go through a large list of string url's and extract the domain name from them.

For example:

http://www.stackoverflow.com/questions would extract www.stackoverflow.com

I originally was using new URL(theUrlString).getHost() but the URL object initialization adds a lot of time to the process and seems unneeded.

Is there a faster method to extract the host name that would be as reliable?

Thanks

Edit: My mistake, yes the www. would be included in domain name example above. Also, these urls may be http or https

标签: java url dns
7条回答
迷人小祖宗
2楼-- · 2019-02-02 14:16

If you want to handle https etc, I suggest you do something like this:

int slashslash = url.indexOf("//") + 2;
domain = url.substring(slashslash, url.indexOf('/', slashslash));

Note that this is includes the www part (just as URL.getHost() would do) which is actually part of the domain name.

Edit Requested via comments

Here are two methods that might be helpful:

/**
 * Will take a url such as http://www.stackoverflow.com and return www.stackoverflow.com
 * 
 * @param url
 * @return
 */
public static String getHost(String url){
    if(url == null || url.length() == 0)
        return "";

    int doubleslash = url.indexOf("//");
    if(doubleslash == -1)
        doubleslash = 0;
    else
        doubleslash += 2;

    int end = url.indexOf('/', doubleslash);
    end = end >= 0 ? end : url.length();

    int port = url.indexOf(':', doubleslash);
    end = (port > 0 && port < end) ? port : end;

    return url.substring(doubleslash, end);
}


/**  Based on : http://grepcode.com/file/repository.grepcode.com/java/ext/com.google.android/android/2.3.3_r1/android/webkit/CookieManager.java#CookieManager.getBaseDomain%28java.lang.String%29
 * Get the base domain for a given host or url. E.g. mail.google.com will return google.com
 * @param host 
 * @return 
 */
public static String getBaseDomain(String url) {
    String host = getHost(url);

    int startIndex = 0;
    int nextIndex = host.indexOf('.');
    int lastIndex = host.lastIndexOf('.');
    while (nextIndex < lastIndex) {
        startIndex = nextIndex + 1;
        nextIndex = host.indexOf('.', startIndex);
    }
    if (startIndex > 0) {
        return host.substring(startIndex);
    } else {
        return host;
    }
}
查看更多
登录 后发表回答