Java how to find out if a URL is http or https?

2019-08-12 01:06发布

I am writing a web crawler tool in Java. When I type the website name, how can I make it so that it connects to that site in http or https without me defining the protocol?

try {
   Jsoup.connect("google.com").get();
} catch (IOException ex) {
   Logger.getLogger(LinkGUI.class.getName()).log(Level.SEVERE, null, ex);
}

But I get the error:

java.lang.IllegalArgumentException: Malformed URL: google.com

What can I do? Are there any classes or libraries that do this?

What I'm trying to do is I have a list of 165 Courses, each with 65 - 71 html pages with links all throughout them. I am writing a Java program to test if the link is broken or not.

1条回答
地球回转人心会变
2楼-- · 2019-08-12 01:23

You can write your own simple method to try both protocols, like:

static boolean usesHttps(final String urlWithoutProtocol) throws IOException {
    try {
        Jsoup.connect("http://" + urlWithoutProtocol).get();
        return false;
    } catch (final IOException e) {
        Jsoup.connect("https://" + urlWithoutProtocol).get();
        return true;
    }
}

Then, your original code can be:

try {
    boolean shouldUseHttps = usesHttps("google.com");
} catch (final IOException ex) {
    Logger.getLogger(LinkGUI.class.getName()).log(Level.SEVERE, null, ex);
}

Note: you should only use the usesHttps() method once per URL, to figure out which protocol to use. After you know that, you should connect using Jsoup.connect() directly. This will be more efficient.

查看更多
登录 后发表回答