Java - How to find the redirected url of a url?

2019-01-01 03:45发布

I am accessing web pages through java as follows:

URLConnection con = url.openConnection();

But in some cases, a url redirects to another url. So I want to know the url to which the previous url redirected.

Below are the header fields that I got as a response:

null-->[HTTP/1.1 200 OK]
Cache-control-->[public,max-age=3600]
last-modified-->[Sat, 17 Apr 2010 13:45:35 GMT]
Transfer-Encoding-->[chunked]
Date-->[Sat, 17 Apr 2010 13:45:35 GMT]
Vary-->[Accept-Encoding]
Expires-->[Sat, 17 Apr 2010 14:45:35 GMT]
Set-Cookie-->[cl_def_hp=copenhagen; domain=.craigslist.org; path=/; expires=Sun, 17     Apr 2011 13:45:35 GMT, cl_def_lang=en; domain=.craigslist.org; path=/; expires=Sun, 17 Apr 2011 13:45:35 GMT]
Connection-->[close]
Content-Type-->[text/html; charset=iso-8859-1;]
Server-->[Apache]

So at present, I am constructing the redirected url from the value of the Set-Cookie header field. In the above case, the redirected url is copenhagen.craigslist.org

Is there any standard way through which I can determine which url the particular url is going to redirect.

I know that when a url redirects to other url, the server sends an intermediate response containing a Location header field that tells the redirected url but I am not receiving that intermediate response through the url.openConnection(); method.

6条回答
谁念西风独自凉
2楼-- · 2019-01-01 04:08
public static URL getFinalURL(URL url) {
    try {
        HttpURLConnection con = (HttpURLConnection) url.openConnection();
        con.setInstanceFollowRedirects(false);
        con.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36");
        con.addRequestProperty("Accept-Language", "en-US,en;q=0.8");
        con.addRequestProperty("Referer", "https://www.google.com/");
        con.connect();
        //con.getInputStream();
        int resCode = con.getResponseCode();
        if (resCode == HttpURLConnection.HTTP_SEE_OTHER
                || resCode == HttpURLConnection.HTTP_MOVED_PERM
                || resCode == HttpURLConnection.HTTP_MOVED_TEMP) {
            String Location = con.getHeaderField("Location");
            if (Location.startsWith("/")) {
                Location = url.getProtocol() + "://" + url.getHost() + Location;
            }
            return getFinalURL(new URL(Location));
        }
    } catch (Exception e) {
        System.out.println(e.getMessage());
    }
    return url;
}

To get "User-Agent" and "Referer" by yourself, just go to developer mode of one of your installed browser (E.g. press F12 on Google Chrome). Then go to tab 'Network' and then click on one of the requests. You should see it's details. Just press 'Headers' sub tab (the image below) request details

查看更多
牵手、夕阳
3楼-- · 2019-01-01 04:12

Simply call getUrl() on URLConnection instance after calling getInputStream():

URLConnection con = new URL( url ).openConnection();
System.out.println( "orignal url: " + con.getURL() );
con.connect();
System.out.println( "connected url: " + con.getURL() );
InputStream is = con.getInputStream();
System.out.println( "redirected url: " + con.getURL() );
is.close();

If you need to know whether the redirection happened before actually getting it's contents, here is the sample code:

HttpURLConnection con = (HttpURLConnection)(new URL( url ).openConnection());
con.setInstanceFollowRedirects( false );
con.connect();
int responseCode = con.getResponseCode();
System.out.println( responseCode );
String location = con.getHeaderField( "Location" );
System.out.println( location );
查看更多
看风景的人
4楼-- · 2019-01-01 04:12

@balusC I did as you wrote . In my case , I've added cookie information to be able to reuse the session .

   // get the cookie if need
    String cookies = conn.getHeaderField("Set-Cookie");

    // open the new connnection again
    conn = (HttpURLConnection) new URL(newUrl).openConnection();
    conn.setRequestProperty("Cookie", cookies);
查看更多
深知你不懂我心
5楼-- · 2019-01-01 04:19

I'd actually suggest using a solid open-source library as an http client. If you take a look at http client by ASF you'll find life a lot easier. It is an easy-to-use,scalable and robust client for http.

查看更多
ら面具成の殇う
6楼-- · 2019-01-01 04:21

You need to cast the URLConnection to HttpURLConnection and instruct it to not follow the redirects by setting HttpURLConnection#setInstanceFollowRedirects() to false. You can also set it globally by HttpURLConnection#setFollowRedirects().

You only need to handle redirects yourself then. Check the response code by HttpURLConnection#getResponseCode(), grab the Location header by URLConnection#getHeaderField() and then fire a new HTTP request on it.

查看更多
浪荡孟婆
7楼-- · 2019-01-01 04:29

Have a look at the HttpURLConnection class API documentation, especially setInstanceFollowRedirects().

查看更多
登录 后发表回答