URLConnection Doesn't Follow Redirect

2018-12-31 19:26发布

I can't understand why Java's HttpURLConnection doesn't follow redirect. I use the following code to get this page:

import java.net.URL;
import java.net.HttpURLConnection;
import java.io.InputStream;

public class Tester {

    public static void main(String argv[]) throws Exception{
        InputStream is = null;

        try {
            String bitlyUrl = "http://bit.ly/4hW294";
            URL resourceUrl = new URL(bitlyUrl);
            HttpURLConnection conn = (HttpURLConnection)resourceUrl.openConnection();
            conn.setConnectTimeout(15000);
            conn.setReadTimeout(15000);
            conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.0; ru; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)");
            conn.connect();
            is = conn.getInputStream();
            String res = conn.getURL().toString();
            if (res.toLowerCase().contains("bit.ly"))
                System.out.println("bit.ly is after resolving: "+res);
       }
       catch (Exception e) {
           System.out.println("error happened: "+e.toString());
       }
       finally {
            if (is != null) is.close(); 
        }
    }
}

Moreover, I get the following response (it seems absolutely right!):

GET /4hW294 HTTP/1.1
Host: bit.ly
Connection: Keep-Alive
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; ru-RU; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)
HTTP/1.1 301 Moved
Server: nginx/0.7.42
Date: Thu, 10 Dec 2009 20:28:44 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Location: https://www.myganocafe.com/CafeMacy
MIME-Version: 1.0
Content-Length: 297

Unfortunately, the res variable contains the same URL and stream contains the following (obviously, Java's HttpURLConnection doesn't follow redirect!):

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML>
<HEAD>
<TITLE>Moved</TITLE>
</HEAD>
<BODY>
<H2>Moved</H2>
<A HREF="https://www.myganocafe.com/CafeMacy">The requested URL has moved here.</A>
<P ALIGN=RIGHT><SMALL><I>AOLserver/4.5.1 on http://127.0.0.1:7400</I></SMALL></P>
</BODY>
</HTML>

6条回答
冷夜・残月
2楼-- · 2018-12-31 19:33

HTTPUrlConnection is not responsible for handling the response of the object. It is performance as expected, it grabs the content of the URL requested. It is up to you the user of the functionality to interpret the response. It is not able to read the intentions of the developer without specification.

查看更多
伤终究还是伤i
3楼-- · 2018-12-31 19:37

HttpURLConnection by design won't automatically redirect from HTTP to HTTPS (or vice versa). Following the redirect may have serious security consequences. SSL (hence HTTPS) creates a session that is unique to the user. This session can be reused for multiple requests. Thus, the server can track all of the requests made from a single person. This is a weak form of identity and is exploitable. Also, the SSL handshake can ask for the client's certificate. If sent to the server, then the client's identity is given to the server.

As erickson points out, suppose the application is set up to perform client authentication automatically. The user expects to be surfing anonymously because he's using HTTP. But if his client follows HTTPS without asking, his identity is revealed to the server.

With that understood, here's the code which will follow the redirects.

  URL resourceUrl, base, next;
  HttpURLConnection conn;
  String location;

  ...

  while (true)
  {
     resourceUrl = new URL(url);
     conn        = (HttpURLConnection) resourceUrl.openConnection();

     conn.setConnectTimeout(15000);
     conn.setReadTimeout(15000);
     conn.setInstanceFollowRedirects(false);   // Make the logic below easier to detect redirections
     conn.setRequestProperty("User-Agent", "Mozilla/5.0...");

     switch (conn.getResponseCode())
     {
        case HttpURLConnection.HTTP_MOVED_PERM:
        case HttpURLConnection.HTTP_MOVED_TEMP:
           location = conn.getHeaderField("Location");
           location = URLDecoder.decode(location, "UTF-8");
           base     = new URL(url);               
           next     = new URL(base, location);  // Deal with relative URLs
           url      = next.toExternalForm();
           continue;
     }

     break;
  }

  is = conn.openStream();
  ...
查看更多
忆尘夕之涩
4楼-- · 2018-12-31 19:39

As mentioned by some of you above, the setFollowRedirect and setInstanceFollowRedirects only work automatically when the redirected protocol is same . ie from http to http and https to https.

setFolloRedirect is at class level and sets this for all instances of the url connection, whereas setInstanceFollowRedirects is only for a given instance. This way we can have different behavior for different instances.

I found a very good example here http://www.mkyong.com/java/java-httpurlconnection-follow-redirect-example/

查看更多
后来的你喜欢了谁
5楼-- · 2018-12-31 19:48

Another option can be to use Apache HttpComponents Client:

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
</dependency>

Sample code:

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpget = new HttpGet("https://media-hearth.cursecdn.com/avatars/330/498/212.png");
CloseableHttpResponse response = httpclient.execute(httpget);
final HttpEntity entity = response.getEntity();
final InputStream is = entity.getContent();
查看更多
不流泪的眼
6楼-- · 2018-12-31 19:49

Has something called HttpURLConnection.setFollowRedirects(false) by any chance?

You could always call

conn.setInstanceFollowRedirects(true);

if you want to make sure you don't affect the rest of the behaviour of the app.

查看更多
爱死公子算了
7楼-- · 2018-12-31 19:59

I don't think that it will automatically redirect from HTTP to HTTPS (or vice-versa).

Even though we know it mirrors HTTP, from the HTTP protocol point of view, HTTPS is just some other, completely different, unknown protocol. It would be unsafe to follow the redirect without user approval.

For example, suppose the application is set up to perform client authentication automatically. The user expects to be surfing anonymously because he's using HTTP. But if his client follows HTTPS without asking, his identity is revealed to the server.

查看更多
登录 后发表回答