I am accessing web pages through java as follows:
URLConnection con = url.openConnection();
But in some cases, a url redirects to another url. So I want to know the url to which the previous url redirected.
Below are the header fields that I got as a response:
null-->[HTTP/1.1 200 OK]
Cache-control-->[public,max-age=3600]
last-modified-->[Sat, 17 Apr 2010 13:45:35 GMT]
Transfer-Encoding-->[chunked]
Date-->[Sat, 17 Apr 2010 13:45:35 GMT]
Vary-->[Accept-Encoding]
Expires-->[Sat, 17 Apr 2010 14:45:35 GMT]
Set-Cookie-->[cl_def_hp=copenhagen; domain=.craigslist.org; path=/; expires=Sun, 17 Apr 2011 13:45:35 GMT, cl_def_lang=en; domain=.craigslist.org; path=/; expires=Sun, 17 Apr 2011 13:45:35 GMT]
Connection-->[close]
Content-Type-->[text/html; charset=iso-8859-1;]
Server-->[Apache]
So at present, I am constructing the redirected url from the value of the Set-Cookie
header field. In the above case, the redirected url is copenhagen.craigslist.org
Is there any standard way through which I can determine which url the particular url is going to redirect.
I know that when a url redirects to other url, the server sends an intermediate response containing a Location
header field that tells the redirected url but I am not receiving that intermediate response through the url.openConnection();
method.
To get "User-Agent" and "Referer" by yourself, just go to developer mode of one of your installed browser (E.g. press F12 on Google Chrome). Then go to tab 'Network' and then click on one of the requests. You should see it's details. Just press 'Headers' sub tab (the image below)
Simply call getUrl() on URLConnection instance after calling getInputStream():
If you need to know whether the redirection happened before actually getting it's contents, here is the sample code:
@balusC I did as you wrote . In my case , I've added cookie information to be able to reuse the session .
I'd actually suggest using a solid open-source library as an http client. If you take a look at http client by ASF you'll find life a lot easier. It is an easy-to-use,scalable and robust client for http.
You need to cast the
URLConnection
toHttpURLConnection
and instruct it to not follow the redirects by settingHttpURLConnection#setInstanceFollowRedirects()
tofalse
. You can also set it globally byHttpURLConnection#setFollowRedirects()
.You only need to handle redirects yourself then. Check the response code by
HttpURLConnection#getResponseCode()
, grab theLocation
header byURLConnection#getHeaderField()
and then fire a new HTTP request on it.Have a look at the
HttpURLConnection
class API documentation, especiallysetInstanceFollowRedirects()
.