I am stuck on this issue for several days now, my eyes are starting to hurt from time spent trying different combinations, but without success. The thing is, I am making an app, which has to get data form the internet, parse it and then show it to the user. I've tried several methods for doing that, and using JSOUP was very helpful, especially with parsing and getting the data out of the results.
However, there is one issue which I can not resolve. I have tried with the regular HTTPClient, and with JSOUP but I can't successfully get the data I need. Here is my code (JSOUP version):
public void bht_ht(Context c, int pozivni, int broj) throws IOException {
//this is the first connection, to get the cookies (I have tried the version without this method separate, but it's the same
Connection.Response resCookie = Jsoup.connect("http://www.bhtelecom.ba/imenik_telefon.html")
.method(Method.GET)
.execute();
String sessionId = resCookie.cookie("PHPSESSID");
String fetypo = resCookie.cookie("fe_typo_user");
//these two above are the cookies
//the POST request, with the data asked
Connection.Response res = Jsoup.connect("http://www.bhtelecom.ba/imenik_telefon.html?a=search")
.data("di", some_data)
.data("br", some_data)
.data("btnSearch","Tra%C5%BEi")
.cookie("PHPSESSID", sessionId)
.cookie("fe_typo_user", fetypo)
.method(Method.POST)
.execute();
Document dok = res.parse();
//So, here is the GET request for the site which contains the results, and this site is redirected to with HTTP 302 response after the POSt result
Document doc = Jsoup.connect("http://www.bhtelecom.ba/index.php?id=3226&")
.cookie("PHPSESSID", sessionId)
.cookie("fe_typo_user", fetypo)
.referrer("http://www.bhtelecom.ba/imenik_telefon.html")
.get();
Document doc = res2.parse();
Element elemenat = doc.select("div.boxtexter").get(0);
String ime = elemenat.text();
}
So, the end result would be a string which contains the returned data. But, whatever I try I get the "blank" page and it's parsed text, and I've simulated everything which is requested by the browser.
Here are the POST and GET raw headers captured by the browser: (post)
> POST /imenik_telefon.html?a=search HTTP/1.1 Host: www.bhtelecom.ba
> Content-Length: 56 Cache-Control: max-age=0 Origin:
> http://www.bhtelecom.ba User-Agent: Mozilla/5.0 (Windows NT 6.1;
> WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202
> Safari/535.1 Content-Type: application/x-www-form-urlencoded Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Referer: http://www.bhtelecom.ba/index.php?id=3226& Accept-Encoding:
> gzip,deflate,sdch Accept-Language: en-US,en;q=0.8 Accept-Charset:
> ISO-8859-1,utf-8;q=0.7,*;q=0.3 Cookie:
> PHPSESSID=opavncj3317uidbt93t9bie980;
> fe_typo_user=332a76d0b1d4944bdbbcd28d63d62d75;
> __utma=206281024.1997742542.1319583563.1319583563.1319588786.2; __utmb=206281024.1.10.1319588786; __utmc=206281024; __utmz=206281024.1319583563.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
>
> di=033&br=123456&_uqid=&_cdt=&_hsh=&btnSearch=Tra%C5%BEi
(get)
> GET /index.php?id=3226& HTTP/1.1 Host: www.bhtelecom.ba Cache-Control:
> max-age=0 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64)
> AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202 Safari/535.1
> Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Referer: http://www.bhtelecom.ba/index.php?id=3226& Accept-Encoding:
> gzip,deflate,sdch Accept-Language: en-US,en;q=0.8 Accept-Charset:
> ISO-8859-1,utf-8;q=0.7,*;q=0.3 Cookie:
> PHPSESSID=opavncj3317uidbt93t9bie980;
> __utma=206281024.1997742542.1319583563.1319583563.1319588786.2; __utmb=206281024.1.10.1319588786; __utmc=206281024; __utmz=206281024.1319583563.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); fe_typo_user=07745dd2a36a23c64c2297026061a2c2
In this GET, (its response), the data I need is located, but with any combination of parameters, cookies, or everything I tried, I couldn't get it to "think" that I made a POST and now want that data.
Here is the version of my code without JSOUP parser, but I can't get it to work either, although when I check those cookies, they are OK, same for POST and GET, but without success.
DefaultHttpClient client = new DefaultHttpClient();
String postURL = "http://www.bhtelecom.ba/imenik_telefon.html?a=search";
HttpPost post = new HttpPost(postURL);
post.getParams().setParameter(CoreProtocolPNames.USE_EXPECT_CONTINUE, Boolean.FALSE);
List<NameValuePair> params = new ArrayList<NameValuePair>();
params.add(new BasicNameValuePair("di", "035"));
params.add(new BasicNameValuePair("br", "819443"));
params.add(new BasicNameValuePair("btnSearch","Tra%C5%BEi"));
UrlEncodedFormEntity ent = new UrlEncodedFormEntity(params,HTTP.UTF_8);
post.setEntity(ent);
HttpResponse responsePOST = client.execute(post);
HttpEntity resEntity = responsePOST.getEntity();
if (resEntity != null) {
//todo
}
//checking for cookies, they are OK
List<Cookie> cookies = client.getCookieStore().getCookies();
if (cookies.isEmpty()) {
Log.d(TAG, "no cookies");
} else {
for (int i = 0; i < cookies.size(); i++) {
Log.d(TAG, "cookies: " + cookies.get(i).toString());
}
}
resEntity.consumeContent();
HttpGet get = new HttpGet("http://www.bhtelecom.ba/index.php?id=3226&");
get.getParams().setParameter(CoreProtocolPNames.USE_EXPECT_CONTINUE, Boolean.FALSE);
HttpResponse responseGET = client.execute(get);
HttpEntity entityGET = responseGET.getEntity();
List<Cookie> cookiesGet = client.getCookieStore().getCookies();
if (cookies.isEmpty()) {
Log.d(TAG, "no cookies");
} else {
for (int i = 0; i < cookiesGet.size(); i++) {
Log.d(TAG, "cookies GET: " + cookiesGet.get(i).toString());
}
}
//a method to check the data, I pass the InputStream to it, and do the operations, I've tried "manually", and passing the InputStream to JSOUP, but without success in either case.
samplemethod(entityGET.getContent());
client.getConnectionManager().shutdown();
} catch (Exception e) {
e.printStackTrace();
}
So, if anyone can find an error in my set up, or find me a way to make these two requests and then get the data, HTTP Entity, which I could then use as an input (InputStream) to lovely JSOUP parser, that would be amazing. Or maybe I got this whole thing about what does the page need, and I need to make my requests with different parameters, I would appreciate that. I used Wireshark and Charles Debugging Proxy to get the idea what to create (tried both, to double check), and found only that session id, fe_typo_user and some other parameters used for tracking the time on site and etc, and I've tried passing them too, "_utma" "_utmb" ... and so on.
I have some other methods, using "simpler", POST only methods with data in response, and I've successfully got that, but this specific issue with this site is driving me crazy. Thanks in advance for your help.
After many, many hours of trying things and tracking incoming/outgoing packets, I finally managed to find a solution.
The things was with the "bug", or the behavior of HTTPClient. If you add a parameter to a post, and a parameter is emty, has "" value, it is not sent with the request. I didn't know that, and thought that those parameters, since they are empty, won't change enything, and with doing stuff with JSOUP I didn't pass them to the requests.
So,
were the places of interest.
Another thing, since this page has 302 response, and JSOUP has followRedirects set to "true" as default, I had to make that false also because that method is POST, and the "follow up request" has to be GET, but JSOUP assumes it's still POST and messes things up.
So that's it, hope someone will find this useful :)