If you enter this in a browser url:
https://charlotte.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&AUCTIONDATE=07/16/2019
It returns a lot of data. But if I try to capture that data with an Input StreamReader, the only data returned is
{"retHTML":"", "rlist":""}
Here is the program:
List<Property> scrapePropertyInfo(List<Date> auctionDates) {
List<Property> properties = new ArrayList<>();
String urlStr = "https://charlotte.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&AUCTIONDATE=07/16/2019";
String str = null;
try {
URL url = new URL(urlStr);
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
StringBuilder stringBuilder = new StringBuilder();
while ((str = in.readLine()) != null) {
stringBuilder.append(str);
}
System.out.println("Url: "+urlStr);
System.out.println(stringBuilder.toString());
in.close();
} catch (MalformedURLException ex) {
Logger.getLogger(CharlotteCtyFL.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(CharlotteCtyFL.class.getName()).log(Level.SEVERE, null, ex);
}
return properties;
}
Does anybody know why?
Edit: a little smarter now So apparently more stuff is required to be sent to the server than just the url. Since this is dynamic ajax data being populated only if you ask it nice using the original web page, need to simulate that in java.
I discovered how to get that info in the chrome F12 debugger console. Under Network->XHR->Preview, click on each item until you see the expected data. Then right-click on it and select Copy->Copy Request Headers.
Here is what got copied:
GET /index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&tx=1563231065712&bypassPage=1&test=1&_=1563231065712 HTTP/1.1 Host: charlotte.realforeclose.com Connection: keep-alive Accept: application/json, text/javascript, /; q=0.01 X-Requested-With: XMLHttpRequest User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36 Origin: http://evil.com/ Referer: https://charlotte.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=PREVIEW&AUCTIONDATE=07/16/2019 Accept-Encoding: gzip, deflate, br Accept-Language: en-US,en;q=0.9 Cookie: cfid=6f228aa1-bb7e-4734-92ff-39eabf23ed9b; cftoken=0; CF_CLIENT_CHARLOTTE_REALFORECLOSE_TC=1563229207612; AWSELB=E7779D5F1C1F6ABE3513A5C5B6B0C754520B66675A407900314ABAC5333A52E93FD1A8D7401D89BC8D5E8B98059C8AAC5507D12A2C6ED07F7E7CB77311BD7FB09B738DB945; _ga=GA1.2.1823487290.1563231012; _gid=GA1.2.1418453663.1563231012; _gat=1; _gcl_au=1.1.273755450.1563231013; __utma=65865852.1823487290.1563231012.1563231014.1563231014.1; __utmc=65865852; __utmz=65865852.1563231014.1.1.utmcsr=realauction.com|utmccn=(referral)|utmcmd=referral|utmcct=/client-sites; __utmt_UA-51657054-1=1; __utmb=65865852.2.10.1563231014; testcookiesenabled=enabled; CF_CLIENT_CHARLOTTE_REALFORECLOSE_LV=1563231067363; CF_CLIENT_CHARLOTTE_REALFORECLOSE_HC=73
Now how do I get that into the request from java? I know how to do it in javascript but not java.