I'm trying to extract information from here with Jsoup library. Cannot grab information after js element.
I look on this page with Opera DragonFly at the each of the td elements. And here is result:
<td class="t_port">
<script type="text/javascript">
//<![CDATA[
document.write(Socks^GrubMe^51959);
//]]>
</script>
"1080
"
</td>
When I'm use view code function of any browser, he returns me same lines of code but without "1080" - information what I'm looking for. Same result I'l take when I try to grab this page with Jsoup. js code is much more or less similar. Like:
document.write(SmallBlind^NineBeforeZero^64881);
or
document.write(ProxyMoxy^DexterProxy^29182);
or something similar
document.write(Defender^Agile^57721);
Understanding policy of this service i suppose what this js code blocks this necessary information and load it later dynamicly, through editing DOM add adding "1080" type of information.
Any suggestions grab this info?
P.S: Here is my code:
Document doc = Jsoup.connect(socks4URL).post();
Elements ips = doc.select("table.proxytbl td.t_ip");
for (Element e : ips) {
System.out.println("e is " + e.text());
}
Elements ports = doc.select("table.proxytbl td.t_port");
for (Element e : ports) {
System.out.println("port is " + e);
}
First
I suppose the site uses this technique exactly to discourage people like you to scrape their information. Having said that, I just assume you understand this and give up.
Second
This side does not load the port info via ajax. It simply defines some global variables in a script tag and uses the bitwise XOR operator (^) to calculate the port number. To understand what is going on, you need to understand the XOR operator, find the little script that is loaded inline in the source (hint: script tag inside the div with id="incontent"). Here is what I got, but that might be a dynamically generated script, so it might differ from call to call:
Now you can parse the data and recreate variables with the same values. Just parse the port field and interpret the little XOR calculation. For example:
According to the above script SmallBlind=35900 and BigProxy=13097 (after evaluation!)
so the calculus is 35900 ^ 13097 ^ 47917 = 1080
Third
Just subscribe to one of the many services that send you ready to use socks proxy lists, if you need them so badly :)