I work on a website with JAVA Jsoup Library to extract some hyperlinks
Document doc = Jsoup.connect("http://www.saudisale.com/SS_a_mpg.aspx").get();
Elements script = doc.select("script") ;
for(Element elementary :doc.select("table"))
{
System.out.println(""+elementary.select("tbody").select("tr").select("td").select("input").attr("onClick")+"");
Sample Output:-
window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://ads.saudisale.com/dyaralez.html ','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://ads.saudisale.com/dyaralez.html ','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://ads.saudisale.com/dalel.html','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://ads.saudisale.com/dalel.html','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('SS_a_car.aspx?carid=37240','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('SS_a_car.aspx?carid=37240','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
Based on the fact that Jsoup does not support javascript, so I have to do some manual java code to convert window.open(hyperlink ) javascript code to absolute hyperlink
For example the following output JavaScript code has to be converted
window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode=1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1')
To: http://saudisale.com/arPrivatePage.aspx?id=21871638
and
window.open('SS_a_car.aspx?carid=37149','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
To http://www.saudisale.com/SS_a_car.aspx?carid=37149
Could someone guide me how to accomplish this task with JAVA?
For relative URl I used this code. It works fine.
Use a regex. This will do what you want:
Your last two URLs are relative, so you have to convert them to absolute URLs as described here.