I'm very new to html parsing with Java, I used JSoup previously to parse simple html without it dynamically changing, however I now need to parse a web page that has dynamic elements. This is the code I attempted to parse the web page with prior however it was impossible to find the elements since they where added after the page had loaded. The situation is question is a page that uses google maps with markers on it, I'm attempting to scrape the images of these markers.
public static void main(String[] args) {
try {
doc = Jsoup.connect("https://pokevision.com")
.userAgent(
"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36")
.get();
} catch (IOException e) {
e.printStackTrace();
}
Elements images = doc.select("img[src~=(?i)\\.(png|jpe?g|gif)]");
for (Element image : images) {
System.out.println("src : " + image.attr("src"));
}
}
So since apparently this operation is impossible with JSoup, what other libraries can I use to find the image sources.
The problem you are facing is Jsoup retrieves the static source code, as it would be delivered to a browser. What you want is the DOM after the javaScript has been invoked. For this, you can use HTML Unit to get the rendered page and then pass its content to Jsoup for parsing.