Parse javascript generated content using Java

2019-08-12 04:18发布

http://support.xbox.com/en-us/contact-us uses javascript to create some lists. I want to be able to parse these lists for their text. So for the above page I want to return the following:

Billing and Subscriptions
Xbox 360
Xbox LIVE
Kinect
Apps
Games

I was trying to use JSoup for a while before noticing it was generated using javascript. I have no idea how to go about parsing a page for its javascript generated content.

Where do I begin?

3条回答
放我归山
2楼-- · 2019-08-12 04:38

I don't think that text is generated by javascript... If I disable javascript those options can be found inside the html at this location (a jquery selector just because it was easier to hand-write than figuring out the xpath without javascript enabled :))

'div#ShellNavigationBar ul.NavigationElements li ul li a'

Regardless in direct answer to your query, you'd have to evaluate the javascript within the scope of the document, which I expect would be rather complex in Java. You'd have more luck identifying the javascript file generating the relevant content and just parsing that directly.

查看更多
孤傲高冷的网名
3楼-- · 2019-08-12 04:40

You'll want to use an HTML+JavaScript library like Cobra. It'll parse the DOM elements in the HTML as well as apply any DOM changes caused by JavaScript.

查看更多
姐就是有狂的资本
4楼-- · 2019-08-12 04:44

you could always import the whole page and then perform a string separator on the page (using return, etc) and look for the string containing the information, then return the string you want and pull pieces out of that string. That is the dirty way of doing it, not sure if there is a clean way to do it.

查看更多
登录 后发表回答