Parse javascript generated content using Java

2019-08-12 04:18发布

http://support.xbox.com/en-us/contact-us uses javascript to create some lists. I want to be able to parse these lists for their text. So for the above page I want to return the following:

Billing and Subscriptions
Xbox 360
Xbox LIVE
Kinect
Apps
Games

I was trying to use JSoup for a while before noticing it was generated using javascript. I have no idea how to go about parsing a page for its javascript generated content.

Where do I begin?

标签： java javascript parsing

3条回答

放我归山

2楼-- · 2019-08-12 04:38

I don't think that text is generated by javascript... If I disable javascript those options can be found inside the html at this location (a jquery selector just because it was easier to hand-write than figuring out the xpath without javascript enabled :))

'div#ShellNavigationBar ul.NavigationElements li ul li a'

Regardless in direct answer to your query, you'd have to evaluate the javascript within the scope of the document, which I expect would be rather complex in Java. You'd have more luck identifying the javascript file generating the relevant content and just parsing that directly.

0人赞添加讨论(0) 举报

孤傲高冷的网名

3楼-- · 2019-08-12 04:40

You'll want to use an HTML+JavaScript library like Cobra. It'll parse the DOM elements in the HTML as well as apply any DOM changes caused by JavaScript.

0人赞添加讨论(0) 举报

姐就是有狂的资本

4楼-- · 2019-08-12 04:44

you could always import the whole page and then perform a string separator on the page (using return, etc) and look for the string containing the information, then return the string you want and pull pieces out of that string. That is the dirty way of doing it, not sure if there is a clean way to do it.

0人赞添加讨论(0) 举报

Parse javascript generated content using Java

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间