I am writing a web crawler. I extracted heading and Main Discussion of the this link but I am unable to find any one of the comment (Ctrl+u -> Ctrl+f . Comment Text). I think the comments are written in JavaScript. Can I extract it?
相关问题
- Is there a limit to how many levels you can nest i
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to toggle on Order in ReactJS
- How to get the background from multiple images by
RT are using a service from spot.im for comments
you need to do make two POST requests, first
https://api.spot.im/me/network-token/spotim
to get a token, thenhttps://api.spot.im/conversation-read/spot/sp_6phY2k0C/post/353493/get
to get the comments as JSON.i wrote a quick script to do this
Yes, if it can be viewed with a web browser, you can extract it.
If you look at the source it is really an iframe that loads a piece of javascript, that then creates a new tag in the document with the source of that script tag loading bundle.js, which really contains the commenting software. This in turns then fetches the actual comments.
Instead of going through this manually, you could consider using for example webkit to create a headless browser that executes the javascript like an ordinary browser. Then you can scrape from that instead of having to manually make your crawler fetch the external resources.
Examples of such headless browsers could be Spynner, Dryscape, or the PhantomJS derived PhantomPy (the latter seems to be an abandoned project now).