可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I want a script that will scrape a certain web page every hour, and will look for a certain string inside that page.

However, when I enter that page and use `view:source", I cannot see that string in the source. I was told that it's because the string I'm looking for comes from an element that is rendered on the client side (javascript), and thus I can see it only when I manually inspect that element with Chrome console for example.

Which practice / programming language / environment, would be the most efficient to achieve what I want, considering that I want to run that script from my webhost server, which has 2.25GB RAM?

Someone suggested that I will use Pyqt4, but my web-host warned me that this will kill my RAM and hurt server performance. I should note that the script supposed to be very simple, and scrape only a single page, once in an hour.

回答1:

It seems that problem could be solved with PhantomJS, as it mocks real browser's action, which extracts information from client code.

For PhantomJS with Javascript, you may check testing-javascript-with-phantomjs

For how to use PhantomJS with python, please take a look at this

Hope it helps~

回答2:

I cannot see that string in the source

If you only need to fetch one string of the page you might program to do the same what js performs. If JS sends ajax request (GET or POST), you also do it using pure Python thus fetching the missing string.

Suppose in-page script performs the following (NB. code might be in pure JS see here an example):

$.ajax({
  url: "test.html",
  context: document.body
}).done(function() {
  $( this ).addClass( "done" );
});

so in your Python scripting you request the 'test.html' file:

import requests 
base='http://example.com/'
r = requests.get( base + 'test.html')

thus getting the data desired:

print r.headers['content-type']
// 'application/json; charset=utf8'
print r.text
// u'{"data":"<string>"...'

Efficient practice to scrape a page with Client-si

问题:

回答1:

回答2:

收藏的人(0)

Efficient practice to scrape a page with Client-si

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮