is it possible to write web crawler in javascript?

2019-03-10 16:16发布

I want to crawl the page and check for the hyperlinks in that respective page and also follow those hyperlinks and capture data from the page

10条回答
老娘就宠你
2楼-- · 2019-03-10 16:55
Melony?
3楼-- · 2019-03-10 16:56

We could crawl the pages using Javascript from server side with help of headless webkit. For crawling, we have few libraries like PhantomJS, CasperJS, also there is a new wrapper on PhantomJS called Nightmare JS which make the works easier.

查看更多
4楼-- · 2019-03-10 17:02

yes it is possible

  1. Use NODEJS (its server side JS)
  2. There is NPM (package manager that handles 3rd party modules) in nodeJS
  3. Use PhantomJS in NodeJS (third party module that can crawl through websites is PhantomJS)
查看更多
何必那么认真
5楼-- · 2019-03-10 17:03

There are ways to circumvent the same-origin policy with JS. I wrote a crawler for facebook, that gathered information from facebook profiles from my friends and my friend's friends and allowed filtering the results by gender, current location, age, martial status (you catch my drift). It was simple. I just ran it from console. That way your script will get privilage to do request on the current domain. You can also make a bookmarklet to run the script from your bookmarks.

Another way is to provide a PHP proxy. Your script will access the proxy on current domain and request files from another with PHP. Just be carefull with those. These might get hijacked and used as a public proxy by 3rd party if you are not carefull.

Good luck, maybe you make a friend or two in the process like I did :-)

查看更多
登录 后发表回答