Python get URL contents when page requires JavaScr

2019-06-13 17:22发布

I am looking to get the contents of a text file hosted on my website using Python. The server requires JavaScript to be enabled on your browser. Therefore when I run:

    import urllib2  
    target_url = "http://09hannd.me/ai/request.txt"
    data = urllib2.urlopen(target_url)

I receive a html page saying to enable JavaScript. I was wondering if there was a way of faking having JS enabled or something.

Thanks

标签： javascript python html http text

2条回答

叛逆

2楼-- · 2019-06-13 18:21

Selenium is the way to go here, but there is another "hacky" option.

Based on this answer: https://stackoverflow.com/a/26393257/2517622

import requests

url = 'http://09hannd.me/ai/request.txt'
response = requests.get(url, cookies={'__test': '2501c0bc9fd535a3dc831e57dc8b1eb0'})
print(response.content) # Output: find me a cafe nearby

0人赞添加讨论(0) 举报

家丑人穷心不美

3楼-- · 2019-06-13 18:26

I would probably suggest tools like this. https://github.com/niklasb/dryscrape

Additionally you can see more info here: Using python with selenium to scrape dynamic web pages

0人赞添加讨论(0) 举报

Python get URL contents when page requires JavaScr

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间