Python get URL contents when page requires JavaScr

2019-06-13 17:22发布

I am looking to get the contents of a text file hosted on my website using Python. The server requires JavaScript to be enabled on your browser. Therefore when I run:

    import urllib2  
    target_url = "http://09hannd.me/ai/request.txt"
    data = urllib2.urlopen(target_url) 

I receive a html page saying to enable JavaScript. I was wondering if there was a way of faking having JS enabled or something.

Thanks

2条回答
叛逆
2楼-- · 2019-06-13 18:21

Selenium is the way to go here, but there is another "hacky" option.

Based on this answer: https://stackoverflow.com/a/26393257/2517622

import requests

url = 'http://09hannd.me/ai/request.txt'
response = requests.get(url, cookies={'__test': '2501c0bc9fd535a3dc831e57dc8b1eb0'})
print(response.content) # Output: find me a cafe nearby
查看更多
家丑人穷心不美
3楼-- · 2019-06-13 18:26

I would probably suggest tools like this. https://github.com/niklasb/dryscrape

Additionally you can see more info here: Using python with selenium to scrape dynamic web pages

查看更多
登录 后发表回答