python requests return a different web page from b

2019-05-30 03:40发布

I use requests to scrape webpage for some content.
When I use

import requests  
requests.get('example.org')

I get a different page from the one I get when I use my broswer or using

import urllib.request
urllib.request.urlopen('example.org')

I tried using urllib but it was really slow.
In a comparison test I did it was 50% slower than requests !!

How Do you solve this??

标签： python python-requests urllib

1条回答

Ridiculous、

2楼-- · 2019-05-30 04:04

After a lot of investigations I found that the site passes a cookie in the header attached to the first visitor to the site only.

so the solution is to get the cookies with head request, then resend them with your get request

import requests  
# get the cookies with head(), this doesn't get the body so it's FAST
cookies = requests.head('example.com')
# send get request with the cookies
result = requests.get('example.com', cookies=cookies)

Now It's faster than urllib + the same result :)

0人赞添加讨论(0) 举报

python requests return a different web page from b

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间