requests.history not showing all redirects

2019-03-05 22:41发布

问题:

I'm trying to get the redirects of some Wikipedia pages, and it's happening something curious to me.

If i make:

>>> request = requests.get("https://en.wikipedia.org/wiki/barcelona", allow_redirects=True)
>>> request.url
u'https://en.wikipedia.org/wiki/Barcelona'
>>> request.history
[<Response [301]>]

As you can see, the redirection is correct and I have same url in browser that in Python.

But if I try:

>>> request = requests.get("https://en.wikipedia.org/wiki/Yardymli_Rayon", allow_redirects=True)
>>> request.url
u'https://en.wikipedia.org/wiki/Yardymli_Rayon'
>>> request.history
[]

And in the browser I see that the URL has changed to: https://en.wikipedia.org/wiki/Yardymli_District

Anyone knows how to solve it?

回答1:

Requests doesn't show the redirect because you're not actually being redirected in the HTTP sense. Wikipedia does some JavaScript trickery (probably HTML5 history modification and pushState) to change the address that's shown in the address bar, but that doesn't apply to Requests, of course.

In other words, both requests and your browser are correct: requests is showing the URL you actually requested (and Wikipedia actually served), while your browser's address bar is showing the 'proper', canonical URL.

You could parse the response and look for the <link rel="canonical"> tag if you want to find out the 'proper' URL from your script, or fetch articles over Wikipedia's API instead.