urllib2 not retrieving entire HTTP response

2019-01-17 17:37发布

I'm perplexed as to why I'm not able to download the entire contents of some JSON responses from FriendFeed using urllib2.

>>> import urllib2
>>> stream = urllib2.urlopen('http://friendfeed.com/api/room/the-life-scientists/profile?format=json')
>>> stream.headers['content-length']
'168928'
>>> data = stream.read()
>>> len(data)
61058
>>> # We can see here that I did not retrieve the full JSON
... # given that the stream doesn't end with a closing }
... 
>>> data[-40:]
'ce2-003048343a40","name":"Vincent Racani'

How can I retrieve the full response with urllib2?

4条回答
2楼-- · 2019-01-17 17:40

Best way to get all of the data:

fp = urllib2.urlopen("http://www.example.com/index.cfm")

response = ""
while 1:
    data = fp.read()
    if not data:         # This might need to be    if data == "":   -- can't remember
        break
    response += data

print response

The reason is that .read() isn't guaranteed to return the entire response, given the nature of sockets. I thought this was discussed in the documentation (maybe urllib) but I cannot find it.

查看更多
Bombasti
3楼-- · 2019-01-17 17:41

Use tcpdump (or something like it) to monitor the actual network interactions - then you can analyze why the site is broken for some client libraries. Ensure that you repeat multiple times by scripting the test, so you can see if the problem is consistent:

import urllib2
url = 'http://friendfeed.com/api/room/friendfeed-feedback/profile?format=json'
stream = urllib2.urlopen(url)
expected = int(stream.headers['content-length'])
data = stream.read()
datalen = len(data)
print expected, datalen, expected == datalen

The site's working consistently for me so I can't give examples of finding failures :)

查看更多
淡お忘
4楼-- · 2019-01-17 17:43
readlines() 

also works

查看更多
一纸荒年 Trace。
5楼-- · 2019-01-17 17:45

Keep calling stream.read() until it's done...

while data = stream.read() :
    ... do stuff with data
查看更多
登录 后发表回答