蟒蛇-请求：获取的响应内容的最前头，而无需耗费了一切(python-requests: fetchi

使用Python-请求和蟒蛇魔法，我想测试的MIME类型的Web资源的并不获取的所有内容（特别是如果该资源恰好是例如，一个OGG文件或PDF文件）。根据这个结果，我可能会决定全部取出。然而已经调用测试的MIME类型只返回什么尚未消耗后的文字方法。我怎么能不消耗反应含量测试的MIME类型？

下面是我当前的代码。

import requests
import magic


r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
mime = magic.from_buffer(r.iter_content(256).next(), mime=True)

if mime == "text/html":
    print(r.text)  # I'd like r.text to give me the entire response content

谢谢！

Answer 1:

注：在这个问题被问的时候，只有标题流的身体是用正确的方法来获取prefetch=False 。该选项已被重新命名为stream和布尔值反转，所以你要stream=True 。

原来的答案如下。

一旦你使用iter_content()您可以继续使用它; .text间接使用罩（通过在相同的接口.content ）。

换句话说，通过使用iter_content()在所有的，你必须做的工作.text手工做：

from requests.compat import chardet

r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
peek = r.iter_content(256).next()
mime = magic.from_buffer(peek, mime=True)

if mime == "text/html":
    contents = peek + b''.join(r.iter_content(10 * 1024))
    encoding = r.encoding
    if encoding is None:
        # detect encoding
        encoding = chardet.detect(contents)['encoding']
    try:
        textcontent = str(contents, encoding, errors='replace')
    except (LookupError, TypeError):
        textcontent = str(contents, errors='replace')
    print(textcontent)

假设你使用Python 3。

另一种方法是使2个请求：

r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
mime = magic.from_buffer(r.iter_content(256).next(), mime=True)

if mime == "text/html":
     print(r.requests.get("http://www.december.com/html/demo/hello.html").text)

Python的版本2：

r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
peek = r.iter_content(256).next()
mime = magic.from_buffer(peek, mime=True)

if mime == "text/html":
    contents = peek + ''.join(r.iter_content(10 * 1024))
    encoding = r.encoding
    if encoding is None:
        # detect encoding
        encoding = chardet.detect(contents)['encoding']
    try:
        textcontent = unicode(contents, encoding, errors='replace')
    except (LookupError, TypeError):
        textcontent = unicode(contents, errors='replace')
    print(textcontent)

Answer 2:

如果“内涵式”就足够了，你可以发出一个HTTP“HEAD”请求而不是“GET”，只接受HTTP标头。

import requests

url = 'http://www.december.com/html/demo/hello.html'
response = requests.head(url)
print response.headers['content-type']

文章来源: python-requests: fetching the head of the response content without consuming it all