可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm using the requests module along with Python 2.7 to build a basic web crawler.

source_code = requests.get(url)
plain_text = source_code.text

Now, in the above lines of code, I'm storing the source code of the specified URL and other metadata inside the source_code variable. Now, in source_code.text, what exactly is the .text attribute? It is not a function. I couldn't find anything in the documentation which explains the origin or feature of .text either.

回答1:

requests.get() returns a Response object; it is that object that has the .text attribute; it is not the 'source code' of the URL, it is an object that lets you access the source code (the body) of the response, as well as other information. The Response.text attribute gives you the body of the response, decoded to unicode.

See the Response Content section of the Quickstart documentation:

When you make a request, Requests makes educated guesses about the encoding of the response based on the HTTP headers. The text encoding guessed by Requests is used when you access r.text.

Further information can be found in the API documentation, see the Response.text entry:

Content of the response, in unicode.

If Response.encoding is None, encoding will be guessed using chardet.

The encoding of the response content is determined based solely on HTTP headers, following RFC 2616 to the letter. If you can take advantage of non-HTTP knowledge to make a better guess at the encoding, you should set r.encoding appropriately before accessing this property.

You can also use Response.content to access the response body undecoded, as raw bytes.

回答2:

in this line

 source_code = requests.get(url)

source_code has a response object, not the source code.

it should be

response = requests.get(url)
source_code = response.text

Requests: Explanation of the .text format

问题:

回答1:

回答2:

收藏的人(0)

Requests: Explanation of the .text format

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮