Upload a large XML file with Python Requests libra

2020-06-04 08:59发布

I'm trying to replace curl with Python & the requests library. With curl, I can upload a single XML file to a REST server with the curl -T option. I have been unable to do the same with the requests library.

A basic scenario works:

payload = '<person test="10"><first>Carl</first><last>Sagan</last></person>'
headers = {'content-type': 'application/xml'}
r = requests.put(url, data=payload, headers=headers, auth=HTTPDigestAuth("*", "*"))

When I change payload to a bigger string by opening an XML file, the .put method hangs (I use the codecs library to get a proper unicode string). For example, with a 66KB file:

xmlfile = codecs.open('trb-1996-219.xml', 'r', 'utf-8')
headers = {'content-type': 'application/xml'}
content = xmlfile.read()
r = requests.put(url, data=content, headers=headers, auth=HTTPDigestAuth("*", "*"))

I've been looking into using the multipart option (files), but the server doesn't seem to like that.

So I was wondering if there is a way to simulate curl -T behaviour in Python requests library.

UPDATE 1: The program hangs in textmate, but throws an UnicodeEncodeError error on the commandline. Seems that must be the problem. So the question would be: is there a way to send unicode strings to a server with the requests library?

UPDATE 2: Thanks to the comment of Martijn Pieters the UnicodeEncodeError went away, but a new issue turned up. With a literal (ASCII) XML string, logging shows the following lines:

2012-11-11 15:55:05,154 INFO Starting new HTTP connection (1): my.ip.address
2012-11-11 15:55:05,294 DEBUG "PUT /v1/documents?uri=/example/test.xml HTTP/1.1" 401 211
2012-11-11 15:55:05,430 DEBUG "PUT /v1/documents?uri=/example/test.xml HTTP/1.1" 201 0

Seems the server always bounces the first authentication attempt (?) but then accepts the second one.

With a file object (open('trb-1996-219.xml', 'rb')) passed to data, the logfile shows:

2012-11-11 15:50:54,309 INFO Starting new HTTP connection (1): my.ip.address
2012-11-11 15:50:55,105 DEBUG "PUT /v1/documents?uri=/example/test.xml HTTP/1.1" 401 211
2012-11-11 15:51:25,603 WARNING Retrying (0 attempts remain) after connection broken by 'BadStatusLine("''",)': /v1/documents?uri=/example/test.xml

So, first attempt is blocked as before, but no second attempt is made.

According to Martijn Pieters (below), the second issue can be explained by a faulty server (empty line). I will look into this, but if someone has a workaround (apart from using curl) I wouldn't mind hearing it.

And I am still surprised that the requests library behaves so differently for small string and file object. Isn't the file object serialized before it gets to the server anyway?

3条回答
甜甜的少女心
2楼-- · 2020-06-04 09:41

To PUT large files, don't read them into memory. Simply pass the file as the data keyword:

xmlfile = open('trb-1996-219.xml', 'rb')
headers = {'content-type': 'application/xml'}
r = requests.put(url, data=xmlfile, headers=headers, auth=HTTPDigestAuth("*", "*"))

Moreover, you were opening the file as unicode (decoding it from UTF-8). As you'll be sending it to a remote server, you need raw bytes, not unicode values, and you should open the file as a binary instead.

查看更多
乱世女痞
3楼-- · 2020-06-04 09:54

i used requests in python to upload an XML file using the commands. first to open the file use open() file = open("PIR.xsd") fragment = file.read() file.close() copy the data of XML file in the payload of the requests and post it payload = {'key':'PFAkrzjmuZR957','xmlFragment':fragment} r = requests.post(URL,data=payload) to check the html validation code print (r.text)

查看更多
一纸荒年 Trace。
4楼-- · 2020-06-04 09:56

Digest authentication always requires you to make at least two request to the server. The first request doesn't contain any authentication data. This first request will fail with a 401 "Authorization required" response code and a digest challenge (called a nounce) to be used for hashing your password etc. (the exact details don't matter here). This is used to make a second request to the server containing your credentials hashed with the challenge.

The problem is in the this two step authentication: your large file was already send with the first unauthorized request (send in vain) but on the second request the file object is already at the EOF position. Since the file size was also send in the Content-length header of the second request, this causes the server to wait for a file that will never be send.

You could solve it using a requests Session and first make a simple request for authentication purposes (say a GET request). Then make a second PUT request containing the actual payload using the same digest challenge form the first request.

sess = requests.Session()
sess.auth = HTTPDigestAuth("*", "*")
sess.get(url)
headers = {'content-type': 'application/xml'}
with codecs.open('trb-1996-219.xml', 'r', 'utf-8') as xmlfile:
    sess.put(url, data=xmlfile, headers=headers)
查看更多
登录 后发表回答