cgi.parse_multipart function throws TypeError in P

2020-07-06 03:42发布

问题:

I'm trying to make an exercise from Udacity's Full Stack Foundations course. I have the do_POST method inside my subclass from BaseHTTPRequestHandler, basically I want to get a post value named message submitted with a multipart form, this is the code for the method:

def do_POST(self):
    try:
        if self.path.endswith("/Hello"):
            self.send_response(200)
            self.send_header('Content-type', 'text/html')
            self.end_headers
            ctype, pdict = cgi.parse_header(self.headers['content-type'])
            if ctype == 'multipart/form-data':
                fields = cgi.parse_multipart(self.rfile, pdict)
                messagecontent = fields.get('message')
            output = ""
            output += "<html><body>"
            output += "<h2>Ok, how about this?</h2>"
            output += "<h1>{}</h1>".format(messagecontent)
            output += "<form method='POST' enctype='multipart/form-data' action='/Hello'>"
            output += "<h2>What would you like to say?</h2>"
            output += "<input name='message' type='text'/><br/><input type='submit' value='Submit'/>"
            output += "</form></body></html>"
            self.wfile.write(output.encode('utf-8'))
            print(output)
            return
    except:
        self.send_error(404, "{}".format(sys.exc_info()[0]))
        print(sys.exc_info()    )

The problem is that the cgi.parse_multipart(self.rfile, pdict) is throwing an exception: TypeError: can't concat bytes to str, the implementation was provided in the videos for the course, but they're using Python 2.7 and I'm using python 3, I've looked for a solution all afternoon but I could not find anything useful, what would be the correct way to read data passed from a multipart form in python 3?

回答1:

I've came across here to solve the same problem like you have. I found a silly solution for that. I just convert 'boundary' item in the dictionary from string to bytes with an encoding option.

    ctype, pdict = cgi.parse_header(self.headers['content-type'])
    pdict['boundary'] = bytes(pdict['boundary'], "utf-8")
    if ctype == 'multipart/form-data':
            fields = cgi.parse_multipart(self.rfile, pdict)

In my case, It seems work properly.



回答2:

To change the tutor's code to work for Python 3 there are three error messages you'll have to combat:

If you get these error messages

c_type, p_dict = cgi.parse_header(self.headers.getheader('Content-Type'))
AttributeError: 'HTTPMessage' object has no attribute 'getheader'

or

 boundary = pdict['boundary'].decode('ascii')
AttributeError: 'str' object has no attribute 'decode'

or

headers['Content-Length'] = pdict['CONTENT-LENGTH']
KeyError: 'CONTENT-LENGTH'

when running

c_type, p_dict = cgi.parse_header(self.headers.getheader('Content-Type'))
if c_type == 'multipart/form-data':
                fields = cgi.parse_multipart(self.rfile, p_dict)
                message_content = fields.get('message')

this applies to you.

Solution

First of all change the first line to accommodate Python 3:

- c_type, p_dict = cgi.parse_header(self.headers.getheader('Content-Type'))
+  c_type, p_dict = cgi.parse_header(self.headers.get('Content-Type'))

Secondly, to fix the error of 'str' object not having any attribute 'decode', it's because of the change of strings being turned into unicode strings as of Python 3, instead of being equivalent to byte strings as in Python 3, so add this line just under the above one:

p_dict['boundary'] = bytes(p_dict['boundary'], "utf-8")

Thirdly, to fix the error of not having 'CONTENT-LENGTH' in pdict just add these lines before the if statement:

content_len = int(self.headers.get('Content-length'))
p_dict['CONTENT-LENGTH'] = content_len

Full solution on my Github:

https://github.com/rSkogeby/web-server



回答3:

I am doing the same course and was running into the same problem. Instead of getting it to work with cgi I am now using the parse library. This was shown in the same course just a few lessons earlier.

from urllib.parse import parse_qs

length = int(self.headers.get('Content-length', 0))
body = self.rfile.read(length).decode()
params = parse_qs(body)

messagecontent = params["message"][0]

And you have to get rid of the enctype='multipart/form-data' in your form.



回答4:

Another hack solution is to edit the source of the cgi module.

At the very beginning of the parse_multipart (around the 226th line): Change the usage of the boundary to str(boundary)

...
boundary = b""
if 'boundary' in pdict:
    boundary = pdict['boundary']
if not valid_boundary(boundary):
    raise ValueError('Invalid boundary in multipart form: %r'
                        % (boundary,))

nextpart = b"--" + str(boundary)
lastpart = b"--" + str(boundary) + b"--" 
...


回答5:

In my case I used cgi.FieldStorage to extract file and name instead of cgi.parse_multipart

form = cgi.FieldStorage(
    fp=self.rfile,
    headers=self.headers,
    environ={'REQUEST_METHOD':'POST',
             'CONTENT_TYPE':self.headers['Content-Type'],
             })

print('File', form['file'].file.read())
print('Name', form['name'].value)