Python: Google API - Getting mimeTypes from a mess

2020-03-04 12:34发布

问题:

My goal is to use the Google API to take data from an email I have specified. Currently I can find the message, get the message data and decode the message data into a readable format. After this I need to find the correct part of my message (type text/html) and then scan for my link using beautiful soup. Unfortunately I don't understand enough about the structure of the email/Google API to scan for this specific part of the mail.

        try:
            message = gmail_service.users().messages().get(userId='me', id=thread['id'], format='raw').execute()
            print 'Message snippet: %s' % message['snippet']
            msg_str = base64.urlsafe_b64decode(message['raw'].encode('ASCII'))
            mime_msg = email.message_from_string(msg_str)

            print mime_msg        #this line gives the output I quoted
            for parts in mime_msg['payload']:    #this line produces error quoted
                if parts['text/html']:
                    mylink = base64.urlsafe_b64decode(part[0]['body']['data'].encode('UTF-8'))
                    print mylink

And the error that this code gives me is:

Traceback (most recent call last):
  File "gmailAPI.py", line 55, in <module>
    for parts in mime_msg['payload']:
TypeError: 'NoneType' object is not iterable

In the output for the code I also receive the information on different parts of the mail, this is the part I want:

----boundary_1_81681de2-2c9a-4827-802a-91544e5e6e28
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: base64

PCFET0NUWVBFIGh0bWwgUFVCTElDICItLy9XM0MvL0RURCBIVE1MIDQuMDEgVHJhbnNpdGlvbmFsLy9FTiINCiAgICJodHRwOi8vd3d3LnczLm9yZy9UUi9odG1sNC9sb29zZS5kdGQiPg0KDQo8aHRtbCBsYW5nPSJlbiI+DQo8aGVhZD4NCgk8bWV0YSBodHRwLWVxdWl2PSJDb250ZW50LVR5cGUiIGNvbnRlbnQ9InRleHQvaHRtbDsgY2hhcnNldD11dGYtOCI+DQoJPHRpdGxlPlNpZ251cDwvdGl0bGU+DQo8L2hlYWQ+DQoNCjxib2R5IGJnY29sb3I9IiNmZmZmZmYiIHRvcG1hcmdpbj0iMCIgbGVmdG1hcmdpbj0iMCIgbWFyZ2luaGVpZ2h0PSIwIiBtYXJnaW53aWR0aD0iMCIgc3R5bGU9Ii13ZWJraXQtZm9udC1zbW9vdGhpbmc6IGFudGlhbGlhc2VkO3dpZHRoOjEwMCUgIWltcG9ydGFudDtiYWNrZ3JvdW5kOiNmZmZmZmY7LXdlYmtpdC10ZXh0LXNpemUtYWRqdXN0Om5vbmU7Ij4NCg0KPHRhYmxlIHdpZHRoPSIxMDAlIiBjZWxscGFkZGluZz0iMCIgY2VsbHNwYWNpbmc9IjAiIGJvcmRlcj0iMCIgYmdjb2xvcj0iI2ZmZmZmZiI+DQoJPHRyPg0KCQk8dGQgYmdjb2xvcj0iI2ZmZmZmZiIgd2lkdGg9IjEwMCUiPg0KCQkJPHRhYmxlIHdpZHRoPSI2MDAiIGNlbGxwYWRkaW5nPSIwIiBjZWxsc3BhY2luZz0iMCIgYm9yZGVyPSIwIiBhbGlnbj0iY2VudGVyIiBjbGFzcz0i
dGFibGUiPg0KCQkJCTx0cj4NCgkJCQkJPHRkIHdpZHRoPSI2MDAiIGNsYXNzPSJjZWxsIj4NCgkgICAJCQkJCTx0YWJsZSB3aWR0aD0iNjAwIiBjZWxscGFkZGluZz0iMCIgY2VsbHNwYWNpbmc9IjAiIGNsYXNzPSJtYXN0Ij4NCgkJCQkJCQk8dHI+DQoJCQkJCQkJCTx0ZCB3aWR0aD0iMjUwIiBiZ2NvbG9yPSIjZmZmZmZmIj4NCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIDxpbWcgc3JjPSJjaWQ6QlRfTG9nby5qcGciIGFsdD0iQm90dG9tbGluZSBsb2dvIiBzdHlsZT0iLW1zLWludGVycG9sYXRpb24tbW9kZTpiaWN1YmljOyI+PGJyLz48YnIgLz4NCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgPC90ZD4NCiAgICAgICAgICAgICAgICAgICAgICAgICAgIAk8L3RyPg0KICAgICAgICAgICAgICAgICAgICAgICAgICAgIDx0cj4NCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgPHRkIGFsaWduPSJsZWZ0IiB3aWR0aD0iMzUwIiBzdHlsZT0icGFkZGluZy1ib3R0b206IDE1cHg7IiB2YWxpZ249InRvcCIgYmdjb2xvcj0iI2ZmZmZmZiIgY2xhc3M9InN1YkxvZ28iPjxpbWcgc3JjPSJjaWQ6QlRfTGluZS5qcGciIGFsdD0ibGluZSI+PC90ZD4NCiAgICAgICAgICAgICAgICAgICAgICAgICAgICA8L3RyPg0KCQkJCQkJPC90YWJsZT4JDQogICAgICAgICAgICAgICAgICAgICAgICA8dGFibGUgd2lkdGg9IjEwMCUiIGNlbGxwYWRkaW5nPSIwIiBjZWxsc3BhY2luZz0iMCIgYm9yZGVyPSIwIj4NCiAgICAgICAgICAgICAgICAgICAgICAgICAgICA8dHI+DQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIDx0ZCBiZ2NvbG9yPSIjZmZmZmZmIiBzdHlsZT0icGFkZGluZzogMjBweDsiIGNsYXNzPSJlbnRyeSIgdmFsaWduPSJ0b3AiPg0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICA8c3BhbiBzdHlsZT0iY29sb3I6IzMzMzMzMztmb250LXNpemU6MTRweDtsaW5lLWhlaWdodDoxLjI7Zm9udC1mYW1pbHk6J0hlbHZldGljYSBOZXVlJyxIZWx2ZXRpY2EsQXJpYWwsc2Fucy1zZXJpZjttYXJnaW4tYm90dG9tOjA7cGFkZGluZy10b3A6MDtwYWRkaW5nLWJvdHRvbTowO2ZvbnQtd2VpZ2h0Om5vcm1hbDsiPg0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgPGJyLz5UaGFuayB5b3UgZm9yIGNob29zaW5nIGVDb25uZWN0IE9ubGluZSBmcm9tIEJvdHRvbWxpbmUgVGVjaG5vbG9naWVzOyB5b3VyIHNlY3VyZSBjbG91ZCBkb2N1bWVudCBkZWxpdmVyeSBzZXJ2aWNlLiBUbyBjb21wbGV0ZSB0aGUgIHNldHVwIG9mIHlvdXIgYWNjb3VudCwgcGxlYXNlIGZvbGxvdyB0aGUgbGluayBiZWxvdy48YnIgLz4NCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIDwvc3Bhbj48YnIgLz48YnIgLz48YnIgLz4NCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgPGEgc3R5bGU9ImJhY2tncm91bmQtY29sb3I6IzNlN2Q2NTt0ZXh0LWRlY29yYXRpb246bm9uZTsgZm9udC1mYW1pbHk6J0hlbHZldGljYSBOZXVlJyxIZWx2ZXRpY2EsQXJpYWwsc2Fucy1zZXJpZjsgY29sb3I6I2ZmZmZmZjsgcGFkZGluZy10b3A6OHB4OyBwYWRkaW5nLWJvdHRvbTo4cHg7IHBhZGRpbmctbGVmdDo4cHg7IHBhZGRpbmctcmlnaHQ6OHB4OyBmb250LXNpemU6MThweDsgbWFyZ2luOiA4cHg7IiBocmVmPSJodHRwOi8vZWNvbm5lY3QuZW1lYS1ib3R0b21saW5lLnJvb3QuYm90dG9tbGluZS5jb20vYXBpL2FjY291bnQvc2lnbnVwY29tcGxldGUvMjBhNTE4YjktZGIzZS00OTkzLWFjN2UtYjE0YzZjMGVkMzMzIj48c3BhbiBzdHlsZT0iY29sb3I6I2ZmZmZmZiI+Q29tcGxldGUgQWNjb3VudCBTZXR1cCAmcmFxdW87PC9zcGFuPjwvYT48YnIgLz48YnIgLz48YnIgLz4NCg0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgPHNwYW4gc3R5bGU9ImNvbG9yOiMzMzMzMzM7Zm9udC1mYW1pbHk6J0hlbHZldGljYSBOZXVlJyxIZWx2ZXRpY2EsQXJpYWwsc2Fucy1zZXJpZjtmb250LXNpemU6MTRweDtsaW5lLWhlaWdodDoxLjI7Zm9u
dC1mYW1pbHk6J0hlbHZldGljYSBOZXVlJyxIZWx2ZXRpY2EsQXJpYWwsc2Fucy1zZXJpZjttYXJnaW4tYm90dG9tOjA7cGFkZGluZy10b3A6MDtwYWRkaW5nLWJvdHRvbTowO2ZvbnQtd2VpZ2h0Om5vcm1hbDsiPg0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgPGJyLz5LaW5kIFJlZ2FyZHMsPGJyIC8+DQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgVGhlIGVDb25uZWN0IE9ubGluZSBUZWFtDQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICA8L3NwYW4+PGJyIC8+PGJyIC8+DQoNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIDxzcGFuIHN0eWxlPSJjb2xvcjojMzMzMzMzO2ZvbnQtc2l6ZToxNHB4O2xpbmUtaGVpZ2h0OjEuMjtmb250LWZhbWlseTonSGVsdmV0aWNhIE5ldWUnLEhlbHZldGljYSxBcmlhbCxzYW5zLXNlcmlmO21hcmdpbi1ib3R0b206MDtwYWRkaW5nLXRvcDowO3BhZGRpbmctYm90dG9tOjA7Zm9udC13ZWlnaHQ6bm9ybWFsOyI+DQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICA8YnIvPkZvciBzdXBwb3J0IHBsZWFzZSBjb250YWN0OiBlbWVhLXN1cHBvcnRAYm90dG9tbGluZS5jb20gPGJyIC8+DQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgVGVsOiAwODcwIDA4MSA4MjUwPGJyIC8+DQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICA8L3NwYW4+DQoNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgPC90ZD4NCiAgICAgICAgICAgICAgICAgICAgICAgICAgICA8L3RyPg0KICAgICAgICAgICAgICAgICAgICAgICAgPC90YWJsZT4NCiAgICAgICAgICAgICAgICAgICAgICAgIDxici8+DQoJCQkJCQkJCQkNCiAgICAgICAgICAgICAgICAgICAgPC90ZD4NCiAgICAgICAgICAgICAgICA8L3RyPg0KICAgICAgICAgICAgPC90YWJsZT4NCgkJPC90ZD4NCgk8L3RyPg0KPC90YWJsZT4NCgkJCQkNCjx0YWJsZSB3aWR0aD0iNjAwIiBjZWxscGFkZGluZz0iMCIgY2VsbHNwYWNpbmc9IjAiIGJvcmRlcj0iMCIgYWxpZ249ImNlbnRlciIgY2xhc3M9ImZvb3RlciI+DQogICAgPHRyPg0KICAgICAgICA8dGQ+DQogICAgICAgICAgICA8dGFibGUgd2lkdGg9IjYwMCIgY2VsbHBhZGRpbmc9IjAiIGNlbGxzcGFjaW5nPSIwIiBib3JkZXI9IjAiIGFsaWduPSJjZW50ZXIiIGNsYXNzPSJ0YWJsZSIgc3R5bGU9ImJvcmRlci10b3A6MXB4IHNvbGlkICNjY2NjY2M7Ij4NCiAgICAgICAgICAgICAgICA8dHI+DQogICAgICAgICAgICAgICAgICAgIDx0ZD4NCiAgICAgICAgICAgICAgICAgICAgICAgIDwvdGQ+DQogICAgICAgICAgICAgICAgPC90cj4NCiAgICAgICAgICAgICAgICA8dHI+DQogICAgICAgICAgICAgICAgICAgIDx0ZD48cCBzdHlsZT0iZm9udC1mYW1pbHk6dmVyZGFuYTsgY29sb3I6IzQ0NDQ0NDsgZm9udC1zaXplOjEwcHg7Ij4NCiAgICAgICAgICAgICAgICAgICAgICAgICAgICZjb3B5OyAyMDE0IEJvdHRvbWxpbmUgVGVjaG5vbG9naWVzLCBJbmMuIEFsbCBSaWdodHMgUmVzZXJ2ZWQ8L3A+PC90ZD4NCiAgICAgICAgICAgICAgICA8L3RyPgkNCiAgICAgICAgICAgIDwvdGFibGU+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICANCiAgICAgICAgPC90ZD4NCiAgICA8L3RyPg0KPC90YWJsZT4NCgkNCjwvYm9keT4NCjwvaHRtbD4NCg==

Link to full dump from my code

Edit: My fixed code

try:
        message = gmail_service.users().messages().get(userId='me', id=thread['id'], format='raw').execute()
        # print 'Message snippet: %s' % message['snippet']
        msg_str = base64.urlsafe_b64decode(message['raw'].encode('ASCII'))
        msg = email.message_from_string(msg_str)

        for part in msg.walk():
            msg.get_payload()
            if part.get_content_type() == 'text/html':
                mytext = base64.urlsafe_b64decode(part.get_payload().encode('UTF-8'))
                # print part.get_payload()
                print mytext

The information found on my chosen answers documentation link was invaluable in solving my issue!

回答1:

To iterate through the parts of a multipart message in Python, you should use get_payload(): https://docs.python.org/2/library/email.message.html#email.message.Message.get_payload

In your example, the call to mime_msg['payload'] is looking up a message header named "payload", which doesn't exist and is not what you want anyway.

Once you have a part in hand, you can check its type using part['Content-Type'] to examine the Content-Type header.

In general, MIME messages are trees of parts, so you may need to recurse.