Gmail API: where to find body of email depending o

2019-01-15 09:07发布

问题:

I am making a request to the User.messages endpoint. All objects returned (the emails) have a mimeType property which I'm struggling to understand.

More specifically, I want to be able to extract the body of the email depending of the mimeType since I've been able to notice that depending on the mimeType, the body will be inside the body property in payload, or in the parts array. What are the different mimeTypes that can be returned, and where can I find the body of the email for each one of them?

回答1:

I think it will make sense if you think of the payload as a part in of itself. Let's say I send a message with just a subject and a plain message text:

From: emtholin@gmail.com
To: emtholin@gmail.com
Subject: Example Subject

This is the plain text message

This will result in the following parsed message:

{
 "id": "154ecb53c10b74d8",
 "threadId": "154ecb53c10b74d8",
 "labelIds": [
  "INBOX",
  "SENT"
 ],
 "snippet": "This is the plain text message",
 "historyId": "38877",
 "internalDate": "1464260181000",
 "payload": {
  "partId": "",
  "mimeType": "text/plain",
  "filename": "",
  "headers": [
   ...
  ],
  "body": {
   "size": 31,
   "data": "VGhpcyBpcyB0aGUgcGxhaW4gdGV4dCBtZXNzYWdlCg=="
  }
 },
 "sizeEstimate": 355
}

If I send a message with a plain text part, a html part and an image, it will look like this when parsed:

{
 "id": "154ed5ccaa12f3df",
 "threadId": "154ed5ccaa12f3df",
 "labelIds": [
  "SENT",
  "INBOX",
  "IMPORTANT"
 ],
 "snippet": "This is a plain/html message with an image.",
 "historyId": "841379",
 "internalDate": "1464271162000",
 "payload": {
  "mimeType": "multipart/mixed",
  "filename": "",
  "headers": [
     ...
  ],
  "body": {
   "size": 0
  },
  "parts": [
   {
    "mimeType": "multipart/alternative",
    "filename": "",
    "headers": [
     {
      "name": "Content-Type",
      "value": "multipart/alternative; boundary=089e0122896c7c80d80533bf3205"
     }
    ],
    "body": {
     "size": 0
    },
    "parts": [
     {
      "partId": "0.0",
      "mimeType": "text/plain",
      "filename": "",
      "headers": [
       {
        "name": "Content-Type",
        "value": "text/plain; charset=UTF-8"
       }
      ],
      "body": {
       "size": 47,
       "data": "VGhpcyBpcyBhIHBsYWluL2h0bWwgKm1lc3NhZ2UqIHdpdGggYW4gaW1hZ2UuDQo="
      }
     },
     {
      "partId": "0.1",
      "mimeType": "text/html",
      "filename": "",
      "headers": [
       {
        "name": "Content-Type",
        "value": "text/html; charset=UTF-8"
       }
      ],
      "body": {
       "size": 73,
       "data": "PGRpdiBkaXI9Imx0ciI-VGhpcyBpcyBhIHBsYWluL2h0bWwgPGI-bWVzc2FnZTwvYj4gd2l0aCBhbiBpbWFnZS48L2Rpdj4NCg=="
      }
     }
    ]
   },
   {
    "partId": "1",
    "mimeType": "image/png",
    "filename": "smile.png",
    "headers": [
       ...
    ],
    "body": {
     "attachmentId": "ANGjdJ-OrSy7VAYL-UbRyNtmySbZLlV-fV43zJF0_neNGZ8yKugsZAxb32eSb-CrbYIhF9NvjGwBVEjSkRrUWoCS7aDpgoQnt9WR7f2sa17qVEyOg_JVSbrGrunirvQw2dY-SxxB3Y0JP3aYDHSBXpNO6fFCByVFWQDw1et5Mh9di7bGO4AWOLKFVe_Yb2RmdDwuazGXGb8zA88TTMaiEPIacPTNiVtBrIWG0EKGxHBhep9j8ujyWeCS5P9X80dBHvBNj4T9XjUwcrN6FvwegRewRMM9cBupY7jQESR7915OcbhCNyi5l64x6vVh1ZU",
     "size": 2002
    }
   }
  ]
 },
 "sizeEstimate": 3077
}

You will see it's just the RFC822-message parsed to JSON. If you just traverse the parts, and treat the payload as a part itself, you will find what you are looking for.

var parts = [response.payload];

while (parts.length) {
  var part = parts.shift();
  if (part.parts) {
    parts = parts.concat(part.parts);
  }

  if(part.mimeType === 'text/html') {
    var decodedPart = decodeURIComponent(escape(atob(part.body.data.replace(/\-/g, '+').replace(/\_/g, '/'))));
    console.log(decodedPart);
  }
}


回答2:

There are many MIME types that can be returned, here are a few:

  • text/plain: the message body only in plain text
  • text/html: the message body only in HTML
  • multipart/alternative: will contain two parts that are alternatives for each othe, for example:
    • a text/plain part for the message body in plain text
    • a text/html part for the message body in html
  • multipart/mixed: will contain many unrelated parts which can be:
    • multipart/alternative as above, or text/plain or text/html as above
    • application/octet-stream, or other application/* for application specific mime types for attachments
    • image/png ot other image/* for images, which could be embedded in the message.

The definitive reference for all this is RFC 2046 https://www.ietf.org/rfc/rfc2046.txt (you might want to also see 2044 and 2045)

To answer your question, build a tree of the message, and look either for:

  • the first text/plain or text/html part (either in the message body or in a multipart/mixed)
  • the first text/plain or text/html inside of a multipart/alternative, which may be part of a multipart mixed.

An example of a complex message:

  • multipart/mixed

    • multipart/alternative
      • text/plain <- message body in plain text
      • text/html <- message body in HTML
    • application/zip <- a zip file attachment
  • -


回答3:

I know this question is not new but I've wrote a PHP script which correctly parses messages pulled from Gmail API, including any type of attachment.

The script includes a recursive "iterateParts" function which iterates all message parts so we can be sure we extracted all available data from each message.

Script steps are:

  1. Pull all message ids from API
  2. Get some important headers (subject & from address)
  3. Either body is directly on payload or send payload to iterateParts
  4. iterateParts is parsing each message to $msgArr with it's data, base64 encoded
  5. Push $msgArr to master array $allmsgArr
  6. Traverse master array and save each part as file according to it's MIME type and filename

    $maxToPull = 1;
    $gmailQuery = "ALL";

    // Initializing Google API
    $service = new Google_Service_Gmail($client);

    // Pulling all gmail messages into $messages array
    $user = 'me';
    $msglist = $service->users_messages->listUsersMessages($user, ["maxResults"=>$maxToPull, "q"=>$gmailQuery]);
    $messages = $msglist->getMessages();

    // Master array that will hold all parsed messages data, including attachments
    $allmsgArr = array();

    // Traverse each message
    foreach($messages as $message)
    {
        $msgArr = array();
        $single_message = $service->users_messages->get('me', $message->getId());
        $payload = $single_message->getPayload();

        // Nice to have the gmail msg id, can be used to direct access the message in Gmail's web gui
        $msgArr['gmailmsgid'] = $message->getId();

        // Retrieving the subject and "from" email address
        foreach($payload->getheaders() as $oneheader)
        {
            if($oneheader['name'] == 'Subject')
                $msgArr['subject'] = $oneheader['value'];
            if($oneheader['name'] == 'From')
                $msgArr['fromaddress'] = substr($oneheader['value'], strpos($oneheader['value'], '<')+1, -1);
        }

        // If body is directly in the message payload (only for plain text messages where there's no HTML part and no attachments, normally this is not the case)
        if($payload['body']['size'] > 0)
            $msgArr['textplain'] = $payload['body']['data'];     
        // Else, iterate over each message part and continue to dig if necessary
        else
            iterateParts($payload, $message->getId());

        // Push the parsed $msgArr (parsed by iterateParts) to master array
        array_push($allmsgArr, $msgArr);
    }


    // Traverse each parsed message and saving it's content and attachments to files
    foreach($allmsgArr as $onemsgArr)
    {

        $folder = "messages/".$onemsgArr['gmailmsgid'];
        mkdir($folder);

        if($onemsgArr['textplain'])
            file_put_contents($folder."/textplain.txt", decodeData($onemsgArr['textplain']));
        if($onemsgArr['texthtml'])
            file_put_contents($folder."/texthtml.html", decodeData($onemsgArr['texthtml']));
        if($onemsgArr['attachments'])
        {
            foreach($onemsgArr['attachments'] as $oneattachment)
            {
                if(!empty($oneattachment['filename']))
                    $filename = $oneattachment['filename'];
                else if($oneattachment['mimetype'] == "message/rfc822" && empty($oneattachment['filename'])) // email attachments
                    $filename = "noname.eml";
                else
                    $filename = "unknown";
                file_put_contents($folder."/".$filename, decodeData($oneattachment['data']));
            }
        }
    }


    function iterateParts($obj, $msgid) {

        global $msgArr;
        global $service;
        foreach($obj as $parts)
        {
            // if found body data
            if($parts['body']['size'] > 0)
            {
                // plain text representation of message body
                if($parts['mimeType'] == 'text/plain')
                {
                    $msgArr['textplain'] = $parts['body']['data'];
                }
                // html representation of message body
                else if($parts['mimeType'] == 'text/html')
                {
                    $msgArr['texthtml'] = $parts['body']['data'];
                }
                // if it's an attachment
                else if(!empty($parts['body']['attachmentId']))
                {
                    $attachArr['mimetype'] = $parts['mimeType'];
                    $attachArr['filename'] = $parts['filename'];
                    $attachArr['attachmentId'] = $parts['body']['attachmentId'];

                    // the message holds the attachment id, retrieve it's data from users_messages_attachments
                    $attachmentId_base64 = $parts['body']['attachmentId'];
                    $single_attachment = $service->users_messages_attachments->get('me', $msgid, $attachmentId_base64);

                    $attachArr['data'] = $single_attachment->getData();

                    $msgArr['attachments'][] = $attachArr;
                }       
            }

            // if there are other parts inside, go get them
            if(!empty($parts['parts']) && !empty($parts['mimeType']) && empty($parts['body']['attachmentId']))
            {
                iterateParts($parts->getParts(), $msgid);
            }

        }
    }

    // All data returned from API is base64 encoded
    function decodeData($data)
    {
        $sanitizedData = strtr($data,'-_', '+/');
        return base64_decode($sanitizedData);
    }

This is how $allmsgArr will look like (where only one message was pulled):


Array
(
    [0] => Array
        (
            [gmailmsgid] => 25k1asfa556x2da
            [fromaddress] => john@gmail.com
            [subject] => Fwd: Sea gulls picture
            [textplain] => UE5SIDQxQzAwMg0KDQpBUkJFTFRFU1QxDQoNCg0K
            [texthtml] => PGRpdiBkaXI9Imx0ciI-PHNwYW4gc3R5bGU9ImZi
            [attachments] => Array
                (
                    [0] => Array
                        (
                            [mimetype] => image/png
                            [filename] => sea_gulls.png
                            [attachmentId] => ANGjdJ9tmy4d8vPXhU_BjNEFEaDODOpu29W2u5OTM7a0
                            [data] => iVBORw0KGgoAAAANSUhEUgAABSYAAAKWCAYAAABUP
                        )

                    [1] => Array
                        (
                            [mimetype] => image/jpeg
                            [filename] => Outlook_Signature.jpg
                            [attachmentId] => ANGjdJ-CgZTK0oK44Q8j7TlN_JlaexxGKZ_wHFfoEB
                            [data] => 6jRXhpZgAATU0AKgAAAAgABwESAAMAAAABAAEAAAEa
                        )

                )
        )
)



标签: gmail-api