Search utf-8 string with Gmail X-GM-RAW IMAP comma

2019-05-27 05:14发布

问题:

Gmail's imap extension command X-GM-RAW allows me to perform a search if I use a ascii query string. If utf-8 chars are used in the query, the imap returns bad response.

https://developers.google.com/google-apps/gmail/imap_extensions#extension_of_the_search_command_x-gm-raw

How should the utf-8 input string be encoded so that X-GM-RAW search will work fine. I do not want to loose the flexibility to search specific field like "subject" or "rfc833msgid"

Thanks

回答1:

Specify CHARSET UTF-8 and send the UTF-8 search term in a literal. For example, to search for 你好, which is 6 bytes long when encoded in UTF-8:

A SEARCH CHARSET UTF-8 X-GM-RAW {6}
+ go ahead
你好
* SEARCH 15
a OK SEARCH completed (Success)

In this example you would actually send the 6-byte UTF-8 encoding of 你好 on the third line.

This will work for any SEARCH keyword that accepts an astring, including SUBJECT and HEADER MESSAGE-ID.



回答2:

IMAP isn't 8-bit clean, so it has to use a variety of different encodings to represent any 8-bit data.

For things like folders and labels IMAP4 uses Modified UTF-7 to represent these characters. Conveniently, ascii data encoded in modified utf7 encodes as itself, so normally nothing special needs to be done.

For message headers (including subjects) the text is encoded as Mime words.

And finally atttachments are generally encoded as either Base64 or Quoted-Printable

My best guess is that GMail uses modified utf7 for their X-GM-RAW queries. The best reference implementation for modified utf7 I've found is in the IMAPClient python library

Hope this helps!