Gmail's imap extension command X-GM-RAW allows me to perform a search if I use a ascii query string. If utf-8 chars are used in the query, the imap returns bad response.
https://developers.google.com/google-apps/gmail/imap_extensions#extension_of_the_search_command_x-gm-raw
How should the utf-8 input string be encoded so that X-GM-RAW search will work fine. I do not want to loose the flexibility to search specific field like "subject" or "rfc833msgid"
Thanks
Specify CHARSET UTF-8 and send the UTF-8 search term in a literal. For example, to search for 你好, which is 6 bytes long when encoded in UTF-8:
A SEARCH CHARSET UTF-8 X-GM-RAW {6}
+ go ahead
你好
* SEARCH 15
a OK SEARCH completed (Success)
In this example you would actually send the 6-byte UTF-8 encoding of 你好 on the third line.
This will work for any SEARCH keyword that accepts an astring, including SUBJECT and HEADER MESSAGE-ID.
IMAP isn't 8-bit clean, so it has to use a variety of different encodings to represent any 8-bit data.
For things like folders and labels IMAP4 uses Modified UTF-7 to represent these characters. Conveniently, ascii data encoded in modified utf7 encodes as itself, so normally nothing special needs to be done.
For message headers (including subjects) the text is encoded as Mime words.
And finally atttachments are generally encoded as either Base64 or Quoted-Printable
My best guess is that GMail uses modified utf7 for their X-GM-RAW queries. The best reference implementation for modified utf7 I've found is in the IMAPClient python library
Hope this helps!