I am currently using Python imaplib to process email text.
I use fetch command to fetch the raw data email from GMail server. However, I found one thing really tricky - the equal sign '='. It is not a normal equal sign but a special symbol.
For example:
'=' sometimes acts as the hyphenation mark at the end of text line:
Depending upon your module selections, course lecturers may also contact yo= u with preparatory work over the next few weeks. It would be wise to start = reviewing the preparatory reading lists provided on the module syllabi now =
Sometimes, it acts as a escape mark similar to '%', for example:
a=20b
is actuallya<SPACE>b
=46rom here
is actuallyFrom here
I am totally confused about such weird notation. I think there must be a guidance to handle this because GMail can handle such thing correctly in their apps.
I see that this is related to HTML encoding, just like '%' will be encoded. But the problem is, all I get from the IMAP response is a string that contain this '=' symbol. How should I handle this? Using regular expression?
In a nutshell, an equal sign at the end of a line indicates a soft line break. An equal sign followed by two hexadecimal characters (0-9, A-F) encodes a single octet (byte).
This encoding scheme is called "quoted printable" and is defined in section 6.7 of RFC 2045. See items (1) and (5), in particular.