Basically I want to read all new emails from an inbox and put them in a database. The reason I use python is because it has imaplib
, but I know nothing about it.
Currently, I have something like this :
def primitive_get_text_blocks(email_message_instance):
maintype = email_message_instance.get_content_maintype()
if maintype == 'multipart':
return_parts = ""
for part in email_message_instance.get_payload():
if part.get_content_maintype() == 'text':
return_parts+= " "+ part.get_payload()
return return_parts
elif maintype == 'text':
return email_message_instance.get_payload()
return ""
fromField=con.escape(email_message["From"])
contentField=con.escape(primitive_get_text_blocks(email_message))
primitive get_text_blocks
is copy pasted from somewhere.
The result is that I get database entries like this :
<META http-equiv=3D"Content-Type" content=3D"text/html; charset=3DUTF-8">
From what I understand, that has something to do with being encoded in utf-7
. So I changed to get_payload(decode=True)
, but that gives me byte-arrays. If I append another decode('utf-8')
, it sometimes crashes with errors like
'codec error can't decode to ...'.
I don't know how encodings work, I only want a unicode string with the body of my email.
Why is there no simple convert(charset from, charset to)
? How do I get a readable email body (and address?). I've discovered IMAP Fetch Encoding and using decode_header
I got no further.
--
I assume encoding is the way bytes represent characters, so with that in mind, shouldn't decode take a byte array and spit out a string? and here on stack overflow I came across somebody claming it had something to do with beeing encoded with utf-8
and utf-7
. What does that even mean?
I did google and there appear to be tons of duplicates but the answers they got didn't really help me out (I've tried most of them)