A bit of context...
Some time ago, I wrote Python a program that deals with email messages, one thing that always comes across is to know whether an email is "multipart" or not.
After a bit of research, I knew that it has something to do with emails containing HTML, or attachments etc... But I didn't really understand it.
My usage of it was limited to 2 instances:
1. When I had to save the attachment from the raw email
I just found this on the internet (probably on here - Sorry for not crediting the person who wrote it but I can't seem to find him again :/) and pasted it in my code
def downloadAttachments(emailMsg, pathToSaveFile):
"""
Save Attachments to pathToSaveFile (Example: pathToSaveFile = "C:\\Program Files\\")
"""
att_path_list = []
for part in emailMsg.walk():
# multipart are just containers, so we skip them
if part.get_content_maintype() == 'multipart':
continue
# is this part an attachment ?
if part.get('Content-Disposition') is None:
continue
filename = part.get_filename()
att_path = os.path.join(pathToSaveFile, filename)
#Check if its already there
if not os.path.isfile(att_path) :
# finally write the stuff
fp = open(att_path, 'wb')
fp.write(part.get_payload(decode=True))
fp.close()
att_path_list.append(att_path)
return att_path_list
2. When I had to get the text from the raw email
Also pasted from someone on the internet without really understanding how it works.
def get_text(emailMsg):
"""
Output: body of the email (text content)
"""
if emailMsg.is_multipart():
return get_text(emailMsg.get_payload(0))
else:
return emailMsg.get_payload(None, True)
What I do understand...
Is that if the email message is multipart, the parts can be iterated over.
My question is
What exactly are these parts? How do you know which one is html for example? Or which one is an attachment? Or just the body?
There is no strict hierarchy or guidance for how exactly to use multipart messages. MIME simply defines a way to collect multiple payloads into a single email message. One of the original motivations I believe was to be able to embed pictures in text; but being able to attach binaries to a text message, and more generally, being able to create structured messages with payloads which are related in arbitrary ways is something which has simply been there for applications to use in whatever way they see fit.
A common misunderstanding is postulating a hierarchy into a "main part" and "subordinate" parts. It's certainly possible to create this structure, but it is by no means universally done. In fact, most multipart messages simply have a sequence of parts without any hierarchy. The user's email client will commonly pick one of the "inline" parts as the preferred "main" part to display in a message pane, but this is by no means dictated by the standard, or possible to enforce by the sending party.
Each MIME part has a set of headers which tell you the type, encoding, and disposition; for parts of type
text/*
the default disposition is "inline" (so it is often not explicitly spelled out) whereas most other parts have a default disposition of "attachment". You'll need to refer to the pertinent standards for a strict definition, but probably take it with a grain of salt, because many real-world applications are not particularly RFC-conformant.For your concrete question, find the topmost leaf parts which are (implicitly or explicitly) inline, and display one which supports your use case as the "main" one. If you want to enforce HTML as the preferred format, you can do that; but many email applications defer this to the user to decide, and some users will definitely -- because of technical necessity, physical disabilities, or personal taste -- prefer plain-text when it's available.
Unfortunately, common practice by message producers recently has been to create a
multipart/alternative
container withtext/plain
andtext/html
members, but then provide a completely uselesstext/plain
part and have all the actual content in atext/html
part. The correct arrangement in this situation would be to simply not supply atext/plain
part if you can't put anything useful in it (but I guess they only care about getting past some misguided spam filter, not about actually accommodating the preferences of the recipients).The answers that you're looking for are all in the MIME standard, especially:
These standards together transformed e-mails from plaintext, English-only state to its current status where we have interesting ways of sending Unicode poo, properietary bitmaps with cute kittens, and also dozens of ways for non-conformant software and middleboxes along the path to corrupt the message in subtle and non-subtle ways. More details for these features are in:
For the IMAP-specific part of your question, i.e., how to best access the MIME tree of these parts via IMAP, see RFC3501, especially chapters which speak about
BODY
andBODYSTRUCTURE
constructs.If you would like to marvel at the beauty of MIME in action, take a look at the "MIME torture test". It is a bit tricky to find, because this random item on github is definitely not what I meant. Here's the original from Mark Crispin, an engineer who created IMAP:
Yes, that's a lot of reading. Unfortunately, you will really need to understand all of the above to handle MIME properly and safely. Please, do not skip these resources and standards unless you want to create abominations such as a random bulk-mailer which consistently splits non-ASCII codepoints in UTF-8 into several adjacent MIME encoded chunks, etc. Thank you.