I'm downloading messages from IMAP with imaplib
into a mbox (with mailbox
module):
import imaplib, mailbox
svr = imaplib.IMAP4_SSL('imap.gmail.com')
svr.login('myname@gmail.com', 'mypaswword')
resp, [countstr] = svr.select("[Gmail]/All Mail", True)
mbox = mailbox.mbox('mails.mbox')
for n in range(...):
resp, lst1 = svr.fetch(n, 'UID') # the UID of the message
resp, lst2 = svr.fetch(n, '(RFC822)') # the message itself
mbox.add(lst2[0][1]) # add the downloaded message to the mbox
#
# how to store the UID of this current mail inside mbox?
#
Let's download the mails with UID = 1 .. 1000
. Next time, I would like to begin at the 1001th message and not from the 1st. However, mailbox.mbox
does not store the UID
anywhre. So next time I will open the mbox file, it will be impossible to know where we stopped.
Is there a natural way with the module mailbox
to store the UID
of the emails?
Or maybe I don't use mailbox
+ imaplib
the way it should ?
To answer your question: after staring at the docs for a long time I didn't see any cleanly way to do what you are looking for. If it is an absolute requirement that the UIDs be stored in the mbox file, then I'd suggest adding a custom UID header to the emails that you are storing:
message = email.message_from_string(lst2[0][1])
message.add_header("my_internal_uid_header", lst1[0][1])
mbox.add(message)
Now of course it is a HUGE pain to get the largest saved UID because you have to iterate through all the messages. I imagine that this would be really bad. If at all possible it would be better to store such information elsewhere.
Best of luck!
I hope it will be useful:
1) libraries and environment Win7 Anaconda3-4.3.1-Windows-x86_64.exe (new is available but that what I have used
2) To list all your mailboxes:
import getpass, imaplib, sys
def main():
hostname = 'my.mail.server'
username = 'my_user_name'
m = imaplib.IMAP4_SSL(hostname)
m.login(username, 'passowrd')
try:
print('Capabilities:', m.capabilities)
print('Listing mailboxes ')
status, data = m.list()
print('Status:', repr(status))
print('Data:')
for datum in data:
print(repr(datum))
finally:
m.logout()
if __name__ == '__main__':
main()
3) Using generated above information we can dump all email messages from mail server to the directories:
import getpass, imaplib, sys, email, os , io
import codecs
BASE_NAME = 'msg_no_'
BASE_DIR = 'D:/my_email/'
def writeTofile(mailDir, partOfName, msg ):
## no need of dos backslash -- newDir = BASE_DIR + mailDir.replace('/', '\\')
newDir = BASE_DIR + mailDir
if not os.path.exists(newDir):
os.makedirs(newDir)
os.chdir(newDir)
# print('Dir:' + os.getcwd() )
file_name = BASE_NAME + partOfName + '.eml'
# print('Write:' + file_name)
fw = open(newDir + '/' + file_name,'w', encoding="utf-8")
fw.write( msg )
fw.close()
return
def processMailDir(m, mailDir):
print('MailDIR:' + mailDir)
m.select(mailbox=mailDir, readonly=True)
typ, data = m.search(None, 'ALL')
for num in data[0].split():
typ, data = m.fetch(num, '(RFC822)')
msg = email.message_from_bytes(data[0][1])
smsg = msg.as_bytes().decode(encoding='ISO-8859-1')
writeTofile(mailDir, num.decode(), smsg )
m.close()
return
def main():
if len(sys.argv) != 3:
hostname = 'my.mail.server'
username = 'my_username'
m = imaplib.IMAP4_SSL(hostname)
m.login(username, 'password')
else:
hostname, username = sys.argv[1:]
m = imaplib.IMAP4_SSL(hostname)
m.login(username, getpass.getpass())
try:
print('Start...')
processMailDir(m, 'INBOX')
processMailDir(m, 'Sent')
processMailDir(m, 'archive/2013/201301')
processMailDir(m, 'archive/2013/201302')
# etc.. etc.. simple as it can be but not simpler
print('Done...')
finally:
m.logout()
if __name__ == '__main__':
main()
Above will dump your emails to:
D:\my_email\INBOX\msg_no_1.eml ... msg_no203.eml
then you need this secret to open eml's on windows:
Administrator: cmd.com:
assoc .eml=Outlook.File.eml
ftype Outlook.File.eml="C:\Program Files (x86)\Microsoft Office\Office12\OUTLOOK.EXE" /eml "%1"
Dear stockoverflow censor - please be merciful, I would found above useful; for example this: smsg = msg.as_bytes().decode(encoding='ISO-8859-1') took a long to figure out.