Saving IMAP messages with Python mailbox module

2019-08-01 09:55发布

问题:

I'm downloading messages from IMAP with imaplib into a mbox (with mailbox module):

import imaplib, mailbox
svr = imaplib.IMAP4_SSL('imap.gmail.com')
svr.login('myname@gmail.com', 'mypaswword')
resp, [countstr] = svr.select("[Gmail]/All Mail", True)

mbox = mailbox.mbox('mails.mbox')

for n in range(...):
  resp, lst1 = svr.fetch(n, 'UID')    # the UID of the message
  resp, lst2 = svr.fetch(n, '(RFC822)')   # the message itself
  mbox.add(lst2[0][1])      # add the downloaded message to the mbox
  #
  # how to store the UID of this current mail inside mbox? 
  #

Let's download the mails with UID = 1 .. 1000. Next time, I would like to begin at the 1001th message and not from the 1st. However, mailbox.mbox does not store the UID anywhre. So next time I will open the mbox file, it will be impossible to know where we stopped.

Is there a natural way with the module mailbox to store the UID of the emails?

Or maybe I don't use mailbox + imaplib the way it should ?

回答1:

To answer your question: after staring at the docs for a long time I didn't see any cleanly way to do what you are looking for. If it is an absolute requirement that the UIDs be stored in the mbox file, then I'd suggest adding a custom UID header to the emails that you are storing:

message = email.message_from_string(lst2[0][1])
message.add_header("my_internal_uid_header", lst1[0][1])
mbox.add(message)

Now of course it is a HUGE pain to get the largest saved UID because you have to iterate through all the messages. I imagine that this would be really bad. If at all possible it would be better to store such information elsewhere.

Best of luck!



回答2:

I hope it will be useful:

1) libraries and environment Win7 Anaconda3-4.3.1-Windows-x86_64.exe (new is available but that what I have used

2) To list all your mailboxes:

import getpass, imaplib, sys

def main():
      hostname = 'my.mail.server'
      username = 'my_user_name'
      m = imaplib.IMAP4_SSL(hostname)
      m.login(username, 'passowrd')

   try:
      print('Capabilities:', m.capabilities)
      print('Listing mailboxes ')
      status, data = m.list()
      print('Status:', repr(status))
      print('Data:')
      for datum in data:
         print(repr(datum))

   finally:
      m.logout()

if __name__ == '__main__':
   main()

3) Using generated above information we can dump all email messages from mail server to the directories:

import getpass, imaplib, sys, email, os , io
import codecs

BASE_NAME = 'msg_no_'
BASE_DIR = 'D:/my_email/'

def writeTofile(mailDir, partOfName, msg ):

   ## no need of dos backslash -- newDir = BASE_DIR + mailDir.replace('/', '\\')

   newDir = BASE_DIR + mailDir

   if not os.path.exists(newDir):
       os.makedirs(newDir)

   os.chdir(newDir)

   # print('Dir:' + os.getcwd() )

   file_name = BASE_NAME + partOfName  + '.eml'

   # print('Write:' + file_name)

   fw = open(newDir + '/' + file_name,'w', encoding="utf-8")
   fw.write( msg )
   fw.close()

   return


def processMailDir(m, mailDir):

   print('MailDIR:' + mailDir)

   m.select(mailbox=mailDir, readonly=True)
   typ, data = m.search(None, 'ALL')

   for num in data[0].split():
      typ, data = m.fetch(num, '(RFC822)')
      msg = email.message_from_bytes(data[0][1])

      smsg = msg.as_bytes().decode(encoding='ISO-8859-1')

      writeTofile(mailDir, num.decode(), smsg )

   m.close()

   return


def main():

   if len(sys.argv) != 3:
      hostname = 'my.mail.server'
      username = 'my_username'
      m = imaplib.IMAP4_SSL(hostname)
      m.login(username, 'password')

   else:
      hostname, username = sys.argv[1:]
      m = imaplib.IMAP4_SSL(hostname)
      m.login(username, getpass.getpass())

   try:
      print('Start...')

      processMailDir(m, 'INBOX')
      processMailDir(m, 'Sent')
      processMailDir(m, 'archive/2013/201301')
      processMailDir(m, 'archive/2013/201302')
# etc.. etc.. simple as it can be but not simpler
      print('Done...')

   finally:
      m.logout()

if __name__ == '__main__':
   main()

Above will dump your emails to: D:\my_email\INBOX\msg_no_1.eml ... msg_no203.eml

then you need this secret to open eml's on windows:

Administrator: cmd.com:

assoc .eml=Outlook.File.eml
ftype Outlook.File.eml="C:\Program Files (x86)\Microsoft Office\Office12\OUTLOOK.EXE" /eml "%1"

Dear stockoverflow censor - please be merciful, I would found above useful; for example this: smsg = msg.as_bytes().decode(encoding='ISO-8859-1') took a long to figure out.