PHP fetch over 20000 imap emails

2019-03-27 17:01发布

问题:

I'm trying to export several mailboxes to an database. My current script will connect IMAP and just loop all messages. Though with larger mailboxes this won't work and it will slow down or even stop.

The idea is to run the script daily to "copy" all messages who are not in the database yet to the database. Whats the best way to fetch big amounts of e-mails (20k mails spread over about 40 - 50 folders).

Eventually this will need to work from a single server to scan hundreds or even thousands accounts daily (so imagine the amounts of data). It will store the mail (uid and subject) into the database and create a package which will be stored on the dataserver (so it also needs to fetch the attachments).

回答1:

So you want to perform email backup via IMAP. There are professional software tools that do this.

Let's start from something simple: downloading an email for one specific user from the inbox folder. This requires you to (a) login with the user's credentials, (b) select the INBOX folder, and (c) download the message (let's assume that you already know its UID, which is 55). You do this in IMAP as follows (requests only - responses not shown):

01 LOGIN username password
02 SELECT INBOX
03 UID FETCH 55 BODY[]

Each message in a particular folder is given a UID. This is a unique identifier for the message that never changes - it cannot be used by any other message in that folder. New messages must have a higher UID than previous ones. This makes it a useful tool to determine whether you already downloaded the message previously.

Next step: let us now look at downloading all new messages in the INBOX folder. Let's assume that you're downloading messages for the first time, and the INBOX currently has messages with UIDs 54, 55 and 57. You can download these messages all at once using a command such as:

03 UID FETCH 54,55,57 BODY[]

(You might want to break this up in batches (e.g. 30 at a time) if there are a lot to download.) After doing that, you store the highest UID you downloaded so far. Next time, you can check for UIDs higher than that as follows:

04 UID FETCH 58:* UID

That will retrieve the UID (only) for messages with a UID from 58 onwards. If you get results, then you download those, and again store the UID. And so on.

There is one catch. The UIDs of a message are valid so long as the folder's UIDVALIDITY attribute (included in the response to the SELECT command) does not change. If this changes for whatever reason, the folder is invalidated, and you need to download all messages in that folder all over again.

Finally, you want to extend this to work for all folders for all users. In order to get all folders for a particular user, you use the IMAP LIST command:

05 LIST "" "*"

You will need to know the credentials for the users beforehand and loop over them.

This is the IMAP theory behind what you need to do. Implementing it in PHP is left as an exercise.



回答2:

Are you using imap_ping?

imap_ping() pings the stream to see if it's still active. It may discover new mail; this is the preferred method for a periodic "new mail check" as well as a "keep alive" for servers which have inactivity timeout.

Other ones to look at: imap_timeout imap_reopen

Fact there is a method called reopen suggests something doesn't it :)

Another option that comes to mind if you just can't seem to keep the connection is to export the data to mbox format and get at it locally. Might be faster for a huge mailbox and can remove the timeout / connection issues.