My question is the following:
If you look below you'll see there is a datastructure with message ids and then the final datastructure containing the message details which should be aggregated from imap_fetch_overview
. The message ids are from imap_thread
. The problem is its not putting the email details in the position where the message id is.
Here is my datastructure:
[5] => Array
(
[0] => 5
[1] => 9
)
[10] => Array
(
[0] => 10
[1] => 11
)
What I'd like to have is:
[5] => Array
(
[0] => messageDetails for id 5
[1] => messageDetails for id 9
)
[10] => Array
(
[0] => messageDetails for id 10
[1] => messageDetails for id 11
)
Here is the code I have thus far:
$emails = imap_fetch_overview($imap, implode(',',$ids));
// root is the array index position of the threads message, such as 5 or 10
foreach($threads as $root => $messages){
// id is the id being given to us from `imap_thread`
foreach($message as $key => $id){
foreach($emails as $index => $email){
if($id === $email->msgno){
$threads[$root][$key] = $email;
break;
}
}
}
}
Here is a printout from one of the $emails:
[0] => stdClass Object
(
[subject] => Cloud Storage Dump
[from] => Josh Doe
[to] => jondoe@domain.com
[date] => Mon, 21 Jan 2013 23:18:00 -0500
[message_id] => <50FE12F8.9050506@domain.com>
[size] => 2559
[uid] => 5
[msgno] => 5
[recent] => 0
[flagged] => 0
[answered] => 1
[deleted] => 0
[seen] => 0
[draft] => 0
[udate] => 1358828308
)
If you notice, the msgno is 5 which corrolates to the $id
, so technically the data should be populating into the final datastructure.
Also, this seems like an inefficient way to handle this.
Please let me know if I you need any additional clarification.
UPDATE CODE
This code is a combination of code I found on php api and some fixes by me. What I think is problematic still is the $root
.
$addedEmails = array();
$thread = imap_thread($imap);
foreach ($thread as $i => $messageId) {
list($sequence, $type) = explode('.', $i);
//if type is not num or messageId is 0 or (start of a new thread and no next) or is already set
if($type != 'num' || $messageId == 0 || ($root == 0 && $thread[$sequence.'.next'] == 0) || isset($rootValues[$messageId])) {
//ignore it
continue;
}
if(in_array($messageId, $addedEmails)){
continue;
}
array_push($addedEmails,$messageId);
//if this is the start of a new thread
if($root == 0) {
//set root
$root = $messageId;
}
//at this point this will be part of a thread
//let's remember the root for this email
$rootValues[$messageId] = $root;
//if there is no next
if($thread[$sequence.'.next'] == 0) {
//reset root
$root = 0;
}
}
$ids=array();
$threads = array();
foreach($rootValues as $id => $root){
if(!array_key_exists($root,$threads)){
$threads[$root] = array();
}
if(!in_array($id,$threads[$root])){
$threads[$root][] = $id;
$ids[]=$id;
}
}
$emails = imap_fetch_overview($imap, implode(',', array_keys($rootValues)));
$keys = array();
foreach($emails as $k => $email)
{
$keys[$email->msgno] = $k;
}
$threads = array_map(function($thread) use($emails, $keys)
{
// Iterate emails in these threads
return array_map(function($msgno) use($emails, $keys)
{
// Swap the msgno with the email details
return $emails[$keys[$msgno]];
}, $thread);
}, $threads);
I don't have access to PHP right now, to test, but I believe what you're trying to do is something like
That being said, even if this works, there is probably a more efficient way to approach this than with three nested loops. What is your reason for needing to store the output in this format?
When you print_r the $emails array what structure you get? Maybe the below should do it?
An implementation with branches (more complex then a single thread
array('5' => array(5,7,8))
, but unless I was only talking to 1 person, threads always tend to branch for me personally, so I'll have to cope with the added complexity)Which give us:
A few things of note:
imap_thread
isn't perfect: we seeid=9
as an orphan, although it seems it should be in the first thread somewhere. However, due to the headers not mentioning this, Google Apps here decided to make it it's own node.N.num.N.branch,N.next
method has apparently no other way to return to root. This is the/return to root $nodes[$treeid] = &$nodes[0];
bit. You can/should filter this out after determining all other nodes, but you need it to build the array at first.To get only the nodes starting new threads (Nth reply on message, N>1):
Which gives us:
And indeed, 35,49,50 & 32 are starts of threads, 9 is recognized as such by the imap server too, and the rest are 2nd or more replies starting their own branches.
Now, you could indeed split out branches as seperate conversation, but as you can see, these are often only 1 or 2 replies more, longer threads tend to develop a bit more rarely. To see how these 'branches' go:
Which gives you roots & branches & their replies:
With some slight alterations we can get the messages in there:
Another option is to just sort them by datetime value, which would be alright for conversations with little/negligible branching, probably making most of the code you are planning just work.
A combination of the two would be 'moving branches', follow threads in series, so this:
Becomes a sequence of
1,2,3,4,5
but a reply on3
would resort it:Making it a sequence of
1,4,5,2,3,6
, which would keep it a logically flowing conversation, with always the thread/branch with the last reply as last.Remember that in php whatever function you use it will be finally converted to some sort of loop. There are, however some steps you could take to make it more efficient and they are different in PHP 5.5 and in 5.3/5.4.
PHP 5.3/5.4 way
The most efficient way of doing this would be to split the function to 2 separate steps. In first step you would generate a map of keys for the list of emails.
In 2nd step you iterate all values in the multi-dimensional $threads and replace them with the email details:
Proof of concept: http://pastebin.com/rp5QFN4J
Explanation of keyword use in anonymous functions:
In order to make use of variables defined in the parent scope, it is possible to import variables from the parent scope into the closure scope with the use () keyword. Although it was introduced in PHP 5.3 it hasn't been documented in the official PHP manual yet. There's only a draft document on php's wiki here https://wiki.php.net/rfc/closures#userland_perspective
PHP 5.5
One of the new features in this version enables you to use generators, which have significantly smaller memory thumbprint thus are more efficient.
Explanation of keyword yield in generators:
The heart of a generator function is the yield keyword. In its simplest form, a yield statement looks much like a return statement, except that instead of stopping execution of the function and returning, yield instead provides a value to the code looping over the generator and pauses execution of the generator function.
1st step:
2nd step:
A few words about the values being returned by genrators:
Generators return an object which is an instance of SPL Iterator thus it needs to use iterator_to_array() in order to convert it into exactly the same array structure your code is expecting. You don't need to do this, but it would require an update of your code following the generator function, which could be even more efficient.
Proof of concept: http://pastebin.com/9Z4pftBH
Testing Performance:
I generated a list of 7000 threads with 5 messages each and tested the performance of each method (avg from 5 tests):
Although the results on your machine/server might be different but the overview shows that the 2-step method is around 45-77 times faster than using 3 foreach loops
Test script: http://pastebin.com/M40hf0x7