My question is the following:
If you look below you'll see there is a datastructure with message ids and then the final datastructure containing the message details which should be aggregated from imap_fetch_overview
. The message ids are from imap_thread
. The problem is its not putting the email details in the position where the message id is.
Here is my datastructure:
[5] => Array
(
[0] => 5
[1] => 9
)
[10] => Array
(
[0] => 10
[1] => 11
)
What I'd like to have is:
[5] => Array
(
[0] => messageDetails for id 5
[1] => messageDetails for id 9
)
[10] => Array
(
[0] => messageDetails for id 10
[1] => messageDetails for id 11
)
Here is the code I have thus far:
$emails = imap_fetch_overview($imap, implode(',',$ids));
// root is the array index position of the threads message, such as 5 or 10
foreach($threads as $root => $messages){
// id is the id being given to us from `imap_thread`
foreach($message as $key => $id){
foreach($emails as $index => $email){
if($id === $email->msgno){
$threads[$root][$key] = $email;
break;
}
}
}
}
Here is a printout from one of the $emails:
[0] => stdClass Object
(
[subject] => Cloud Storage Dump
[from] => Josh Doe
[to] => jondoe@domain.com
[date] => Mon, 21 Jan 2013 23:18:00 -0500
[message_id] => <50FE12F8.9050506@domain.com>
[size] => 2559
[uid] => 5
[msgno] => 5
[recent] => 0
[flagged] => 0
[answered] => 1
[deleted] => 0
[seen] => 0
[draft] => 0
[udate] => 1358828308
)
If you notice, the msgno is 5 which corrolates to the $id
, so technically the data should be populating into the final datastructure.
Also, this seems like an inefficient way to handle this.
Please let me know if I you need any additional clarification.
UPDATE CODE
This code is a combination of code I found on php api and some fixes by me. What I think is problematic still is the $root
.
$addedEmails = array();
$thread = imap_thread($imap);
foreach ($thread as $i => $messageId) {
list($sequence, $type) = explode('.', $i);
//if type is not num or messageId is 0 or (start of a new thread and no next) or is already set
if($type != 'num' || $messageId == 0 || ($root == 0 && $thread[$sequence.'.next'] == 0) || isset($rootValues[$messageId])) {
//ignore it
continue;
}
if(in_array($messageId, $addedEmails)){
continue;
}
array_push($addedEmails,$messageId);
//if this is the start of a new thread
if($root == 0) {
//set root
$root = $messageId;
}
//at this point this will be part of a thread
//let's remember the root for this email
$rootValues[$messageId] = $root;
//if there is no next
if($thread[$sequence.'.next'] == 0) {
//reset root
$root = 0;
}
}
$ids=array();
$threads = array();
foreach($rootValues as $id => $root){
if(!array_key_exists($root,$threads)){
$threads[$root] = array();
}
if(!in_array($id,$threads[$root])){
$threads[$root][] = $id;
$ids[]=$id;
}
}
$emails = imap_fetch_overview($imap, implode(',', array_keys($rootValues)));
$keys = array();
foreach($emails as $k => $email)
{
$keys[$email->msgno] = $k;
}
$threads = array_map(function($thread) use($emails, $keys)
{
// Iterate emails in these threads
return array_map(function($msgno) use($emails, $keys)
{
// Swap the msgno with the email details
return $emails[$keys[$msgno]];
}, $thread);
}, $threads);
Remember that in php whatever function you use it will be finally converted to some sort of loop.
There are, however some steps you could take to make it more efficient and they are different in PHP 5.5 and in 5.3/5.4.
PHP 5.3/5.4 way
The most efficient way of doing this would be to split the function to 2 separate steps.
In first step you would generate a map of keys for the list of emails.
$keys = array();
foreach($emails as $k => $email)
{
$keys[$email->msgno] = $k;
}
In 2nd step you iterate all values in the multi-dimensional $threads and replace them with the email details:
// Iterate threads
$threads = array_map(function($thread) use($emails, $keys)
{
// Iterate emails in these threads
return array_map(function($msgno) use($emails, $keys)
{
// Swap the msgno with the email details
return $emails[$keys[$msgno]];
}, $thread);
}, $threads);
Proof of concept: http://pastebin.com/rp5QFN4J
Explanation of keyword use in anonymous functions:
In order to make use of variables defined in the parent scope, it is possible to import variables from the parent scope into the closure scope with the use () keyword. Although it was introduced in PHP 5.3 it hasn't been documented in the official PHP manual yet. There's only a draft document on php's wiki here https://wiki.php.net/rfc/closures#userland_perspective
PHP 5.5
One of the new features in this version enables you to use generators, which have significantly smaller memory thumbprint thus are more efficient.
Explanation of keyword yield in generators:
The heart of a generator function is the yield keyword. In its simplest form, a yield statement looks much like a return statement, except that instead of stopping execution of the function and returning, yield instead provides a value to the code looping over the generator and pauses execution of the generator function.
1st step:
function genetateKeyMap($emails)
{
foreach($emails as $k => $email)
{
// Yielding key => value pair to result set
yield $email->msgno => $k;
}
};
$keys = iterator_to_array(genetateKeyMap($emails));
2nd step:
function updateThreads($emails, $threads, $keys)
{
foreach($threads as $thread)
{
$array = array();
// Create a set of detailed emails
foreach($thread as $msgno)
{
$array[] = $emails[$keys[$msgno]];
}
// Yielding array to result set
yield $array;
}
};
$threads = iterator_to_array(updateThreads($emails, $threads, $keys));
A few words about the values being returned by genrators:
Generators return an object which is an instance of SPL Iterator thus it needs to use iterator_to_array() in order to convert it into exactly the same array structure your code is expecting. You don't need to do this, but it would require an update of your code following the generator function, which could be even more efficient.
Proof of concept: http://pastebin.com/9Z4pftBH
Testing Performance:
I generated a list of 7000 threads with 5 messages each and tested the performance of each method (avg from 5 tests):
Takes: Memory used:
----------------------------
3x foreach(): 2.8s 5.2 MB
PHP 5.3/5.4 way 0.061s 2.7 MB
PHP 5.5 way 0.036s 2.7 MB
Although the results on your machine/server might be different but the overview shows that the 2-step method is around 45-77 times faster than using 3 foreach loops
Test script: http://pastebin.com/M40hf0x7
I don't have access to PHP right now, to test, but I believe what you're trying to do is something like
foreach($emails as $email) {
foreach($threads as $root => $messages) {
foreach($messages as $index =>$message_id){
if($message_id == $email->msgno){
$threads[$root][$index] = $email;
}
}
}
}
That being said, even if this works, there is probably a more efficient way to approach this than with three nested loops. What is your reason for needing to store the output in this format?
An implementation with branches (more complex then a single thread array('5' => array(5,7,8))
, but unless I was only talking to 1 person, threads always tend to branch for me personally, so I'll have to cope with the added complexity)
<?php
$threads = imap_thread($imap, SE_UID);
/*
* threads returns entries as follows:
* <id>.num = <messageid>
* <id>.next = <messageid of first reply to <id>>, 0 = no replies
* <id>.branch = <messageid of nth. reply to <parent of id>>, 0 = no more branches
* Keep in mind: _every_ message 'starts' a branch, but that may be empty.
*/
$nodes = array( 0 => array( 'children' => array()));
$ids = array();
foreach ($threads as $key => $val) {
list($treeid,$type) = explode('.',$key);
switch($type){
case 'num':
//the actual message number of this tree node
//store id for retrieval later:
$ids[$val] = null;
if($val==0){
//return to root
$nodes[$treeid] = &$nodes[0];
} else {
if(!isset($nodes[$treeid])) $nodes[$treeid] = array();
$nodes[$treeid] = array_merge($nodes[$treeid],array(
'id' => $val,
'message' => &$ids[$val],
'treeid' => $treeid));
}
break;
case 'next':
// 0 means no next message, anything else is a reply
if (0!=$val) {
if(!isset($nodes[$val])) $nodes[$val] = array('parent' => $treeid);
$nodes[$treeid][] = &$nodes[$val];
}
break;
case 'branch':
//0 means end of branch, a number means continue as sibling \
//so we need to know the parent
if (0!=$val) {
if(!isset($nodes[$val])) $nodes[$val] = array('parent' => $nodes[$treeid]['parent']?:0);
$nodes[$nodes[$val]['parent']][] = &$nodes[$val];
}
break;
default:
trigger_error("Unknown tree traverse-type: $type", E_USER_WARNING);
}
}
//the great thing is we can get all our ID's at once:
$keystofetch = implode(',',array_filter(array_keys($nodes)));
$messages = imap_fetch_overview($imap,$keystofetch, FT_UID);
foreach($messages as $message){
// you can of course store the _whole_ message in this thread like:
// $nodes[$message->uid]['message'] = get_object_vars($message);
// and do what you like with $tree[0]['children'] (be it a resursive array iterator,
// or a resursive function, your pick.
// However, for this example we are going to only set message to a string of p.o.c
// (which is also nicer for our treeiterator)
$ids[$message->uid] = $message->from.':'.$message->subject;
}
//let's show the result:
$it = new RecursiveTreeIterator(new RecursiveArrayIterator($nodes[0]),
RecursiveTreeIterator::BYPASS_CURRENT,
CachingIterator::TOSTRING_USE_KEY);
foreach($it as $key => $item){
echo "$key".(is_scalar($item)?': '.$item:'').PHP_EOL;
}
Which give us:
|-children
|-0
| |-parent: 0
| |-id: 35
| |-message: Friend Purple Acc2 <purple2@example.com>:A bigger message thread
| |-treeid: 1
| \-0
| |-parent: 1
| |-id: 7
| |-message: Friend White <white@example.com>:Re: A bigger message thread
| |-treeid: 2
| \-0
| |-parent: 2
| |-id: 11
| |-message: Friend Grey <grey@example.com>Re: A bigger message thread
| |-treeid: 3
| \-0
| |-parent: 3
| |-id: 39
| |-message: Friend Purple Acc2 <purple2@example.com>:Re: A bigger message thread
| |-treeid: 4
| \-0
| |-parent: 4
| |-id: 40
| |-message: Friend Pink <pink@example.com>:Re: A bigger message thread
| |-treeid: 5
| \-0
| |-parent: 5
| |-id: 38
| |-message: Friend Yellow <yellow@example.com>:Re: A bigger message thread
| |-treeid: 6
| \-0
| |-parent: 6
| |-id: 12
| |-message: Friend Pink <pink@example.com>:Re: A bigger message thread
| |-treeid: 7
| \-0
| |-parent: 7
| |-id: 25
| |-message: Friend White <white@example.com>:Re: A bigger message thread
| |-treeid: 8
| \-0
| |-parent: 8
| |-id: 19
| |-message: Friend Black <black@example.com>:Re: A bigger message thread
| |-treeid: 9
| \-0
| |-parent: 9
| |-id: 23
| |-message: Friend Black <black@example.com>:Re: A bigger message thread
| |-treeid: 10
| \-0
| |-parent: 10
| |-id: 30
| |-message: Friend Yellow <yellow@example.com>:Re: A bigger message thread
| |-treeid: 11
| \-0
| |-parent: 11
| |-id: 2
| |-message: Friend Yellow <yellow@example.com>:Re: A bigger message thread
| |-treeid: 12
| |-0
| | |-parent: 12
| | |-id: 20
| | |-message: Me <me@example.com>:Re: A bigger message thread
| | |-treeid: 13
| | \-0
| | |-parent: 13
| | |-id: 1
| | |-message: Fiend Silver <silver@example.com>:Re: A bigger message thread
| | |-treeid: 14
| | \-0
| | |-parent: 14
| | |-id: 41
| | |-message: Fiend Silver <silver@example.com>:Re: A bigger message thread
| | |-treeid: 15
| | \-0
| | |-parent: 15
| | |-id: 27
| | |-message: Friend Grey <grey@example.com>Re: A bigger message thread
| | |-treeid: 16
| | \-0
| | |-parent: 16
| | |-id: 17
| | |-message: Friend Magenta <magenta@example.com>:Re: A bigger message thread
| | |-treeid: 17
| | |-0
| | | |-parent: 17
| | | |-id: 31
| | | |-message: Friend Purple <purple@example.com>:Re: A bigger message thread
| | | |-treeid: 18
| | | \-0
| | | |-parent: 18
| | | |-id: 4
| | | |-message: Friend Black <black@example.com>:Re: A bigger message thread
| | | |-treeid: 19
| | | \-0
| | | |-parent: 19
| | | |-id: 37
| | | |-message: Friend Black <black@example.com>:Re: A bigger message thread
| | | |-treeid: 20
| | | \-0
| | | |-parent: 20
| | | |-id: 24
| | | |-message: Friend Purple Acc2 <purple2@example.com>:Re: A bigger message thread
| | | |-treeid: 21
| | | \-0
| | | |-parent: 21
| | | |-id: 13
| | | |-message: Friend White <white@example.com>:Re: A bigger message thread
| | | \-treeid: 22
| | \-1
| | |-parent: 17
| | |-id: 15
| | |-message: Friend Grey <grey@example.com>Re: A bigger message thread
| | |-treeid: 23
| | \-0
| | |-parent: 23
| | |-id: 18
| | |-message: Friend Magenta <magenta@example.com>:Re: A bigger message thread
| | |-treeid: 24
| | \-0
| | |-parent: 24
| | |-id: 45
| | |-message: Friend Black <black@example.com>:Re: A bigger message thread
| | \-treeid: 25
| \-1
| |-parent: 12
| |-id: 46
| |-message: Friend Yellow <yellow@example.com>:Re: A bigger message thread
| |-treeid: 26
| \-0
| |-parent: 26
| |-id: 29
| |-message: Fiend Silver <silver@example.com>:Re: A bigger message thread
| |-treeid: 27
| \-0
| |-parent: 27
| |-id: 26
| |-message: Friend Magenta <magenta@example.com>:Re: A bigger message thread
| |-treeid: 28
| |-0
| | |-parent: 28
| | |-id: 34
| | |-message: Friend Grey <grey@example.com>Re: A bigger message thread
| | \-treeid: 29
| |-1
| | |-parent: 28
| | |-id: 33
| | |-message: Friend Yellow <yellow@example.com>:Re: A bigger message thread
| | |-treeid: 30
| | \-0
| | |-parent: 30
| | |-id: 36
| | |-message: Friend White <white@example.com>:Re: A bigger message thread
| | |-treeid: 31
| | |-0
| | | |-parent: 31
| | | |-id: 10
| | | |-message: Friend White <white@example.com>:Re: A bigger message thread
| | | \-treeid: 32
| | \-1
| | |-parent: 31
| | |-id: 48
| | |-message: Friend Pink <pink@example.com>:Re: A bigger message thread
| | \-treeid: 33
| \-2
| |-parent: 28
| |-id: 47
| |-message: Friend Purple <purple@example.com>:Re: A bigger message thread
| |-treeid: 34
| \-0
| |-parent: 34
| |-id: 5
| |-message: Friend White <white@example.com>:Re: A bigger message thread
| |-treeid: 35
| \-0
| |-parent: 35
| |-id: 3
| |-message: Friend Purple <purple@example.com>:Re: A bigger message thread
| |-treeid: 36
| \-0
| |-parent: 36
| |-id: 21
| |-message: Friend Yellow <yellow@example.com>:Re: A bigger message thread
| |-treeid: 37
| \-0
| |-parent: 37
| |-id: 8
| |-message: Friend Purple <purple@example.com>:Re: A bigger message thread
| |-treeid: 38
| \-0
| |-parent: 38
| |-id: 43
| |-message: Friend White <white@example.com>:Re: A bigger message thread
| |-treeid: 39
| \-0
| |-parent: 39
| |-id: 28
| |-message: Friend Purple <purple@example.com>:Re: A bigger message thread
| |-treeid: 40
| \-0
| |-parent: 40
| |-id: 42
| |-message: Friend Brown <brown@example.com>:Re: A bigger message thread
| |-treeid: 41
| \-0
| |-parent: 41
| |-id: 22
| |-message: Friend Purple <purple@example.com>:Re: A bigger message thread
| \-treeid: 42
|-1
| |-parent: 0
| |-id: 9
| |-message: Friend Blue <blue@example.com>:RE: A bigger message thread
| \-treeid: 43
|-2
| \-parent: 0
|-3
| |-parent: 44
| |-id: 49
| |-message: Some Subcription <foo@example.com>:Newsletter #1
| \-treeid: 45
|-4
| |-parent: 44
| |-id: 50
| |-message: Some Subcription <foo@example.com>:Newsletter #2
| \-treeid: 46
\-5
|-parent: 0
|-id: 32
|-message: Friend Red <red@example.com>:A second mainthread
|-treeid: 47
\-0
|-parent: 47
|-id: 16
|-message: Friend Black <black@example.com>:Re: A second mainthread
|-treeid: 48
\-0
|-parent: 48
|-id: 14
|-message: Friend Red <red@example.com>:Re: A second mainthread
|-treeid: 49
\-0
|-parent: 49
|-id: 6
|-message: Friend White <white@example.com>:Re: A second mainthread
|-treeid: 50
\-0
|-parent: 50
|-id: 44
|-message: Fiend Silver <silver@example.com>:Re: A second mainthread
\-treeid: 51
A few things of note:
- The first incarnation of this script incorrectly added branches to
the first child of a node instead of the actual node itself, which is
fixed now by also storing its parent.
imap_thread
isn't perfect: we see id=9
as an orphan, although it seems it should be in the first thread somewhere. However, due to
the headers not mentioning this, Google Apps here decided to make it
it's own node.
- The 3rd (key=2) entry is a method to 'return to root', as the
N.num.N.branch,N.next
method has apparently no other way to return
to root. This is the /return to root $nodes[$treeid] = &$nodes[0];
bit. You can/should filter this out after determining all other
nodes, but you need it to build the array at first.
To get only the nodes starting new threads (Nth reply on message, N>1):
$threads = imap_thread($imap, SE_UID);
$branchestarts = array();
foreach($threads as $key => $value){
list($num,$type) = explode('.',$key);
if (
$type=='num' // an id
&& $value == 0 // which is actually root
&& isset($threads[$num.'.next']) // then check for next
&& isset($threads[$threads[$num.'.next'].'.num'])
){
$branchestarts[] = $threads[$threads[$num.'.next'].'.num'];
} else if(
$type=='branch' // branch movement
&& $value != 0 // not back
&& isset($threads[$value.'.num']) // sanity: target exists
&& $threads[$value.'.num'] != 0 // and is not a return to root
){
$branchestarts[] = $threads[$value.'.num'];
}
}
echo json_encode($branchestarts);
Which gives us:
[35,15,46,33,48,47,9,49,50,32]
And indeed, 35,49,50 & 32 are starts of threads, 9 is recognized as such by the imap server too, and the rest are 2nd or more replies starting their own branches.
Now, you could indeed split out branches as seperate conversation, but as you can see, these are often only 1 or 2 replies more, longer threads tend to develop a bit more rarely.
To see how these 'branches' go:
$branches = array();
$currenttree = null;
foreach($threads as $key => $value){
list($num,$type) = explode('.',$key);
switch($type){
case 'num':
//nothing
break;
case 'next':
if(is_null($currenttree)) $currenttree = &$branches[$threads[$value.'.num']];
if($value && isset($threads[$value.'.num'])) $currenttree[] = $threads[$value.'.num'];
break;
case 'branch':
unset($currenttree);
if($value && $threads[$value.'.num']){
$branches[$threads[$value.'.num']] = array($threads[$value.'.num']);
$currenttree =& $branches[$threads[$value.'.num']];
}
}
}
echo json_encode($branches, JSON_PRETTY_PRINT);
Which gives you roots & branches & their replies:
{
"35": [
35,
7,
11,
39,
40,
38,
12,
25,
19,
23,
30,
2,
20,
1,
41,
27,
17,
31,
4,
37,
24,
13
],
"15": [
15,
18,
45
],
"46": [
46,
29,
26,
34
],
"33": [
33,
36,
10
],
"48": [
48
],
"47": [
47,
5,
3,
21,
8,
43,
28,
42,
22
],
"9": [
9
],
"49": [
49
],
"50": [
50
],
"32": [
32,
16,
14,
6,
44
]
}
With some slight alterations we can get the messages in there:
$branches = array();
$currenttree = null;
$messages = array();
foreach($threads as $key => $value){
list($num,$type) = explode('.',$key);
switch($type){
case 'num':
//nothing
break;
case 'next':
if(is_null($currenttree)) $currenttree = &$branches[$threads[$value.'.num']];
if($value && isset($threads[$value.'.num'])) $currenttree[] = &$messages[$threads[$value.'.num']];
break;
case 'branch':
unset($currenttree);
if($value && $threads[$value.'.num']){
$branches[$threads[$value.'.num']] = array(&$messages[$threads[$value.'.num']]);
$currenttree =& $branches[$threads[$value.'.num']];
} else {
$currenttree = null;
}
}
}
$keystofetch = implode(',',array_filter(array_keys($messages)));
foreach(imap_fetch_overview($imap,$keystofetch,FT_UID) as $message){
$messages[$message->uid] = $message;
}
echo json_encode($branches);//won't show it's output, this answer is to large as it is ;)
Another option is to just sort them by datetime value, which would be alright for conversations with little/negligible branching, probably making most of the code you are planning just work.
A combination of the two would be 'moving branches', follow threads in series, so this:
1 2013-06-01
2 2013-06-02
3 2013-06-03
4 2013-06-03
5 2013-06-04
Becomes a sequence of 1,2,3,4,5
but a reply on 3
would resort it:
1 2013-06-01
4 2013-06-03
5 2013-06-04
2 2013-06-02
3 2013-06-03
6 2013-06-05
Making it a sequence of 1,4,5,2,3,6
, which would keep it a logically flowing conversation, with always the thread/branch with the last reply as last.