I'm trying to create a dbase solution for email threading. When I receive an email.. I want to know if this email belongs to an existing thread.
So I match the subject ie subject = 'cool bro' matches 're: cool bro'
and i also want to match the sender and receiver pair
ie
(sender = 'A@gmail.com' and receiver = 'B@gmail.com')
OR
(sender ='B@gmail.com' and receiver = 'A@gmail.com')
for those exact cases.. this query worked fine (see more details here):
(SELECT COUNT(search_email.threadID) FROM search_email
WHERE search_email.subject MATCH '%query%' AND
(
(search_email.sender = '%sender' AND search_email.tos = '%receiver%')
OR
(search_email.sender = '%receiver%' AND search_email.tos = '%sender%')
)
problem is when you have a case like this:
(sender = 'Amanda Collins A@gmail.com' and receiver = 'B@gmail.com')
OR
(sender ='Billy Bob B@gmail.com' and receiver = 'A@gmail.com')
clearly.. a MATCH clause (or some regex or somthing) is in order here..
problem is that sqllite doesn't allow MATCH
to work with OR
: a UNION
is needed instead..
but I couldn't figure out a way to convert the above statement into a statement that has UNION/INTERSECT
instead:
SELECT * FROM search_email WHERE
search_email.subject MATCH '%query%' INTERSECT SELECT * FROM
(SELECT * FROM (SELECT * FROM search_email WHERE
search_email.sender MATCH '%sender%'
INTERSECT SELECT * FROM search_email WHERE search_email.tos MATCH '%receiver%' )
UNION
SELECT * FROM( SELECT * FROM search_email WHERE search_email.sender MATCH '%sender%'
INTERSECT SELECT * FROM search_email WHERE search_email.tos MATCH '%receiver%'))
any ideas?
update:
it seems the answer is just by changing the first intersect
into a union
:
SELECT * FROM search_email WHERE
search_email.subject MATCH '%query%' UNION SELECT * FROM
(SELECT * FROM (SELECT * FROM search_email WHERE
search_email.sender MATCH '%sender%'
INTERSECT SELECT * FROM search_email WHERE search_email.tos MATCH '%receiver%' )
UNION
SELECT * FROM( SELECT * FROM search_email WHERE search_email.sender MATCH '%sender%'
INTERSECT SELECT * FROM search_email WHERE search_email.tos MATCH '%receiver%'))
but this solution seems very inefficient.. mysql compiler takes a while to figure it out.. is there a more efficient way?
Using
MATCH
orLIKE '%string%'
will be always very slow. You might be able to improve that using full text search, but you still will not get very far.There is another way to do this, which is more efficient and is widely supported.
According to RFC2822 (and older RFC822) mail client programs should create and maintain fields in mail headers that facilitate threading support. In practice, all known mail clients actually support that. Those fields are:
<randomstring@sender.com>
If you extract these headers and save them in separate columms and/or tables in your database, you should be able to easily create threads by using exact comparison (no
MATCH '%string%
'), so it will be very fast.