Two tables.
emails id (int10) | ownership (int10)
messages emailid (int10) indexed | message (mediumtext)
Subquery (which is terrible in mysql).
SELECT COUNT(*) FROM messages WHERE message LIKE '%word%' AND emailid IN (SELECT id FROM emails WHERE ownership = 32)
The usage here is that I run a search on emails (which is obviously simplified in the sample above), that generates a list of say 3,000 email id's. I then want to do a search against messages because i need to do a text match - from only those 3000 emails against the message.
The query against messages is expensive (message is not indexed) but this is fine because it would only ever be checking against a few rows.
Ideas:
i) A join. My attempts at this so far have not worked and have resulted in full table scans of the message table (i.e. the emailid index not used) ii) temporary table. This could work I think. iii) cache ids in client and run 2 queries. This does work. Not elegant. iv) subquery. mySQL subqueries run the 2nd query each time so this does not work. maybe fixed in mysql 6.
Ok, here is what I have so far. These are the actual field names (I had simplified a bit in question).
The query:
SELECT COUNT(*) FROM ticket LEFT JOIN ticket_subject
ON (ticket_subject.ticketid = ticket.id)
WHERE category IN (1)
AND ticket_subject.subject LIKE "%about%"
The results:
1 SIMPLE ticket ref PRIMARY,category category 4 const 28874
1 SIMPLE ticket_subject eq_ref PRIMARY PRIMARY 4 deskpro.ticket.id 1 Using where
It takes 0.41 seconds and returns a count(*) of 113.
Running:
SELECT COUNT (*) FROM ticket WHERE category IN (1)
Takes 0.01 seconds and finds 33,000 results.
Running
SELECT COUNT (*) FROM ticket_subject WHERE subject LIKE "%about%"
Takes 0.14 seconds and finds 1,300 results.
Both the ticket table and ticket_subject table have 300,000 rows.
There is an index on ticket_subject.ticketid and ticket.category.
I realise now that using the LIKE syntax was a mistake - as it has been a bit of a red herring about FULLTEXT. THis is not the issue. The issue is:
1) Table A - very fast query, run on index. 0.001 seconds 2) Table B - moderate to slow query, no index - does full table scan. 0.1 seconds.
Both of these results are fine. The problem is I have to JOIN them and the search takes 0.3 seconds; which to me makes no sense because the slow aspects of the combined query on Table B should be quicker because we are now only searching over a fraction of that table - ie it should not be doing a full table scan because the field that is being JOINED on is indexed.
I think this is what you are looking for:
Hard to tell for sure how it will perform. If the FTS is because of the starting wildcard on WORD, then doing it this way won't solve the problem. But the good news is that perhaps the join will limit the records in the messages table you have to look at.
The problem though is with the
'%word%'
This will always require a scan of message. You might want to look into full text search if you are usingMyISAM
.Remember to take advantage of Boolean short-circuit evaluation:
This filters by
ownership
before it evaluates theLIKE
predicate. Always put your cheaper expressions on the left.Also, I agree with @Martin Smith and @MJB that you should consider using MySQL's
FULLTEXT
indexing to make this faster.Re your comment and additional information, here's some analysis:
The note "Using index" is a good thing to see because it means it can satisfy the query just by reading the index data structure, not even touching the data of the table. This is certain to run very fast.
This shows that there are no possible keys that can benefit the wildcard
LIKE
predicate. It uses the condition in the WHERE clause, but it has to evaluate it by running a table-scan.Likewise, accessing the ticket table is quick, but that's spoiled by the table-scan incurred by the
LIKE
condition.You're never going to make
LIKE
perform well. See my presentation Practical Full-Text Search in MySQL.Re your comment: Okay, I've done some experiments on a dataset of similar size (the Users and Badges tables in the Stack Overflow data dump :-). Here's what I found:
That's really fast, because I have an index on the reputation column.
That's as expected, since the table has 700k rows, and it has to do a table-scan. Now let's do the join:
That doesn't seem so bad. Here's the explain report:
This does seem like it's using indexes intelligently for the join, and it helps that I have a compound index including userid and reputation. Remember that MySQL can use only one index per table, so it's important to get define the right compound indexes for the query you need to do.
Re your comment: OK, I've tried this where reputation > 5000, and where reputation > 500, and where reputation > 50. These should match a much larger set of users.
The explain report is the same in all cases, but if the query finds more matching rows in the Users table, then it naturally has to evaluate the
LIKE
predicate against a lot more matching rows in the Badges table.It's true that there is some cost to doing a join. It's a little surprising that it's so dramatically expensive. But this can be mitigated if you use indexes.
I know you said you have a query that can't use an index, but perhaps it's time to consider creating a redundant column with some transformed version of the data of your original column, so you can index it. In the example above, I might create a column
creationdate_day
and populate it fromDAYOFYEAR(creationdate)
.Here's what I mean:
Here's the explain report:
Is it possible for you to turn the join the other way around? It seems that the second query is a less expensive one and since the whole thing is a simple join then you want to perform the less expensive query to narrow the data-set as much and then do a join to your more expensive query.