My db is running on mysql v5.x. I have a table T1 with 5 columns and column C1 is the primary key. C1 is of type varchar(20). It contains about 2000 rows with values like:
fxg
axt3
tru56
and so on..
Now my application's job is to read input data and find if the input data has a starting pattern similar to that found in column C1 in table T1. For example: my input may appear as:
trx879478986
fxg87698x84
784xtr783utr
axt3487ghty
... and so on
So for the above input, I have to return true for 'fxg87698x84' and 'axt3487ghty' and false for others. The query I use is:
select 1 from T1 where (? like concat(C1,'%'));
note: the ? is replaced by the input value got from the application.
The issue is my input is huge (about 1 million records to be processed in 30 minutes) and my query is not fast enough. Any ideas on how to re-write the query or force it to use indexes? Even if I have to use a different object structure, I can do it, if that helps. So any help will be appreciated. Thx.
the way your problem is set up, you almost by definition need to check every row in the database against every input doing it the way you are currently doing it. The index doesn't really matter in this case since any row could be a match.
I'm not sure it would be faster, but one thing you could try would be to query the database for an exact match on every possibly valid substring of your input.
For example, if you know that your substrings have to be at least length 3 to match, start with the first 3 characters: trx879478986 => trx, trx8, trx87, ...
Build an array of those possible matches and use the
IN()
operator to query for them:I'm pretty sure mysql can use an index to match against a list of values given to
IN()
you could try a Top-N query to find the first candidate, and then apply that candidate only to the actual pattern:
the top-n query should use a regular index on c1.
EDIT: Explained that in more detail in my blog: http://blog.fatalmind.com/2010/09/29/finding-the-best-match-with-a-top-n-query/