Having this table:
CREATE TABLE `example` (
`id` int(11) unsigned NOT NULL auto_increment,
`keywords` varchar(200) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
We would like to optimize the following query:
SELECT id FROM example WHERE keywords LIKE '%whatever%'
The table is InnoDB, (so no FULLTEXT for now) which would be the best index to use in order to optimize such query?
We've tried a simple :
ALTER TABLE `example` ADD INDEX `idxSearch` (`keywords`);
But an explain query shows that need to scan the whole table
if our queries where LIKE 'whatever%' instead, this index performs well, but otherwise has no value.
Is there anyway to optimize this for innoDB ?
Thanks!
Indexes are built from the start of the string towards the end. When you use LIKE 'whatever%'
type clause, MySQL can use those start-based indexes to look for whatever
very quickly.
But switching to LIKE '%whatever%'
removes that anchor at the start of the string. Now the start-based indexes can't be used, because your search term is no longer anchored at the start of the string - it's "floating" somewhere in the middle and the entire field has to be search. Any LIKE '%...
query can never use indexes.
That's why you use fulltext indexes if all you're doing are 'floating' searches, because they're designed for that type of usage.
Of major note: InnoDB now supports fulltext indexes as of version 5.6.4. So unless you can't upgrade to at least 5.6.4, there's nothing holding you back from using InnoDB *AND fulltext searches.
I would like to comment that surprisingly, creating an index also helped speed up queries for like '%abc%'
queries in my case.
Running MySQL 5.5.50
on Ubuntu
(leaving everything on default), I have created a table with a lot of columns and inserted 100,000
dummy entries. In one column, I inserted completely random strings with 32 characters (i.e. they are all unique).
I ran some queries and then added an index on this column.
A simple
select id, searchcolumn from table_x where searchcolumn like '%ABC%'
returns a result in ~2 seconds
without the index and in 0.05 seconds
with the index.
This does not fit the explanations above (and in many other posts). What could be the reason for that?
EDIT
I have checked the EXPLAIN output. The output says rows is 100,000
, but Extra info is "Using where; Using index
". So somehow, the DBMS has to search all rows, but still is able to utilise the index?