In another post on stackoverflow, I read that INSTR
could be used to order results by relevance.
My understanding of col LIKE '%str%' and
INSTR(col, 'str')` is that they both behave the same. There seems to be a difference in how collations are handled.
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO users (name)
VALUES ('Joël'), ('René');
SELECT * FROM users WHERE name LIKE '%joel%'; -- 1 record returned
SELECT * FROM users WHERE name LIKE '%rene%'; -- 1 record returned
SELECT * FROM users WHERE INSTR(name, 'joel') > 0; -- 0 records returned
SELECT * FROM users WHERE INSTR(name, 'rene') > 0; -- 0 records returned
SELECT * FROM users WHERE INSTR(name, 'joël') > 0; -- 1 record returned
SELECT * FROM users WHERE INSTR(name, 'rené') > 0; -- 1 record returned
Although INSTR
does some conversion, it finds ë
in é
.
SELECT INSTR('é', 'ë'), INSTR('é', 'e'), INSTR('e', 'ë');
-- returns 1, 0, 0
Am I missing something?
http://sqlfiddle.com/#!2/9bf21/6 (using mysql-version: 5.5.22)
This is due to bug 70767 on
LOCATE()
andINSTR()
, which has been verified.Though the
INSTR()
documentation states that it can be used for multi-byte strings, it doesn't seem to work, as you note, with collations likeutf8_general_ci
, which should be case and accent insensitiveThe bug report states that although MySQL does this correctly it only does so when the number of bytes is also identical:
To pervert the reports example, if you create the following table:
then run this query, you can see the same behaviour demonstrated:
SQL Fiddle