INSTR(str,substr) does not work when str contains

In another post on stackoverflow, I read that INSTR could be used to order results by relevance.

My understanding of col LIKE '%str%' andINSTR(col, 'str')` is that they both behave the same. There seems to be a difference in how collations are handled.

CREATE TABLE `users` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

INSERT INTO users (name)
VALUES ('Joël'), ('René');

SELECT * FROM users WHERE name LIKE '%joel%'; -- 1 record returned
SELECT * FROM users WHERE name LIKE '%rene%'; -- 1 record returned
SELECT * FROM users WHERE INSTR(name, 'joel') > 0; -- 0 records returned
SELECT * FROM users WHERE INSTR(name, 'rene') > 0; -- 0 records returned
SELECT * FROM users WHERE INSTR(name, 'joël') > 0; -- 1 record returned
SELECT * FROM users WHERE INSTR(name, 'rené') > 0; -- 1 record returned

Although INSTR does some conversion, it finds ë in é.

SELECT INSTR('é', 'ë'), INSTR('é', 'e'), INSTR('e', 'ë');
-- returns 1, 0, 0

Am I missing something?

http://sqlfiddle.com/#!2/9bf21/6 (using mysql-version: 5.5.22)

标签： mysql utf-8 collation

1条回答

Melony?

2楼-- · 2019-06-24 15:16

This is due to bug 70767 on LOCATE() and INSTR(), which has been verified.

Though the INSTR() documentation states that it can be used for multi-byte strings, it doesn't seem to work, as you note, with collations like utf8_general_ci, which should be case and accent insensitive

This function is multi-byte safe, and is case sensitive only if at least one argument is a binary string.

The bug report states that although MySQL does this correctly it only does so when the number of bytes is also identical:

However, you can easily observe that they do not (completely) respect collations when looking for one string inside another one. It seems that what's happening is that MySQL looks for a substring which is collation-equal to the target which has exactly the same length in bytes as the target. This is only rarely true.

To pervert the reports example, if you create the following table:

create table t ( needle varchar(10), haystack varchar(10)
                  ) COLLATE=utf8_general_ci;
insert into t values ("A", "a"), ("A", "XaX");
insert into t values ("A", "á"), ("A", "XáX");
insert into t values ("Á", "a"), ("Á", "XaX");
insert into t values ("Å", "á"), ("Å", "XáX");

then run this query, you can see the same behaviour demonstrated:

select needle
     , haystack
     , needle=haystack as `=`
     , haystack LIKE CONCAT('%',needle,'%') as `like`
     , instr(needle, haystack) as `instr`
  from t;

SQL Fiddle

0人赞添加讨论(0) 举报

INSTR(str,substr) does not work when str contains

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间