Negative backreferences in MySQL REGEXP

2019-05-14 23:07发布

MySQL manual is not very detailed about what expressions it supports, so I am not sure if the following is possible with MySQL at all.

I am trying to create a query with RLIKE which matches the following.

The task is to get from SQL all the sentences which contains at least any two words from the given sentence.

Let's say, I have some certain words to use in regex:

hello, dog

I have following sentences in the database:

hello from dog
hello hello cat
dog says hello
dog dog goes away
big bad dog

From those all I want to match only

hello from dog
dog says hello

For now I have it like this:

SELECT *
FROM test
WHERE 
test RLIKE '(hello|dog).*(hello|dog)'

The problem is - I get also those unneeded

hello hello cat
dog dog goes away

So I guess, I need a backreference right before the second (hello|dog).

In pseudo code it would look like this:

RLIKE '(hello OR dog) anything can be here (hello OR dog, but not the word which already was in the previous group)'

so it could be like:

'(hello|dog).*(negative backreference to the 1st group goes here)(hello|dog)'

Can such negative backreference be done in MySQL regex? Or maybe there is some better way to write the regex which does the same thing, but also considering that the query will get generated by some C++ code, so it shouldn't be too complex to generate?

1条回答
SAY GOODBYE
2楼-- · 2019-05-14 23:48

MySQL uses a Posix Extended Regular Expression engine (POSIX ERE) and therefore doesn't support backreferences at all. It also doesn't support lookaround which you would need in order to construct a single regex that could handle this.

Therefore you have to spell out all possible combinations:

hello.*dog|dog.*hello

Of course, this will get unwieldy if the number of match candidates increases, so regular expressions are not the right tool for this in MySQL, unless you can install/use LIB_MYSQLUDF_PREG.

查看更多
登录 后发表回答