-->

SQL INDEX not used on WHERE ABS(x-y) < k condit

2019-02-20 16:37发布

问题:

I have a query involving couples of rows which have a less-than-2-hours time-difference (~0.08333 days):

SELECT mt1.*, mt2.* FROM mytable mt1, mytable mt2 
                    WHERE ABS(JULIANDAY(mt1.date) - JULIANDAY(mt2.date)) < 0.08333

This query is rather slow, i.e. ~ 1 second (the table has ~ 10k rows).

An idea was to use an INDEX. Obviously CREATE INDEX id1 ON mytable(date) didn't improve anything, that's normal.


Then I noticed that the magical query CREATE INDEX id2 ON mytable(JULIANDAY(date))

  1. didn't help when using:

    ... WHERE ABS(JULIANDAY(mt1.date) - JULIANDAY(mt2.date)) < 0.08333
    
  2. didn't help when using:

    ... WHERE JULIANDAY(mt2.date) - 0.08333 < JULIANDAY(mt1.date) < JULIANDAY(mt2.date) + 0.08333
    
  3. ... but massively improved the performance (query time happily divided by 50 !) when using:

    ... WHERE JULIANDAY(mt1.date) < JULIANDAY(mt2.date) + 0.08333
          AND JULIANDAY(mt1.date) > JULIANDAY(mt2.date) - 0.08333
    

Of course 1., 2. and 3. are equivalent since mathematically,

|x-y| < 0.08333 <=> y - 0.08333 < x < y + 0.08333
                <=> x < y + 0.08333 AND x > y - 0.08333

Question: Why are solutions 1. and 2. not making use of INDEX whereas solution 3. is using it?


Note:

  • I'm using Python + Sqlite sqlite3 module

  • The fact solutions 1. and 2. are not using the index is confirmed when doing EXPLAIN QUERY PLAN SELECT ...:

    (0, 0, 0, u'SCAN TABLE mytable AS mt1')
    (0, 1, 1, u'SCAN TABLE mytable AS mt2')
    

    The fact solution 3. is using the index is shown when doing EXPLAIN QUERY PLAN SELECT ...:

    (0, 0, 1, u'SCAN TABLE mytable AS mt2')
    (0, 1, 0, u'SEARCH TABLE mytable AS mt1 USING INDEX id2 (<expr>>? AND <expr><?)')
    

回答1:

I believe that the inclusion of AND is the reasoning as per :

The WHERE clause on a query is broken up into "terms" where each term is separated from the others by an AND operator. If the WHERE clause is composed of constraints separate by the OR operator then the entire clause is considered to be a single "term" to which the OR-clause optimization is applied.

The SQLite Query Optimizer Overview

It may be worthwhile running ANALYZE to see if that improves matters.

As per the comment:

I think the previously added paragraph can clarify why ABS(x-y) < k is not using index, and why x < y + k is using it, don't you think so? Would you want to include this paragraph? [All terms of the WHERE clause are analyzed to see if they can be satisfied using indices. To be usable by an index a term must be of one of the following forms: column = expression, column IS expression, column > expression ...

The following has been added.

To be usable by an index a term must be of one of the following forms:
column = expression
column IS expression
column > expression
column >= expression
column < expression
column <= expression
expression = column
expression > column
expression >= column
expression < column
expression <= column
column IN (expression-list)
column IN (subquery)
column IS NULL

I'm not sure if it would work with a BETWEEN (e.g. WHERE column BETWEEN expr1 AND expr2).



回答2:

You are using an expression index. The documentation says:

The SQLite query planner will consider using an index on an expression when the expression that is indexed appears in the WHERE clause or in the ORDER BY clause of a query, exactly as it is written in the CREATE INDEX statement. The query planner does not do algebra.

So it is not possible to use an index to speed up lookups of a call to abs() if the indexed expression is only a parameter. (And it is not possible to index the entire abs() call because it involves two tables.)

So converting the expression as you did is the only way to make it more efficient.

(Please note that a<b<c compares a and b first, and then compares the resulting boolean value against c. This is not what you want.)