Why Index is not used with subquery

2019-04-29 18:20发布

问题:

This takes 0.001 seconds to execute and it uses index seek

SELECT * FROM CUSTOMER WHERE ID IN (1008,1122)

Now I have a stored procedure U_VIP which returns the same ID as example one (1008,1122), and it takes only 0.001 second to execute

SELECT ID FROM U_VIP    //returns (1008,1122)

Now when I combine them, it takes around half-a-second to execute and index is not used

SELECT * FROM CUSTOMER WHERE ID IN (SELECT ID FROM U_VIP)

I've simplified the example above, in actual application the performance is impacted by much higher magnitude. How to force Firebird to use index in this case?

**Using Firebird 2.1

** EDIT **

Base on Mark's answer, use JOIN does improve the execution time because it is now doing index seek.

SELECT CUSTOMER.* 
FROM CUSTOMER
INNER JOIN U_VIP ON U_VIP.ID = CUSTOMER.ID

This is great, however, it introduces another problem for me which I'll try to explain in the following example.

SELECT CUSTOMER.* 
FROM CUSTOMER
WHERE (:AREAID = 0 OR ID IN (SELECT ID FROM U_VIP(:AREAID)))

Using where clause, I can conditionally apply the filter base on whether :AREAID is supplied by the user. How do I achieve the same when I replace the where clause with a join?

Something like:

SELECT CUSTOMER.* 
FROM CUSTOMER
{IF :AREAID > 0 THEN}
INNER JOIN (SELECT ID FROM U_VIP(:AREAID)) VIP ON VIP.ID = CUSTOMER.ID
{END IF}

Which of course, Firebird dislikes the part with braces =/

回答1:

Instead of IN, you need to use EXISTS or an INNER JOIN. I am not entirely sure about the details, but I believe in your query the CUSTOMER table is fully read, evaluating the result of the subquery for every row (maybe even executing the subquery for every row). As the optimizer doesn't know the number of results of the subquery in advance, it can't create an optimization like it can if you use a fixed number of literal values like in your first query.

Try changing your query to:

SELECT * 
FROM CUSTOMER 
WHERE EXISTS (SELECT 1 FROM U_VIP WHERE U_VIP.ID = CUSTOMER.ID)

Or:

SELECT CUSTOMER.* 
FROM CUSTOMER
INNER JOIN U_VIP ON U_VIP.ID = CUSTOMER.ID

Or (changing the order sometimes can lead to better performance):

SELECT CUSTOMER.* 
FROM U_VIP
INNER JOIN CUSTOMER ON CUSTOMER.ID = U_VIP.ID

In general I would expect those queries to perform better than the query with IN.

Edit in response to update

Based on your updated question I can think of multiple solutions, I am not entirely sure on their performance though.

  • Use separate queries for :AREAID is 0 and :AREAID is not 0
  • Use a stored procedure or EXECUTE BLOCK with an EXECUTE STATEMENT with a dynamically built statement (variant of the previous)
  • Make the stored procedure U_VIP return all customers if :AREAID is 0
  • Use an additional JOIN condition OR :AREAID = 0; this might not yield results if U_VIP returns nothing for 0 (and might not perform *)
  • Use a LEFT JOIN and add WHERE U_VIP.ID IS NOT NULL OR :AREAID = 0 (might not perform *)
  • Use a UNION of the 'normal' query and a second query on CUSTOMER with WHERE :AREAID = 0 (might not perform *)

For (*) see the 'Smart logic' anti-pattern

For the dynamically built query you can think of something like:

EXECUTE BLOCK (INPUTCONDITION INTEGER = ?)
    RETURNS (ID INTEGER)
AS
    DECLARE VARIABLE QUERY VARCHAR(6400);
BEGIN
    QUERY = 'SELECT a.ID FROM SORT_TEST a';
    IF (INPUTCONDITION <> 0) then
        QUERY = QUERY || ' WHERE a.ID = ' || INPUTCONDITION;
    FOR EXECUTE STATEMENT QUERY INTO :ID
    DO
        SUSPEND;
END

In this example the value 0 for INPUTCONDITION will generate a query without WHERE-clause, and for other inputs a query with a WHERE-clause. Doing it like this is prone to SQL injection if the parameter is a (VAR)CHAR or BLOB, so be careful. You could also consider two branches where one uses EXECUTE STATEMENT with parameters and the other without.

Instead of EXECUTE BLOCK, you can also use a selectable procedure like you already use for U_VIP; EXECUTE BLOCK is essentially a stored procedure that isn't stored in the database.

See also 'Myth: dynamic SQL is slow'



标签: sql firebird