Why this mysql query (with is null check) is so sl

2019-06-02 11:35发布

问题:

I getting in doubt with "IS NULL" MySQL check. I have this 2 queries. The first one runs in about 300 seconds. The second one run less then 1 second!

Slow query:

SELECT count(distinct(u.id))
FROM ips_usuario AS u 
JOIN ips_fatura AS f
    ON ((u.id = f.ips_usuario_id) OR
       (u.ips_usuario_id_titular IS NOT NULL AND
        u.ips_usuario_id_titular = f.ips_usuario_id));

Fast query:

SELECT count(distinct(u.id))
FROM ips_usuario AS u 
JOIN ips_fatura AS f
    ON ((u.id = f.ips_usuario_id) OR
       (u.ips_usuario_id_titular = f.ips_usuario_id));

All join conditions use foreign keys indexed columns. The table ips_usuario have about 20.000 records and the table ips_fatura have about 500.000 records.

回答1:

I am surprised that either is fast. I would suggest replacing them with exists:

SELECT COUNT(*)
FROM ips_usuario u  
WHERE EXISTS (SELECT 1 FROM ips_fatura f WHERE u.id = f.ips_usuario_id) OR
      EXISTS (SELECT 1 FROM ips_fatura f WHERE u.ips_usuario_id_titular = f.ips_usuario_id);

And for the second:

SELECT COUNT(*)
FROM ips_usuario u  
WHERE EXISTS (SELECT 1 FROM ips_fatura f WHERE u.id = f.ips_usuario_id) OR
      (u.ips_usuario_id_titular IS NOT NULL AND
       EXISTS (SELECT 1 FROM ips_fatura f WHERE u.ips_usuario_id_titular = f.ips_usuario_id)
      )

For both these, you want two indexes: ips_fatura(ips_usuario_id) and ips_fatura(ips_usuario_id_titular). You can check the explain to be sure that EXISTS is using the index. If not, the newer releases of MySQL use indexes for IN:

SELECT COUNT(*)
FROM ips_usuario u  
WHERE u.id IN (SELECT f.ips_usuario_id FROM ips_fatura f) OR
      u.ips_usuario_id_titular IN (SELECT f.ips_usuario_id FROM ips_fatura f);

In either case (EXISTS or IN) the goal is to do a "semi-join". That is, to only fine the first row with a match rather than all matches. This is an important efficiency, because it allows the query to avoid duplication removal.

I would speculate that the issue is the optimization of the or -- usually this results in inefficient JOIN algorithms. However, perhaps MySQL is smart in your first case. But the addition of the IS NULL to the outer table throws it off.