I have 2 tables (A and B) with the same primary keys. I want to select all row that are in A and not in B. The following works:
select * from A where not exists (select * from B where A.pk=B.pk);
however it seems quite bad (~2 sec on only 100k rows in A and 3-10k less in B)
Is there a better way to run this? Perhaps as a left join?
select * from A left join B on A.x=B.y where B.y is null;
On my data this seems to run slightly faster (~10%) but what about in general?
I use queries in the format of your second example. A join is usually more scalable than a correlated subquery.
I think your last statement is the best way. You can also try
SELECT A.*
from A left join B on
A.x = B.y
where B.y is null
I also use left joins with a "where table2.id is null" type criteria.
Certainly seems to be more efficient than the nested query option.
Joins are generally faster (in MySQL), but you should also consider your indexing scheme if you find that it's still moving slowly. Generally, any field setup as a foreign key (using INNODB) will already have an index set. If you're using MYISAM, make sure that any columns in the ON statement are indexed, and consider also adding any columns in the WHERE clause to the end of the index, to make it a covering index. This allows the engine to have access to all the data needed in the index, removing the need to make a second round-trip back to the original data. Keep in mind that this will impact the speed of inserts/updates/deletes, but can significantly increase the speed of the query.
This helped me a lot. Joins
are always faster than Sub Queries to give results:
SELECT tbl1.id FROM tbl1 t1
LEFT OUTER JOIN tbl2 t2 ON t1.id = t2.id
WHERE t1.id>=100 AND t2.id IS NULL ;