How bad is IN operator for SQL query perfomance?

2019-05-30 00:45发布

问题:

I had SQL query that was taking 9 hours for execution. See below:

Select Field1, Field2
From A
Where Field3 IN (45 unique values here) 

When I have split this query into 3 exactly the same queries only with each having 15 values within IN clause, they each took 2 minutes to execute. So instead of spending 9 hours I now spend 6 minutes (3 queries * 2 minutes).

Can anyone explain please how practically the same queries with only difference in quantity of values in the IN clause take that different an amount of time to run?

回答1:

  • What database engine are you using (Oracle, SQL Server, MySQL, etc.)?
  • Can you post the query plan for both queries?

It sounds like the optimizer your database is using chose a very poor execution plan for the first query. Assuming that your database engine uses some sort of cost-based optimizer (most of the large vendors do), that generally implies that the optimizer's guess as to how many rows are returned was quite incorrect. Most likely, its estimates are equally incorrect whether there are 15 or 45 unique values specified but it happens that tripling the bad estimate leads the optimizer to believe that a different query plan would be appropriate. Generally, this means that the information you've given the optimizer in terms of statistics about table A are incorrect and need to be fixed.

That being said, how large is table A? If you are really querying from a single table, the optimizer has relatively few options to consider. It really only has to decide between doing a full table scan on A and using an index on Field3. Even if it decides to do a table scan rather than use an index on Field3, there is no way that it should take 9 hours to scan a single table barring really exceptional circumstances where table A is measured in at least hundreds of GB and the hardware is relatively pedestrian.



回答2:

Did you try to use EXISTS ?

 Select Field1, Field2 From A Where
 EXISTS (SELECT NULL FROM B Where
 B.value = A.Value )


回答3:

Two possibilities that I can see:

1) There might be a billion records with Field3 = 10001 so that'll be really slow. And there might be no records with the other values, so that would be very fast.

2) the issue could be the method that the DB uses to run the query. for example there may be a cut off point where it will probably switch from using the index to a full table scan. the optimizer isn't always right. when it's wrong you have to dance around it.



回答4:

Poor query performance can be caused by a number of different issues, none of which I can even venture to guess would be your problem because there just isn't enough information:

  • Multiple or unnecessary joins
  • Rowcounts of joined tables
  • No indexes or indexes not being utilized by query engine
  • Out of date table statistics causing indexes to be inefficient
  • Inefficient query

With that being said you can always get a better idea of how your query is being interpreted by the query engine by running an EXPLAIN PLAN. Here is information on how to do this for an Oracle database, http://download.oracle.com/docs/cd/B10500_01/server.920/a96533/ex_plan.htm



回答5:

First, make sure you have an index on your "Field3" in your table "A", and if possible include "Field1", "Field2" (covering index). You don't say which database you are running so it is hard to be precise in giving any better advice.

If that still doesn't speed it up, try creating a temp table, then inserting all the "IN" values and then "INNER JOIN" or "WHERE EXISTS" that temp table to your table "A".