Why “IN ” query tag is so costly in sql stored pro

2020-05-09 19:10发布

问题:

How can I improve my performance issue? I have an sql query with 'IN' I guess 'IN' making some costly performance issue. But I need index my sql query?

My sql query:


SELECT [p].[ReferencedxxxId]  
FROM [Common].[xxxReference] AS [p]  
WHERE ([p].[IsDeleted] = 0)  
  AND (([p].[ReferencedxyzType] = @__refxyzType_0)
  AND [p].[ReferencedxxxId] IN ('42342','ffsdfd','5345345345'))

My solution: (BUT I NEED YOUR HELP FOR BETTER ADVISE) Whichone is correct clustered or nonclustred index?

USE [xxx]
GO
CREATE NONCLUSTERED INDEX IX_NonClusteredIndexDemo_xxxId
ON [Common].[xxxReference](xxxId)
INCLUDE ([ID],[ReferencedxxxId])
WITH (DROP_EXISTING=ON, ONLINE=ON, FILLFACTOR=90)
GO

Second:

CREATE INDEX xxxReference_ReferencedxxxId_index
ON [Common].[xxxReference]  (ReferencedxxxId)[/code]

Whichone is correct or do you have better solution?

回答1:

Couple of things:

  1. If you have a SELECT statement inside the IN, that should be avoided and should be replaced with an EXISTS clause. But in your above example, that is not relevant as you have direct values inside IN.

    Using EXISTS and NOT EXISTS instead of IN and NOT IN helps SQL Server to not needing to scan each value of the column for each values inside the IN / NOT IN and rather can short circuit the search once a match or non-match found.

  2. Avoid the implicit conversion. They degrade the performance due to many reasons including i> SQL Server not able to find proper statistics on an index and hence not able to leverage an index and would rather go make use of a clustered index available in the table (which may not be covering your query), ii> Not assigning proper required RAM during memory allocation phase of the query by storage engine, iii> Cardinality estimation becomes wrong as SQL Server would not have statistics on the computed value of that column, and rather probably had statistics on that column.

    If you look at your execution plan posted above, you will see a yellow mark in your 'SELECT'. If you hover over it, you will see one/more warning messages. If your warning is related to implicit conversion, try to use proper datatypes during comparison.

    Eg. What is the datatype of the column '[ReferencedxxxId]'? If it is not an NVARCHAR and is rather a VARCHAR, then I would suggest:

  3. Make the values inside the IN as VARCHAR (currently you are making them NVARCHAR). This way you will still be able to take full advantage of the rowstore index created on [ReferencedxxxId] column.

  4. If you must have the values as NVARCHAR inside the IN clause, then you should:

    • CONVERT/CAST the column [ReferencedxxxId] in your IN clause. This is going to get rid of the Implicit conversion but you will no longer be able to take full advantage of the rowstore index on [ReferencedxxxId] column.

      +

    • Rather create a clustered/nonclustered columnstore index on the table covering the columns used in the query. This should significantly enhance the performance of your SELECT query.

  5. If you decided to go with the route of using rowstore index by correcting the values inside the IN, you need to make sure that you create a clustered/nonclustered index which covers the query. Meaning the index covers the columns on which you are doing search ([ReferencedxxxId], [ReferencedxxxType], [IsDeleted]) and then including the columns used in SELECT statement under INCLUDE clause (if it is a nonclustered index)

  6. Also, when you are creating a composite rowstore index, try to keep the order of columns inside the index high cardinality to low cardinality from left to right to make the best use of that index.



回答2:

The performance problem of this query is not the result of using the IN operator.
This operator performs very well with small lists (say, less than 1000 members).

The performance bottle neck here is the fact that SQL Server performs an index scan instead of an index seek (which is very costly), and the key lookup, which is 20% of the query cost.

To avoid both problems, you can add an index on IsDeleted, ReferencedxyzType and ReferencedxxxId - probably in this exact order.

SQL Performance tuning is a science that tends to look a little like art or magic - either way you look at it it requires a good knowledge of both the theory and practice of index settings and the relevant systems requirements.

Therefor, my suggestion is this: Do not attempt to solve it yourself with the help of strangers on the internet. Get an expert for a consulting job for a couple of hours/days to analyze the system and help you fine-tune it.
Learn whatever you can during this process. Ask questions about everything that is not trivial. This will be money well spent.



回答3:

On the basis of assuming an OLTP based system and not OLAP, my first pass would be an NC Index - given isDeleted is likely to have the least selectivity, I would place it last, first pass would be an NC index ReferencedxyzType, ReferencedxxxId, IsDeleted

I might even be tempted in a higher volume scenario to move the IsDeleted out of the index onto an include instead, since it provides so little selectivity to the index itself.

There is clearly already a clustered index in place on the table (from the query plan we can see it), we don't have the details of what is in it.

The question around clustered vs non-clustered is more complex and requires a lot more knowledge of the system and usage.