Reads are not getting low after putting a Index

2019-08-12 04:48发布

问题:

The requirement is to load 50 records in paging with all 65 columns of table "empl" with minimum IO. There are 280000+ records in table. There is only one clustered index over the PK.

Pagination query is as following:

WITH result_set AS (    
SELECT    
ROW_NUMBER() OVER (ORDER BY  e.[uon] DESC ) AS [row_number], e.*    
FROM    
empl e with (NOLOCK)    
LEFT JOIN empl_add ea with (NOLOCK)    
ON ea.ptid = e.ptid    
WHERE    
e.del = 0 AND e.pub = 1 AND e.sid = 2
AND e.md = 0     
AND e.tid = 3    
AND e.coid = 2     
AND (e.cid = 102)
AND ea.ptgid IN (SELECT ptgid FROM empl_dep where psid = 1001
AND ib = 1)) 
SELECT  
*  
FROM  
result_set  
WHERE  
[row_number] BETWEEN 0 AND 50

Following are the stats after running the above query from profiler:

CPU: 1500, Reads: 25576, Duration: 25704

Then I put the following index over the table empl:

CREATE NONCLUSTERED INDEX [ci_empl]
ON [dbo].[empl] ([del],[md],[pub],[tid],[coid],[sid],[ptid],[cid],[uon])
GO

After putting index CPU and Reads are still higher. I don't know what's wrong with the index or something wrong with the query?

Edit:

The following query is also taking high reads after putting index. And there are only 3 columns and 1 count.

SELECT TOP (2147483647)
ame.aid ID, ame.name name,         
COUNT(empl.pid) [Count], ps.uff uff FROM ame with (NOLOCK)        
JOIN pam AS pa WITH (NOLOCK) ON pa.aid = ame.aid         
JOIN empl WITH (NOLOCK) ON empl.pid = pa.pid         
LEFT JOIN psam AS ps
ON ps.psid = 1001
AND ps.aid = ame.aid
LEFT JOIN empl_add ea with (NOLOCK)        
ON ea.ptid = empl.ptid        
WHERE 
empl.del = 0 AND empl.pub = 1 AND empl.sid = 2
AND empl.md = 0         
AND (empl.tid = 3)        
AND (empl.coid = 2)        
AND (empl.cid = 102)        
AND ea.ptgid IN (SELECT ptgid FROM empl_dep where psid = 1001
AND ib = 1)        
AND ame.pub = 1 AND ame.del = 0        
GROUP BY ame.aid, ame.name, ps.uff        
ORDER BY ame.name ASC

Second Edit:

Now I had put the following index on "uon" column:

CREATE NONCLUSTERED INDEX [ci_empl_uon]
ON [dbo].[empl] (uon)
GO

But still CPU and Reads are Higher.

Third Edit:

DTA is suggesting me index with all columns included for the first query so I altered the suggested index convert it to a filter index for the basic four filters to make it more effective.

I added the line below after Include while creating the index.

Where e.del = 0 AND e.pub = 1 AND e.sid = 2 AND e.md = 0 AND e.coid = 2

But still the reads are high on both development and production machine.

Fourth Edit:

Now I had come to a solution that has improved the performance, but still not up to the goal. The key is that it's not going for ALL THE DATA.

The query is a following:

WITH result_set AS (    
SELECT    
ROW_NUMBER() OVER (ORDER BY  e.[uon] DESC ) AS [row_number], e.pID pID   
FROM    
empl e with (NOLOCK)    
LEFT JOIN empl_add ea with (NOLOCK)    
ON ea.ptid = e.ptid    
WHERE    
e.del = 0 AND e.pub = 1 AND e.sid = 2
AND e.md = 0     
AND e.tid = 3    
AND e.coid = 2     
AND (e.cid = 102)
AND ea.ptgid IN (SELECT ptgid FROM empl_dep where psid = 1001
AND ib = 1)) 
SELECT  
*  
FROM  
result_set join empl on result_set.pID = empl.pID
WHERE  
[row_number] BETWEEN @start AND @end

And recreated the index with key column alterations, include and filter:

CREATE NONCLUSTERED INDEX [ci_empl]
ON [dbo].[empl] ([ptid],[cid],[tid],[uon])
INCLUDE ([pID])
Where 
[coID] = 2 and
[sID] = 2 and
[pub] = 1 and
[del] = 0 and
[md] = 0
GO

It improves the performance, but not up to the goal.

回答1:

You are selecting the top 50 rows ordered by e.uon desc. An index that starts with uon will speed up the query:

create index IX_Empl_Uon on dbo.empl (uon)

The index will allow SQL Server to scan the top N rows of a this index. N is the highest number in your pagination: for the 3rd page of 50 elements, N equals 150. SQL Server then does 50 key lookups to retrieve the full rows from the clustered index. As far as I know, this is a textbook example of where an index can make a big difference.

Not all query optimizers will be smart enough to notice that row_number() over ... as rn with where rn between 1 and 50 means the top 50 rows. But SQL Server 2012 does. It uses the index both for first and consecutive pages, like row_number() between 50 and 99.



回答2:

You are trying to find the X through X+Nth row from a dataset, based on an order specified by column uon.

I’m assuming here that uon is the mentioned primary key. If not, without an index where uon is the first (if not only) column, a table scan is inevitable.

Next wrinkle: You don’t want that direct span of columns, you want that span of columns as filtered by a fairly extensive assortment of filters. The clustered index might pull the first 50 columns, but the WHERE may filter none, some, or all of those out. More will almost certainly have to read in order to "fill your span".

More fun: you perform a left outer join on table empl_add (e.g. retaing the empl row even if there is no empl_add found), and then require filter out all rows where empladd.ptgid is not found in the subquery. Might as well make this an inner join, it may speed things up and certainly will not make them slower. It is also a "filtering factor" that cannot be addressed with an index on table empl.

So: as I see it (i.e. I’m not testing it all out locally), SQL has to first assemble the data, filter out the invalid rows (which involves table joins), order what remains, and return that span of rows you are interested in. I believe that, with or without the index on uon, SQL is identifying a need to read all the data and filter/sort before it can pick out the desired range.

(Your new index would appear to be insufficient. The sixth column is sid, but sid is not referenced in the query, so it might only be able to help “so far”. This raises lots of questions about data cardinality and whatnot, at which point I defer to @Aarons’ point that we have insufficient information on the overall problem set for a full analysis.)