Alternative to using WHERE … IN (…) for slow SQL q

2019-03-02 22:17发布

问题:

This is actually part of a much larger complex query.
According to the query plan the sort on this statement dominates the cost of the larger query.
And by materializing this part of the query I verified it dominates the cost.

    select [sID], ROW_NUMBER() over (partition by [sID] order by [wordPos]) [rn], [wordPos], [wordID]
      from [FTSindex] 
     where [wordID] in (428,2112)
  order by [sID], [rn] 

From right to left:
- Index seek 5% (IX_FTSindex_wordID_sID)
- Sort 76%
- Parallelism 19%

CREATE TABLE [dbo].[FTSindex](
    [sID] [int] NOT NULL,
    [wordPos] [int] NOT NULL,
    [wordID] [int] NOT NULL,
    [charPos] [int] NOT NULL,
 CONSTRAINT [PK_FTSindex] PRIMARY KEY CLUSTERED 
(
    [sID] ASC,
    [wordPos] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 100) ON [PRIMARY]
) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [IX_FTSindex_wordID_sID] ON [dbo].[FTSindex] 
(
    [wordID] ASC,
    [sID] ASC,
    [wordPos] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 100) ON [PRIMARY]
GO

Given the IX_FTSindex_wordID_sID includes [sID] and [wordPos] I thought that sort would be very fast.
Tried [wordID] alone and [wordID], [sID] and still the sort is still 76% of the cost.

Even this query

    select [sID], [wordPos] -- , wordID
      from [FTSindex] 
     where [wordID] in (428,2112)
  order by [sID], [wordPos]   

The sort is sort is 76% or the cost.

How can I get the sort cost to come down?
The PK has to stay as it is.
I can add or revise other indexes.

回答1:

Just for giggles again, could you try this query:

  select 
    [sID], 
    ROW_NUMBER() over (partition by [sID] order by [wordPos]) [rn], 
    [wordPos], [FTSindex].[wordID]
  from [FTSindex] 
  join ( 
    values (428), (2112)
  ) w (wordID) on w.wordID = [FTSindex].wordID
  order by [sID], [rn] 

Sometimes, throwing more hardware at the problem is the correct answer; though I agree that this should be a last resort and not a first. Whether this particular problem requires more CPU, more memory, or more spindles is dependent on many factors, including your present hardware.

Your result set of 1.6 million rows, each 4 integers, should sort quickly on any reasonable amount of current hardware. Since delays are occurring it seems likely that too much processing is occurring on the base set of 900 million rows, and the challenge is to identify why. Can you attach more details about the query plan?