T SQL - Eloquent replacement of Correlated Subquer

2020-06-18 01:09发布

问题:

I have a query that is currently using a correlated subquery to return the results, but I am thinking the problem could be solved more eloquently perhaps using ROW_NUMBER().

The problem is around the profile of a value v, through a number of years for an Item. Each item has a number of versions, each with its own profile whick starts when the version is introduced and the data currently looks like this:

    
ItemId    ItemVersionId    Year    Value
===========================================
1         1                01      0.1
1         1                02      0.1
1         1                03      0.2
1         1                04      0.2
1         1                05      0.2
1         1                06      0.3
1         1                07      0.3
1         1                08      0.4
1         2                04      0.3
1         2                05      0.3
1         2                06      0.3
1         2                07      0.4
1         2                08      0.5
1         3                07      0.6
1         3                08      0.7
2         1                01      0.1
2         1                01      0.1
2         1                01      0.2
etc

I want to return the full profile for an Item using the most recent version where applicable. For the above example for item 1:

ItemId    ItemVersionId    Year    Value
===========================================
1         1                01      0.1
1         1                02      0.1
1         1                03      0.2
1         2                04      0.3
1         2                05      0.3
1         2                06      0.3
1         3                07      0.6
1         3                08      0.7

I am currently using

SELECT ItemId, ItemVersionId, Year, Value
FROM table t
WHERE
    ItemId = 1
    AND ItemVersionId = (SELECT MAX(ItemVersionId) FROM table WHERE ItemId = t.ItemId AND Year = t.Year)   

Whilst this returns the correct I suspect there is a more efficient way to do it, especially when the table gets large.

I am using SQL Server 2005.

Thanks in advance

回答1:

I would do it with a CTE:

WITH Result AS
(
  SELECT Row_Number() OVER (PARTITION BY ItemId, Year
ORDER BY ItemversionId DESC) AS RowNumber
      ,ItemId
      ,ItemversionId
      ,Year
      ,Value
  FROM table
)
SELECT ItemId
  ,ItemversionId
  ,Year
  ,Value
FROM Result
WHERE RowNumber = 1
ORDER BY ItemId, Year


回答2:

I think it's okay how you do it. You could check if there is a composite index on ItemId and Year.

You could inspect the query plan to see the impact of that query.

If there is an "Item" table in your database you could try another approach. Insert a column ItemVersionId in that table and make sure you update that value when new versions are saved. Then in your query join the Item table using ItemId and ItemVersionId instead of using that subquery.



回答3:

This should work, although you will have to test performance with your own data:

SELECT
    T1.ItemID,
    T1.ItemVersionID,
    T1.Year,
    T1.Value
FROM
    MyTable T1
INNER JOIN (SELECT Year, MAX(ItemVersionID) AS MaxItemVersionID FROM MyTable T2 WHERE T2.ItemID = 1 GROUP BY Year) SQ ON
    SQ.Year = T1.Year AND
    SQ.MaxItemVersionID = T1.ItemVersionID
WHERE
    T1.ItemID = 1

Also, you can alter the subquery to also group by and return an ItemID so that you can return data for more than one item at a time if you need to for some other part of your application. Just be sure to then add the ItemID to the join criteria.