Get first/last row of n-th consecutive group

2019-03-27 09:45发布

问题:

What's the easiest way to select a single record/value from the n-th group? The group is determined by a material and it's price(prices can change). I need to find the first date of the last and the last date of the next to last material-price-groups. So i want to know when exactly a price changed.

I've tried following query to get the first date of the current(last) price which can return the wrong date if that price was used before:

DECLARE @material VARCHAR(20)
SET @material = '1271-4303'

SELECT TOP 1 Claim_Submitted_Date 
FROM   tabdata
WHERE Material = @material 
AND Price = (SELECT TOP 1 Price FROM tabdata t2 
             WHERE Material = @material
             ORDER BY Claim_Submitted_Date DESC)
ORDER BY Claim_Submitted_Date ASC

This also only returns the last, how do i get the previous? So the date when the previous price was used last/first?

I have simplified my schema and created this sql-fiddle with sample-data. Here in chronological order. So the row with ID=7 is what i need since it's has the next-to-last price with the latest date.

ID   CLAIM_SUBMITTED_DATE                   MATERIAL    PRICE
5   December, 04 2013 12:33:00+0000         1271-4303   20
4   December, 03 2013 12:33:00+0000         1271-4303   20   <-- current
3   November, 17 2013 10:13:00+0000         1271-4846   40
7   November, 08 2013 12:16:00+0000         1271-4303   18   <-- last(desired)
2   October, 17 2013 09:13:00+0000          1271-4303   18
1   September, 17 2013 08:13:00+0000        1271-4303   10
8   September, 16 2013 12:15:00+0000        1271-4303   17
6   June, 23 2013 14:22:00+0000             1271-4303   18
9   January, 11 2013 12:22:10+0000          1271-4303   20   <-- a problem since this is older than the desired but will be returned by my simply sub-query approach above

Is it even possible to parametrize this value, so the nthLatestPriceGroup if i want to know the 3rd last price-date? Note that the query sits in a scalar-valued-function.

Edit: Many thanks to all. But unfortunately a simple ROW_NUMBER seems not to help here since i'm trying to get the row with the most recent price before the current price for a given material. So GROUP BY/PARTITION BY material,price includes rows with the same price that don't belong to the last recent material-price group.

Consider that a price can change from

Date             Price     Comment
5 months ago     20        original price, note that this is the same as the curent which causes my query to fail!
3 months ago     18        price has changed, i might need the first and last date
2 months ago     20        price has changed, i might need the first and last date
1 month ago      18        previous price, i need the oldest and newest dates 
NOW              20        current price, i need the first/oldest date from this group

So i want the date of the most recent row of the last 20-group, the oldest 20-group is irrelevant. So i must somehow group by consecutive prices since a price can repeat after it has already changed.

So actually i only need the most recent Claim_Submitted_Date from the price-group that starts with 1 month ago ... previous price in the list above which is the date until the previous price was valid. The other informations listed in the comments are just nice to have(the nthLatestPriceGroup sub-question). That's the row with ID=7 in the sample data above. By the way, the oldest row of this price-group would be the one with ID=2(October, 17) and not ID=6(June, 23) even if the latter is older. There was a different price(10) after. That's the reason why i can't use simple ranking functions.

回答1:

You will need to use the windowed function ROWNUMBER in a subquery,...

something like this will get you there:

ROW_NUMBER() OVER(PARTITION BY Price ORDER BY Claim_Submitted_Date DESC) AS Row 

Here's the update based on your fiddle:

DECLARE @material VARCHAR(20)
SET @material = '1271-4303'


SELECT * FROM
(
SELECT  *,
        ROW_NUMBER() OVER(PARTITION BY Material ORDER BY Claim_Submitted_Date ASC) AS rn  
FROM tabdata t2 
WHERE Material = @material
) res
WHERE rn=2

If idData is incremental(and therefore chronological) you could use this:

SELECT * FROM
(
SELECT  *,
        ROW_NUMBER() OVER(PARTITION BY Material ORDER BY idData DESC) AS rn  
FROM tabdata t2 
WHERE Material = @material
) res

Looking at your latest requirements we could all be over thinking it(if I understand you correctly):

DECLARE @MATERIAL AS VARCHAR(9)
SET @MATERIAL = '1271-4303'

SELECT  TOP 1 *
FROM tabdata t2 
WHERE Material = @material
AND PRICE <> (  SELECT TOP 1 Price
                FROM tabdata 
                WHERE Material = @material 
                ORDER BY CLAIM_SUBMITTED_DATE desc)
ORDER BY CLAIM_SUBMITTED_DATE desc

--results
idData  Claim_Submitted_Date        Material    Price
7       2013-11-08 12:16:00.000     1271-4303   18

Here's a fiddle based on this.



回答2:

Try this

DECLARE @material VARCHAR(20), @Nth INT
SET @material = '1271-4303'
SET @Nth = 2

;with CTE1 ([idData],[Claim_Submitted_Date], [Material], [Price], Rn)
as
(
SELECT  *,
        DENSE_RANK() OVER(ORDER BY PRICE DESC) AS rn  
FROM tabdata  
WHERE Material = @material
)
,
CTE2 ([idData],  [Material], [Price], LastDate)
AS(
SELECT [idData],  [Material], [Price], MAX([Claim_Submitted_Date])
FROM CTE1
WHERE rn = @Nth
GROUP BY [idData],  [Material], [Price]
)
SELECT Top 1 [idData],  [Material], [Price], LastDate
FROM CTE2 
ORDER BY LastDate DESC

Result Set

idData  Material    Price   LastDate
  7     1271-4303   18      2013-11-08 12:16:00.000


回答3:

Following your last comments, only solution I came with is counting the different price groups according to their Claim_Submitted_Date, and then include the obtained group indexes as part as the grouping criteria. Not sure it will be highly efficient. Hope it will help though.

declare @materialId nvarchar(max), @targetrank int
set @materialId = '1271-4303'
set @targetrank =2


;with grouped as (
    select *, 
              (select count( t.price)  -- don't put a DISTINCT here. (I know, I did)
               from tabdata as t 
               where t.Price <> tj.Price 
                 and t.Claim_Submitted_Date> tj.Claim_Submitted_Date 
                  and t.Material= @materialId
              )as group_indicator 
      from tabdata tj 
      where Material= @materialId
), 
rankedClaims as
(
    select grouped.*, row_number() over (PARTITION BY material,price,group_indicator  ORDER BY claim_submitted_date desc) as rank
    from grouped
),
numbered as
(
   select *, ROW_NUMBER() OVER (order by Claim_Submitted_Date desc) as RowNumber from
   rankedClaims 
   where rank =1
)
select Id, Claim_Submitted_Date, Material, Price from numbered
    where RowNumber=@targetrank

(Not sure also of should two claims on different prices on the same date should be treated t.Claim_Submitted_Date> tj.Claim_Submitted_Date)

-------------------- Previous answer

Maybe you can try something like :

SELECT ranked.[CLAIM_SUBMITTED_DATE]
FROM
(
  SELECT trimmed.*, ROW_NUMBER() OVER (ORDER BY claim_submitted_date) AS rank FROM
  (
    SELECT a.*
      ,row_number() over (PARTITION BY material,price ORDER BY claim_submitted_date) AS daterank
    FROM tabdata a
    WHERE a.material= '1271-4303'
  )
  AS trimmed
  WHERE daterank=1
) AS ranked
WHERE rank=2

Parameterizing the rank seems possible as it is only involved in WHERE rank=2



回答4:

Have you tried windowed functions such as row_number()

 select a.[IDDATA]
, a.[CLAIM_SUBMITTED_DATE]
, a.[MATERIAL]
 , a.[PRICE]
 ,row_number() over (PARTITION by material,price order by claim_submitted_date) as seq
 from tabdata a
 where a.material= '1271-4303'

SQLFiddle