What's the most efficient way to match values

2019-03-28 12:53发布

问题:

I've got two tables in MS SQL Server:

dailyt - which contains daily data:

date             val
---------------------
2014-05-22       10
2014-05-21       9.5
2014-05-20       9
2014-05-19       8
2014-05-18       7.5
etc...

And periodt - which contains data coming in at irregular periods:

date             val
---------------------
2014-05-21       2
2014-05-18       1

Given a row in dailyt, I want to adjust its value by adding the corresponding value in periodt with the closest date prior or equal to the date of the dailyt row. So, the output would look like:

addt

date             val
---------------------
2014-05-22       12      <- add 2 from 2014-05-21
2014-05-21       11.5    <- add 2 from 2014-05-21
2014-05-20       10      <- add 1 from 2014-05-18
2014-05-19       9       <- add 1 from 2014-05-18
2014-05-18       8.5     <- add 1 from 2014-05-18

I know that one way to do this is to join the dailyt and periodt tables on periodt.date <= dailyt.date and then imposing a ROW_NUMBER() (PARTITION BY dailyt.date ORDER BY periodt.date DESC) condition, and then having a WHERE condition on the row number to = 1.

Is there another way to do this that would be more efficient? Or is this pretty much optimal?

回答1:

I think using APPLY would be the most efficient way:

SELECT  d.Val,
        p.Val,
        NewVal = d.Val + ISNULL(p.Val, 0)
FROM    Dailyt AS d
        OUTER APPLY
        (   SELECT  TOP 1 Val
            FROM    Periodt p
            WHERE   p.Date <= d.Date
            ORDER BY p.Date DESC
        ) AS p;

Example on SQL Fiddle



回答2:

If there relatively very few periodt rows, then there is an option that may prove quite efficient.

Convert periodt into a From/To ranges table using subqueries or CTEs. (Obviously performance depends on how efficiently this initial step can be done, which is why a small number of periodt rows is preferable.) Then the join to dailyt will be extremely efficient. E.g.

;WITH PIds AS (
        SELECT ROW_NUMBER() OVER(ORDER BY PDate) RN, *
        FROM   @periodt
    ),
    PRange AS (
        SELECT  f.PDate AS FromDate, t.PDate as ToDate, f.PVal
        FROM    PIds f
                LEFT OUTER JOIN PIds t ON
                  t.RN = f.RN + 1
    )
SELECT  d.*, p.PVal
FROM    @dailyt d
        LEFT OUTER JOIN PRange p ON
            d.DDate >= p.FromDate
        AND (d.DDate < p.ToDate OR p.ToDate IS NULL)
ORDER BY 1 DESC

If you want to try the query, the following produces the sample data using table variables. Note I added an extra row to dailyt to demonstrate no periodt entries with a smaller date.

DECLARE  @dailyt table (
    DDate date NOT NULL,
    DVal float NOT NULL
)

INSERT INTO @dailyt(DDate, DVal)
          SELECT '20140522', 10
UNION ALL SELECT '20140521', 9.5
UNION ALL SELECT '20140520', 9
UNION ALL SELECT '20140519', 8
UNION ALL SELECT '20140518', 7.5
UNION ALL SELECT '20140517', 6.5

DECLARE @periodt table (
    PDate date NOT NULL,
    PVal int NOT NULL
)

INSERT INTO @periodt
          SELECT '20140521', 2
UNION ALL SELECT '20140518', 1