SQL: How to fill empty cells with previous row val

2020-02-09 04:08发布

问题:

I need to produce the column "required" in the following table using SQL without using loops and correlated sub queries. Is this possible in SQL 2008?

Date    Customer   Value   Required   Rule
20100101       1      12         12
20100101       2                  0   If no value assign 0
20100101       3      32         32
20100101       4      42         42
20100101       5      15         15
20100102       1                 12   Take last known value
20100102       2                  0   Take last known value
20100102       3      39         39
20100102       4                 42   Take last known value
20100102       5      16         16
20100103       1      13         13
20100103       2      24         24
20100103       3                 39   Take last known value
20100103       4                 42   Take last known value
20100103       5      21         21
20100104       1      14         14
20100104       2                 24   Take last known value
20100104       3                 39   Take last known value
20100104       4      65         65
20100104       5      23         23

Basically I am filling empty "Value" cells with last know value for that customer. Remember the last row may not have a valid value, so you will have to pick it from the row before that with a valid value.

回答1:

Faiz,

how about the following query, it does what you want as far as I understand it. The comments explain each step. Take a look at CTEs on Books Online. This example could even be changed to use the new MERGE command for SQL 2008.

/* Test Data & Table */
DECLARE @Customers TABLE
    (Dates datetime,
     Customer integer,
     Value integer) 

    INSERT  INTO @Customers
    VALUES  ('20100101', 1, 12),
        ('20100101', 2, NULL),
        ('20100101', 3, 32),
        ('20100101', 4, 42),
        ('20100101', 5, 15),
        ('20100102', 1, NULL),
        ('20100102', 2, NULL),
        ('20100102', 3, 39),
        ('20100102', 4, NULL),
        ('20100102', 5, 16),
        ('20100103', 1, 13),
        ('20100103', 2, 24),
        ('20100103', 3, NULL),
        ('20100103', 4, NULL),
        ('20100103', 5, 21),
        ('20100104', 1, 14),
        ('20100104', 2, NULL),
        ('20100104', 3, NULL),
        ('20100104', 4, 65),
        ('20100104', 5, 23) ;

/* CustCTE - This gives us a RowNum to allow us to build the recursive CTE CleanCust */
WITH    CustCTE
          AS (SELECT    Customer,
                        Value,
                        Dates,
                        ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY Dates) RowNum
              FROM      @Customers),

/* CleanCust - A recursive CTE. This runs down the list of values for each customer, checking the Value column, if it is null it gets the previous non NULL value.*/
        CleanCust
          AS (SELECT    Customer,
                        ISNULL(Value, 0) Value, /* Ensure we start with no NULL values for each customer */
                        Dates,
                        RowNum
              FROM      CustCte cur
              WHERE     RowNum = 1
              UNION ALL
              SELECT    Curr.Customer,
                        ISNULL(Curr.Value, prev.Value) Value,
                        Curr.Dates,
                        Curr.RowNum
              FROM      CustCte curr
              INNER JOIN CleanCust prev ON curr.Customer = prev.Customer
                                           AND curr.RowNum = prev.RowNum + 1)

/* Update the base table using the result set from the recursive CTE */
    UPDATE trg
    SET Value = src.Value
    FROM    @Customers trg
    INNER JOIN CleanCust src ON trg.Customer = src.Customer
                                AND trg.Dates = src.Dates

/* Display the results */
SELECT * FROM @Customers


回答2:

This is the "Last non-null puzzle," and here's one of several elegant solutions:

If your "sparse" table is SparseTable with columns Date, Customer, Value then:

with C as
(select *,
    max(case when Value is not null then [Date] end)
        over (partition by Customer order by [Date] rows unbounded preceding) as grp
 from SparseTable
)
insert into FullTable
select *, 
    max(Value) over (partition by Customer, grp order by [Date] rows unbounded preceding) as Required
from C

Where Value couldn't be filled forward it will still be NULL, so you can then

update FullTable set Required = 0 where Required is null


回答3:

I need to produce the column "required" in the following table using SQL without using loops and correlated sub queries. Is this possible in SQL 2008?

Impossible. Point. Not possibly on ANY SQL based server, including oracle.

The main problem here is that you rule out loops AND correlated subqueries and any way to retrieve the value at query time will ultimately use another query to find the valid value (actually one per field). This is how SQL works. Yes, you could hide them in a custom scalar function, but still they would contain a logical sub query.



回答4:

I'm not sure if following counts considering your constraints but it gets the job done.

Test data

DECLARE @Customers TABLE (Date DATETIME, Customer INTEGER, Value INTEGER)

INSERT INTO @Customers VALUES ('20100101', 1, 12  )       
INSERT INTO @Customers VALUES ('20100101', 2, NULL)           
INSERT INTO @Customers VALUES ('20100101', 3, 32  ) 
INSERT INTO @Customers VALUES ('20100101', 4, 42  ) 
INSERT INTO @Customers VALUES ('20100101', 5, 15  ) 
INSERT INTO @Customers VALUES ('20100102', 1, NULL) 
INSERT INTO @Customers VALUES ('20100102', 2, NULL)
INSERT INTO @Customers VALUES ('20100102', 3, 39  )
INSERT INTO @Customers VALUES ('20100102', 4, NULL)
INSERT INTO @Customers VALUES ('20100102', 5, 16  )
INSERT INTO @Customers VALUES ('20100103', 1, 13  )
INSERT INTO @Customers VALUES ('20100103', 2, 24  )
INSERT INTO @Customers VALUES ('20100103', 3, NULL)
INSERT INTO @Customers VALUES ('20100103', 4, NULL)
INSERT INTO @Customers VALUES ('20100103', 5, 21  )
INSERT INTO @Customers VALUES ('20100104', 1, 14  )
INSERT INTO @Customers VALUES ('20100104', 2, NULL)
INSERT INTO @Customers VALUES ('20100104', 3, NULL)
INSERT INTO @Customers VALUES ('20100104', 4, 65  )
INSERT INTO @Customers VALUES ('20100104', 5, 23  )

Query

SELECT  c.Date
        , c.Customer
        , Value = COALESCE(c.Value, cprevious.Value, 0)
FROM    @Customers c
        INNER JOIN (
          SELECT  c.Date
                  , c.Customer
                  , MaxDate = MAX(cdates.Date)
          FROM    @Customers c
                  LEFT OUTER JOIN (
                    SELECT  Date
                            , Customer
                    FROM    @Customers
                  ) cdates ON cdates.Date < c.Date AND cdates.Customer = c.Customer
          GROUP BY
                  c.Date, c.Customer
        ) cmax ON cmax.Date = c.Date AND cmax.Customer = c.Customer                  
        LEFT OUTER JOIN @Customers cprevious ON cprevious.Date = cmax.MaxDate AND cprevious.Customer = cmax.Customer
ORDER BY
        1, 2, 3        

Update statement

UPDATE  @Customers 
SET     Value = c2.Value 
OUTPUT  Inserted.* 
FROM    @Customers c 
        INNER JOIN ( 
          SELECT  c.Date
                  , c.Customer
                  , Value = COALESCE(c.Value, cprevious.Value, 0)
          FROM    @Customers c
                  INNER JOIN (
                    SELECT  c.Date
                            , c.Customer
                            , MaxDate = MAX(cdates.Date)
                    FROM    @Customers c
                            LEFT OUTER JOIN (
                              SELECT  Date
                                      , Customer
                              FROM    @Customers
                            ) cdates ON cdates.Date < c.Date AND cdates.Customer = c.Customer
                    GROUP BY
                            c.Date, c.Customer
                  ) cmax ON cmax.Date = c.Date AND cmax.Customer = c.Customer                  
                  LEFT OUTER JOIN @Customers cprevious ON cprevious.Date = cmax.MaxDate AND cprevious.Customer = cmax.Customer
        ) c2 ON c2.Date = c.Date 
                AND c2.Customer = c.Customer 


回答5:

How about a left outer join on the same table where the date is less than the current one and the value is non-empty, ordered by date desc (limit 1), returning zero when null ? (No server available to test at the moment). Unless this counts as a sub-query...