Oracle SQL, fill missing value with the closest no

2019-07-29 06:53发布

问题:

I have a dataset in which I want to fill missing values witht the closest non-missing value. I found two elegant solutions in the answers to this question, but I don't understand why they are not working for me.

Table:

create table Tab1(data date, V1 number);
insert into Tab1 values (date '2000-01-01', 1);
insert into Tab1 values (date '2000-02-01', 1);
insert into Tab1 values (date '2000-03-01', 1);
insert into Tab1 values (date '2000-04-01', 1);
insert into Tab1 values (date '2000-05-01', NULL);
insert into Tab1 values (date '2000-06-01', NULL);
insert into Tab1 values (date '2000-03-01', 2);
insert into Tab1 values (date '2000-04-01', 2);
insert into Tab1 values (date '2000-05-01', NULL);
insert into Tab1 values (date '2000-06-01', NULL);
select * from Tab1;

DATA       V1
2000-01-01  1
2000-02-01  1
2000-03-01  1
2000-04-01  1
2000-05-01  
2000-06-01  
2000-03-01  2
2000-04-01  2
2000-05-01  
2000-06-01  

Attempt #1:

select A.*, 
    (case when V1 is null then lag(V1 ignore nulls)  
               over (partition by V1 order by V1, data) 
          else V1 
          end) V2
 from Tab1 A;

Attempt #2:

select A.*, 
    (case when V1 is null 
               then last_value(V1 ignore nulls)
               over (partition by V1 order by data 
               range between unbounded preceding and 1 preceding)  
          else V1 
          end) V2
 from Tab1 A;

Both give me the same unwanted result:

DATA       V1   V2
2000-01-01  1   1
2000-02-01  1   1
2000-03-01  1   1
2000-04-01  1   1
2000-03-01  2   2
2000-04-01  2   2
2000-05-01      
2000-05-01      
2000-06-01      

What am I doing wrong?

回答1:

Your first version should work, with a slight tweak:

select A.*, 
       coalesce(V1, lag(V1 ignore nulls)  over (order by data)) V2
from Tab1 A;

The tweak is to remove the partition by v1 from the lag(). The coalesce() is just my preference for simpler expressions.

The same tweak should work for the second version as well.

Your version doesn't work because the lag() value must come from the same partition (or be null). When you have partition by v1, you are actually ensuring that v1 has the same value as in the current row.



回答2:

Hi or you can try manual create the ignore null solution by refer to my below. thanks https://stackoverflow.com/a/57016373/10562099

  1. You need use a 0/1 column to indicate the Null/Non-Null data
  2. Then create accumulate summary column to calculate the indicator number in step 1. -now you can see the data already looks like grouped by your non_Null data.
  3. As last step, pls use a Max function group by the accumulate sum (in step 2) to populate the data(here is amont) in to empty items.