I have a dataset in which I want to fill missing values witht the closest non-missing value. I found two elegant solutions in the answers to this question, but I don't understand why they are not working for me.
Table:
create table Tab1(data date, V1 number);
insert into Tab1 values (date '2000-01-01', 1);
insert into Tab1 values (date '2000-02-01', 1);
insert into Tab1 values (date '2000-03-01', 1);
insert into Tab1 values (date '2000-04-01', 1);
insert into Tab1 values (date '2000-05-01', NULL);
insert into Tab1 values (date '2000-06-01', NULL);
insert into Tab1 values (date '2000-03-01', 2);
insert into Tab1 values (date '2000-04-01', 2);
insert into Tab1 values (date '2000-05-01', NULL);
insert into Tab1 values (date '2000-06-01', NULL);
select * from Tab1;
DATA V1
2000-01-01 1
2000-02-01 1
2000-03-01 1
2000-04-01 1
2000-05-01
2000-06-01
2000-03-01 2
2000-04-01 2
2000-05-01
2000-06-01
Attempt #1:
select A.*,
(case when V1 is null then lag(V1 ignore nulls)
over (partition by V1 order by V1, data)
else V1
end) V2
from Tab1 A;
Attempt #2:
select A.*,
(case when V1 is null
then last_value(V1 ignore nulls)
over (partition by V1 order by data
range between unbounded preceding and 1 preceding)
else V1
end) V2
from Tab1 A;
Both give me the same unwanted result:
DATA V1 V2
2000-01-01 1 1
2000-02-01 1 1
2000-03-01 1 1
2000-04-01 1 1
2000-03-01 2 2
2000-04-01 2 2
2000-05-01
2000-05-01
2000-06-01
What am I doing wrong?
Your first version should work, with a slight tweak:
The tweak is to remove the
partition by v1
from thelag()
. Thecoalesce()
is just my preference for simpler expressions.The same tweak should work for the second version as well.
Your version doesn't work because the
lag()
value must come from the same partition (or benull
). When you havepartition by v1
, you are actually ensuring thatv1
has the same value as in the current row.Hi or you can try manual create the ignore null solution by refer to my below. thanks https://stackoverflow.com/a/57016373/10562099