I have a data set containing an unbalanced panel of observations, where I want to forward and backward fill missing and/or "wrong" observations of ticker with the latest non-missing string.
id time ticker_have ticker_want
------------------------------
1 1 ABCDE YYYYY
1 2 . YYYYY
1 3 . YYYYY
1 4 YYYYY YYYYY
1 5 . YYYYY
------------------------------
2 4 . ZZZZZ
2 5 ZZZZZ ZZZZZ
2 6 . ZZZZZ
------------------------------
3 1 . .
------------------------------
4 2 OOOOO OOOOO
4 3 OOOOO OOOOO
4 4 OOOOO OOOOO
Basically, if the observation already has a ticker, but this ticker is not the same as the latest non-empty ticker, we replace this ticker using the latest ticker.
So far, I have managed to fill missing observations forward using this code
proc sql;
create table have as select * from old_have order by id, time desc;
quit;
data want;
drop temp;
set have;
by id;
/* RETAIN the new variable*/
retain temp; length temp $ 5;
/* Reset TEMP when the BY-Group changes */
if first.id then temp=' ';
/* Assign TEMP when X is non-missing */
if ticker ne ' ' then temp=ticker;
/* When X is missing, assign the retained value of TEMP into X */
else if ticker=' ' then ticker=temp;
run;
Now I am stuck figuring out the cases where I can't access the non-missing value using last.ticker
or first.ticker
...
How would one do this using DATA
or PROC SQL
or any other SAS commands?
You can do this several ways, but
proc sql
with some nested sub-queries is one solution.(Read it from inside out, #1 then 2 then 3. You could build each subquery into a dataset first if it helps)
Consider using a data step to retrieve the last ticker by time for each id, then joining it to main table. Also, use a
CASE
statement to conditionally assign new ticker if missing or not.Data
Output