Comparing start/end dates between different rows b

2019-08-02 20:23发布

SAS 9.4

Example dataset

╔═════════╦═══════╦════════════╦══════════╗  
║ subject ║ event ║ start_date ║ end_date ║  
╠═════════╬═══════╬════════════╬══════════╣  
║ s1      ║ e1    ║ 2-Mar-19   ║ 1-Jun-19 ║  
║ s2      ║ e1    ║ 1-Mar-19   ║ 1-May-19 ║  
╚═════════╩═══════╩════════════╩══════════╝ 

Expected Output:

╔═════════╦═══════╦════════════╦══════════╦════════════╗  
║ subject ║ event ║ start_date ║ end_date ║ query_text ║  
╠═════════╬═══════╬════════════╬══════════╬════════════╣   
║ s1      ║ e1    ║ 1-Mar-19   ║ 1-Jun-19 ║ error?     ║  
║ s2      ║ e1    ║ 1-Mar-19   ║ 1-May-19 ║ error?     ║  
╚═════════╩═══════╩════════════╩══════════╩════════════╝  

I want to return all that have the same 'event' and where the 'end_date' of one with the earlier 'start_date' is greater than (>) the 'start_date' of the other one with the later start_date.

Here, subject 's2' has an earlier 'start_date' than 's1' but 's2' has a later 'end_date' than "s1's" 'start_date', so these two would get returned

This is what I have so far, but I'm not sure how to make sure what I'm comparing have the same 'event':

data Out_2;   
/* set relevant dataset */   
if input(compress(end_date,"/"), ??date9.) > input(compress(start_date,"/"), ??date9.);  

/* do something */  

run;

Thanks

标签: sas
1条回答
SAY GOODBYE
2楼-- · 2019-08-02 20:49

The way to process rows in groups in a DATA step is to sort by the key, then use BY key_field (event in your example) in the DATA step, retaining the data from the previous rows in a group, while identifying the group's first and last row using if first.key_field and if last.key_field).

That would be easy if you have a maximum of two subjects per event. If you have more you'd have to write duplicate code to retain the values from the first, second, etc rows, or retain the data into an array.

So I'd ditch the DATA step in favor of SQL, which lets you do this easily:

proc sql;
    create table result as
    select i1.*
      from example_input i1,
           example_input i2
     where i1.event = i2.event
       and i1.subject ^= i2.subject
       and (
             (i1.start_date < i2.start_date and
              i1.end_date > i2.start_date
             ) 
           or /* the other way around */
             (i2.start_date < i1.start_date and
              i2.end_date > i1.start_date
             )
           )
   ; /* untested - I don't have SAS handy */
quit;

The output I get with this SQL on your input data (not in SAS, but I'm pretty sure it shouldn't give different results!):

| subject | event | start_date | end_date   |
| ------- | ----- | ---------- | ---------- |
| s1      | e1    | 2019-03-02 | 2019-06-01 |
| s2      | e1    | 2019-03-01 | 2019-05-01 |

View on DB Fiddle

查看更多
登录 后发表回答