Mark timestamps into session in SAS

2019-08-07 09:43发布

I have a series of dataset with two variables: uid and timestamp. I want to create a new variable called "session_num" to parse the timestamps into session numbers (when two timestampes are 30 min + apart, it will be marked as a new session).

For example:

enter image description here

I try to use retain statement in sas to loop through the timestamp, but it didn't work. Here's my code:

Data test;
SET test1;
By uid;
RETAIN session_num session_len;
IF first.uid THEN DO;
session=1;
session_len=0;
END;
session_len=session_len+timpestamp;
IF timpestamp-session_len>1800 THEN session_num=session_num+1; 
ELSE session_num=session_num;
IF last.uid;
KEEP uid timestamp session_num;
RUN;

Really appreciate if you could point out my mistake, and suggest the right solution.

Thanks!

标签: sas
2条回答
一纸荒年 Trace。
2楼-- · 2019-08-07 09:52

First, here is some sample input data (in the future, you should supply your own code to generate the sample input data so others don't have to spend time doing this for you),

data test;
    input uid$ timestamp : DATETIME.;
    format timestamp DATETIME.;
    datalines;
    a 05jul2014:03:55:00
    a 05jul2014:03:57:00
    a 07jul2014:20:08:00
    a 10jul2014:19:02:00
    a 10jul2014:19:05:00
    a 11jul2014:14:39:00
;

Then you can create the session variable as you defined it with

data testsession;
    set test;
    retain last;
    retain session 0;
    by uid;
    if first.uid then do;
        session = session +1;
        last = timestamp;
    end;
    if (timestamp-last)/60>15 then do;
        session = session+1;
    end;
    last = timestamp;
    drop last;
run;
查看更多
劫难
3楼-- · 2019-08-07 09:53

MrFlick's method is probably the more normal way to do this, but another option involves the look-ahead self-merge. (Yes, look-ahead, even though this is supposed to look behind - look behind is more complicated in this manner.)

data want;
  retain session 1; *have to initialize to 1 for the first record!;
  merge have(in=_h) have(rename=(timestamp=next uid=nextuid) firstobs=2);
  output;  *output first, then we adjust session for the next iteration;
  if (uid ne nextuid) then session=1;
  else if timestamp + 1800 < next then session+1;
  drop next:;
run;
查看更多
登录 后发表回答