How to count distinct ID in the last 90 days on SA

2019-09-14 10:12发布

问题:

I have a dataset like this:

CustomerID  AccountManager TransactionID  Transaction_Time 
1111111111  FA001          TR2016001      08SEP16:11:19:25
1111111111  FA001          TR2016002      26OCT16:08:22:49
1111111111  FA002          TR2016003      04NOV16:08:05:36
1111111111  FA003          TR2016004      04NOV16:17:15:52
1111111111  FA004          TR2016005      25NOV16:13:04:16
1231231234  FA005          TR2016006      25AUG15:08:03:29
1231231234  FA005          TR2016007      16SEP15:08:24:24
1231231234  FA008          TR2016008      18SEP15:14:42:29

CustomerID represents each customer, each customer could have multiple transactions. Each account manager could deal with multiple transactions too. But transactionID is unique in this table.

Now I would like to count for each customer, when the transation happened, if I went back to last 90 days, how many distinct Account Manager get involved and how many transactions happened. The result I am looking for is like this:

CustomerID  Manager TransacID  Transaction_Time    CountTransac CountManager
1111111111  FA001   TR2016001  08SEP16:11:19:25    1            1
1111111111  FA001   TR2016002  26OCT16:08:22:49    2            1
1111111111  FA002   TR2016003  04NOV16:08:05:36    3            2
1111111111  FA003   TR2016004  04NOV16:17:15:52    4            3
1111111111  FA004   TR2016005  25NOV16:13:04:16    5            4
1231231234  FA005   TR2016006  25AUG15:08:03:29    1            1
1231231234  FA005   TR2016007  16SEP15:08:24:24    2            1
1231231234  FA008   TR2016008  18SEP15:14:42:29    3            2

Now using the following code, I figure out how to calculate transaction count, but I did not know how to calculate the distinct manager count. It would be highly appreciated if someone could help me out. Thanks a lot.

DATA want;
    SET transaction;
    COUNT=1;
    DO point=_n_-1 TO 1 BY -1;
        SET want(KEEP=CustomerID Transaction_Time COUNT POINT=point
            RENAME=(CustomerID =SAME_ID Transaction_Time =OTHER_TIME COUNT=OTHER_COUNT));

        IF CustomerID NE SAME_ID 
            OR INTCK ("DAY", DATEPART(OTHER_TIME), DATEPART(Transaction_Time )) > 90 
            THEN LEAVE;   
        COUNT + OTHER_COUNT;
    END;
    DROP SAME_ID OTHER_TIME OTHER_COUNT;
    RENAME COUNT=COUNT_TRANSAC;
RUN;

回答1:

Your code does not work at all as it is, but I see what you want to do. Here is something that does work. I commented out the WHERE statement so you can see that it produces the result you asked for. You need the WHERE statement if you really want just the last 90 days.

* Always a good idea to sort first unless you are CERTAIN that
* your values are in the order you want.;
proc sort data=have;
    by customerid AccountManager transactionid;
run;

DATA want;
    SET have;
* Uncomment the WHERE statement to activate the 90-day time frame.;
*   where today()-datepart(transaction_time)<=90;
    by customerid AccountManager transactionid;
    if first.customerid
     then do;
        counttransac=0;
        countmanager=0;
     end;
    if first.AccountManager
     then countmanager+1;

    counttransac+1;

RUN;

Taking advantage of SAS's BY statement and the first. and last. variable modifiers, you can reset your counter each time you see a new customer ID and manager ID.

[EDIT] Okay, that's much more difficult. Here is code that looks back at the history before each transaction. I see why you were using two SET statements because you have to join the dataset to itself. Probably you can do this with PROC SQL, but I didn't have time to check it out. Let me know if this works for you.

* Sort each customer's and manager's transactions;
proc sort data=transaction;
    by customerid accountmanager;
run;


DATA want;
    SET transaction nobs=pmax;
    by customerid;

    length lastmgr $ 100;
    retain pstart;      * Starting row for each customer;

    * Save starting row for each customer;
    if first.customerid
     then pstart=_n_;

    * Initialize current account manager and counters for
    * managers and transactions. The current transaction always
    * counts as one transaction and one manager.
    * Save the beginning of the 90-day period to avoid 
    * recalculating it each time.;
    lastmgr=accountmanager;
    mgrct=1;
    tranct=1;
    ninetyday=datepart(transaction_time)-90;

    * Set the starting row to search for each transaction;
    p=pstart;

    * Loop through all rows for the customer and only count
    * those that occur before the current transaction and
    * after the 90-day period before it.;
    * Note that the transactions are not necessarily sorted
    * in chronological order but rather in groups by customer
    * and manager, so we have to look through all of the
    * customer's transactions each time.;
    * DO UNTIL(0) means loop forever, so be careful that
    * there is always a LEAVE statement executed.;
    do until(0);

        * p > pmax means the end of the transaction list, so stop.;
        if p > pmax
         then leave;

        set transaction (keep=customerid accountmanager transaction_time
                  rename=(customerid=cust2 accountmanager=mgr2 transaction_time=tt2))
            point=p;

        * When customer ID changes, we are done with the loop.;
        if cust2 ~= customerid
         then leave;
         else do;
            * To be counted, the transaction needs to be within the 
            * 90-day period. Using "<" for the transaction time pre-
            * vents counting the current transaction twice.;
            if datepart(tt2) >= ninetyday and tt2 < transaction_time
             then do;
                tranct=tranct+1;
                if mgr2 ~= lastmgr
                 then do;
                    mgrct=mgrct+1;
                    lastmgr=mgr2;
                 end;
             end;
          end;

        * Look at the next transaction.;
        p=p+1;

    end;

    keep CustomerID AccountManager TransactionID Transaction_Time tranct mgrct;

RUN;

[EDIT] Here is a PROC SQL approach that works. It's by Tom in answer to my question here about how to create an elegant query to accomplish your task:

proc sql noprint ;
 create table want as
   select a.*
        , count(distinct b.accountmanager) as mgrct
        , count(*) as tranct
   from transaction a
   left join transaction b
   on a.customerid = b.customerid
    and b.transaction_time <= a.transaction_time
    and datepart(a.transaction_time)-datepart(b.transaction_time)
        between 0 and 90
   group by 1,2,3,4
 ;
quit;


标签: sas