I have a dataset like this:
CustomerID AccountManager TransactionID Transaction_Time
1111111111 FA001 TR2016001 08SEP16:11:19:25
1111111111 FA001 TR2016002 26OCT16:08:22:49
1111111111 FA002 TR2016003 04NOV16:08:05:36
1111111111 FA003 TR2016004 04NOV16:17:15:52
1111111111 FA004 TR2016005 25NOV16:13:04:16
1231231234 FA005 TR2016006 25AUG15:08:03:29
1231231234 FA005 TR2016007 16SEP15:08:24:24
1231231234 FA008 TR2016008 18SEP15:14:42:29
CustomerID represents each customer, each customer could have multiple transactions. Each account manager could deal with multiple transactions too. But transactionID is unique in this table.
Now I would like to count for each customer, when the transation happened, if I went back to last 90 days, how many distinct Account Manager get involved and how many transactions happened. The result I am looking for is like this:
CustomerID Manager TransacID Transaction_Time CountTransac CountManager
1111111111 FA001 TR2016001 08SEP16:11:19:25 1 1
1111111111 FA001 TR2016002 26OCT16:08:22:49 2 1
1111111111 FA002 TR2016003 04NOV16:08:05:36 3 2
1111111111 FA003 TR2016004 04NOV16:17:15:52 4 3
1111111111 FA004 TR2016005 25NOV16:13:04:16 5 4
1231231234 FA005 TR2016006 25AUG15:08:03:29 1 1
1231231234 FA005 TR2016007 16SEP15:08:24:24 2 1
1231231234 FA008 TR2016008 18SEP15:14:42:29 3 2
Now using the following code, I figure out how to calculate transaction count, but I did not know how to calculate the distinct manager count. It would be highly appreciated if someone could help me out. Thanks a lot.
DATA want;
SET transaction;
COUNT=1;
DO point=_n_-1 TO 1 BY -1;
SET want(KEEP=CustomerID Transaction_Time COUNT POINT=point
RENAME=(CustomerID =SAME_ID Transaction_Time =OTHER_TIME COUNT=OTHER_COUNT));
IF CustomerID NE SAME_ID
OR INTCK ("DAY", DATEPART(OTHER_TIME), DATEPART(Transaction_Time )) > 90
THEN LEAVE;
COUNT + OTHER_COUNT;
END;
DROP SAME_ID OTHER_TIME OTHER_COUNT;
RENAME COUNT=COUNT_TRANSAC;
RUN;
Your code does not work at all as it is, but I see what you want to do. Here is something that does work. I commented out the WHERE
statement so you can see that it produces the result you asked for. You need the WHERE
statement if you really want just the last 90 days.
* Always a good idea to sort first unless you are CERTAIN that
* your values are in the order you want.;
proc sort data=have;
by customerid AccountManager transactionid;
run;
DATA want;
SET have;
* Uncomment the WHERE statement to activate the 90-day time frame.;
* where today()-datepart(transaction_time)<=90;
by customerid AccountManager transactionid;
if first.customerid
then do;
counttransac=0;
countmanager=0;
end;
if first.AccountManager
then countmanager+1;
counttransac+1;
RUN;
Taking advantage of SAS's BY
statement and the first.
and last.
variable modifiers, you can reset your counter each time you see a new customer ID and manager ID.
[EDIT] Okay, that's much more difficult. Here is code that looks back at the history before each transaction. I see why you were using two SET
statements because you have to join the dataset to itself. Probably you can do this with PROC SQL
, but I didn't have time to check it out. Let me know if this works for you.
* Sort each customer's and manager's transactions;
proc sort data=transaction;
by customerid accountmanager;
run;
DATA want;
SET transaction nobs=pmax;
by customerid;
length lastmgr $ 100;
retain pstart; * Starting row for each customer;
* Save starting row for each customer;
if first.customerid
then pstart=_n_;
* Initialize current account manager and counters for
* managers and transactions. The current transaction always
* counts as one transaction and one manager.
* Save the beginning of the 90-day period to avoid
* recalculating it each time.;
lastmgr=accountmanager;
mgrct=1;
tranct=1;
ninetyday=datepart(transaction_time)-90;
* Set the starting row to search for each transaction;
p=pstart;
* Loop through all rows for the customer and only count
* those that occur before the current transaction and
* after the 90-day period before it.;
* Note that the transactions are not necessarily sorted
* in chronological order but rather in groups by customer
* and manager, so we have to look through all of the
* customer's transactions each time.;
* DO UNTIL(0) means loop forever, so be careful that
* there is always a LEAVE statement executed.;
do until(0);
* p > pmax means the end of the transaction list, so stop.;
if p > pmax
then leave;
set transaction (keep=customerid accountmanager transaction_time
rename=(customerid=cust2 accountmanager=mgr2 transaction_time=tt2))
point=p;
* When customer ID changes, we are done with the loop.;
if cust2 ~= customerid
then leave;
else do;
* To be counted, the transaction needs to be within the
* 90-day period. Using "<" for the transaction time pre-
* vents counting the current transaction twice.;
if datepart(tt2) >= ninetyday and tt2 < transaction_time
then do;
tranct=tranct+1;
if mgr2 ~= lastmgr
then do;
mgrct=mgrct+1;
lastmgr=mgr2;
end;
end;
end;
* Look at the next transaction.;
p=p+1;
end;
keep CustomerID AccountManager TransactionID Transaction_Time tranct mgrct;
RUN;
[EDIT] Here is a PROC SQL
approach that works. It's by Tom in answer to my question here about how to create an elegant query to accomplish your task:
proc sql noprint ;
create table want as
select a.*
, count(distinct b.accountmanager) as mgrct
, count(*) as tranct
from transaction a
left join transaction b
on a.customerid = b.customerid
and b.transaction_time <= a.transaction_time
and datepart(a.transaction_time)-datepart(b.transaction_time)
between 0 and 90
group by 1,2,3,4
;
quit;