Merging two datasets conditional on 3 variables in

2019-09-05 05:34发布

问题:

I am quite new to using SAS and would like to ask for your help on merging two datasets that do not contain one key identifier but require a combination of three conditions to be merged successfully.

I have two datasets:

  • Dataset 1 contains information on client visits including StartDate, EndDate, EmployeeNumber and CustomerNumber.
  • Dataset 2 contains information on certain activities that may have been performed during the visits in addition to the "Date" when the activity has been carried out and EmployeeNumber and CustomerNumber

The goal is to keep the first dataset and add information from the second dataset in case it is available - based on a match of date, employee number and customer number.

It is important that the date variable in the activity dataset is just on day while in the visits dataset it is a range between start and end date. So, the data sets look like this: Base datasets

Logically, it should be a merge based on three conditions:

  1. IF EmployeeNumber = EmployeeNumber
  2. IF CustomerNumber = CustomerNumber
  3. IF Date is between StartDate and EndDate
    Then: Update visits dataset with information from activity dataset

So, the ideal new dataset would look like this: Intended result of merge
If there is no entry in the activity dataset, the new data set should simply contain a blank for the respective variables. For all matches, it should add the respective variable.

When trying (rather simple) merge statements in SAS, I usually get an error message stating that "BY variables are not properly sorted on data set". Furthermore, I am not sure how to implement the conditions (in particular the match of date to a range) properly.

Can this sort of merge be done in SAS or SQL?
Any help is greatly appreciated!!!

回答1:

I tend to use proc sql for this kind of stuff, it's just a relatively simple join in which SQL excels. Try something like this:

proc sql;

create table merged_data as
  select a.*, b.other_variable, b.other_variable2
  from dataset1 as a
  left join dataset2 as b 
    on a.EmployeeNumber = b.EmployeeNumber 
    and a.CustomerNumber = b.CustomerNumber
    and b.Date between a.StartDate and a.EndDate
;

quit;