I have an administrative dataset for store visits from multiple years that I am trying to merge into one under the ID
variable.
Each dataset has duplicates of an ID
that occur during different store visits, annotated by Date
. Some of the more recent data files also have new variables (Y
) not contained in the old data files. Datasets from different years will also contain different cases indicated by different ID
. Also, some variables may be the same for each case but at different dates. I want the merged file to retain these duplicates.
Example data files:
File 1
ID Date X
1 3 4
1 5 3
2 1 4
File 2
ID Date X Y
1 6 4 2
1 7 1 5
2 8 4 7
3 7 2 3
I want the merged file to continue listing ALL duplicate cases, as such:
ID Date X Y
1 3 4 .
1 5 3 .
1 6 4 2
1 7 1 5
2 1 4 .
2 8 4 7
3 7 2 3
I then plan to restructure (CASESTOVARS /AUTOFIX=0
) the merged file so that it looks like this:
ID Date.1 Date.2 Date.3 Date.4 X.1 X.2 X.3 X.4 Y.1 Y.2 Y.3 Y.4
1 3 5 6 7 4 3 4 1 . . 2 5
2 1 8 . . 4 4 . . . 7 . .
3 7 . . . 2 . . . 3 . . .
I am having trouble with the initial merging process, however. I have tried looking up the safest way to merge files when they both have duplicate cases in order to make sure no data are lost in the process. It seems that the "Add Variables" method results in lost values for duplicate variables.
Thanks!
EDIT: If I used the "Add Variables" function and used both the ID
and Date
variables as the key variables, would that help avoid deletion of duplicate cases?
Why not try
add cases
instead ofadd variables
? if there are no occurrences of the same Id with the same date it should work OK with thecasestovars
.If there are such cases, you'll need to think what you want to do with them before you can proceed with the
casestovars
.One way would be to aggregate by ID and DATE and decide if you want to e.g. add up the data vars for this case.