My question seems simple and I hope it is.
I have a dataframe that has the date of diagnosis of a disease, a binary variable indicating which drug the patient was on (or exposed and unexposed group), a start and stop date for the drug, and an overall stop date.
ID Diag_date Treatment End.date Drug.start drug.end
1 NA 0 15/03/2002 01/01/2002 01/02/2002
1 NA 1 15/03/2002 01/02/2002 01/03/2002
1 NA 0 15/03/2002 01/03/2002 NA
2 01/04/2002 1 01/05/2002 01/01/2015 01/02/2002
2 01/04/2002 0 01/05/2002 01/02/2002 01/03/2002
2 01/04/2002 0 01/05/2002 01/03/2002 NA
As you can see the date of diagnosis is non time-varying, but the drug start and stop dates are.
Preferably I want an answer to two questions:
1.) How do i transfer the overall End.date
to the final drug.end
for each ID
?
2.) How do I create a binary column that shows if the diagnosis date occurs in the interval between Drug.start
and Drug.end
?
I wish my final data to look like the following:
ID Diag_date Treatment End.Date Drug.start Drug.end Event
1 NA 0 15/03/2002 01/01/2002 01/02/2002 0
1 NA 1 15/03/2002 01/02/2002 01/03/2002 0
1 NA 0 15/03/2002 01/03/2002 15/03/2002 0
2 01/04/2002 1 01/05/2002 01/01/2015 01/02/2002 0
2 01/04/2002 0 01/05/2002 01/02/2002 01/03/2002 0
2 01/04/2002 0 01/05/2002 01/03/2002 01/05/2002 1
Not everyone has a diagnosis date because not everyone in the sample had the disease. The code I wrote is the following:
for (i in 1:nrow(df)) {
if ((df$Diag_date[i] >= df$Drug.start[i]) && ( df$Diag_date[i] <= df$Drug.stop[i])) {
df$Event[i] <- 1
} else {
df$Event[i] <- 0
}
}
the error i get when I run this code is:
missing value where TRUE/FALSE needed
Any help would be much appreciated.