Since my terrible execution and interpretation of my previous question I'll start over and will try to formulate the question as short and general possible.
I have two dataframes (see the examples below). Each dataset contains the same number of columns.
tc <- textConnection('
ID Track1 Track2 Track3 Track4 Time Loc
4 15 "" "" 50 40 1
5 17 115 109 55 50 1
6 17 115 109 55 60 1
7 13 195 150 60 70 1
8 13 195 150 60 80 1
9 "" "" 181 70 90 2 #From this row, example data added
10 "" "" 182 70 92 2
11 429 31 "" 80 95 3
12 480 31 12 80 96 3
13 118 "" "" 90 100 4
14 120 16 213 90 101 4
')
MATCHINGS <- read.table(tc, header=TRUE)
tc <- textConnection('
ID Track1 Track2 Track3 Track4 Time Loc
"" 15 "" "" 50 40 1
"" 17 "" 109 55 50 1
"" 17 432 109 55 65 1
"" 17 115 109 55 59 1
"" 13 195 150 60 68 1
"" 13 195 150 60 62 1
"" 10 5 1 10 61 3
"" 13 195 150 60 72 1
"" 40 "" 181 70 82 2 #From this row, example data added
"" "" "" 182 70 85 2
"" 429 "" "" 80 90 3
"" "" 31 12 80 92 3
"" "" "" "" 90 95 4
"" 118 16 213 90 96 4
')
INVOLVED <- read.table(tc, header=TRUE)
The goal is to place the least recent ID from MATCHINGS
into INVOLVED
by matching on Track1
to Track4
and Loc
. An extra condition is that the Time
of the matching INVOLVED
entry may not be higher than the Time
of the entry in MATCHING
. Furthermore a match on Track1
is most preferred, a match on Track4
is least preferred. However only Track4
is always available (all other Track
-columns can be empty). Thus the expected results are:
ID Track1 Track2 Track3 Track4 Time Loc
4 15 "" "" 50 40 1
5 17 "" 109 55 50 1
"" 17 432 109 55 65 1
6 17 115 109 55 59 1
7 13 195 150 60 68 1
7 13 195 150 60 62 1
"" 10 5 1 10 61 3
8 13 195 150 60 72 1
9 40 "" 181 70 82 2 #From this row, example data added
10 "" "" 182 70 85 2
11 429 "" "" 80 90 3
12 "" 31 12 80 92 3
13 "" "" "" 90 95 4
13 118 16 213 90 96 4
I tried to this with the data.table
package, but fail in doing this efficient. Is it possible to get rid of the vector scans and efficiently go through the data without looping?
dat <- data.table(MATCHINGS)
for(i in 1:nrow(INVOLVED)){
row <- INVOLVED[i,]
match <- dat[Time>=row$Time][Loc==row$Loc][Track4==row$Track4][Track4!=""][order(Time)][1]
if(!is.na(match$ID)){ INVOLVED$ID[i]<-match$ID }
match <- dat[Time>=row$Time][Loc==row$Loc][Track3==row$Track3][Track3!=""][order(Time)][1]
if(!is.na(match$ID)){ INVOLVED$ID[i]<-match$ID }
match <- dat[Time>=row$Time][Loc==row$Loc][Track2==row$Track2][Track2!=""][order(Time)][1]
if(!is.na(match$ID)){ INVOLVED$ID[i]<-match$ID }
match <- dat[Time>=row$Time][Loc==row$Loc][Track1==row$Track1][Track1!=""][order(Time)][1]
if(!is.na(match$ID)){ INVOLVED$ID[i]<-match$ID }
}
update
Updated the example data showing the need for Track 1 to 3
. As shown Track1
is most important and Track4
least important. Even if Track1 to 3
match to MATCHINGS x
and Track4
matches to MATCHINGS y
, the ID
of y
should be assigned to that INVOLVED row
. So: Track3
match overrides Track4
match, Track2
match overrides Track3
match, Track1
match overrides Track2
match.