In this question I need to be able to lookup a value from a dataframe's column not only based on one attribute, but based on more attributes and range comparing against a dictionary. (Yes, this is actually a continuation of a story in R conditional replace more columns by lookup )
It should be easy question for R-known ppl, because I provide working solution for basic indexing, that needs to be upgraded, possibly easily ... but it is very hard for me, because Iam in a process of learning R.
From where to start:
When I do want to replace missing values from columns testcolnames from (big) table df1 according to column default of (small) dictionary testdefs (row selected by making testdefs$LABMET_ID equal to column name from testcolnames), I use this code:
testcolnames=c("80","116") #...result of regexp on colnames(df1), originally much longer
df1[,testcolnames] <- lapply(testcolnames, function(x) { tmpcol<-df1[,x];
tmpcol[is.na(tmpcol)] <- testdefs$default[match(x, testdefs$LABMET_ID)];
tmpcol })
To where to go:
Now - I need to upgrade this solution. The table testdefs will have (example below) multiple rows of the same LABMET_ID differing only by new two columns called lower and upper ... which need to be the bounds for variable df1$rngvalue when selecting which value to replace.
In another words - to upgrade this solution to not only select the row from testdefs (where testdefs$LABMET_ID equals the column name), but to select from these rows such a row, where df1$rngvalue is in the bounds of testdefs$lower and testdefs$upper (if none such exists, take the range closest - either the lowest or the highest, if the dictionary doesnt have LABMET_ID, we can leave NA in the original data).
An example:
testdefs
"LABMET_ID","lower","upper","default","notuse","notuse2"
30,0,54750,25,80,2 #..."many columns we dont care about"
46,0,54750,1.45,3.5,0.2
80,0,54750,0.03,0.1,0.01
116,0,30,0.09,0.5,0.01
116,31,365,0.135,0.7,0.01
116,366,5475,0.11,0.7,0.01
116,5476,54750,0.105,0.7,0.02
df1:
"rngvalue","80","116"
36,NA,NA
600000,NA,NA
367,5,NA
90,NA,6
to be transformed into:
"rngvalue","80","116"
36,0.03,0.135 #col80 is always replaced by 0.03
600000,0.03,0.105 #col116 needs to be decided on range, this value is bigger than everything in dictionary so take the last one
367,5,0.11 #5 not replaced, but second column nicely looks up to 0.11
90,0.03,6 #6 not replaced