I have a dataframe of values and for each value in the dataframe I want to determine if it is within say 10% of any other value in its row. I want to do this generically as I do not know how many columns I will have nor the names of the columns.
Some values are NA, if all other values in the row are NA I want to return TRUE. For the actual values which are NA I want to return FALSE. The values are all positive but can be 0.
For example say I have the follwoing dataframe
dataDF <- data.frame(
a = c(100, 250, NA, 700, 0),
b = c(105, 300, 280, NA, 0),
c = c(200, 400, 280, NA, 0)
)
In the first row we have a = 100, b = 105 and c = 200. a and b are within 10% of each other so we would have TRUE for both of those, c is not within 10% of either a or b so would be FALSE.
In the second row no values are within 10% of each other so all would be FALSE
In the third row b and c are equal so are TRUE, a is NA so is FALSE.
In the fourth row we only have a value for a so it is returned as TRUE, b and c are FALSE
In the final row all values are the same, so we would have TRUE for all
So my output would be
data.frame(
a = c( TRUE, FALSE, FALSE, TRUE, TRUE),
b = c( TRUE, FALSE, TRUE, FALSE, TRUE),
c = c(FALSE, FALSE, TRUE, FALSE, TRUE)
)
How I calculate the percentage difference doesn't really matter but they way I was going to do it would be to divide the absolute difference by the average of the 2 values so that I get the same value whichever way I look at it.
So for example to calculate the percentage difference between 100 and 105 it would be:
abs(100 - 105)/((100 + 105)/2) = 5/102.5 = 0.0488
Any ideas on the quickest and neatest way of doing this would be appreciated.
Thanks
Define a function an apply it on each row of your data.frame:
Gives the wanted result: