I have what I think is a very simple question related to the use of data.table and the :=
function. I don't think I quite understand the behaviour of :=
and often I run into similar problems.
Here is some example data
mat <- structure(list(
col1 = c(NA, 0, -0.015038, 0.003817, -0.011407),
col2 = c(0.003745, 0.007463, -0.007407, -0.003731, -0.007491)),
.Names = c("col1", "col2"),
row.names = c(NA, 10L),
class = c("data.table", "data.frame"))
which gives
> mat
col1 col2
1: NA 0.003745
2: 0.000000 0.007463
3: -0.015038 -0.007407
4: 0.003817 -0.003731
5: -0.011407 -0.007491
I want to create a column called col3 which gives the sum of col1 and col2. If I use
mat[,col3 := col1 + col2]
# col1 col2 col3
#1: NA 0.003745 NA
#2: 0.000000 0.007463 0.007463
#3: -0.015038 -0.007407 -0.022445
#4: 0.003817 -0.003731 0.000086
#5: -0.011407 -0.007491 -0.018898
then I get an NA for the first row, but I want NAs to be ignored. So I tried instead
mat[,col3 := sum(col1,col2,na.rm=TRUE)]
# col1 col2 col3
#1: NA 0.003745 -0.030049
#2: 0.000000 0.007463 -0.030049
#3: -0.015038 -0.007407 -0.030049
#4: 0.003817 -0.003731 -0.030049
#5: -0.011407 -0.007491 -0.030049
which is not what I am after, since it is giving me the sum of all elements of col1 and col2. I think I don't quite get :=
... How can I get the sum of the element of col1 and col2 ignoring NA values?
Not sure this is relevant, but here is my sessionInfo
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.8.3
This is standard
R
behaviour, nothing really to do withdata.table
Adding anything to
NA
will returnNA
sum
will return a single numberIf you want
1 + NA
to return1
then you will have to run something like
To deal with when
col1
orcol2
areNA
EDIT - an easier solution
You could also use rowSums, which has a
na.rm
argumentrowSums
is what you want (by definition, therowSums
of a matrix containingcol1
andcol2
, removingNA
values(@JoshuaUlrich suggested this as a comment )
It's not a lack of understanding of data.table but rather one regarding vectorized functions in R. You can define a dyadic operator that will behave differently than the "+" operator with regard to missing values:
You can use mrdwad's comment to do it with
sum(... , na.rm=TRUE
):