Fastest way to replace NAs in a large data.table

2019-01-01 05:02发布

I have a large data.table, with many missing values scattered throughout its ~200k rows and 200 columns. I would like to re code those NA values to zeros as efficiently as possible.

I see two options:
1: Convert to a data.frame, and use something like this
2: Some kind of cool data.table sub setting command

I'll be happy with a fairly efficient solution of type 1. Converting to a data.frame and then back to a data.table won't take too long.

7条回答
查无此人
2楼-- · 2019-01-01 05:32
library(data.table)

DT = data.table(a=c(1,"A",NA),b=c(4,NA,"B"))

DT
    a  b
1:  1  4
2:  A NA
3: NA  B

DT[,lapply(.SD,function(x){ifelse(is.na(x),0,x)})]
   a b
1: 1 4
2: A 0
3: 0 B

Just for reference, slower compared to gdata or data.matrix, but uses only the data.table package and can deal with non numerical entries.

查看更多
登录 后发表回答