I'm trying to figure out a simple way to do something like this with dplyr (data set = dat, variable = x):
day$x[dat$x<0]=NA
Should be simple but this is the best I can do at the moment. Is there an easier way?
dat = dat %>% mutate(x=ifelse(x<0,NA,x))
You can use
replace
which is a bit faster thanifelse
:You can speed it up a bit more by supplying an index to
replace
usingwhich
:On my machine, this cut the time to a third, see below.
Here's a little comparison of the different answers, which is only indicative of course:
(I'm using dplyr_0.3.0.2 and data.table_1.9.4)
Since we're always very interested in benchmarking, especially in the course of data.table-vs-dplyr discussions I provide another benchmark of 3 of the answers using microbenchmark and the data by akrun. Note that I modified
dplyr1
to be the updated version of my answer:You can use the
is.na<-
function:Or you can use mathematical operators:
If you are using
data.table
, the below code is fasterBenchmarks
Using
data.table_1.9.5
anddplyr_0.3.0.9000
Updated Benchmarks
Using
data.table_1.9.5
anddplyr_0.4.0
. I used a slightly bigger dataset and replacedas.data.table
withsetDT
(Included @Sven Hohenstein's faster function as well.)Updated Benchmarks2
At the request of @docendo discimus, benchmarking again his "new" version of
dplyr
usingdata.table_1.9.5
anddplyr_0.4.0
.NOTE: Because there is a change in @docendo discimus code, I changed
0
to0L
for the data.table`data