I am trying to exclude some rows from a datatable based on, let's say, days and month - excluding for example summer holidays, that always begin for example 15th of June and end the 15th of next month. I can extract those days based on Date, but as as.Date function is awfully slow to operate with, I have separate integer columns for Month and Day and I want to do it using only them.
It is easy to select the given entries by
DT[Month==6][Day>=15]
DT[Month==7][Day<=15]
Is there any way how to make "difference" of the two data.tables
(the original ones and the ones I selected). (Why not subset? Maybe I am missing something simple, but I don't want to exclude days like 10/6, 31/7.)
I am aware of a way to do it with join, but only day by day
setkey(DT, Month, Day)
DT[-DT[J(Month,Day), which= TRUE]]
Can anyone help how to solve it in more general way?
Based on the answer here, you might try something like
If your data contain at least one entry for each day you want to exclude,
na.omit
might not be required.Great question. I've edited the question title to match the question.
A simple approach avoiding
as.Date
which reads nicely :That's probably fast enough in many cases. If you have a lot of different ranges, then you may want to step up a gear :
That's a bit long and error prone because it's DIY. So one idea is that a
list
column in ani
table would represent a range query (FR#203, like a binary search%between%
). Then a not-join (also not yet implemented, FR#1384) could be combined with the list column range query to do exactly what you asked :That would extend to multiple different ranges, or the same range for many different ids, in the usual way; i.e., more rows added to
i
.