How can I select columns of a data.table based on a regex? Consider a simple example as follows:
library(data.table)
mydt <- data.table(foo=c(1,2), bar=c(2,3), baz=c(3,4))
Is there a way to use columns of bar
and baz
from the datatable based on a regex? I know that the following solution works but if the table is much bigger and I would like to choose more variables this could easily get cumbersome.
mydt[, .(bar, baz)]
I would like to have something like matches()
in dplyr::select()
but only by reference.
There is also a
subset
method for "data.table", so you can always use something like the following:It turns out that creating a
startswith
type of function for "data.table" is not very straightforward.Since
data.table v1.12.0
(Jan 2019) you can do:From the official documentation
?data.table
:You can also try to use
%like%
fromdata.table
package, which is a "convenience function for calling regexpr". However makes code more readable ;)In this case, answering your question:
As
%like%
returns a logical vector, whe can use the following to get every column except those which contain "foo":where
!
negates the logical vector.David's answer will work. But if your regex is long and you would rather it be done first, try:
It just depends on your preferences and needs. You can also assign the subsetted table to a chosen variable if you need the original intact.
UPDATE: I updated the comparison with @sindri_baldur's answer - using version
1.12.6
. According to the results,patterns()
is a handy shortcut, but if performance matters, one should stick with the..
orwith = FALSE
solution (see below).Apparently, there is a new way of achieving this from version 1.10.2 onwards.
It seems to work the fastest out of the posted solutions.