I've stored the names of a data.table
as a vector
:
library(data.table)
set.seed(42)
DT <- data.table(x = runif(100), y = runif(100))
names1 <- names(DT)
As far as I can tell, it's a plain vanilla character vector:
str(names1)
# chr [1:2] "x" "y"
class(names1)
# [1] "character"
dput(names1)
# c("x", "y")
However, this is no ordinary character vector. It's a magic character vector! When I add a new column to my data.table
, this vector gets updated!
DT[ , z := runif(100)]
names1
# [1] "x" "y" "z"
I know this has something to do with how :=
updates by assignment, but this still seems magic to me, as I expect <-
to make a copy of the data.table
's names.
I can fix this by wrapping the names in c()
:
library(data.table)
set.seed(42)
DT <- data.table(x = runif(100), y = runif(100))
names1 <- names(DT)
names2 <- c(names(DT))
all.equal(names1, names2)
# [1] TRUE
DT[ , z := runif(100)]
names1
# [1] "x" "y" "z"
names2
# [1] "x" "y"
My question is 2-fold:
- Why doesn't
names1 <- names(DT)
create a copy of thedata.table
's names? In other instances, we are explicitly warned that<-
creates copies, both ofdata.table
s anddata.frame
s. - What's the difference between
names1 <- names(DT)
andnames2 <- c(names(DT))
?
Update: This is now added in the documentation for
?copy
in version 1.9.3. From NEWS:Part of your first question makes it a bit unclear to me as to what you really mean about
<-
operator (at least in the context ofdata.table
), especially the part: In other instances, we are explicitly warned that <- creates copies, both of data.tables and data.frames.So, before answering your actual question, I'll briefly touch it here. In case of a
data.table
a<-
(assignment) merely is not sufficient for copying adata.table
. For example:If you want to create a
copy
, then you've to explicitly mention it usingcopy
command.From CauchyDistributedRV, I understand what you mean is the assignment
names(dt) <- .
that'll result in the warning. I'll leave it as such.Now, to answer your first question: It seems that
names1 <- names(DT)
also behaves similarly. I hadn't thought/known about this until now. The.Internal(inspect(.))
command is very useful here:Here, you see that they are pointing to the same memory location
@7fc86a851480
. Even thetruelength
ofnames1
is 100 (which is by default allocated indata.table
, check?alloc.col
for this).So basically, the assignment
names1 <- names(dt)
seems to happen by reference. That is,names1
is pointing to the same location as dt's column names pointer.To answer your second question: The command
c(.)
seems to create a copy as there is no checking as to whether the contents result due to concatenation operation are different. That is, becausec(.)
operation can change the contents of the vector, it immediately results in a "copy" being made without checking if the contents are modified are not.